Apache Tika 1.6

An ASFv2 licensed open source toolkit for extracting information from digital documents
Tika allows search engines, content management systems and other web applications that work with various digital documents to easily detect, access and extract metadata and content from major file formats.

Main features:

  • Supported formats:
  • HyperText Markup Language
  • XML and derived formats
  • Microsoft Office document formats
  • OpenDocument Format
  • Portable Document Format
  • Electronic Publication Format
  • Rich Text Format
  • Compression and packaging formats
  • Text formats
  • Audio formats
  • Image formats
  • Video formats
  • Java class files and archives
  • The mbox format

last updated on:
September 9th, 2014, 20:24 GMT
developed by:
Apache Software Foundation
license type:
Apache License
operating system(s):
Windows / Linux / Mac OS / BSD / Solaris
C: \ Development Tools \ Other Libraries


In a hurry? Add it to your Download Basket!

user rating



What's New in version 1.5
  • Fixed bug in handling of embedded file processing in PDFs.
  • Added SourceCodeParser to support java, Groovy, C++ files.
  • Updated Tika Server to support multipart/form-data payloads.
  • Updated Tika Server to CXF 2.7.8.
read full changelog

Add your review!