Apache Tika 1.6

An open source toolkit for parsing, analyzing, and extracting metadata and content from files, with support for a broad range of file types
Apache Tika was developed as a low-level toolkit for searching content inside other files.

Tika doesn't do much on its own being a simple library, but it can be integrated in more powerful tools like search engines, digital asset management systems or CMSs to provide a fully-functional in-file search system.

The library can access just the file's header for quick overall file information, or it can go really deep and search even in the file's body for various types of data, in text or binary format.

A wide range of file types are supported and Tika can also be used with other programming languages thanks to a series of third-party bindings and wrappers.

Main features:

  • Supported formats:
  • Audio formats
  • CAD formats
  • Compression and packaging formats
  • Crypto formats
  • Electronic Publication Format
  • Executable programs and libraries
  • Feed and Syndication formats
  • Font formats
  • Help formats
  • HyperText Markup Language
  • Image formats
  • iWorks document formats
  • Java class files and archives
  • Mail formats
  • Microsoft Office document formats
  • OpenDocument Format
  • Portable Document Format
  • Rich Text Format
  • Scientific formats
  • Source code
  • Text formats
  • Video formats
  • XML and derived formats

last updated on:
September 9th, 2014, 20:24 GMT
license type:

Apache License

developed by:
Apache Software Foundation
operating system(s):
Windows / Linux / Mac OS / BSD / Solaris
C: \ Development Tools \ Other Libraries
Apache Tika
Download Button

In a hurry? Add it to your Download Basket!

user rating 1



Rate it!
What's New in This Release:
  • This release includes bug fixes and new features including a new Translation API, more supported formats, and overall improvements in Tika stability.
read full changelog

Add your review!