Apache Tika 1.8

An open source toolkit for parsing, analyzing, and extracting metadata and content from files, with support for a broad range of file types

  Add it to your Download Basket!

 Add it to your Watch List!

0/5

Rate it!

What's new in Apache Tika 1.7:

  • This release includes bug fixes and new features including a new Tesseract OCR Parser; a new GDAL Parser; more supported formats, and overall improvements in Tika stability.
Read full changelog
send us
an update
LICENSE TYPE:

Apache License

USER RATING:
5.0/5 1
DEVELOPED BY:
Apache Software Foundation
HOMEPAGE:
tika.apache.org
LANGUAGE:
Java
CATEGORY:
C: \ Development Tools \ Other Libraries
Apache Tika was developed as a low-level toolkit for searching content inside other files.

Tika doesn't do much on its own being a simple library, but it can be integrated in more powerful tools like search engines, digital asset management systems or CMSs to provide a fully-functional in-file search system.

The library can access just the file's header for quick overall file information, or it can go really deep and search even in the file's body for various types of data, in text or binary format.

A wide range of file types are supported and Tika can also be used with other programming languages thanks to a series of third-party bindings and wrappers.

Last updated on April 21st, 2015

Runs on: Windows / Linux / Mac OS / BSD / Solaris

feature list requirements

#metadata extraction #content analysis #content parsing #metadata #extract #analysis #meta

Add your review!

SUBMIT