It builds on Lucene Java, adding new web-specifics, such as parsers for HTML, a crawler, a link-graph database and other document formats.
What's New in This Release: [ read full changelog ]
· Added Solr 4x (trunk) example schema.
· Added '/runtime' to svn ignore.
· Application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml.
· Fixed parse-tika and parse-html to use relative URL resolution per RFC-3986.
· Upgraded to Tika 0.10. NOTE: Tika's new RTF parser may ignore more text in malformed documents than previously - see TIKA-748 for details.
· Added Sonar targets to Ant build.xml.
· Upgraded SolrJ to version 3.4.0.
· Ant pmd target is broken.
· Upgraded Solr schema to version 1.4.