· Renamed HTMLParseFilter into ParseFilter.
· Remove remaining robots/IP blocking code in lib-http.
· Port logging to slf4j.
· External parser supports encoding attribute.
· Ivy configuration settings don't include Gora.
· Injector should add the metadata before calling injectedScore.
· Port Nutch benchmark to Nutchbase.
· Add parse-html back.
· MoreIndexingFilter missing date format.
· Timeout for Parser.
· Retry interval in crawl date is set to 0.
· Generate log output for solr indexer and dedup.
· Improved NutchConfiguration.
· SolrDeleteDuplicates needs to clone the SolrRecord objects.
· Native hadoop libs not available through maven.
· Separate the build and runtime environments.
· This release includes several improvements including upgrades of several major components including Tika 1.1 and Hadoop 1.0.0, improvements to LinkRank and WebGraph elements as well as a number of new plugins covering blacklisting, filtering and parsing to name a few.
· Added Solr 4x (trunk) example schema.
· Added '/runtime' to svn ignore.
· Application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml.
· Fixed parse-tika and parse-html to use relative URL resolution per RFC-3986.
· Upgraded to Tika 0.10. NOTE: Tika's new RTF parser may ignore more text in malformed documents than previously - see TIKA-748 for details.
· Added Sonar targets to Ant build.xml.
· Upgraded SolrJ to version 3.4.0.
· Ant pmd target is broken.
· Upgraded Solr schema to version 1.4.
· This release includes several improvements (improved RSS parsing support, tighter integration with Apache Tika, external parsing support, improved language identification and an order of magnitude smaller source release tarball -- only about 2MB!).
· Make index-more plug-in configurable.
· Configurable file protocol parent directory crawling.
· Timeout for Parser.
· Website is still Lucene branded.
· Retry interval in crawl date is set to 0.
· Allow parsers to return multiple Parse objects.
· Removed redundant commons-logging jar from ontology plugin.
· Bug in SegmentReader causes infinite loop.
· Scoring filter should distribute score to all outlinks at once.
· Reduce number of warnings in nutch core.