Scrapy 0.24.4

A light Web crawling framework written in Python to help with screen-scraping procedures and the extraction of data from the Web
Scrappy is written 100% in Python and can be utilized for simple data mining, to page monitoring, Web search engines and even for code testing.

Scrapy is not a search engine in the true meaning of the word, but it acts like one (without the indexing part). Nevertheless Scrapy can be a great tool to build your search engine logic on.

The true power of this framework relies in its core's versatility, Scrapy being a system on which to build generic or dedicated search spiders (crawlers) on.

While this might sound very complicated to non-technical users, with a quick look over the documentation and available tutorials, it's pretty simple to see how Scrapy has managed to take out all the hard-work out of this and reduce the entire process to just a few lines of code (for easier, smaller crawlers).

Main features:

  • Works with HTML and XML
  • Support for HTTP authentication
  • Detects robots.txt files
  • Detects and crawls sitemaps
  • Crawl depth restrictions
  • Fast turnaround
  • Built-in caching DNS resolver
  • Logging
  • Extendable via plugins
  • Portable codebase
  • Cross-platform tested
  • Tested code
  • Documentation

last updated on:
August 20th, 2014, 15:23 GMT
license type:

BSD License

developed by:
Pablo Hoffman
operating system(s):
Windows / Linux / Mac OS / BSD / Solaris
C: \ Development Tools \ HTML and HTML5 Tools
Download Button

In a hurry? Add it to your Download Basket!

user rating 3



Rate it!
What's New in version 0.24.0
  • Add UTF8 encoding header to templates
  • Telnet console now binds to by default
  • Update debian/ubuntu install instructions
  • Disable smart strings in lxml XPath evaluations
read full changelog

Add your review!