Written in PHP, it can differentiate from normal navigation links and static data like files, image sources and other multimedia files usually found embedded in a web page.
A Python version is also available.
Here are some key features of "PHP Web Crawler":
· The crawler can run in multiple instances
· Can run as a cron job
· All crawls are saved in a MySQL database.
· It generates the table “urls” to store the crawls
· For each saved URL it also saves the source URL, destination URL and anchor text
· Validates the URLs via a regular expression
Requirements:
· PHP