|
|
|
User Rating: Rated by: |
Fair (2.8/5) 18 user(s) |
|
|
|
webbot description |
|
|
webbot - A Java-based "web browser" that extract all links from a web-page, and display them. WEBBOT Java-based browser with download and PERL regular expressions. The function will extract all links from a web-page, and display them. The resulting documents can be downloaded.
WEBBOT(URL) URL is a string indicating the base page address; the url must link to an html file. The function lists all links in the file. URL can also be a cell vector of url-strings.
WEBBOT(URL, WHAT) displays only specific links. WHAT is a string: 'all_links': displays all links (default). 'page_links': displays all links to an html web page*. 'local_links': displays all local links on the server*. 'external_links': displays all links to external websites. 'image_links': displays all links to an image file**. 'image_tags': displays all image tags . '.xxx.yyyy.zz': displays all links to each specific .xxx files; the case is ignored ('zip' will find 'ZiP'); e.g. '.zip.gz.gzip.tar.Z'.
WEBBOT(URL, WHAT, ACT) performs an action on found links. ACT is a string: 'noaction': just display links (default) 'download': downloads all links found locally. 'cartoons': downloads all image tags found on linked pages. This is usefull for cartoons websites where each cartoon (e.g. "01.gif") is on its own html page (e.g. "c01.html").
'follow.x': follows links to html pages and recursively performs the same action on the resulting page. 'x' is an integer indicating the ecursivity depth (0 is equivalent to 'noaction').
lks = WEBBOT(URL, ...) returns an cell-array with links of URL{end}.
Notes: * Links explicitely pointing to a .htm or .html url. ** Image links are recognized by the following file types: .jpg .jpeg .gif .pict .bmp .tif .tiff .ras .png (.giff) Requirements:
· MATLAB Release: R13
|
|