clipped from: stackoverflow.com   

How to implement a web scraper in PHP?

clipped from: stackoverflow.com   

There is a Book "Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL" on this topic - see a review here

PHP-Architect covered it in a well written article in the December 2007 Issue by Matthew Turland

clipped from: stackoverflow.com   

Scraping generally encompasses 3 steps:


  • first you GET or POST your request to a specified URL

  • next you receive the html that is returned as the response

  • finally you parse out of that html the text you'd like to scrape.

  • My Favorite program for working with RegExs is Regex Buddy. I would advise you to try the demo of that product even if you have no intention of buying it. It is an invaluable tool and will even generate code for your regexs you make in your language of choice (including php).


    Usage:


    PHP Class: