Tuesday, June 7, 2011

Web scraping with easy example

Web scraping (also called Web harvesting or Web data extraction) is a computer software technique of extracting information from websites.
Web scraping is the process of automatically collecting Web information..
We can collect other website content automatically without access of those websites by web scraping script with PHP Regular Expression.

Here is one example:

Suppose We need to collect information from http://www.bdnews24.com/bangla/

Suppose We need the top or latest news portion with Image.. And show into my website…
$html = file_get_contents("http://www.bdnews24.com/bangla/");

preg_match_all(

'/.*?<div id="thdivbox">.*?<a href="(.*?)" >(.*?)<\/a>.*?<\/div>.*?/s',

$html,

$posts,

PREG_SET_ORDER

);

$link=$posts[0][1];

$heading=$posts[0][2];

 

Now you can show or store the heading text with target url.

Again,

 
$html = file_get_contents("http://www.bdnews24.com/bangla/");

preg_match_all(

'/.*?<div align="left" >.*?<a href="(.*?)" >(.*?)<\/a>.*?<\/div>.*?/s',

$html,

$posts,

PREG_SET_ORDER

);

$text=$posts[0][2];

Now You can show or store the description text.

No comments:

Post a Comment