Tuesday, October 14, 2008

Screen scraping with ASP.NET/C#

Here is an excellent resource on how to screen scrap using ASP.NET/C#.

The author of the article has explained the steps along with code on how to screenscrap data from another website and integrate it into our own ASP.NET application.

The code to fetch a html page from which we would like to scrap info is given below:

// Open the requested URL  WebRequest req = WebRequest.Create(strURL);  // Get the stream from the returned web response  StreamReader stream = new StreamReader(req.GetResponse().GetResponseStream());  // Get the stream from the returned web response  System.Text.StringBuilder sb = new System.Text.StringBuilder(); string strLine; // Read the stream a line at a time and place each one  // into the stringbuilder  while( (strLine = stream.ReadLine()) != null ) {     // Ignore blank lines      if(strLine.Length > 0 )         sb.Append(strLine); } // Finished with the stream so close it now  stream.Close();  // Cache the streamed site now so it can be used  // without reconnecting later  m_strSite = sb.ToString();

We could then write our custom logic to extract that portion of the page which we desire to use. 

