I was in need of a parsing and extracting all the anchor tags within an HTML file. First I tried using some string manipulation technique but that was a mess!! Then i tried to use regular expression to achive the same, but it since I am not good at regular expressions (not even bad :) ), it gave me some really hard time. But like always web was there so save me, and by combining my search and programming expertise atlast I was able to write a piece of code that can extract all anchor "<a>" tags with css class from an html file...
The code is give below, which first reads a webpage and save it's HTML in a string variable.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://enggwaqas.spaces.live.com");
The code is give below, which first reads a webpage and save it's HTML in a string variable.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://enggwaqas.spaces.live.com");
try
{
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
string szResult = sr.ReadToEnd();
sr.Close();
string pattern= @"<a.*?href=[""'](?<url>.*?)[""'] ?(class=[""']linkClass[""']).*?>(?<name>.*?)</a>";{
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
string szResult = sr.ReadToEnd();
sr.Close();
MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Singleline | RegexOptions.IgnoreCase);
foreach(Match m in matches)
Console.WriteLine(m.Value);
}
catch(Exception e){}
It will not extract all the anchor tags but those with cssClass set to 'linkClass', why? Because I write the code this way :)
No comments:
Post a Comment