c# - External Web page is not loading everytime in htmlagility pack -
i using htmlagilitypack scrape part of webpage. getting actual output not always.
htmlagilitypack.htmlweb web = new htmlweb(); web.useragent = "mozilla/5.0 (windows; u; windows nt 5.1; en-us; rv:1.8.0.4) gecko/20060508 firefox/1.5.0.4"; htmlagilitypack.htmldocument doc = web.load(url); var resultpricetable = doc.documentnode.selectnodes("//div[@class='resultsset']//table");
resultpricetable coming null in cases(nearly 50%).from debugging found
htmlagilitypack.htmldocument doc = web.load(url);
is causing issue. not loading url. how fix issue ?
thanks in advance.
try load page via webclient or httpwebrequest/httpwebresponse , send result htmlagilitypack
this code sample try download page 5 time if empty string or webexception
in production code don't skip exceptions, need handle (or @ least log it)
sample:
string html = string.empty; int tries = 5; while (tries > 0) { using (var client = new webclient()) { string url = "http://google.com/"; client.headers.add(httprequestheader.useragent, "mozilla/5.0 (windows; u; windows nt 5.1; en-us; rv:1.8.0.4) gecko/20060508 firefox/1.5.0.4"); try { html = client.downloadstring(url); tries--; if (!string.isnullorempty(html)) { break; } } catch (webexception) { tries--; } } } htmlagilitypack.htmldocument doc = new htmlagilitypack.htmldocument(); doc.loadhtml(html);
Comments
Post a Comment