web scraping - Getting non-object type randomly when traversing with php DOMDocument -
below code:
$xpath = new domxpath($doc); // start root element $query = '//div[contains(@class, "hudpagepad")]/div/ul/li/a'; $nodelist = @$xpath->query($query); // size 104 $size = $nodelist->length; ( $i = 1; $i <= $size; $i++ ) { $node = $nodelist->item($i-1); $url = $node->getattribute("href"); $error = scrapeurl($url); } function scrapeurl($url) { $cfm = new domdocument(); $cfm->loadhtmlfile($url); $cfmpath = new domxpath($cfm); $pointer = $cfm->getelementbyid('content-area'); $filter = 'table/tr'; // problem lies here $state = $pointer->firstchild->nextsibling->nextsibling->nodevalue; $nodelist = $cfmpath->query($filter, $pointer); }
basically traverses list of links , scrapes each link scrapeurl method.
i don't know problem here randomly non-object type error trying $pointer
, passes through without error , values correct.
anyone knows problem here? i'm guessing point when problem occurs when page not loaded properly?
i found idea of answer here:
http://sharovatov.wordpress.com/2009/11/01/php-loadhtmlfile-and-a-html-file-without-doctype/
it better use 'manual' query using getelementbyid coz breaks if doctype of document load not formed.
so use instead:
$cfmpath->query("//*[@id='content-area']")
or create method
function getelementbyid($id) { global $dom; $xpath = new domxpath($dom); return $xpath->query("//*[@id='$id']")->item(0); }
thank attempted help!
Comments
Post a Comment