web scraping - Getting non-object type randomly when traversing with php DOMDocument -


below code:

$xpath = new domxpath($doc); // start root element $query = '//div[contains(@class, "hudpagepad")]/div/ul/li/a'; $nodelist = @$xpath->query($query);  // size 104 $size = $nodelist->length;  ( $i = 1; $i <= $size; $i++ ) {     $node = $nodelist->item($i-1);     $url = $node->getattribute("href");      $error = scrapeurl($url); }  function scrapeurl($url) {     $cfm = new domdocument();     $cfm->loadhtmlfile($url);     $cfmpath = new domxpath($cfm);     $pointer = $cfm->getelementbyid('content-area');     $filter = 'table/tr';      // problem lies here         $state = $pointer->firstchild->nextsibling->nextsibling->nodevalue;      $nodelist = $cfmpath->query($filter, $pointer); } 

basically traverses list of links , scrapes each link scrapeurl method.

i don't know problem here randomly non-object type error trying $pointer , passes through without error , values correct.

anyone knows problem here? i'm guessing point when problem occurs when page not loaded properly?

i found idea of answer here:

http://sharovatov.wordpress.com/2009/11/01/php-loadhtmlfile-and-a-html-file-without-doctype/

it better use 'manual' query using getelementbyid coz breaks if doctype of document load not formed.

so use instead:

$cfmpath->query("//*[@id='content-area']")

or create method

function getelementbyid($id) {     global $dom;     $xpath = new domxpath($dom);     return $xpath->query("//*[@id='$id']")->item(0); } 

thank attempted help!


Comments

Popular posts from this blog

django - How can I change user group without delete record -

java - Need to add SOAP security token -

java - EclipseLink JPA Object is not a known entity type -