ruby - Nokogiri and xpath for extracting table data -
i'm bit of newbie , i'm trying scrape data table, not having luck using xpath. can first field need, then... nothing.
the table structure each row follows:
<tr bgcolor="#fff7e7"> <td valign="top"><font color="#8c4510"> <span id="datagrid1__ctl3_label2">index</span> </font></td> <td><font color="#8c4510"><a href="javascript:__dopostback('datagrid1$_ctl3$_ctl0','')"><font color="#8c4510">title</font></a></font></td> <td><font color="#8c4510"><a href="javascript:__dopostback('datagrid1$_ctl3$_ctl2','')"><font color="#8c4510">people</font></a></font></td> <td valign="top"><font color="#8c4510">date</font></td><td><font color="#8c4510"><a href="javascript:__dopostback('datagrid1$_ctl3$_ctl4','')"> <font color="#8c4510">text</font></a></font></td> <td><font color="#8c4510"><a href="javascript:__dopostback('datagrid1$_ctl3$_ctl6','')"><font color="#8c4510">outcome</font></a></font></td> <td valign="top"> <font color="#8c4510"><a href="javascript:__dopostback('datagrid1$_ctl3$_ctl8','')"><font color="#8c4510">click link more</font></a></font></td> </tr>
i'm trying extract index, title, people, text, outcome fields link. i'm managing extract index, can't seem rest.
in ruby code, call getting table seems working, loop i'm extracting fields each row of table not, apart index.
any great.
with excerpt gave there, can extract text , links following xpath query:
require 'rubygems' require 'nokogiri' f = file.open('test.html') doc = nokogiri::html(f) doc.xpath("//tr//td//a").each |node| puts "#{node.text().strip()}: #{node.attribute('href')}" end f.close
however, not seeing other rows in table, not sure whether of rest.
Comments
Post a Comment