node.js - how to extract html content using xpath using nodejs module -

February 15, 2011

i looking html content extractor using xpath, have seen various nodejs module

jsdom, htmlparser2, xpath, cheerio

i found cheerio better getting data using class, id, tags etc not able data specifying xpath , , using xpath nodejs module able data using xpath smaller html, longer html gives different type of error

entity not found: @#[line:120,col:9], unclosed xml attribute @#[line:1,col:877]

note: have no permission change html in way

e.g. if html

<html> <body>  <div>      <ul id="fruits">         <li class="apple">apple</li>         <li class="orange">orange</li>         <li class="pear">pear</li>     </ul>  </div>  </body>   </html>

if using , giving xpath //*[@id="fruits"]/li[2] find element using xpath nodejs module, not getting error , got result orange using xpath nodejs module, if using html of page http://www.infotaxi.org/india_taxi/ahmedabad_taxi.htm

(which quite longer), , accessing part of text using xpath

//*[@id="navlistmeniu"]/li[3]/a/b,

i getting error

entity not found: @#[line:120,col:9]

using cheerio able extract data using class, id, tags etc. , not xpath

please help????

Search This Blog

Lix

node.js - how to extract html content using xpath using nodejs module -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - How can I echo out this array? -

javascript - IE11 incompatibility with jQuery's 'readonly'? -