node.js - how to extract html content using xpath using nodejs module -
i looking html content extractor using xpath, have seen various nodejs module
jsdom, htmlparser2, xpath, cheerio
i found cheerio better getting data using class, id, tags etc not able data specifying xpath , , using xpath nodejs module able data using xpath smaller html, longer html gives different type of error
entity not found: @#[line:120,col:9], unclosed xml attribute @#[line:1,col:877]
note: have no permission change html in way
e.g. if html
<html> <body> <div> <ul id="fruits"> <li class="apple">apple</li> <li class="orange">orange</li> <li class="pear">pear</li> </ul> </div> </body> </html>
if using , giving xpath //*[@id="fruits"]/li[2]
find element using xpath nodejs module, not getting error , got result orange using xpath nodejs module, if using html of page http://www.infotaxi.org/india_taxi/ahmedabad_taxi.htm
(which quite longer), , accessing part of text using xpath
//*[@id="navlistmeniu"]/li[3]/a/b,
i getting error
entity not found: @#[line:120,col:9]
using cheerio able extract data using class, id, tags etc. , not xpath
please help????
Comments
Post a Comment