java - Jsoup.clean without adding html entities -
i'm cleaning text unwanted html tags (such <script>) using
string clean = jsoup.clean(someinput, whitelist.basicwithimages()); the problem replaces instance å å (which causes troubles me since it's not "pure xml").
for example
jsoup.clean("hello å <script></script> world", whitelist.basicwithimages()) yields
"hello å world" but like
"hello å world" is there simple way achieve this? (i.e. simpler converting å å in result.)
you can configure jsoup's escaping mode: using escapemode.xhtml give output w/o entities.
here's complete snippet accepts str input, , cleans using whitelist.simpletext():
// parse str document document doc = jsoup.parse(str); // clean document. doc = new cleaner(whitelist.simpletext()).clean(doc); // adjust escape mode doc.outputsettings().escapemode(escapemode.xhtml); // string of body. str = doc.body().html();
Comments
Post a Comment