java - Jsoup.clean without adding html entities -


i'm cleaning text unwanted html tags (such <script>) using

string clean = jsoup.clean(someinput, whitelist.basicwithimages()); 

the problem replaces instance å &aring; (which causes troubles me since it's not "pure xml").

for example

jsoup.clean("hello å <script></script> world", whitelist.basicwithimages()) 

yields

"hello &aring;  world" 

but like

"hello å  world" 

is there simple way achieve this? (i.e. simpler converting &aring; å in result.)

you can configure jsoup's escaping mode: using escapemode.xhtml give output w/o entities.

here's complete snippet accepts str input, , cleans using whitelist.simpletext():

// parse str document document doc = jsoup.parse(str);  // clean document. doc = new cleaner(whitelist.simpletext()).clean(doc);  // adjust escape mode doc.outputsettings().escapemode(escapemode.xhtml);  // string of body. str = doc.body().html(); 

Comments

Popular posts from this blog

Email notification in google apps script -

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -