regex - Using Perl to strip everything from a string except HTML Anchor Links -
using perl, how can use regex take string has random html in 1 html link anchor, this:
<a href="http://example.com" target="_blank">whatever example</a>
and leave , rid of else? no matter inside href attribute <a, title=
, or style=
, or whatever. , leave anchor: "whatever example" , </a>?
you can take advantage of stream parser such html::tokeparser::simple:
#!/usr/bin/env perl use strict; use warnings; use html::tokeparser::simple; $html = <<eo_html; using perl, how can use regex take string has random html in 1 html link anchor, this: <a href="http://example.com" target="_blank">whatever <i>interesting</i> example</a> , leave , rid of else? no matter inside href attribute <a, title=, or style=, or whatever. , leave anchor: "whatever example" , </a>? eo_html $parser = html::tokeparser::simple->new(string => $html); while (my $tag = $parser->get_tag('a')) { print $tag->as_is, $parser->get_text('/a'), "</a>\n"; }
output:
$ ./whatever.pl <a href="http://example.com" target="_blank">whatever interesting example</a>
Comments
Post a Comment