regex - Using Perl to strip everything from a string except HTML Anchor Links -

April 15, 2010

using perl, how can use regex take string has random html in 1 html link anchor, this:

  <a href="http://example.com" target="_blank">whatever example</a>

and leave , rid of else? no matter inside href attribute <a, title=, or style=, or whatever. , leave anchor: "whatever example" , </a>?

you can take advantage of stream parser such html::tokeparser::simple:

#!/usr/bin/env perl  use strict; use warnings;  use html::tokeparser::simple;  $html = <<eo_html;  using perl, how can use regex take string has random html in 1 html link anchor, this:     <a href="http://example.com" target="_blank">whatever <i>interesting</i> example</a>         , leave , rid of else? no matter    inside href attribute <a, title=, or style=, or    whatever. , leave anchor: "whatever example" , </a>? eo_html  $parser = html::tokeparser::simple->new(string => $html);  while (my $tag = $parser->get_tag('a')) {     print $tag->as_is, $parser->get_text('/a'), "</a>\n"; }

output:

$ ./whatever.pl <a href="http://example.com" target="_blank">whatever interesting example</a>

Search This Blog

Lix

regex - Using Perl to strip everything from a string except HTML Anchor Links -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - How can I echo out this array? -

javascript - IE11 incompatibility with jQuery's 'readonly'? -