regex - Using wildcards with sed -


i have log file has embedded xml amongst normal stdout in follows:

2015-05-06 04:07:37.386 [info]process:102 - application submitted ==== 1 <application><firstname>test</firstname><studentssn>123456789</studentssn><address>123 test street</address><parentssn>123456780</parentssn><applicationid>2</applicationid></application> 2015-05-06 04:07:39.386 [info] process:103 - application completed ==== 1 2015-05-06 04:07:37.386 [info]process:104 - application submitted ==== 1 <application><firstname>test2</firstname><studentssn>323456789</studentssn><address>234 test street</address><parentssn>123456780</parentssn><applicationid>2</applicationid></application> 2015-05-06 04:07:39.386 [info] process:105 - application completed ==== 1 

which parsing per solution provided me in parsing , manipulating log file embedded xml . per post there, use .sed file commands follows:

s|<firstname>[^<]*</firstname>|<firstname>***</firstname>| s|<studentssn>[^<]*</studentssn>|<studentssn>***</studentssn>| s|<address>[^<]*</address>|<address>***</address>| s|<parentssn>[^<]*</parentssn>|<parentssn>***</parentssn>| 

my question is, there way wild card match in foo.sed file have above? example, if wanted match *ssn tags , replace **, rather have 1 line studentssn , parentssn , still yield output below:

2015-05-06 04:07:37.386 [info]process:102 - application submitted ==== 1 <application><firstname>***</firstname><studentssn>***</studentssn><address>*******</address><parentssn>*********</parentssn>   <applicationid>2</applicationid></application> 2015-05-06 04:07:39.386 [info] process:103 - application completed ==== 1 2015-05-06 04:07:37.386 [info]process:104 - application submitted ==== 1 <application><firstname>***</firstname><studentssn>*********</studentssn><address>*****</address><parentssn>*********</parentssn>   <applicationid>2</applicationid></application> 2015-05-06 04:07:39.386 [info] process:105 - application completed ==== 1 

thank in advance

choroba's helpful answer works gnu sed, because using \| alternation in basic regular expression (implied absence of -r option) supported there.

also, op has since expressed desire use patterns match similar element names.

here's solution makes uses of extended regular expressions, should work on both linux (gnu sed) , bsd/osx platforms (bsd sed):

sed -e 's%<([^>]*name|[^>]*ssn|address[^>]*)>[^<]*%<\1>***%g' file 

note:

  • it import match variable parts of element names [^>]* rather .* ensure matches remain confined opening tag.
  • bsd/osx extended regular expressions (in accordance posix extended regular expressions) not support backreferences inside regular expression itself (as opposed "backreferences" refer capture-group matches in replacement string), no attempt made match closing tag one.
  • while command works on stated platforms, not posix-compliant, because posix mandates support basic regular expressions in sed.

the above command equivalent of following gnu sed command using basic regular expression - note need escape (, ), , |:

sed  's%<\([^>]*name\|[^>]*ssn\|address[^>]*\)>[^<]*%<\1>***%g' file 

note, use of alternation (\|) makes command not portable, because posix basic regular expressions not support it.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -