regex - Using wildcards with sed -
i have log file has embedded xml amongst normal stdout in follows:
2015-05-06 04:07:37.386 [info]process:102 - application submitted ==== 1 <application><firstname>test</firstname><studentssn>123456789</studentssn><address>123 test street</address><parentssn>123456780</parentssn><applicationid>2</applicationid></application> 2015-05-06 04:07:39.386 [info] process:103 - application completed ==== 1 2015-05-06 04:07:37.386 [info]process:104 - application submitted ==== 1 <application><firstname>test2</firstname><studentssn>323456789</studentssn><address>234 test street</address><parentssn>123456780</parentssn><applicationid>2</applicationid></application> 2015-05-06 04:07:39.386 [info] process:105 - application completed ==== 1
which parsing per solution provided me in parsing , manipulating log file embedded xml . per post there, use .sed file commands follows:
s|<firstname>[^<]*</firstname>|<firstname>***</firstname>| s|<studentssn>[^<]*</studentssn>|<studentssn>***</studentssn>| s|<address>[^<]*</address>|<address>***</address>| s|<parentssn>[^<]*</parentssn>|<parentssn>***</parentssn>|
my question is, there way wild card match in foo.sed file have above? example, if wanted match *ssn tags , replace **, rather have 1 line studentssn , parentssn , still yield output below:
2015-05-06 04:07:37.386 [info]process:102 - application submitted ==== 1 <application><firstname>***</firstname><studentssn>***</studentssn><address>*******</address><parentssn>*********</parentssn> <applicationid>2</applicationid></application> 2015-05-06 04:07:39.386 [info] process:103 - application completed ==== 1 2015-05-06 04:07:37.386 [info]process:104 - application submitted ==== 1 <application><firstname>***</firstname><studentssn>*********</studentssn><address>*****</address><parentssn>*********</parentssn> <applicationid>2</applicationid></application> 2015-05-06 04:07:39.386 [info] process:105 - application completed ==== 1
thank in advance
choroba's helpful answer works gnu sed
, because using \|
alternation in basic regular expression (implied absence of -r
option) supported there.
also, op has since expressed desire use patterns match similar element names.
here's solution makes uses of extended regular expressions, should work on both linux (gnu sed) , bsd/osx platforms (bsd sed):
sed -e 's%<([^>]*name|[^>]*ssn|address[^>]*)>[^<]*%<\1>***%g' file
note:
- it import match variable parts of element names
[^>]*
rather.*
ensure matches remain confined opening tag. - bsd/osx extended regular expressions (in accordance posix extended regular expressions) not support backreferences inside regular expression itself (as opposed "backreferences" refer capture-group matches in replacement string), no attempt made match closing tag one.
- while command works on stated platforms, not posix-compliant, because posix mandates support basic regular expressions in sed.
the above command equivalent of following gnu sed command using basic regular expression - note need escape (
, )
, , |
:
sed 's%<\([^>]*name\|[^>]*ssn\|address[^>]*\)>[^<]*%<\1>***%g' file
note, use of alternation (\|
) makes command not portable, because posix basic regular expressions not support it.
Comments
Post a Comment