More concise/efficient regex to match a string within matching parentheses -
i match strings, foo
appears within select([...])
, if possibly occurring parentheses match together. e.g. match select(((foo)))
or select(x(())(foo(x))()x((y)x)x())
not select((foo)
or select(x(foo)y()
.
i know, have limit maximum number of nested parentheses , came following regular expression solve problem 1 additional pair of parentheses:
select\((?: (?:[^()]*|[^()]*\([^()]*\)[^()]*)* foo (?:[^()]*|[^()]*\([^()]*\)[^()]*)* | (?:[^()]*|[^()]*\([^()]*\)[^()]*)* \([^()]*foo[^()]*\) (?:[^()]*|[^()]*\([^()]*\)[^()]*)* )\)
that means within select([...])
either match foo
no or 1 pair of parentheses in front or behind or match foo
within 1 pair of parentheses , no or 1 pair of parentheses in front or behind.
does have neater solution this?
expanding regex solve problem 2 additional pair of parentheses this:
select\((?: (?:[^()]*|[^()]*\((?:[^()]*|[^()]*\([^()]*\)[^()]*)*\)[^()]*)* foo (?:[^()]*|[^()]*\((?:[^()]*|[^()]*\([^()]*\)[^()]*)*\)[^()]*)* | (?:[^()]*|[^()]*\((?:[^()]*|[^()]*\([^()]*\)[^()]*)*\)[^()]*)* \((?: (?:[^()]*|[^()]*\([^()]*\)[^()]*)* foo (?:[^()]*|[^()]*\([^()]*\)[^()]*)* | (?:[^()]*|[^()]*\([^()]*\)[^()]*)* \([^()]*foo[^()]*\) (?:[^()]*|[^()]*\([^()]*\)[^()]*)* )\) (?:[^()]*|[^()]*\((?:[^()]*|[^()]*\([^()]*\)[^()]*)*\)[^()]*)* )\)
whereby indented part previous regex , no or 1 pair of parentheses
parts have been expanded no or 1 or 2 pair of parentheses
.
i put last regex on regex101: https://www.regex101.com/r/fj6cr4/1
the problem regex (and more further expanded versions) quite time-consuming, i'm hoping better ideas.
there 2 things should simplify (and speed up) regex:
(?: [^()]* | [^()]*\([^()]*\)[^()]* )*
example of catastrophic backtracking. outer, repeated group should have 2 alternatives: sequence of non-parenthesised characters or such sequence between parenthesis:(?: [^()]+ | \([^()]*\) )*
you mixing non-parenthesised characters
[^()]*
both alternatives.instead of doing
…foo…|…\(foo\)…
, better should…(?:foo|\(foo\))…
don't have repeat lengthy…
thing.
with two, smaller expression becomes
select\( (?: [^()]+ | \([^()]*\) )* (?: foo | \([^()]*foo[^()]*\) ) (?: [^()]+ | \([^()]*\) )* \)
i'll leave applying these onto larger expression you.
Comments
Post a Comment