python - Matching terms that contain special characters with re.findall()? -


using re.findall(), i'm attempting find occurrences of each term list of terms, in string.

if particular term contains special characters (i.e. a '+'), match not found, or error messages may generated. using re.escape(), error messages avoided, terms special characters not found within string.

import re          my_list = ['java', 'c++', 'c#', '.net'] my_string = ' python javascript c++ c++ c# .net java .net' matches = []  term in my_list:     if any(x in term x in ['+', '#', '.']):         term = re.escape(term)      print "\nlooking term '%s'" % term      match = re.findall("\\b" + term + "\\b", my_string, flags = re.ignorecase)     matches.append(match) 

the above code find 'java' within string. suggestions regarding, how find terms special characters within string?

caveat: cannot change 'my_list' manually, because don't know in advance terms contain.

update - appears problem has word boundary specifiers within regex (the "\b") breaking string along characters include non-alphanumeric chars included in string. it's unclear how solve in clean , straightforward way, however.

edit - question not duplicate of this - incorporates applicable solution post.

import re my_list = ['java', 'c++', 'c#', '.net'] my_string = ' python javascript c++ c++ c# .net java .net' matches = []  term in my_list:     if any(x in term x in ['+', '#', '.']):         term = re.escape(term)      print "\nlooking term '%s'" % term     match = re.findall(r"(?:^|(?<=\s))"+term+r"(?=\s|$)", my_string, flags = re.ignorecase)     matches.append(match) 

try this.the problem \b word boundary.in c++ there no word boundary after +.so not match.similarly others.


Comments

Popular posts from this blog

Email notification in google apps script -

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -