python - Matching terms that contain special characters with re.findall()? -
using re.findall(), i'm attempting find occurrences of each term list of terms, in string.
if particular term contains special characters (i.e. a '+'), match not found, or error messages may generated. using re.escape(), error messages avoided, terms special characters not found within string.
import re my_list = ['java', 'c++', 'c#', '.net'] my_string = ' python javascript c++ c++ c# .net java .net' matches = [] term in my_list: if any(x in term x in ['+', '#', '.']): term = re.escape(term) print "\nlooking term '%s'" % term match = re.findall("\\b" + term + "\\b", my_string, flags = re.ignorecase) matches.append(match) the above code find 'java' within string. suggestions regarding, how find terms special characters within string?
caveat: cannot change 'my_list' manually, because don't know in advance terms contain.
update - appears problem has word boundary specifiers within regex (the "\b") breaking string along characters include non-alphanumeric chars included in string. it's unclear how solve in clean , straightforward way, however.
edit - question not duplicate of this - incorporates applicable solution post.
import re my_list = ['java', 'c++', 'c#', '.net'] my_string = ' python javascript c++ c++ c# .net java .net' matches = [] term in my_list: if any(x in term x in ['+', '#', '.']): term = re.escape(term) print "\nlooking term '%s'" % term match = re.findall(r"(?:^|(?<=\s))"+term+r"(?=\s|$)", my_string, flags = re.ignorecase) matches.append(match) try this.the problem \b word boundary.in c++ there no word boundary after +.so not match.similarly others.
Comments
Post a Comment