regex - python re.sub with a list of words to find -
i not familiar re trying iterate on list , use re.sub
take out multiple items large block of text held in variable first_word
.
i use re.sub
remove tags first , works fine, next want remove strings in exclusionlist
variable , not sure how this.
thanks help, here code raises exception.
exclusionlist = ['+','of','<et>f.','to','the','<l>l.</l>'] in range(0, len(exclusionlist)): first_word = re.sub(exclusionlist[a], '',first_word)
and exception :
first_word = re.sub(exclusionlist[a], '',first_word) file "/library/frameworks/python.framework/versions/2.7/lib/python2.7/re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) file "/library/frameworks/python.framework/versions/2.7/lib/python2.7/re.py", line 245, in _compile raise error, v # invalid expression error: nothing repeat
the plus symbol operator in regex meaning 'one or more repetitions of preceding'. e.g., x+
means 1 or more repetitions of x
. if want find , replace actual +
signs, need escape this: re.sub('\+', '', string)
. change first entry in exclusionlist.
you can eliminate loop, this:
exclusions = '|'.join(exclusionlist) first_word = re.sub(exclusions, '', first_word)
the pipe symbol |
indicates disjunction in regex, x|y|z
matches x or y or z.
Comments
Post a Comment