regex - python re.sub with a list of words to find -


i not familiar re trying iterate on list , use re.sub take out multiple items large block of text held in variable first_word.

i use re.sub remove tags first , works fine, next want remove strings in exclusionlist variable , not sure how this.

thanks help, here code raises exception.

exclusionlist = ['+','of','<et>f.','to','the','<l>l.</l>']  in range(0, len(exclusionlist)):       first_word = re.sub(exclusionlist[a], '',first_word) 

and exception :

first_word = re.sub(exclusionlist[a], '',first_word)     file "/library/frameworks/python.framework/versions/2.7/lib/python2.7/re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count)     file "/library/frameworks/python.framework/versions/2.7/lib/python2.7/re.py", line 245, in _compile raise error, v # invalid expression error: nothing repeat 

the plus symbol operator in regex meaning 'one or more repetitions of preceding'. e.g., x+ means 1 or more repetitions of x. if want find , replace actual + signs, need escape this: re.sub('\+', '', string). change first entry in exclusionlist.

you can eliminate loop, this:

exclusions = '|'.join(exclusionlist) first_word = re.sub(exclusions, '', first_word) 

the pipe symbol | indicates disjunction in regex, x|y|z matches x or y or z.


Comments

Popular posts from this blog

django - How can I change user group without delete record -

java - Need to add SOAP security token -

java - EclipseLink JPA Object is not a known entity type -