使用正则表达式 A液:
我知道d表示插入到thm中的字符串未被转义。 因此,在将其添加到正则表达式之前,您需要使用正则表达式语言中的 附加义务来跳过所有符号 - 这里是?和.。
我将?替换为[?]{1}和.并使用\.。结果正则表达式现在匹配测试字符串。
import re
thm = '/public_media/cache/84/b5/84b59e293cbdb7041b68a84977d62cf3.jpg?image_pk=82'
all_html_code = '''
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf '''
escaped_thm = thm.replace('.', '\.').replace('?','[?]{1}')
p = re.compile(r'' % escaped_thm)
test_img = ''''''
print p.match(test_img)
new_img_tag = ''
print p.sub(new_img_tag, all_html_code)
顺便说一句,你为什么要找?您可以直接替换的src属性:
escaped_thm = thm.replace('.', '\.').replace('?','[?]{1}')
p = re.compile(r'src="(%s)"' % escaped_thm)
replacement = '''src="/python/logo.jpg"'''
print p.sub(replacement, all_html_code)
输出1
lksj lksdfj ... lksdjf
输出2
lksj lksdfj ... lksdjf
询问逃脱正则表达式的符号有道之后(Regular expression to escape regular expressions) - 我可以回复修改re.escape而不是两个replace方法。
使用LXML
我宁愿像这样使用XPath。