Python:如何判断一个url是以http开头的?
有一个文本,里面存放了很多的字符串,有的是以http开头的,有些不是,如何过滤出url呢?
比如一个文本test.txt,里面的内容为:
http://www.sogou.com
this is a url
this is http://www.sogou.com address
第一种方式是,判断包含:
#encoding: utf-8
with open("test.txt", "r") as f:
content = f.readlines()
for line in content:
if "http" in line:
print(line)
输出为:
http://www.sogou.com
this is http://www.sogou.com address
如果只获取以http开头的,那么:
#encoding: utf-8
import re
with open("test.txt", "r") as f:
content = f.readlines()
for line in content:
r = re.match("http", line)
if r != None:
print(line)
输出为:
http://www.sogou.com
re.match, 从开头匹配字符串,如果匹配到返回匹配到的对象。没有匹配到返回None。
有没有更简单的方式呢?
#encoding: utf-8
with ope