正则表达式:爬虫检索数据,使用JavaScript的exec、Java的find、Python的findall

本文通过实例展示了如何使用JavaScript、Java和Python三种语言的正则表达式来解析特定格式的目标名称,包括代码实现及关键步骤说明。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目标

爬虫检索数据:
There is the first target: RipperAaron, with another target: GZhao. Then we have the third target: RipperLOM.
中的target: 之后的内容,RipperAaron、GZhao、RipperLOM

一、JavaScript的exec

JavaScript代码

<script>
    var str = "There is the first target: RipperAaron, with another target: GZhao. ";
    str += "Then we have the third target: RipperLOM. ";
    var reg = /(target:\s)(\w+)\b/g;
    var result;
    var targets = [];
    while (result = reg.exec(str)){
        console.log(result);
        console.log("匹配所有值\t" + result[0]);
        console.log("匹配第一括号\t" + result[1]);
        console.log("匹配第二括号\t" + result[2]);
        targets.push(result[2])
        console.log("匹配出现位置\t" + result["index"]);
    }
    console.log(targets)
</script>

获得最终结果[“RipperAaron”, “GZhao”, “RipperLOM”]

(3) ["target: RipperAaron", "target: ", "RipperAaron", index: 19, input: "There is the first target: RipperAaron, with anoth…: GZhao. Then we have the third target: RipperLOM", groups: undefined]
匹配所有值	target: RipperAaron
匹配第一括号	target: 
匹配第二括号	RipperAaron
匹配出现位置	19
(3) ["target: GZhao", "target: ", "GZhao", index: 53, input: "There is the first target: RipperAaron, with anoth…: GZhao. Then we have the third target: RipperLOM", groups: undefined]
匹配所有值	target: GZhao
匹配第一括号	target: 
匹配第二括号	GZhao
匹配出现位置	53
(3) ["target: RipperLOM", "target: ", "RipperLOM", index: 91, input: "There is the first target: RipperAaron, with anoth…: GZhao. Then we have the third target: RipperLOM", groups: undefined]
匹配所有值	target: RipperLOM
匹配第一括号	target: 
匹配第二括号	RipperLOM
匹配出现位置	91
(3) ["RipperAaron", "GZhao", "RipperLOM"]

二、Java的find

Java代码

String input = "There is the first target: RipperAaron, with another target: GZhao. "
input += "Then we have the third target: RipperLOM. ";
String regex = "(target:\\s)(\\w+)\\b";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
List<String> targets = new ArrayList<String>();
while (m.find()){
	System.out.println(m.group(0));
	System.out.println(m.group(1));
	System.out.println(m.group(2));
	targets.add(m.group(2));
	System.out.println("start:"+m.start()+",end:"+m.end());
}
System.out.println(targets);

获得最终结果[“RipperAaron”, “GZhao”, “RipperLOM”]

target: RipperAaron
target: 
RipperAaron
start:19,end:38
target: GZhao
target: 
GZhao
start:53,end:66
target: RipperLOM
target: 
RipperLOM
start:91,end:108
[RipperAaron, GZhao, RipperLOM]

二、Java的find

Java代码

String input = "There is the first target: RipperAaron, with another target: GZhao. "
input += "Then we have the third target: RipperLOM. ";
String regex = "(target:\\s)(\\w+)\\b";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
List<String> targets = new ArrayList<String>();
while (m.find()){
	System.out.println(m.group(0));
	System.out.println(m.group(1));
	System.out.println(m.group(2));
	targets.add(m.group(2));
	System.out.println("start:"+m.start()+",end:"+m.end());
}
System.out.println(targets);

获得最终结果[“RipperAaron”, “GZhao”, “RipperLOM”]

target: RipperAaron
target: 
RipperAaron
start:19,end:38
target: GZhao
target: 
GZhao
start:53,end:66
target: RipperLOM
target: 
RipperLOM
start:91,end:108
[RipperAaron, GZhao, RipperLOM]

二、Python的findall

Python代码

import re
input1 = "There is the first target: RipperAaron, with another target: GZhao. "
input1 += "Then we have the third target: RipperLOM. "
regex = "(target:\s)(\w+)\\b"
pattern= re.compile(regex)
findall = pattern.findall(input1)
print(findall)
targets = [find[1] for find in findall]
print(targets)

获得最终结果[“RipperAaron”, “GZhao”, “RipperLOM”]

[('target: ', 'RipperAaron'), ('target: ', 'GZhao'), ('target: ', 'RipperLOM')]
['RipperAaron', 'GZhao', 'RipperLOM']

相关阅读:JavaScript、Java、Python中正则调用的函数和方法

正则表达式:JavaScript、Java、Python基础语法

详见
https://blog.youkuaiyun.com/weixin_43473435/article/details/83831719

Python:正则语法,re模块的match()、search()、findall()和sub()、split()

详见
https://blog.youkuaiyun.com/weixin_43473435/article/details/83830707

Java:正则语法,lookingAt()、matches()、find()

详见
https://blog.youkuaiyun.com/weixin_43473435/article/details/83793500

JavaScript:正则语法,reg.test()、reg.exec()、str.search()和str.replace()

详见
https://blog.youkuaiyun.com/weixin_43473435/article/details/83756259

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值