Learning Perl: 8.7. General Quantifiers

Previous Page
Next Page

 

8.7. General Quantifiers

A quantifier in a pattern means to repeat the preceding item a certain number of times. You've seen three quantifiers: *, +, and ?. But if none of those three suits your needs, use a comma-separated pair of numbers inside curly braces ({ }) to specify how few and how many repetitions are allowed.

The pattern /a{5,15}/ will match from five to fifteen repetitions of the letter a. If the a appears three times, that's too few, so it won't match. If it appears five times, it's a match. If it appears ten times, that's still a match. If it appears twenty times, the first fifteen will match since that's the upper limit.

If you omit the second number (but include the comma), there's no upper limit to the number of times the item will match. So, /(fred){3,}/ will match if there are three or more instances of fred in a row (with no extra characters, like spaces, allowed between each fred and the next). There's no upper limit, so that would match 88 instances of fred, if you had a string with that many.

If you omit the comma as well as the upper bound, the number given is an exact count: //w{8}/ will match exactly eight word characters (occurring as part of a larger string, perhaps). And /,{5}chameleon/ matches "comma comma comma comma comma chameleon." By George, that is nice.

The three quantifier characters that you saw earlier are common shortcuts. The star is the same as the quantifier {0,}, meaning zero or more. The plus is the same as {1,}, meaning one or more. And the question mark could be written as {0,1}. In practice, it's unusual to need any curly-brace quantifiers since the three shortcut characters are nearly always the only ones needed.

Previous Page
Next Page
import json import random from faker import Faker def generate_thermal_dataset(output_path="thermal_dataset.json"): """生成智能家居温度调控指令数据集""" fake = Faker('zh_CN') # 定义数据生成模板组件‌:ml-citation{ref="1" data="citationList"} base_phrases = ["有点", "非常", "特别", "实在", "真的", "确实", "越来越", "稍微", "极其"] heat_verbs = ["热","炎热","火热","酷热","炽热","闷热","灼热","滚烫","炙热","沸热","炙烤","加热","升温","燥热","湿热","高温","暖热"] quantifiers = ["了", "啦", "啊", "呢", ""] dataset = [] for _ in range(2000): # 构造自然语言输入‌:ml-citation{ref="4" data="citationList"} phrase = random.choice(base_phrases) verb = random.choice(heat_verbs) quantifier = random.choice(quantifiers) input_text = f"我{phrase}{verb}{quantifier}" # 生成带逻辑的JSON输出‌:ml-citation{ref="3,5" data="citationList"} output_json = { "Device": "AirConditioner", "Status": "on" } # 构建完整数据项‌:ml-citation{ref="1,6" data="citationList"} dataset.append({ "instruction": "你是一个物联网数据处理专家,需要把自然语言指令转换为JSON结构化数据", "input": input_text, "output": json.dumps(output_json, ensure_ascii=False), }) # 保存数据集‌:ml-citation{ref="2" data="citationList"} with open(output_path, "w", encoding="utf-8") as f: json.dump(dataset, f, indent=2, ensure_ascii=False) print(f"数据集已生成:{output_path}") if __name__ == "__main__": generate_thermal_dataset()
最新发布
07-05
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值