文本中按规则分组区段随机抽样

该博客讨论如何从CSV文件中按特定规则随机抽样,要求在每个12小时时间段内抽取唯一位置,且位置之间不能在6小时内重叠。博主提出的问题涉及日期跨越和不重复采样,解决方案建议使用SPL编程语言,通过分组、过滤和随机选取实现所需功能。最终,抽样结果将写入新的文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

【问题】

This is a bit complex, and I greatly appreciate any help! I am trying to randomly sample rows from a .csv file. Essentially, I want a resulting file of unique locations (Locations are specified by Easting and Northing columns of the data file, below). I want to randomly pull 1 location per 12 hour period per SessionDate in this file (12 hour periods divided into: between 0631 and 1829 hours and between 1830 and 0630 hours; Given as Start: and End: in Data File, below); BUT if any 2 locations are within 6 hours of each other (based on their Start: time), for that location to be tossed, and a new location to be randomly drawn, and for this sampling to continue until no new locations are drawn (i.e., sampling WITHOUT replacement). I have been trying to do this with python, but my experience is very limited. I tried first putting each row into a dictionary, and recently each row into a list, as follows:

import random
import csv
f = open('file.csv', "U")
list = []
for line in f:
    list.append(line.split(','))

I'm unsure where to go from here - how to sample from these lists the way I need to, then write them to an output file with my'unique' locations.

Here is the top few lines of my data file:

SessionDate Start:  End:    Easting Northing
27-04-07   18:00   21:45   174739  9785206
28-04-07   18:00   21:30   171984  9784738
28-04-07   18:00   21:30   171984  9784738
28-04-07   18:00   21:30   171984  9784738
28-04-07   18:00   21:30   171984  9784738

It gets a bit complicated as some of the observations span midnight, so they may be on different dates, but can be within 6 hours of each other (which is why I have this criterion), for example:

SessionDate Start:  End:    Easting Northing
27-04-07   22:30   23:25   171984  9784738
28-04-07   0:25    1:30    174739  9785206

【回答】

这类问题建议用SPL,可以简单且完整地实现设计思路:

ABCD
1=file("D:\\source.txt").import@t(SessionDate: date:"dd-MM-yy",Start:time:"HH:mm",End:time:"HH:mm",Easting,Northing).derive(datetime(SessionDate,Start):DateStart)
2for A1.group(Easting,Northing)=null=A2.sort(DateStart).group@o(between(Start,time("6:30","HH:mm"):time("18:30","HH:mm")))
3for C2=if(B2,B3.select(interval@s(B3.DateStart,DateStart)>60*60*6),B3)
4if C3!=[]=@|(B2=C3(rand(C3.len())+1))
5=file("D:\\result.txt").export@t(D4)

A1:读入文件,第一行作为列名,并设置SessionDate、Start和End的数据类型,最后将SessionDate和Start拼成完整日期DateStart。

A2:将A1按Easting,Northing分组,循环每一组。循环体即缩进的B2-D4,可用A2来引用循环变量。类似地,B3的循环体是C3-D4。

B2:暂存上一组选出记录,初值为null。

C2:将A2再按DateStart每隔12小时交替分组。@o表示不排序直接分组。

B3:循环C3每组数据。

C3:第1组(B2==null)数据直接返回,否则过滤出与B2间隔大于6小时的记录。

C4-D4:如果C3过滤的结果不空,则随机取记录存入B2,并将该记录追加到D4。@表示当前格(D4)。

D4结果为:

A5:将存在D4中的计算结果写出到文件result.txt。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值