Excel 工作表做 JOIN

在合并两个Excel工作表时,需求是如果单个位置坐标位于范围坐标内,则将它们合并到同一行。讨论了使用SQL和VBA的解决方案,但建议使用SPL进行更简洁的操作,能快速实现对数千行数据的处理,并生成期望的输出结果。
部署运行你感兴趣的模型镜像

【问题】

I’m working on combining two excel worksheets. Before I start, I’d like to mention that I also have mysql workbench, so I’m open to working on this issue in either sql or vba (I should learn both). I’m working with .bed files, which are lists of genomic coordinates. In short, the data is indexed by chromosome number (ie:chr2) and then has a numerical start and stop location on the chromosome. These numerical locations can span a large range (ie:100-10,000) or be a single position (ie: 999-1000). I have a list of coordinates that cover a large range, and in a separate file I have a list of single positions.

Example of a file with ranges:

chromosome    start   stop
chr1          45616321
chr3          984211253

Example of file with single positions:

chromosome   start   stop
chr1          52135214
chr3          1025410255

I would like to combine these worksheets such that if a location in my list of single positions is found within the range in my list of ranges, the locations for both are listed in the same row. The lists are 1000s of locations long, so I’d also like this program to loop through every row. Using the example data listed above, I’d like my output to look like the following:

Example of desired output:

chromosome  start   stop  chromosome  start  stop
chr1         45616321    chr1      52135214
chr3         984211253    chr3     1025410255

There is a high probability that multiple single positions will fall within a single range, and I would like these to be listed as separate rows.

有人给出 SQL: 正确,只是要出库入库,稍显麻烦。

select a.chromosome, a.start, a stop, b.chromosome, b.start, b.stop from ranges_table a, positions_table b where b.start >= a.start and b.stop <= a.stop

有人给出 VBA:正确,SQL+ 出库入库代码。

SubSqlJoin()
Dim oConnAsNew ADODB.Connection
Dim oRSAsNew ADODB.Recordset
Dim sPath
Dim sSQLAsString, wbAsWorkbook
Set wb =ThisWorkbook
    sSQL ="select a.chromosome, a.start, a stop," & _
"b.chromosome, b.start, b.stop" & _
"from <ranges_table> a, <positions_table> b" & _
"where b.start >= a.start and b.stop <= a.stop"
    sSQL =Replace(sSQL,“<ranges_table>”, _
Rangename(wb.Worksheets(“Ranges”).Range(“A1”).CurrentRegion))
    sSQL =Replace(sSQL,“<positions_table>”, _
Rangename(wb.Worksheets(“Positions”).Range(“A1”).CurrentRegion))
If wb.Path <>""Then
      sPath = wb.FullName
Else
MsgBox"The workbook must be saved first!"
ExitSub
EndIf
    oConn.Open"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=‘" & sPath &"’;" & _
“Extended Properties=‘Excel 12.0;HDR=Yes;IMEX=1’;”
    oRS.Open sSQL, oConn
IfNot oRS.EOFThen
        wb.Worksheets(“Results”).Range(“A2”).CopyFromRecordset oRS
Else
MsgBox"No records found"
EndIf
    oRS.Close
    oConn.Close
EndSub
FunctionRangename(rAsRange)AsString
Rangename =“[” & r.Parent.Name &“$” & _
               r.Address(False,False) &“]”
EndFunction

【回答】

用 VBA 太麻烦了,用 SPL 几句就算出来了

A
1=file(“D:\\range.xlsx”).xlsimport@t()
2=file(“D:\\position.xlsx”).xlsimport@t()
3=xjoin(A1;A2,start>A1.start && stop<A1.stop)
4=file(“D:\\result.xlsx”).xlsexport(A3,#1.chromosome,#1.start,#1.stop,#1.chromosome,#2.start,#2.stop)

运行结果:

A1:读取 excel 文件返回序表

A2:读取 excel 文件返回序表

A3:将两个序列进行叉乘

A4:将序表输出到新 excel 文件中

您可能感兴趣的与本文相关的镜像

Stable-Diffusion-3.5

Stable-Diffusion-3.5

图片生成
Stable-Diffusion

Stable Diffusion 3.5 (SD 3.5) 是由 Stability AI 推出的新一代文本到图像生成模型,相比 3.0 版本,它提升了图像质量、运行速度和硬件效率

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值