ld regression

本文介绍了一种使用R语言估算连锁不平衡(LD)随距离衰减的方法,基于Hill和Weir提出的公式。通过非线性模型拟合,估算出群体重组参数。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

source: http://www.r-bloggers.com/estimate-decay-of-linkage-disequilibrium-with-distance/


It is well known that linkage disequilibrium (LD) decays with distance. Several functions have been proposed to estimate such decay. Among the most widely used are the Hill and Weir (1) formula for describing the decay of r2 and a formula proposed by Abecasis (2) for describing the decay of D’.
I wrote R functions to estimate decay of LD according to both the formulas for a paper I recently published (3), but I post here only the one according to Hill and Weir (just because is the only one currently in a “publishable” form!). Please, refer to the original publications for details. Here I just use a non-linear model to fit the data do the decay function.

Input:
n: sample size
LD.data: estimates of LD as r2 between pair of markers
distance: the distance between pair of markers
(note that LD.data and distance must be in the same order and of the same length since they represent respectively the LD values and distance of any pair of markers considered)

Output:
HW.nonlinear: object obtained after fitting the non-linear model
new.rho: estimate of population recombination parameter (which is actually C/distance)
fpoints: points obtained fitting the linear model.

Below you find the commands, including some sample data. Any feedback is appreciated!

distance<-c(19,49,81,91,104,131,158,167,30)
LD.data<-c(0,0.07,0.018,0.007,0,0.09,0.09,0.05,0)
n<-52

HW.st<-c(C=0.1)
HW.nonlinear<-nls(LD.data~((10+C*distance)/((2+C*distance)*(11+C*distance)))*(1+((3+C*distance)*(12+12*C*distance+(C*distance)^2))/(n*(2+C*distance)*(11+C*distance))),start=HW.st,control=nls.control(maxiter=100))
tt<-summary(HW.nonlinear)
new.rho<-tt$parameters[1]
fpoints<-((10+new.rho*distance)/((2+new.rho*distance)*(11+new.rho*distance)))*(1+((3+new.rho*distance)*(12+12*new.rho*distance+(new.rho*distance)^2))/(n*(2+new.rho*distance)*(11+new.rho*distance)))

References:
(1) Hill WG, Weir BS (1988) Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol 33:54–78
(2) Abecasis GR et al (2001) Extent and distribution of linkage disequilibrium in three genomic regions. Am J Hum Genet 68:191–197
(3) Marroni et al (2011) Nucleotide diversity and linkage disequilibrium in Populus nigra cinnamyl alcohol dehydrogenase (CAD4) gene. Tree Genetics & Genomes, DOI 10.1007/s11295-011-0391-5.


### sklearn库中LogisticRegression函数的用法 `LogisticRegression` 是 scikit-learn 库中的一个重要分类算法实现,适用于二分类或多分类问题。它通过拟合数据到逻辑回归模型来预测目标变量的概率分布。 #### 参数解释 以下是 `LogisticRegression` 的一些重要参数及其作用: 1. **penalty**: 正则化项的选择,默认为 `'l2'` 表示 L2 正则化。也可以设置为 `'l1'` 或 `'elasticnet'`[^2]。 2. **dual**: 是否采用对偶形式求解优化问题,默认为 `False`。当样本数量大于特征数时建议设为 `False`。 3. **C**: 控制正则化的强度,其值越小表示正则化越强,默认值为 `1.0`。 4. **fit_intercept**: 是否计算截距项,默认为 `True`。 5. **solver**: 用于优化问题的算法选项,常见的有 `'liblinear'`, `'lbfgs'`, `'newton-cg'`, `'sag'`, 和 `'saga'`。不同的 solver 对应于不同类型的 penalty 和多分类支持能力。 6. **max_iter**: 迭代的最大次数,默认为 `100`。 7. **multi_class**: 多分类策略选择,可选 `'ovr'`(一对其余)或 `'multinomial'`(多项式)。对于二分类问题默认为 `'auto'`。 #### 使用示例 下面是一个完整的例子展示如何使用 `LogisticRegression` 来训练并可视化决策边界: ```python from sklearn.linear_model import LogisticRegression import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions # 假定 X_train_lda 已经经过降维处理,y_train 是对应的标签 lr = LogisticRegression(solver='liblinear', C=1.0, random_state=1) lr = lr.fit(X_train_lda, y_train) # 绘制决策区域图 plot_decision_regions(X_train_lda, y_train, classifier=lr) plt.xlabel('LD 1') plt.ylabel('LD 2') plt.legend(loc='lower left') plt.show() ``` 上述代码片段展示了如何利用线性核的支持向量机方法构建一个简单的逻辑回归分类器,并绘制二维空间内的决策边界[^1]。 #### 数据集规模说明 在实际应用中,数据集大小会影响模型性能评估以及超参数调优过程。例如,在给定的数据集中提到词汇表大小为 43842,而训练集、验证集和测试集分别具有如下形状: - 训练集输入 `(25000,)` 输出 `(25000,)` - 验证集输入 `(10000,)` 输出 `(10000,)` - 测试集输入 `(6250,)` 输出 `(6250,)`[^3] 这表明该任务可能涉及自然语言处理领域的大规模文本分类场景。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值