<Question3> of R & Biostatistics

第一题

要求:在R环境中完成下述操作,并写出具体R代码。

  1. 查看R当前工作目录,设置R工作目录为数据所在目录并查看该目录下的文件;
  2. 将数据homework3_data.csv导入到R中;
  3. 查看行列数及前5行数据以及数据类型;
  4. 对数据中的测量值进行描述统计并绘制箱线图;
  5. 下载并安装R包pwr,查看帮助文档了解用法。

第二题

R language application. Please use R to resolve the following issues and display your R code and results.

  1. For a normal random variable X with mean 4.0, and standard deviation 1.0, find the probability that X is less than 2.0. find the value K so that P(X>K) = 0.05.

  2. When tossing a fair coin 8 times, find the probability of seeing no heads (Hint: this is a binomial distribution.) find the probability of seeing exactly 4 heads. find the probability of seeing more than 5 heads.

  3. Simulate a sample of 1000 random data points from a normal distribution with mean 100 and standard deviation 8, and store the result in a vector. plot a histogram and a boxplot of the vector you just created.using the data above, test the hypothesis that the mean equals 100 (using t.test).

第三题

Company A produces biological reagents and some laboratory equipment. The weekly production of reagent M follows the normal probability distribution with a mean of 200 and a standard deviation of 16. Recently, new production methods have been introduced and 50 reagent M were produced whose mean is 203.5

  1. The boss would like to investigate whether there has been a change in weekly production of reagent M. Test using 0.01 significance level.

  2. Suppose the boss what to know whether there has been an increase in weekly production of reagent M. To put it another way, can we conclude, because of the improved production methods, that the mean production of M was more than 200? Test using 0.01 significance level.

第四题

This question needs to use data “datasets.csv”, which derives from a microarray dataset investigating gene e ression of certain disease. The data has been processed,and the .rst row of the data is the sample serial number,namely,S1 - S20, and the .rst column of the data is the genes (G1- G1OO). The numbers are the expression values of each gene.Please answer the following questions (R code required)

  1. Please draw a density plot (PDF) to investigate the distribution G3 gene expression among 20 samples,and then calculate its minimum, median and variance using certain function in R.

  2. Please draw a boxplot to compare the distribution of all genes expression among 20 different samples. Note that you should check whether there are any outliers in them? If they really exist, please delete them and redo it.


title: “Question”
author: “HHTING”
output:
pdf_document:
keep_tex: yes
latex_engine: xelatex
word_document: default
html_document:
df_print: paged
header-includes: \usepackage{ctex}

第一题

1.

查看R当前工作目录

getwd()

设置R工作目录为数据所在目录

setwd("D:/")

查看该目录下的文件

dir()

2.

将数据homework3_data.csv导入到R中

data3_1 = read.csv(file = "./homework3-1_data.csv", header = TRUE)

3.

查看行列数

行数:

nrow(x = data3_1)

列数:

ncol(x = data3_1)

or

dim(x = data3_1)

查看前5行数据

head(x = data3_1, n = 5)

查看数据类型

class(x = data3_1)

or

mode(x = data3_1)

or

typeof(x = data3_1)

4.

对数据中的测量值进行描述统计

summary(object = data3_1)

绘制箱线图

boxplot(x = data3_1)

5.

下载并安装R包pwr

# install.packages("pwr")

查看帮助文档了解用法

??pwr

第二题

1.

the probability that X is less than 2.0

pnorm(q = 2.0, mean = 4.0, sd = 1.0)

the value K so that P(X>K) = 0.05

qnorm(p = 1-0.05, mean = 4.0, sd = 1.0)

2.

the probability of seeing no heads

dbinom(x = 0, size = 8, prob = 0.5)

the probability of seeing exactly 4 heads

dbinom(x = 4, size = 8, prob = 0.5)

the probability of seeing more than 5 heads

pbinom(5, size = 8, prob = 0.5, lower.tail = FALSE)

3.

Simulate a sample of 1000 random data points from a normal distribution with mean 100 and standard deviation 8 and store the result in a vector.

x <- rnorm(n = 1000, mean = 100, sd = 8)

the histogram of the vector you just created

hist(x)

the boxplot of the vector you just created

boxplot(x)

test the hypothesis that the mean equals 100

H 0 H_{0} H0 μ 1 = μ 0 \mu_{1}=\mu_{0} μ1=μ0,样本均值等于100

H α H_{α} Hα μ 1 ≠ μ 0 \mu_{1}\neq\mu_{0} μ1=μ0, 样本均值不等于100

标准差未知,检验均值,可用t检验,构造统计量

t = X ‾ − μ 0 s / n t=\frac{\overline{X}-\mu_{0}}{s/\sqrt{n}} t=s/n Xμ0

t.test(x, alternative = "two.sided", mu = 100)

p值大于显著性水平(默认值0.05),接受原假设,样本均值等于100

第三题

1.

①设置原假设和备择假设
H 0 H_{0} H0 μ 1 = μ 0 = 200 \mu_{1}=\mu_{0}=200 μ1=μ0=200

H a H_{a} Ha μ 1 ≠ μ 0 = 200 \mu_{1}\neq\mu_{0}=200 μ1=μ0=200

②确定置信空间和检验方式
显著性水平 α = 0.01 \alpha=0.01 α=0.01;标准差已知, X ‾ = 203.5 \overline{X}=203.5 X=203.5 μ 0 = 200 \mu_{0}=200 μ0=200 σ 0 = 16 \sigma_{0}=16 σ0=16 n = 50 n=50 n=50,检验均值,可用Z检验,构造统计量

Z = X ‾ − μ 0 σ 0 / n Z=\frac{\overline{X}-\mu_{0}}{\sigma_{0}/\sqrt{n}} Z=σ0/n Xμ0

X_bar = 203.5
mu = 200
sigma = 16
n = 50
Z = (X_bar - mu) / (sigma / sqrt(n))
Z

代入计算得

Z = X ‾ − μ σ / n = 203.5 − 200 16 / 50 = 1.54680 Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}=\frac{203.5-200}{16/\sqrt{50}}=1.54680 Z=σ/n Xμ=16/50 203.5200=1.54680

qnorm(p = 0.01/2)

Z α / 2 = − 2.57583 Z_{\alpha/2}=-2.57583 Zα/2=2.57583

qnorm(p = 1 - 0.01/2)

Z 1 − α / 2 = 2.57583 Z_{1-\alpha/2}=2.57583 Z1α/2=2.57583

Z α / 2 ≤ Z ≤ Z 1 − α / 2 Z_{\alpha/2}\leq Z\leq Z_{1-\alpha/2} Zα/2ZZ1α/2

∴接受原假设,即新方法的周产量与原方法没有显著提高。

2.

①设置原假设和备择假设
H 0 H_{0} H0 μ 1 = μ 0 = 200 \mu_{1}=\mu_{0}=200 μ1=μ0=200

H a H_{a} Ha μ 1 ≠ μ 0 = 200 \mu_{1}\neq\mu_{0}=200 μ1=μ0=200

②确定置信空间和检验方式
显著性水平 α = 0.01 \alpha=0.01 α=0.01;标准差已知, X ‾ = 203.5 \overline{X}=203.5 X=203.5 μ 0 = 200 \mu_{0}=200 μ0=200 σ 0 = 16 \sigma_{0}=16 σ0=16 n = 50 n=50 n=50,检验均值,可用Z检验,构造统计量

Z = X ‾ − μ 0 σ 0 / n Z=\frac{\overline{X}-\mu_{0}}{\sigma_{0}/\sqrt{n}} Z=σ0/n Xμ0

X_bar = 203.5
mu = 200
sigma = 16
n = 50
Z = (X_bar - mu) / (sigma / sqrt(n))
Z

代入计算得

Z = X ‾ − μ σ / n = 203.5 − 200 16 / 50 = 1.54680 Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}=\frac{203.5-200}{16/\sqrt{50}}=1.54680 Z=σ/n Xμ=16/50 203.5200=1.54680

qnorm(p = 1 - 0.01)

Z 1 − α = 2.32635 Z_{1-\alpha}=2.32635 Z1α=2.32635

Z ≤ Z 1 − α Z\leq Z_{1-\alpha} ZZ1α

∴接受原假设,即新方法的周产量没有显著提高。

第四题

1.

data3_4 = as.matrix(read.csv(file = "./homework3-4_data.csv", header = TRUE, row.names = 1))

The density plot of G3 gene expression among 20 samples

plot(density(x = data3_4["G3",]))

Its minimum

min(x = data3_4["G3",])

Its median

median(x = data3_4["G3",])

Its variance

var(x = data3_4["G3",])

2.

The boxplot of all genes expression among 20 different samples

boxplot(x = data3_4)

The boxplot of all genes expression among 20 different samples after deleting outliers

boxplot(x = data3_4, outline = FALSE)
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值