第一题
要求:在R环境中完成下述操作,并写出具体R代码。
- 查看R当前工作目录,设置R工作目录为数据所在目录并查看该目录下的文件;
- 将数据homework3_data.csv导入到R中;
- 查看行列数及前5行数据以及数据类型;
- 对数据中的测量值进行描述统计并绘制箱线图;
- 下载并安装R包pwr,查看帮助文档了解用法。
第二题
R language application. Please use R to resolve the following issues and display your R code and results.
-
For a normal random variable X with mean 4.0, and standard deviation 1.0, find the probability that X is less than 2.0. find the value K so that P(X>K) = 0.05.
-
When tossing a fair coin 8 times, find the probability of seeing no heads (Hint: this is a binomial distribution.) find the probability of seeing exactly 4 heads. find the probability of seeing more than 5 heads.
-
Simulate a sample of 1000 random data points from a normal distribution with mean 100 and standard deviation 8, and store the result in a vector. plot a histogram and a boxplot of the vector you just created.using the data above, test the hypothesis that the mean equals 100 (using t.test).
第三题
Company A produces biological reagents and some laboratory equipment. The weekly production of reagent M follows the normal probability distribution with a mean of 200 and a standard deviation of 16. Recently, new production methods have been introduced and 50 reagent M were produced whose mean is 203.5
-
The boss would like to investigate whether there has been a change in weekly production of reagent M. Test using 0.01 significance level.
-
Suppose the boss what to know whether there has been an increase in weekly production of reagent M. To put it another way, can we conclude, because of the improved production methods, that the mean production of M was more than 200? Test using 0.01 significance level.
第四题
This question needs to use data “datasets.csv”, which derives from a microarray dataset investigating gene e ression of certain disease. The data has been processed,and the .rst row of the data is the sample serial number,namely,S1 - S20, and the .rst column of the data is the genes (G1- G1OO). The numbers are the expression values of each gene.Please answer the following questions (R code required)
-
Please draw a density plot (PDF) to investigate the distribution G3 gene expression among 20 samples,and then calculate its minimum, median and variance using certain function in R.
-
Please draw a boxplot to compare the distribution of all genes expression among 20 different samples. Note that you should check whether there are any outliers in them? If they really exist, please delete them and redo it.
title: “Question”
author: “HHTING”
output:
pdf_document:
keep_tex: yes
latex_engine: xelatex
word_document: default
html_document:
df_print: paged
header-includes: \usepackage{ctex}
第一题
1.
查看R当前工作目录
getwd()
设置R工作目录为数据所在目录
setwd("D:/")
查看该目录下的文件
dir()
2.
将数据homework3_data.csv导入到R中
data3_1 = read.csv(file = "./homework3-1_data.csv", header = TRUE)
3.
查看行列数
行数:
nrow(x = data3_1)
列数:
ncol(x = data3_1)
or
dim(x = data3_1)
查看前5行数据
head(x = data3_1, n = 5)
查看数据类型
class(x = data3_1)
or
mode(x = data3_1)
or
typeof(x = data3_1)
4.
对数据中的测量值进行描述统计
summary(object = data3_1)
绘制箱线图
boxplot(x = data3_1)
5.
下载并安装R包pwr
# install.packages("pwr")
查看帮助文档了解用法
??pwr
第二题
1.
the probability that X is less than 2.0
pnorm(q = 2.0, mean = 4.0, sd = 1.0)
the value K so that P(X>K) = 0.05
qnorm(p = 1-0.05, mean = 4.0, sd = 1.0)
2.
the probability of seeing no heads
dbinom(x = 0, size = 8, prob = 0.5)
the probability of seeing exactly 4 heads
dbinom(x = 4, size = 8, prob = 0.5)
the probability of seeing more than 5 heads
pbinom(5, size = 8, prob = 0.5, lower.tail = FALSE)
3.
Simulate a sample of 1000 random data points from a normal distribution with mean 100 and standard deviation 8 and store the result in a vector.
x <- rnorm(n = 1000, mean = 100, sd = 8)
the histogram of the vector you just created
hist(x)
the boxplot of the vector you just created
boxplot(x)
test the hypothesis that the mean equals 100
H 0 H_{0} H0: μ 1 = μ 0 \mu_{1}=\mu_{0} μ1=μ0,样本均值等于100
H α H_{α} Hα: μ 1 ≠ μ 0 \mu_{1}\neq\mu_{0} μ1=μ0, 样本均值不等于100
标准差未知,检验均值,可用t检验,构造统计量
t = X ‾ − μ 0 s / n t=\frac{\overline{X}-\mu_{0}}{s/\sqrt{n}} t=s/nX−μ0
t.test(x, alternative = "two.sided", mu = 100)
p值大于显著性水平(默认值0.05),接受原假设,样本均值等于100
第三题
1.
①设置原假设和备择假设
H
0
H_{0}
H0:
μ
1
=
μ
0
=
200
\mu_{1}=\mu_{0}=200
μ1=μ0=200
H a H_{a} Ha: μ 1 ≠ μ 0 = 200 \mu_{1}\neq\mu_{0}=200 μ1=μ0=200
②确定置信空间和检验方式
显著性水平
α
=
0.01
\alpha=0.01
α=0.01;标准差已知,
X
‾
=
203.5
\overline{X}=203.5
X=203.5,
μ
0
=
200
\mu_{0}=200
μ0=200,
σ
0
=
16
\sigma_{0}=16
σ0=16,
n
=
50
n=50
n=50,检验均值,可用Z检验,构造统计量
Z = X ‾ − μ 0 σ 0 / n Z=\frac{\overline{X}-\mu_{0}}{\sigma_{0}/\sqrt{n}} Z=σ0/nX−μ0
X_bar = 203.5
mu = 200
sigma = 16
n = 50
Z = (X_bar - mu) / (sigma / sqrt(n))
Z
代入计算得
Z = X ‾ − μ σ / n = 203.5 − 200 16 / 50 = 1.54680 Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}=\frac{203.5-200}{16/\sqrt{50}}=1.54680 Z=σ/nX−μ=16/50203.5−200=1.54680
qnorm(p = 0.01/2)
Z α / 2 = − 2.57583 Z_{\alpha/2}=-2.57583 Zα/2=−2.57583
qnorm(p = 1 - 0.01/2)
Z 1 − α / 2 = 2.57583 Z_{1-\alpha/2}=2.57583 Z1−α/2=2.57583
∵ Z α / 2 ≤ Z ≤ Z 1 − α / 2 Z_{\alpha/2}\leq Z\leq Z_{1-\alpha/2} Zα/2≤Z≤Z1−α/2
∴接受原假设,即新方法的周产量与原方法没有显著提高。
2.
①设置原假设和备择假设
H
0
H_{0}
H0:
μ
1
=
μ
0
=
200
\mu_{1}=\mu_{0}=200
μ1=μ0=200
H a H_{a} Ha: μ 1 ≠ μ 0 = 200 \mu_{1}\neq\mu_{0}=200 μ1=μ0=200
②确定置信空间和检验方式
显著性水平
α
=
0.01
\alpha=0.01
α=0.01;标准差已知,
X
‾
=
203.5
\overline{X}=203.5
X=203.5,
μ
0
=
200
\mu_{0}=200
μ0=200,
σ
0
=
16
\sigma_{0}=16
σ0=16,
n
=
50
n=50
n=50,检验均值,可用Z检验,构造统计量
Z = X ‾ − μ 0 σ 0 / n Z=\frac{\overline{X}-\mu_{0}}{\sigma_{0}/\sqrt{n}} Z=σ0/nX−μ0
X_bar = 203.5
mu = 200
sigma = 16
n = 50
Z = (X_bar - mu) / (sigma / sqrt(n))
Z
代入计算得
Z = X ‾ − μ σ / n = 203.5 − 200 16 / 50 = 1.54680 Z=\frac{\overline{X}-\mu}{\sigma/\sqrt{n}}=\frac{203.5-200}{16/\sqrt{50}}=1.54680 Z=σ/nX−μ=16/50203.5−200=1.54680
qnorm(p = 1 - 0.01)
Z 1 − α = 2.32635 Z_{1-\alpha}=2.32635 Z1−α=2.32635
∵ Z ≤ Z 1 − α Z\leq Z_{1-\alpha} Z≤Z1−α
∴接受原假设,即新方法的周产量没有显著提高。
第四题
1.
data3_4 = as.matrix(read.csv(file = "./homework3-4_data.csv", header = TRUE, row.names = 1))
The density plot of G3 gene expression among 20 samples
plot(density(x = data3_4["G3",]))
Its minimum
min(x = data3_4["G3",])
Its median
median(x = data3_4["G3",])
Its variance
var(x = data3_4["G3",])
2.
The boxplot of all genes expression among 20 different samples
boxplot(x = data3_4)
The boxplot of all genes expression among 20 different samples after deleting outliers
boxplot(x = data3_4, outline = FALSE)