predict.glm -> which class does it predict?

 

predict.glm -> which class does it predict?

Peter Schüffler-2
2 posts
Hi, 

I have a question about logistic regression in R. 

Suppose I have a small list of proteins P1, P2, P3 that predict a 
two-class target T, say cancer/noncancer. Lets further say I know that I 
can build a simple logistic regression model in R 

model <- glm(T ~ ., data=d.f(Y), family=binomial)   (Y is the dataset of 
the Proteins). 

This works fine. T is a factored vector with levels cancer, noncancer. 
Proteins are numeric. 

Now, I want to use predict.glm to predict a new data. 

predict(model, newdata=testsamples, type="response")    (testsamples is 
a small set of new samples). 

The result is a vector of the probabilites for each sample in 
testsamples. But probabilty WHAT for? To belong to the first level in T? 
To belong to second level in T? 

Is this fallowing expression 
factor(predict(model, newdata=testsamples, type="response") >= 0.5) 
TRUE, when the new sample is classified to Cancer or when it's 
classified to Noncancer? And why not the other way around? 

Thank you, 

Peter 

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide  http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 
Reply |  Threaded |  More 

Re: predict.glm -> which class does it predict?

Marc Schwartz-3
1330 posts
On Jul 10, 2009, at 9:46 AM, Peter Schüffler wrote: 

> Hi, 

> I have a question about logistic regression in R. 

> Suppose I have a small list of proteins P1, P2, P3 that predict a   
> two-class target T, say cancer/noncancer. Lets further say I know   
> that I can build a simple logistic regression model in R 

> model <- glm(T ~ ., data=d.f(Y), family=binomial)   (Y is the   
> dataset of the Proteins). 

> This works fine. T is a factored vector with levels cancer,   
> noncancer. Proteins are numeric. 

> Now, I want to use predict.glm to predict a new data. 

> predict(model, newdata=testsamples, type="response")    (testsamples   
> is a small set of new samples). 

> The result is a vector of the probabilites for each sample in   
> testsamples. But probabilty WHAT for? To belong to the first level   
> in T? To belong to second level in T? 

> Is this fallowing expression 
> factor(predict(model, newdata=testsamples, type="response") >= 0.5) 
> TRUE, when the new sample is classified to Cancer or when it's   
> classified to Noncancer? And why not the other way around? 

> Thank you, 

> Peter
... [ show rest of quote]

As per the Details section of ?glm: 

A typical predictor has the form response ~ terms where response is   
the (numeric) response vector and terms is a series of terms which   
specifies a linear predictor forresponse. ***For binomial and   
quasibinomial families the response can also be specified as a factor   
(when the first level denotes failure and all others success)*** or as   
a two-column matrix with the columns giving the numbers of successes   
and failures. A terms specification of the form first + second   
indicates all the terms in first together with all the terms in second   
with any duplicates removed. 


So, given your description above, you are predicting   
"noncancer"...that is, you are predicting the probability of the   
second level of the factor ("success"), given the covariates. 

If you want to predict "cancer", alter the factor levels thusly: 

   T <- factor(T, levels = c("noncancer", "cancer")) 

By default, R will alpha sort the factor levels, so "cancer" would be   
first. 

Think of it in terms of using a 0,1 integer code for absence,presence,   
where you are predicting the probability of a '1', or the presence of   
the event or characteristic of interest. 

BTW, using 'T' as the name of the response vector is not a good habit: 

 > T 
[1] TRUE 

'T' is shorthand for the built in R constant TRUE. R is generally   
smart enough to know the difference, but it is better to avoid getting   
into trouble by not using it. 

HTH, 

Marc Schwartz 

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide  http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 
Reply |  Threaded |  More 

Re: predict.glm -> which class does it predict?

Peter Dalgaard
2360 posts
In reply to  this post by Peter Schüffler-2
Peter Schüffler wrote:

> Hi, 

> I have a question about logistic regression in R. 

> Suppose I have a small list of proteins P1, P2, P3 that predict a 
> two-class target T, say cancer/noncancer. Lets further say I know that I 
> can build a simple logistic regression model in R 

> model <- glm(T ~ ., data=d.f(Y), family=binomial)   (Y is the dataset of 
> the Proteins). 

> This works fine. T is a factored vector with levels cancer, noncancer. 
> Proteins are numeric. 

> Now, I want to use predict.glm to predict a new data. 

> predict(model, newdata=testsamples, type="response")    (testsamples is 
> a small set of new samples). 

> The result is a vector of the probabilites for each sample in 
> testsamples. But probabilty WHAT for? To belong to the first level in T? 
> To belong to second level in T? 

> Is this fallowing expression 
> factor(predict(model, newdata=testsamples, type="response") >= 0.5) 
> TRUE, when the new sample is classified to Cancer or when it's 
> classified to Noncancer? And why not the other way around?
... [ show rest of quote]

It's the probability of the 2nd level of a factor response (termed 
"success" in the documentation, even when your modeling the probability 
of disease or death...), just like when interpreting the logistic 
regression itself. 

I find it easiest to sort ut this kind of issue by experimentation in 
simplified situations. E.g. 

 > x <- sample(c("A","B"),10,replace=TRUE) 
 > x 
  [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" 
 > table(x) 

A B 
4 6 

(notice that the relative frequency of B is 0.6) 

 > glm(x~1,binomial) 
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 
In addition: Warning message: 
In model.matrix.default(mt, mf, contrasts) : 
   variable 'x' converted to a factor 

(OK, so it won't go without conversion to factor. This is a good thing.) 

 > glm(factor(x)~1,binomial) 

Call:  glm(formula = factor(x) ~ 1, family = binomial) 

Coefficients: 
(Intercept) 
      0.4055 

Degrees of Freedom: 9 Total (i.e. Null);  9 Residual 
Null Deviance:    13.46 
Residual Deviance: 13.46 AIC: 15.46 

(The intercept is positive, corresponding to log odds for a probability 
 > 0.5 ; i.e.,  must be that "B": 0.4055==log(6/4)) 

 > predict(glm(factor(x)~1,binomial)) 
         1         2         3         4         5         6         7 
        8 
0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 
0.4054651 
         9        10 
0.4054651 0.4054651 
 > predict(glm(factor(x)~1,binomial),type="response") 
   1   2   3   4   5   6   7   8   9  10 
0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 

As for why it's not the other way around, well, if it had been, then you 
could have asked the same question.... 


-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B 
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K 
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918 
~~~~~~~~~~ - ( [hidden email])              FAX: (+45) 35327907 

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide  http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 
Reply |  Threaded |  More 

Re: predict.glm -> which class does it predict?

Gabor Grothendieck
7686 posts
2009/7/10 Peter Dalgaard < [hidden email]>:

> Peter Schüffler wrote: 
>> 
>> Hi, 
>> 
>> I have a question about logistic regression in R. 
>> 
>> Suppose I have a small list of proteins P1, P2, P3 that predict a 
>> two-class target T, say cancer/noncancer. Lets further say I know that I can 
>> build a simple logistic regression model in R 
>> 
>> model <- glm(T ~ ., data=d.f(Y), family=binomial)   (Y is the dataset of 
>> the Proteins). 
>> 
>> This works fine. T is a factored vector with levels cancer, noncancer. 
>> Proteins are numeric. 
>> 
>> Now, I want to use predict.glm to predict a new data. 
>> 
>> predict(model, newdata=testsamples, type="response")    (testsamples is a 
>> small set of new samples). 
>> 
>> The result is a vector of the probabilites for each sample in testsamples. 
>> But probabilty WHAT for? To belong to the first level in T? To belong to 
>> second level in T? 
>> 
>> Is this fallowing expression 
>> factor(predict(model, newdata=testsamples, type="response") >= 0.5) 
>> TRUE, when the new sample is classified to Cancer or when it's classified 
>> to Noncancer? And why not the other way around? 

> It's the probability of the 2nd level of a factor response (termed "success" 
> in the documentation, even when your modeling the probability of disease or 
> death...), just like when interpreting the logistic regression itself. 

> I find it easiest to sort ut this kind of issue by experimentation in 
> simplified situations. E.g. 

>> x <- sample(c("A","B"),10,replace=TRUE) 
>> x 
>  [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" 
>> table(x) 
> x 
> A B 
> 4 6 

> (notice that the relative frequency of B is 0.6) 

>> glm(x~1,binomial) 
> Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 
> In addition: Warning message: 
> In model.matrix.default(mt, mf, contrasts) : 
>  variable 'x' converted to a factor 

> (OK, so it won't go without conversion to factor. This is a good thing.) 

>> glm(factor(x)~1,binomial) 

> Call:  glm(formula = factor(x) ~ 1, family = binomial) 

> Coefficients: 
> (Intercept) 
>     0.4055 

> Degrees of Freedom: 9 Total (i.e. Null);  9 Residual 
> Null Deviance:      13.46 
> Residual Deviance: 13.46        AIC: 15.46 

> (The intercept is positive, corresponding to log odds for a probability > 
> 0.5 ; i.e.,  must be that "B": 0.4055==log(6/4)) 

>> predict(glm(factor(x)~1,binomial)) 
>        1         2         3         4         5         6         7       8 
> 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 
> 0.4054651 
>        9        10 
> 0.4054651 0.4054651 
>> predict(glm(factor(x)~1,binomial),type="response") 
>  1   2   3   4   5   6   7   8   9  10 
> 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 

> As for why it's not the other way around, well, if it had been, then you 
> could have asked the same question.... 
>
... [ show rest of quote]


Or more specifically: 

> resp <- factor(c("cancer", "noncancer", "noncancer", "noncancer")) 
> mod <- glm(resp ~ 1, family = binomial) 
> predict(mod, type = "response") 
   1    2    3    4 
0.75 0.75 0.75 0.75 

and since noncancer occurs 75% of the time in the sample clearly 
its predicting the probability of noncancer. 

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide  http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 
Reply |  Threaded |  More 

Re: predict.glm -> which class does it predict?

Peter Dalgaard
2360 posts
In reply to  this post by Peter Dalgaard

> As for why it's not the other way around, well, if it had been, then you 
> could have asked the same question.... 

...and come to think about it, it is rather convenient that it meshes 
with the default ordering of levels in factor(x) is x is 0/1 or FALSE/TRUE. 


-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B 
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K 
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918 
~~~~~~~~~~ - ( [hidden email])              FAX: (+45) 35327907 

______________________________________________ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide  http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

 

转载于:https://www.cnblogs.com/xiaojikuaipao/p/9561886.html

执行 `python segment/predict.py --weights yolov5m-seg.pt --data data/images/bus.jpg` 命令时出现 `can't open file '/home/HwHiAiUser/segment/predict.py': [Errno 2] No such file or directory` 错误,通常有以下几种原因: ### 文件路径错误 - **相对路径与绝对路径问题**:使用的 `segment/predict.py` 可能是相对路径,而当前工作目录并非预期的目录,导致 Python 找不到该文件。若当前工作目录不是包含 `segment` 文件夹的目录,Python 就无法正确定位 `predict.py` 文件。 - **路径拼写错误**:路径中的文件夹名、文件名可能存在拼写错误,如 `segment` 文件夹名写错,或者 `predict.py` 文件名有误,都可能导致文件找不到。 ### 文件确实不存在 - **文件被移动或删除**:`predict.py` 文件可能已被移动到其他位置或被删除,从而在指定路径下找不到该文件。 - **项目结构变更**:项目结构发生了变化,原本存在于 `segment` 文件夹下的 `predict.py` 文件被移到了其他地方。 ### 权限问题 - **文件权限不足**:当前用户可能没有权限访问 `predict.py` 文件,这也会导致 Python 无法打开该文件。 ### Pycharm 工作目录问题(如果在 Pycharm 中运行) - **工作目录记录错误**:如同引用 [4] 中提到的情况,Pycharm 会记录脚本的工作目录,若脚本被移动到其他目录,可能会出现找不到文件的问题。 ### 示例代码说明问题 以下是一个简单示例,模拟文件路径错误的情况: ```python # 假设以下代码在 main.py 中执行 try: with open('wrong_path/wrong_file.py', 'r') as f: print(f.read()) except FileNotFoundError as e: print(f"Error: {e}") ``` 上述代码尝试打开一个不存在的文件路径,会抛出 `FileNotFoundError` 错误。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值