Performance Measures and Evaluation on IR System_data retrival performance evaluation 和 information-优快云博客

本文深入探讨了信息检索中的核心概念，包括精确率、召回率、错误率、F-measure、平均精确率、R-精确率和折扣累积收益等评价指标的定义与应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

All common measures generally assume a ground truth notion of relevance: every document is known to be either relevant or non-relevance to a particular query.

1. Precision and Recall

Precision is the fraction of the documents retrieved that are relevant to the user’s information need.

Recall is the fraction of the documents that are relevant to the query that are successful retrieved.

$A$ : Retrieved documents $C:$ Relevant documents $B=A{\cap}C$

So, we will have

$Precision=\frac{B}{A},Recall=\frac{B}{C}$

2. Fall-out

Fall-out is the proportion of non-relevant documents that are retrieved, out of all non-relevant documents available:

$fall-out=\frac{A-B}{-C}$

It can be looked at as the probability that a non-relevant document is retrieved by a query.

3. F-measure

F-measure or F-score is the weighted harmonic mean of precision and recall.

The traditional F-measure or balanced F-score is:

$F=\frac{2{\cdot}precision{\cdot}recall}{(precision+recall)}$

The general formula for non-negative real $\beta$ is

$F_\beta=\frac{\left(1+\beta^2\right){\cdot}precision{\cdot}recall}{\left(\beta^2{\cdot}precision+recall\right)}$

4. Average Precision

By computing a precision and recall at every position in the ranked sequence of documents, one can plot a precision-recall curve, plotting precision as $p(r)$ a function of recall $r$ .

Average Precision computes the average value of over the interval from $r=0$ to $r=1$ .

$AvgP=\int_0^1 p(r)\,\mathrm{d}x$

This integral is in practice replaced with a finite sum over every position in the ranked sequence of documents.

$AvgP=\sum_{k=1}^{k=n}P(k)\Detar(k)$

Where k is the rank in the sequence of retrieved documents, n is the number of retrieved documents,P(k) is the precision at cut-off k in the list, and ${\Deta}r(k)$ is the change in recall from items k-1 to k.

5. R-Precision

Precision at $R_{th}$ position in the ranking of results for a query that has R relevant documents.

6. Mean average precision

Mean average precision for a set of queries is the mean of the average precision scores for each query.

$MAP=\frac{\sum_{q=1}^QAveP(q)}{Q}$

Where Q is the number of queries.

7. Discounted cumulative gain

DCG uses a graded relevance scale of documents from the results set to evaluate the usefulness or gain, of a document based on its position in the result list.

The DCG accumulated at a particular rank position p is defined as:

$DCG_p=rel_1+\sum_{i=2}^p\frac{rel_i}{log_2i}$

Precision and Recall

1. Information Retrieval

Precision is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search.
Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents.

2. Classification task

Precision is defined as the number of true positives divided by the total number of elements labeled as belonging to the positive class (i.e.the sum of true positives and false positives). Precision is also called positive predict value (PPV).
Recall is defined as the number of true positives divided by the total number of elements that actually belong to positive class (i.e.the sum of true positives and false negatives). Recall is also called sensitivity or true positive rate.

3. Relationship

Often, there is an inverse relationship between precision and recall.Usually, precision and recall scores are not discussed in isolation. Instead,either values for one measure are compared for a fixed level at the other measure or both are combined into a single measure (such as F-measure).

Confusion Matrix(contingency table)

Each column of the matrix represents the instance in a predicted class, while each row represents the instances in an actual class.

Confusion Matrix allows more detailed analysis than accuracy. Accuracy is not a reliable metric for the real performance of a classifier, because it will yield misleading results if the data set is unbalanced (that is, when the number of samples in different classes vary greatly).

Reference:

[1] http://en.wikipedia.org/wiki/Information_retrieval

[2] http://en.wikipedia.org/wiki/Precision_and_recall

[3] http://en.wikipedia.org/wiki/Confusion_matrix