All common measures generally assume a ground truth notion of relevance: every document is known to be either relevant or non-relevance to a particular query.
1. Precision and Recall
Precision is the fraction of the documents retrieved that are relevant to the user’s information need.
Recall is the fraction of the documents that are relevant to the query that are successful retrieved.
: Retrieved documents
Relevant documents
So, we will have
2. Fall-out
Fall-out is the proportion of non-relevant documents that are retrieved, out of all non-relevant documents available:
It can be looked at as the probability that a non-relevant document is retrieved by a query.
3. F-measure
F-measure or F-score is the weighted harmonic mean of precision and recall.
The traditional F-measure or balanced F-score is:
The general formula for non-negative real is
4. Average Precision
By computing a precision and recall at every position in the ranked sequence of documents, one can plot a precision-recall curve, plotting precision as a function of recall
.
Average Precision computes the average value of over the interval from to
.
This integral is in practice replaced with a finite sum over every position in the ranked sequence of documents.
Where k is the rank in the sequence of retrieved documents, n is the number of retrieved documents,P(k) is the precision at cut-off k in the list, and is the change in recall from items k-1 to k.
5. R-Precision
Precision at position in the ranking of results for a query that has R relevant documents.
6. Mean average precision
Mean average precision for a set of queries is the mean of the average precision scores for each query.
Where Q is the number of queries.
7. Discounted cumulative gain
DCG uses a graded relevance scale of documents from the results set to evaluate the usefulness or gain, of a document based on its position in the result list.
The DCG accumulated at a particular rank position p is defined as:
Precision and Recall
1. Information Retrieval
- Precision is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search.
- Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents.
2. Classification task
- Precision is defined as the number of true positives divided by the total number of elements labeled as belonging to the positive class (i.e.the sum of true positives and false positives). Precision is also called positive predict value (PPV).
- Recall is defined as the number of true positives divided by the total number of elements that actually belong to positive class (i.e.the sum of true positives and false negatives). Recall is also called sensitivity or true positive rate.
3. Relationship
Often, there is an inverse relationship between precision and recall.Usually, precision and recall scores are not discussed in isolation. Instead,either values for one measure are compared for a fixed level at the other measure or both are combined into a single measure (such as F-measure).
Confusion Matrix(contingency table)
Each column of the matrix represents the instance in a predicted class, while each row represents the instances in an actual class.
Confusion Matrix allows more detailed analysis than accuracy. Accuracy is not a reliable metric for the real performance of a classifier, because it will yield misleading results if the data set is unbalanced (that is, when the number of samples in different classes vary greatly).
Reference:
[1] http://en.wikipedia.org/wiki/Information_retrieval
[2] http://en.wikipedia.org/wiki/Precision_and_recall
[3] http://en.wikipedia.org/wiki/Confusion_matrix