UIUC大学之Coursera课程Text Retrieval and Search Engines：Week 3 Quiz

最新推荐文章于 2018-01-11 17:16:42 发布

原创最新推荐文章于 2018-01-11 17:16:42 发布 · 1.7k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#UIUC大学 #Coursera课程 #Text Retrieval #Search Engines #Quiz

Coursera 专栏收录该内容

8 篇文章

订阅专栏

本文深入探讨了语言模型在概率计算中的应用，包括语言模型的不平等性、单词概率计算、最大似然估计、平滑技术、查询可能性评估、参数调整对概率的影响、反馈机制对检索系统的精度与召回率的影响，以及Rocchio反馈机制在检索系统中的应用。通过这些技术的解析，读者可以更全面地理解信息检索和自然语言处理的基础。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.

In accordance with the Coursera Honor Code, I certify that the answers here are my own work.

Question 1

Assume you are using a unigram language model to calculate the probabilities of phrases. Then, the probabilities of generating the phrases “study text mining” and “text mining study” are not equal, i.e., P(“study text mining”)

≠ P(“text mining study”).

False

True

Question 2

You are given a vocabulary composed of only four words: “the”, “computer”, “science”, and “technology”. Below are the probabilities of three of these four words given by a unigram language model.

Word	Probability
the	0.4
computer	0.2
science	0.3

What is the probability of generating the phrase “the technology” using this unigram language model?

0.5

0.04

0.0024

0.1

Question 3

You are given the query Q= “online courses” and two documents:
D1 = “online courses search engine”
D2 = “online education is affordable”
Assume you are using the maximum likelihood estimator without smoothing to calculate the probabilities of words in documents (i.e., the estimated

p(w|D) is the relative frequency of word

w in the document

D ). Based on the unigram query likelihood model, which of the following choices is correct?

P(Q|D1) = 1/16 P(Q|D2) = 0

P(Q|D1) = 0 P(Q|D2) = 1/4

P(Q|D1) = 1/16 P(Q|D2) = 1/4

P(Q|D1) = 1/2 P(Q|D2) = 1/2

Question 4

Assume the same scenario as in Question 3, but using linear interpolation (Jelinek-Mercer) smoothing with

λ=0.5 . Furthermore, you are given the following probabilities of some of the words in the collection language model:

Word	P(w\|C)
online	1/4
courses	1/4
education	1/8

Based on the unigram query likelihood model, which of the following choices is correct?

P(Q|D1) = 1/16 P(Q|D2) = 1/16

P(Q|D1) = 1/16 P(Q|D2) = 1/32

P(Q|D1) = 1/32 P(Q|D2) = 1/32

P(Q|D1) = 1/16 P(Q|D2) = 0

Question 5

The BM25 has more free parameters to tune than the ranking function of the Dirichlet Prior smoothing.

True

False

Question 6

Assume you are using Dirichlet Prior smoothing to estimate the probabilities of words in a certain document. What happens to the smoothed probability of the word when the parameter

μ is increased?

It becomes closer to the probability of the word in the collection language model

It becomes closer to the maximum likelihood estimate of the probability derived from the document

It does not change

It tends to 1

Question 7

It is possible that pseudo feedback decreases the precision and recall of a certain retrieval system.

True

False

Question 8

Refer to the Rocchio feedback formula in the slides. If you want to eliminate the effect of non-relevant documents when doing feedback, which of the following parameters must be set to zero?

γ and