Improving web-query processing through semantic knowledge and user feedback-3

本文介绍了一种利用ResearchCyc和WordNet的知识来增强网络查询的方法。该方法通过查询扩展和查询优化来提升检索效果,更好地满足用户需求。

4. Methodology

The methodology uses semantic, linguistic, and factual information from ResearchCyc and WordNet to process web queries. The two major aspects of the methodology are query expansion and query refinement. During query expansion, the query is “expanded” with new terms to improve the retrieval performance. The expansion is usually performed using synonyms of the initial query terms Qw. Query expansion also takes into account the actions or properties of the initial terms (non-taxonomic relationships) and instances (in some cases). For example, the query “Pet” may be expanded as Pet and Animal.

Query refinement is the incremental process of transforming a query into a new query that more accurately reflects the user’s information need [17]. The goal is not to obtain better results but to change (shrink/grow) what the user are looking for (the expected result). To do so, the user is asked to disambiguate the query after which, the query may be reformulated automatically using the semantic knowledge of the ontology. This process uses generalization or specialization relationship types as well as non-taxonomic relationships and instances. For example, the query “Pet” may be refined by asking the user:

You are interested in Pets, but are you interested in any activity related with Pets? (1) buying/selling Pets, (2) Pet Stores in your area, (3) Providers of Animal Therapy

Supposing that the user is interested in Pet Stores in his/her area and he/she lives in Atlanta, then using the ResearchCyc knowledge the query may be automatically redefined as: “Pet Store” and Atlanta and Georgia and buy.

The proposed methodology consists of four phases: (a) Query Parsing, (b) Query Expansion, (c) Query Refinement, and (d) Query Submission. The query parsing phase involves parsing the natural language query using POS tagging and identifying the types of terms: nouns (and noun phrases), verbs, adjectives, adverbs, etc. These terms form the initial query. The query expansion phase adds similar terms to the query and negative knowledge as appropriate. The user identifies the correct word sense and the other word senses are added as negative knowledge since the user is not interested in them. The query refinement phase reformulates the query to better focus on the necessities of the user. This is accomplished by using the taxonomic and non-taxonomic relationships in ResearchCyc. The query submission phase creates the final query according to the syntax required by the search engine used and submits the query and provides the results back to the user. The steps in the methodology are given below. After each step, the user is asked if the query reflects his/her intension. If so, the final query is constructed using the appropriate syntax and submitted to the search engine. The steps in the methodology are summarized in Table 3.


 

Table 3.

Steps of the methodology

Phase

Step

Description

Knowledge used

Result

Query parsing

1

Query is parsed using POS Tagger in order to identify the terms used

None

A set of query terms (t1, … , tn) that will be used as the initial query

Query expansion

2

The concepts in ResearchCyc that represent the query terms are identified

Linguistic information from ResearchCyc (use of WordNet is also recommended due to limited linguistic information contained in ResearchCyc)

A set of the ResearchCyc concepts (c1, … , cm with m   n) relevant to the query

 

3

There may be more than one concept ci for each term tj. This step finds the appropriate word sense of each term in the context of the query. This process is mostly manual, although heuristics are applied to disambiguate terms automatically in some cases

General information about the concepts is used, such as description and sub and supertypes

To disambiguate concepts automatically, disjoint constraints, generalization/ specialization relationships, and general relationship types are used

Two sets:

(1) Composed of one concept per query term (each concept representing the relevant meaning of a term), and

(2) Composed of the concepts that represent the discarded senses

 

4

Extends the query with other concepts closely related to the query concepts

Some generic relationship types of ResearchCyc are used to identify the elements closely related to each concept ci

A set of concepts (cl1, … ,clk) closely related to the concepts of the query

Query refinement

5

Several refinements of the query are identified and presented to the user; user may proceed with the initial query or choose a refinement and generate a refined query

The 82 part of relationship types and other relationship types that denote semantic closeness between concepts are used to identify possible refinements for the query concepts

If the user has selected any refinement, then the output is one ResearchCyc concept for each of the query terms

Query Submission

6

Construct the final Boolean query using appropriate syntax

Linguistic information is used. The denotation words of the selected and discarded concepts are incorporated into the query using the search engine syntax.

A string that represents the final query

 

7

Submit query to the search engine and provide the results back to the user

None

Results of query execution

Full-size table

View Within Article

 

 

The following example shows how concepts in ResearchCyc can be used to reason about query terms and to select an appropriate sense and terms to add to the query. Suppose the user wants to know the places to drink mate, which is a kind of tea frequently drunk in Argentina, and writes the query “drinking mate in Barcelona”. In the first step, the query is parsed and the output is the set of initial query terms, namely, drinking, mate, and Barcelona. The word drinking has three senses in ResearchCyc: Alcoholic beverage, Drink as a noun and Drink as a verb (the act of drinking). The word mate has three senses with Paraguayan tea identified from Wordnet because it is not defined in ResearchCyc. We use the links between the supertypes of mate in Wordnet and ResearchCyc to identify that tea (a supertype of mate) is related to the concept Tea-Beverage in ResearchCyc. Finally, Barcelona has only one sense: city of Barcelona. Thus, the result of the second step for the presented query is:

{{AlcoholicBeverage, Drink, DrinkEvent},{partner, Tea-Beverage},{CityOfBarcelona}}.

The above query has three senses for the word drinking. In the third step of the methodology, the second sense Drink is automatically discarded because it is the supertype of the sense AlcoholicBeverage of the same word. Here, we need user interaction to identify that we are interested in the activity of drinking instead of the alcoholic beverage. Therefore, the concept DrinkEvent is selected as the appropriated sense for the first query term. Note that the appropriate sense of the second word “mate” may be inferred automatically because Tea-Beverage is related to a particular sense of the other two words. Specifically, Tea-Beverage is related with DrinkEvent because Tea-Beverage is a subtype of Drink, and Drink is related with DrinkEvent with a relationship that denotes that the action of drinking involves consuming a drink. Tea-Beverage is related with cityOfBarcelona because it is an instance of City, which is a subtype of Place; Place is related with Event-Localized, which is a supertype of DrinkingEvent, with the relationship type EventOccurs. Hence, the other senses are automatically discarded from the second word of the query and added as negative knowledge. Therefore, the third step returns the following two lists:

{DrinkEvent, Tea-Beverage, CityOfBarcelona}

{{AlcoholicBeverage}, {partner}, { }}

The first list denotes the relevant meaning for the query while the second list represents the discarded senses. In the fourth step of our example, the query should be expanded with the concept Spain because there is a relationship called CountryOfCity that relates CityOfBarcelona with the concept Spain. Hence, the concept Spain has been added to the list of relevant concepts. Assume that the user does not select any refinement in this example (fifth step). Finally, the sixth step returns the following query:

“Drinking mate tea Barcelona Spain – alcoholic – love”3

5. Prototype architecture and implementation

The methodology has been implemented in a prototype using J2EE technologies. The prototype interfaces with Google and AlltheWeb search engines. The query expansion module and the query refinement module interact with ResearchCyc through its Java API [5], which is used for querying the concepts of ResearchCyc and making inferences about the concepts related to user query terms.

The architecture of the prototype is shown in Fig. 2 and consists of two parts, the client side and java-enabled server side. The client is a web browser that presents the web pages created in the server side to gather information from the user and present the query results. The server side contains four major components: (1) query parser module, (2) query expansion module, (3) query refinement module, and (4) query generation module.


 

 

Full-size image (71K)

 

Fig. 2. System architecture.

View Within Article

 

 

The Query Parser Module captures the user’s query, parses it with QTag parser (http://www.english.bham.ac.uk/staff/omason/software/qtag.html) and returns the part-of-speech for each term. From this, a baseline query is created. The Query Expansion Module interfaces with ResearchCyc knowledge sources and supports the query expansion steps. For each query term, it obtains the related concepts and the synsets and lets the user select the appropriate word sense to use. Based on the user’s input, appropriate synonyms and negative knowledge are added to the query. In some cases, the word sense can be identified automatically when some of the terms in the query are relationship types that relate to other concepts in the query for only one of the possible senses. The Query Refinement Module interfaces with ResearchCyc and adds personal information that is relevant to the query to restrict the search domains. Based on the user’s selected synset, hypernyms and hyponyms for the selected sense of the term are obtained from ResearchCyc and WordNet. This module uses taxonomic and non-taxonomic relationships from ResearchCyc to reason about concepts and to propose appropriate refinements to the query. When the user chooses to refine a query, the new query is sent back to the query expansion module because the refined query can be expanded with new information and further refined before being executed. The Query Generation Module creates the augmented query using the appropriate syntax for the search engine. Boolean operators are used to construct the final query and adequate care is taken to ensure that the final query meets the syntax requirements. The Search Engine Interface enables the final query to be submitted to various search engines and forwards the results back to the user.

5.1. Implementation

The prototype is implemented as a web application using JSP (Java Server Pages). This development environment was chosen because it would make the system portable and easily accessible through the World Wide Web. On the client side, web pages are used to gather information from the user, such as the initial query, the user selection of the relevant senses for the query terms and the refinements to apply. On the server side, several modules have been implemented using java servlets. These modules are used to parse the query, identify the correct senses of the query terms in the ontology, identify other concepts from the ontology which are also relevant to the query, and identify possible query refinements. The query expansion module and the query refinement module interact with ResearchCyc through its Java API [5]. The query creation module interfaces with Google and AlltheWeb search engines.

5.2. Sample query

This section illustrates how our system works using a sample query. Assume that a user from Atlanta (Georgia) is looking to buy forks. Therefore, the user may pose the query “buying fork Georgia”. If we execute such a query in Google only one of the first 10 results is relevant to the user.

In our system, the user would type the query in the initial web page (Fig. 3) and click on the “Parse Query” button. The query is sent to the server and parsed to identify the query terms (the nouns and verbs contained in the query among others). The query terms in the example are {buying, fork, georgia}.


 

 

Full-size image (56K)

 

Fig. 3. Initial web page for specifying the query.

View Within Article

 

 

The query expansion module identifies the ResearchCyc concepts that are linguistically related to the query terms. If more than one concept is related to a single query term then the system creates a web page and sends it to the user (Fig. 4). This web page shows the different meanings of the ambiguous query terms and allows the user to choose the correct sense. In our example, the term Georgia has three different meanings in the ontology: the University of Georgia, the state of Georgia in the US and the country of Georgia in Europe. Then, the user selects the meaning Georgia-State and clicks on the “Query Expansion and Refinement” button to continue. Due to the incompleteness of the ontology, an option called “none of the previous senses” has been added to the web page. In the event that none of the senses defined in the ontology fits with the query term in the context of the query, the user can select this option.


 

 

Full-size image (81K)

 

Fig. 4. Disambiguation web page.

View Within Article

 

 

Based on the user’s selection, the query expansion module identifies the concepts that can be used to expand the query. At this point, only geographical information and the part of relationship types are used to identify these expansions. Since Georgia-State is a state of the US, the query is expanded with the term the United States. The query generation module creates the final query using the denotations of the correct meaning of the query terms and their expansions. The discarded meanings are added as negative information to the query. The resultant query is

fork georgia buying – “the university of georgia” – “the republic of georgia” the United States

Next, the system identifies the possible refinements of the query by studying the knowledge related to the relevant concepts. When all the possible refinements have been identified, the system creates a web page that contains the created query and these refinements (Fig. 5). In this example, the system presents seven refinements for the term fork (Kitchenware, eating, utensil organizer, hand, grip, control, action with only one performer), three refinements for Georgia (Governor, US Governor, state official) and two for Buying (drug trafficking4 and invest in hedge fund).


 

 

Full-size image (135K)

 

Fig. 5. Query refinement web page for initial query.

View Within Article

 

 

In the web page shown in Fig. 5, the user has two options:

1. Generate the final query: If no refinements have been done, the user can generate the final query by clicking “Construct Final Query”. Then the final query will be created and presented to the user using the syntax of Google and AlltheWeb search engines. The user can click on the corresponding button for the search engine that he or she wants to use. Since the query is presented in a text box, the user can modify it before submission. The results of the query are directly presented by the search engine.

2. Refine the query: The user can check the proposed refinements to see if any of them better represents his or her needs. Suppose that in our example, the user chooses the kitchenware refinement for the term fork, the query will be refined by substituting the term fork (and its negative and expanded knowledge) with the term kitchenware. Since the user is neither interested in drugs nor in politics, he or she does not need to select further refinements for the other two terms Georgia and buying. When refinements are selected, the “Construct Final Query” button will be changed to “Refine & Expand Query”. When the user clicks on this, the query is modified to “buying kitchenware Georgia” and sent to the query expansion module again because the new query terms (kitchenware) may involve additional expansions. Since the user may also be interested in the refinements of the new query, the possible refinements are created and presented to the user in a web page.

Assume that, after this refinement, the user agrees with the expanded query and therefore executes it by clicking on the “Construct Final Query” button. Then the final query is created and the user can execute it using Google or AllTheWeb search engines. Our system greatly improved the relevance of the results for this query, since there were eight relevant Web pages in the first 10 results returned for the refined query compared to only one from the initial query.

6. Validation

Sample queries have been executed with the results shown in Table 4. The base query and the expanded query were executed in Google and the number of relevant hits in the top 10 results were identified (Relevance Score). These sample queries show that the addition of semantic and linguistic knowledge helps improve query results.


 

Table 4.

Query results

Base query

Relevance score (Google)

Expanded query

Relevance score (our method)

Flute bohemian drink

1

(Flute OR champagne flute ) Bohemian (Drink OR beverage) – woodwind – drinking

4

Blues Suicide

3

Suicide (Blues OR depression) – blues music – the blues style of music

10

Find cookie stores

3

Cookie (stores OR retail store ) (Find OR encountering) – http cookie - http cookie – storing – retail space – fund – conscious activity

10

Pirates punishment

0

punishment ( Pirates OR pirate) – pirating – buccaneer – whitworth college

4

Coach agency rules

1

(agency OR organization) Coach ( rules OR code of conduct ) – federal agency – coaching – bus – governing – ruler – principle

5

Image download sunset

7

Wallpaper sunset download – sundown

10

Monkey virus

4

Monkey virus – computer virus

6

Buccaneer history

8

History piracy

10

Full-size table

View Within Article

 

 

In the first query the user searches information about bohemian flutes, which are glasses made in the bohemian region (Czech republic). This query returns a lot of information about flutes (the tube-shaped musical instrument). In the second query, the user is interested in the relationship between suicide and sadness emotions of people. In the third query, the user is interested in information about biscuit stores. Executing this query using a search engine yields irrelevant results because a cookie is also a web technique used for storing information in a user’s computers. The fourth query returns web pages about the sailors that performed piracy during the 17th, 18th and 19th centuries. However, the user is interested in the punishments for piracy, the illegal copying of copyrighted material. The fifth query returns a great deal of irrelevant information regarding vehicles and traveling. In the expanded query the possible meanings of the term coach is narrowed down and therefore the query results are improved. In the sixth query, the user wants to download images of sunsets in order to put them as wallpapers in his or her computer. Since our tool allows refining the term image to wallpaper, some of the results that dealt with screensavers can be discarded. The seventh query does not produce good results because the phrase monkey virus is ambiguous. There is a computer virus and a biological virus with the same name. In the refined query, the user indicates a preference for the biological virus sense. In the last query, the user is looking for the history of buccaneers. Some of the results obtained deal with the football team Tampa Bay Buccaneers. In all of the above queries, the refinements proposed by our system improve the query results.

 

4.方法
该方法使用ResearchCyc和WordNet中的语义、语言和现实信息来处理网页查询。这个方法的两个主要方面是查询词扩展和查询词优化。在查询词扩展中,通过用新的词项扩展来提高检索性能。扩展经常使用初始查询词项的同义词来进行的。查询词扩展同时也考虑初始词项和实例的动作和属性。例如,查询词“Pet”可以扩展成“Pet and Animal”。查询词优化是把查询词转化成新的更准确的反映用户需要的查询词的渐进过程。目的不是获得更好结果,而是改变(收缩和增长)用户搜索的结果。为了达到此目的,在使用本体的语义知识查询词可以自动更新之后,要求用户消除查询词歧异。这个过程使用了泛化关系类型、具体化关系类型、非分类关系和实例。例如:查询词“Pet”可能通过询问用户来优化:你对“Pet”感兴趣,对“Pet”的相关活动感兴趣吗?(1)买/卖Pets。(2)你所在的地方的Pets商店。(3)动物治疗的提供者。假如用户对他所在的地方的Pets商店感兴趣,他是住在Atlanta,使用ResearchCyc知识,查询词就可以自动的重新定义为:“Pet Store”and Atlanta and Georgia and buy。
提出的方法包括四阶段:(A)查询词分解,(B)查询词扩展,(C)查询词优化(D)查询词提交。查询词分解阶段是通过使用词性标注和识别词的类型(名词,动词,形容词,副词等)来分解自然语言查询词,这些词项形成了初始查询词。查询扩展阶段增加了相似的词项到查询词中和一些适当的消极知识。用户识别正确的词义和其它词义,其它词义是作为消极知识加进来的,因为用户对他们不感兴趣。查询词优化阶段是更新查询词使更好的集中表现用户的需要。这个阶段是通过使用ResearchCyc中的分类和非分类关系来实现的。查询词提交阶段是根据所使用的搜索引擎需要的语法来创建最终的查询词,提交查询词,向用户反馈结果。下面会给出方法的这些步骤。每一步之后,都会问用户查询词是否反映他的目的,如果是,就会使用正确的语法创建查询词,提交到搜索引擎。表3总结了方法中的这些步骤。 
下面的例子显示ResearchCyc中的概念用在推导查询词项和选择正确的意义和词项加到查询词中。假设用户想知道喝阿根廷人经常喝的马黛茶的地方,写下查询词“drinking mate in Barcelona”。第一步骤,分解查询词,输出的是初始查询词项的集合:drinking,mate和Barcelona。Drinking这个词在ResearchCyc中有三种意思:酒精饮料,作为名词喝,作为动词喝。Mate这个词在WordNet中定义关于巴拉圭的茶就有三个意思,因为在ResearchCyc中没有关于这个词的定义。我们使用WordNet和ResearchCyc中mate的超类型之间的联系来识别茶是跟ResearchCyc的概念-茶饮料相关的。最后,Barcelona只有一个意思:Barcelona城市。这样,对于提供的查询词的第二步操作结果是:{{AlcoholicBeverage,Drink,DrinkEvent},{SexualCopulation,partner,Tea-Beverage},{CityOfBarcelona}}.
在上面的查询词中,drinking有三种意思。该方法的第三个步骤中,Drink的第二个意思自动的被去掉,因为他是AlcoholicBeverage的超类型。这里,我们需要用户交互来确定我们是对喝这个动物感兴趣,而不是对酒精饮料感兴趣。因此,选择DrinkEvent这个概念作为第一个查询词项的正确意思。注意到第二个词“mate”的正确意思可以自动的推导出来,因为茶饮料跟其它两个词的意思相关联。特别的,茶饮料跟DrinkEvent相关联,因为茶饮料是“喝”的子类型,“喝”是跟“喝事件”相关联,“喝事件”表示喝的动作就是消费饮料。茶饮料是跟CityOfBarcelona相关联,因为它是城市的一个实例,是地点的子类型。地点跟EventLocalized相关联,它是“喝事件”的超类型,带有关系类型EventOccurs。因此,查询词的第二个词的其它意思将自动的被去掉,作为消极知识,因此,第三步骤返回下列的两个集合:{DrinkEvent,Tea-Beverage,CityOfBarcelona} {{AlcoholicBeverage},{SexualCopulation,partner},{}}
第一个集合表示查询词的相关意义,而第二个表示去掉意义。在我们例子的第四步中,查询词用概念Spain来扩展,因为CityOfBarcelona和Spain有着一种叫CountryOfCity的联系。因此,概念Spain加入到相关概念的表中。假设用户在第五步骤不选择任何优化。最后,第六步返回下面的查询词:“Drinking mate tea Barcelona Spain-alcoholic-love-sexual”。
5.原型的体系结构和实现
这个方法是使用J2EE技术在原型上实现的。这个原型是跟Google和AlltheWeb搜索引擎相连接。查询词扩展模块和查询词优化模块是通过ResearchCyc的Java API跟ResearchCyc相互通信的。API是用来查询ResearchCyc中的概念和对跟用户查询词项关联的概念作推论。
图2就是模型的体系结构,包括两个部分,客户端和用有java功能的服务器端。客户端是一个网页浏览器,他表示在服务器端创建的网页从用户收集信息和显示查询结果。服务器端包括四个主要组成部分:(1)查询词分解模块(2)查询词扩展模块(3)查询词优化模块(4)查询词 泛化模块。
查询词分解模块是得到用户查询词之后,用 QTag 分解器进行分解,返回每个词项的 词性。这样,一个基础的查询词就形成了。查询词扩展模块跟ResearchCyc知识源连接,支持查询词扩展步骤。对每一个查询词项,它获得相关概念和 同义词集,让用户选择正确的词义。在一些情况下,当查询词中的一些词项是关系类型,它跟查询词只有一个意思的其它概念关联时,词义就可以自动识别出来。查询词优化模块跟ResearchCyc连接,加入一些跟查询词相关的个人信息到查询词中以便约束查询的领域。基于用户选择的同义词集,就可以从ResearchCyc和WordNet得到选择同义词的多义和上下关系。这个模块使用ResearchCyc中的分类和非分类关系来推理概念和建议正确的查询词优化。当用户选择优化查询词,新的查询词就会发回给扩展模块,因为优化过的查询词在执行之前可以用新的信息进行扩展和进一步优化。查询词泛化模块使用搜索引擎的适当语法放大查询词。使用布尔操作符创建最后的查询词,注意确定最后查询词符合语法要求。搜索引擎界面使得最后查询词可以提交给各种搜索引擎并反馈结果给用户。
5.1实现
原型是使用JSP开发的网页应用程序。选择这个开发环境是因为它使得系统小,容易通过互联网访问。在客户端,使用网页从用户获得信息,例如初始查询词,用户对查询词项相关的意思的选择,优化。在服务器端,几个模块是使用java servlets实现的。这些模块是用来分解查询词、识别查询词项在本体中的正确意义、识别本休中跟查询词相关的其它概念、确定可能的查询词优化。查询词扩展模块和优化模块是通过ResearchCyc的javaAPI跟ResearchCyc交互。查询词生成模块是跟Google和AlltheWeb相连接。
5.2查询词样例
这一节描述了系统使用一个样例查询词怎么工作。假设在Atlanta的一个用户想买forks。因此,用户可能使用这样的查询词“buying fork Georgia”,假如我们在Google上执行这样的查询词,前10个返回的结果中只有一个跟用户相关。
在我们的系统中,用户可以输入初始查询词,然后点击“Parse Query”按钮。查询词就被发到服务器,分解成可识别的词项,在本样例中查询词项是{buying,fork,georgia}。
扩展模块确定ResearchCyc中与查询词项在语言上相关的概念。如果单个词项有多个概念与之相关,系统就会产生一个网页发送给用户。网页显示了模糊词项的不同意思,允许用户选择一个正确的意思。在我们的样例中,词项Georgia在本体中有本个不同的意思:佐治亚大学,美国佐治亚州,欧洲的一个国家。用户选择了佐治亚州,点击“Query Expansion and Refinement”按钮。因为本体不完整,“none of the previous senses”选项加入到网页中。如果该选项在查询词的环境中跟词项匹配,用户就可以选择该选项。
根据用户的选择,查询词扩展模块确定可以用来扩展查询词的概念。在这一点上,只使用地理信息和一部分关系类型来确定这些扩展。因为Georgia-State是美国一个州,查询词就使用the United States来扩展。查询词生成模块使用词项的正确意思和扩展来生成最终查询词。去掉的信息作为消极信息加到查询词中。最终的查询词是:fork Georgia buying-the university of Georgia =the republic of Georgia the United States。
接下来,系统通过学习相关概念知识来确定查询词可能的优化。但确定了所有可能的优化之后,系统就产生包括查询词和这些优化的网页。本例中,系统给出了fork的7个优化,Georgia的3个优化,buying的2个优化。
在图5中,用户有两个选择:
1.产生最终查询词:如果没有优化可选,用户可以通过点击“Construct Final Query”生成最终查询词。然后,使用搜索引擎的语法创建最终查询词并显示给用户。用户可以点击他想用的SE的相应按钮。因为查询词是显示在文本框里,用户在提交之前可以修改它。搜索引擎直接把结果显示出来。
2.优化查询词:用户可以检测提供的优化,看看有没有更符合自己需要的优化。要本例中,假如用户选择kitchenware作为fork的优化,查询就会重新定义,用kitchenware来代替fork。因为用户对药品和政治都不感兴趣,所以对其它两个词Georgia,buying不需进一步选择优化。当选择了优化,“Construct Final Query”按钮就会变成“Refine&Expand Query”。用户点击这个按钮,查询词就会变成“buying kitchenware Georgia”,再次传给扩展模块因为新的查询词项可能涉及另外的扩展信息。因为用户对新的词项的优化考兴趣,所以会创建候选的优化并在一个网页里显示给用户。
假如优化后,用户同意这个扩展的查询词,因些通过点击“Consruct Final Query”执行。就会创建最后的查询词用户可以使用搜索引擎搜索。我们系统大大的提高了查询词结果的相关性,因为优化查询词化,返回的前十个结果中,有8个是相关的,用初始查询词查的时候,才返回一个相关的。
6.验证
样例查询的执行结果显示在表4中,基础查询词和扩展查询词都在Google执行,把返回前10个结果中相关的个数定义为相关分数。这些样例查询显示增加语义和语言知识有助于改善查询结果。

在第一个查询中,用户搜索关于bohemian flutets,在bohemian地区是玻璃制造的。这个查询返回许多关于笛子的信息。第二个查询中,用户搜索自杀和心情难过之间的联系。第三个查询中,用户想搜索饼干店。使用搜索引擎执行查询,包括不相关的结果,因为cookie也是用来存储信息在用户电脑里的一个网页技术。第四个返回关于17,18,19世纪海盗的信息。然而,用户是对盗版(对有版权的资料进行非常复制)的处罚有兴趣。第五个查询词返回大量关于交通工具的无关信息,在查询词扩展中,coach的可能意思缩小了,因此搜索结果得到改善。在第六个查询词里,用户查下载日出的照片来作为电脑的壁纸。因为允许我们把image优化成wall-paper,所以关于屏保的结果会被去掉。第七个查询词不会查询出好的结果,因为monkey virus是模糊的,有电脑病毒和生物病毒两种意思。在查询优化里,用户指出对生物病毒这个意思有兴趣。在最后一个查询词里,用户想查询海盗的历史,所得结果中有一些关于足球队Tampa Bay Buccaneers的信息。在上面所有查询中,我们系统提供的优化改善了查询结果。

通过共享语义感知码本提高基于学习的图像传输语义编码效率是一个在图像传输领域具有创新性的研究方向。 在传统的图像编码中,往往侧重于对图像的像素级信息进行处理和压缩,而基于学习的语义编码则更关注图像中的语义内容。语义感知码本是一种能够对图像的语义信息进行有效表示的码本。共享语义感知码本的核心思想在于,不同的图像可能包含一些相似的语义元素,通过共享这个码本,可以避免对相同或相似语义的重复编码,从而提高编码效率。 从技术实现角度来看,需要构建一个能够准确捕捉图像语义信息的码本。这可能涉及到使用深度学习模型,如卷积神经网络(CNN),对大量图像进行训练,以学习图像中不同语义的特征表示,并将这些特征映射到码本中的特定条目。在图像传输时,发送端先将图像的语义信息与码本进行匹配,找到对应的码本条目,然后只需要传输这些条目的索引,而不是原始的图像数据。接收端则根据共享的码本和接收到的索引,重建出图像的语义信息。 这种方法的优势在于,一方面可以显著减少图像传输所需的数据量,降低传输带宽的要求;另一方面,由于关注的是语义信息,在一定程度上可以提高图像在接收端的理解和处理效率。例如,在一些智能视觉系统中,接收端可以直接根据重建的语义信息进行目标识别、场景分析等任务,而无需先对图像进行复杂的解码和特征提取。 ### 代码示例(简单示意构建码本和编码过程) ```python import numpy as np import torch import torch.nn as nn # 简单的CNN模型用于特征提取 class FeatureExtractor(nn.Module): def __init__(self): super(FeatureExtractor, self).__init__() self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1) self.relu = nn.ReLU() self.pool = nn.MaxPool2d(2, 2) def forward(self, x): x = self.conv1(x) x = self.relu(x) x = self.pool(x) return x # 生成码本(简单示例,随机生成) codebook_size = 100 codebook = np.random.randn(codebook_size, 16 * 16 * 16) # 模拟图像输入 image = torch.randn(1, 3, 32, 32) # 提取特征 extractor = FeatureExtractor() features = extractor(image).view(1, -1).detach().numpy() # 查找最近的码本条目 distances = np.linalg.norm(codebook - features, axis=1) index = np.argmin(distances) print(f"编码后的索引: {index}") ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值