CS209-Course-Notes

本文介绍了不同类型的在线广告,如搜索广告、原生广告及展示广告,并探讨了它们的工作原理和技术实现方式。此外,还详细讲解了信息检索系统中倒排索引、查询处理等关键技术。

Lecture1:

Ads types

Search Ads:

  • logic: match ad’s keywords to user’s query
  • ads format: text, image
  • ads position: main line, side bar, top, bottom of search result

Native Ads:

  • logic: match ad’s keywords to web page or APP’s context
  • ads format: text, image , style should also match context of web page or APP
  • ads position: embedded in original content of web page or APP

Display Ads:

  • logic: match user’s demographic, interests to ad’s category interests collected from user behavior: page dwell time, click, video engagement time
  • ads format: image , animation(gif), video, audio
  • ads position: sidebar, top, bottom of page or App

Ads data structure

What is a campaign:

  • A campaign focuses on a theme or a group of products
  • set a budget
  • choose your audience
  • write your ad including keywords, ad content

Ad:

  • AdID
  • CampaignID
  • Keywords
  • Bid
  • Description
  • LandingPage

Campaign

  • CampaignID
  • Budget*

Search Ads Workflow

Search Ads Workflow

Lecture2:

Information Retrieval(IR)

Finding material(usually documents) of an unstructured nature(usually text) that satisfies an information need from within large collections(stored on computer), e.g. web search, e-mail search, etc.

Inverted index

For each term t, we must store a list of all documents that contain t, which is identified by a docID.

Inverted index construction

  • Tokenization
    Cut character sequence into word tokens
  • Normalization
    Map text and query term to same form: lower case, U.S.A->USA
  • Stemming
    We may wish different forms of a to match: am, are, is->be; cars, car’s, cars’->car
  • Stop words
    Omit very common words like preposition: of, on

How to process a query like A and B, “Alice and Bruce”
- Locate and merge, form is (term : docs)
How to process a query like A B, “star wars”
- New form (term, num of docs; doc1: pos1, pos2, …; doc2: pos1, pos2, …;), calculate distance in shared index.

Application of IR in search ads

  • build inverted index for ad: Key->term in key words, Value->list(Adid)
  • build forward index for ad detail info
  • process query
  • rank ads candidates

How to rank ads candidates?

  • Relevance score = num of words match in our query/ total num of key words

Web service

Web services are client and server applications that communicate over the WWW via Hyper Text Transfer Protocol(HTTP)
Web Services

Component:

  • Client: PC, mobile phone, tablet
  • Protocol: HTTP
  • Web Server: Tomcat, nginx, IIS, Jetty
  • Data Layer: SQL database, NOSQL, document

The HTTP is an application-level protocol for distributed, collaborative, hypermedia information systems. It is used to deliver data on WWW

  • Connectionless
    The HTTP client, i.e., a browser initiates an HTTP request and after a request is made, the client disconnects from the server and waits for a response. The server processes the request and re-established the connection with the client to send a response back.
  • Media independent
    Any type of data can be sent by HTTP as long as both the client and the server know how to handle the data content.
  • Stateless
    HTTP is connection-less and it is a direct result of HTTP being a stateless protocol. The server and client are aware of each other only during a current request.
    HTTP Header

How web server handle http request?

  • AuthTrans
    Verify any authorization info sent in the request
  • NameTrans
    Translate the logical URL into a local file system path
  • PathCheck
    Check local file system path for validity and check the the requestor has access privileges to the requested resource on the file system
  • ObjectType
    Determing the Multi-purpose Internet Mail Encoding(MIME-type) of the requested resource
  • ParseParams
    Process incoming request data read by the service step
  • Service(generate response)
    generate and return the response to the client
  • Error
    if an error happens, the server log the error message and aborts the process

Map Reduce

  • Map
    Divides the input into ranges and creates a map task to transfer each partition
    input: any string
    output: key, value
  • Shuffle
    Distribute partitions to different machine by key
  • Reduce
    Collects the various results and combines them to answer the larger problem that the master node needs to solve
    input : key, list(value)

Lecture3

Query Rewrite

  • Goal
    Find queries related to the issued one, which would allow us to retrieve relevant ads that were not matched by the original
  • Approach
    Find K nearest neighbors of original query, semantically
    similar queries
  • Intuition
    If we can find vector representation of query, then we can
    calculate similarity by cosine of two vectors

Normally, a customer would generate a query like “an outdoor beach furniture”, we would first find its K-nearest neighbors(similar queries) and compare their similarity. To generate a vector, we would use the input one-hot vector to calculate word vector, and the word vector will calculate the output vector, e.g. the word vector of “ant” times the output layer of word “car” will give us a value which will be put into softmax layer to calculate probability

word2vec

  • skip gram model
    • for a given word in a sentence, what is the probability of each and every other word in our vocabulary appearing anywhere within a small window around the input word
    • for example, given word “trump”, trained model is going to say that words like “president” ,”elect” and “donald” have a high probability of appearing nearby, and unrelated words like “cook” and “movie” have a low probability
  • skip gram model training
    • training data: vocabulary of V unique words
    • input word representation: one-hot vector for each word, this vector will have V components (one for every word in our vocabulary) and we’ll place a “1” in the position corresponding to the word, and 0s in all of the other positions
    • output : a single vector containing, for every word in our vocabulary, the probability that each word would appear near the input word.

How to calculate query rewrite with word2vec?

  • term level: replace query term with similar terms
  • phrase level: replace phrase from query and embed it with similar phrases

Query Intent Extraction

  • Goal
    Generate sub-queries which preserve the intent of the original query the best and allow us to retrieve more relevant ads
  • Approach
    Logistic regression classifier is used to determine the goodness of each sub-query
  • Intuition
    Historically good sub query has more clicks on relevant ads
    which contain terms in sub query
    Query Intent Extraction Example

How to generate sub-queries?

  • remove stop words
  • generate n gram as sub-query (2<= n <= N - 1)

How to quantify good sub-query?
Mutual Click Intent

Feature

MCI Features

  • Click Intent Rank(CIR)
    CIR quantify the contribution of each token to query intent and indicate how important token v is in the query
    intuition: important tokens can generate good sub query
    query: stella artois beer prices

CIR

Apply PageRank algorithm
PageRank for CIR

CIR Features

标题基于Python的自主学习系统后端设计与实现AI更换标题第1章引言介绍自主学习系统的研究背景、意义、现状以及本文的研究方法和创新点。1.1研究背景与意义阐述自主学习系统在教育技术领域的重要性和应用价值。1.2国内外研究现状分析国内外在自主学习系统后端技术方面的研究进展。1.3研究方法与创新点概述本文采用Python技术栈的设计方法和系统创新点。第2章相关理论与技术总结自主学习系统后端开发的相关理论和技术基础。2.1自主学习系统理论阐述自主学习系统的定义、特征和理论基础。2.2Python后端技术栈介绍DjangoFlask等Python后端框架及其适用场景。2.3数据库技术讨论关系型和非关系型数据库在系统中的应用方案。第3章系统设计与实现详细介绍自主学习系统后端的设计方案和实现过程。3.1系统架构设计提出基于微服务的系统架构设计方案。3.2核心模块设计详细说明用户管理、学习资源管理、进度跟踪等核心模块设计。3.3关键技术实现阐述个性化推荐算法、学习行为分析等关键技术的实现。第4章系统测试与评估对系统进行功能测试和性能评估。4.1测试环境与方法介绍测试环境配置和采用的测试方法。4.2功能测试结果展示各功能模块的测试结果和问题修复情况。4.3性能评估分析分析系统在高并发等场景下的性能表现。第5章结论与展望总结研究成果并提出未来改进方向。5.1研究结论概括系统设计的主要成果和技术创新。5.2未来展望指出系统局限性并提出后续优化方向。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值