读 Forecasting High-Dimensional Data

高维属性组合的实时在线预测

最新推荐文章于 2025-05-30 14:07:18 发布

原创最新推荐文章于 2025-05-30 14:07:18 发布 · 1k 阅读

0 ·

CC 4.0 BY-SA版权

本文探讨了如何在不增加过多计算和存储需求的情况下，实时预测任意属性组合的方法。通过只预估一部分属性组合并利用高维属性相关模型动态预测其他组合，解决了稀疏性问题，并确保了需求响应的准确性。

ABSTRACT
•Challenging
–many possible attribute combinations that need to be forecast
•Address
–only a sub-set of attribute combinations are explicitly forecast and stored
–the other combinations are dynamically forecast on-the-fly using high dimensional attribute correlation models

INTRODUCTION
•Problem
–How do we forecast arbitrary attribute combinations without excessive computational and space requirements, while still maintaining real-time response?
•Advantage
–Hold new attributes add
–Not suffer from the sparsity problem
–Adapt guaranteed need

Data and Query Model
•Gender = Male, Age= 30, Location = California, Interested In Sports= True, Interested In Finance= False, Planning Vaction = True, Page-Category = Sports, ..., Time = 31 October 2009 11:00pm
•a query as (Page Category = Sports ^(Gender =Male V Age[25, 35]) ^ Time [1 Aug 2009 — 31 Oct 2009])

Forecasting Problem Statement
•count forecast problem
–forecast the number of points in the query region
•sample forecast problem
–forecast a sample of points in the query region
–be used to compute the number of user visits in the query region have already been assigned to previous guaranteed contracts

Solution Overview
•when the query arrives
–we first map a sub-set of the query attributes to an attribute combination that has time-series forecasts to obtain future trend information.
–Then, we multiply this trend count with the correlation ratios (obtained from the correlation model) for the other query attributes to obtain the forecast count for the query

•Which attribute combinations do we forecast trends for?
–beyond the scope of this paper
•How do we effectively represent correlations in a high-dimensional space?
–a naive Bayesian model
–a partially independent model
–a fully correlated model

System Architecture