CIKM competition
TASK: Design a language-independent algorithm to automatically detect the intent of unlabeled queries in the dataset.
Target categories : “VIDEO”, “NOVEL”, “GAME”, “TRAVEL”, “LOTTERY”, “ZIPCODE”, and “OTHER”.
Dataset :
dataset D1: 970,000 queries that are of length more than 2 bytes and occur at least 10 times a day.
dataset D2: D1 can label to severl queries, yet D2 limit each cell can be labeled only up to 2 categories, and filter out those bigger than 2. Reduced down to about 7W queries.
dataset D3: Collect “click” and “session” data from user logs.
dataset D4: code the chinese characters into digit code.
dataset D5: 50% data to train, and with some miss values, write the results to D5.