Datamining Concepts

本文概述了信息技术领域的多个关键概念和术语,包括用例、词法分析、停止词、广泛查询与精确匹配、双盲实验、TF-IDF、随机游走、蒙特卡洛方法、启发式方法、召回与精度等,提供了对这些核心概念的理解和应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  1. Use case: a use case is a list of steps, typically defining interactions between a role (known in UML as an "actor") and a system, to achieve a goal. The actor can be a human or an external system.
  2. e.g.
  3. Stemming: In linguisticmorphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form.
  4. Stop words: In computing, stop words are words which are filtered out prior to, or after, processing of natural language data (text).
  5. Broad query and exact match: in results of broad query, keywords can appear in any order, exact match otherwise.
  6. double-blind experiment: an experimental procedure in which neither the subjects of the experiment nor the persons administering the experiment know the critical aspects of the experiment; "a double-blind procedure is used to guard against both experimenter bias and placebo effects"
  7. tf-idfterm frequency–inverse document frequency, is a numerical statistic which reflects how important a word is to a document in a collection or corpus. the number of times a term occurs in a document is called its term frequency. Tf–idf is the product of two statistics, term frequency and inverse document frequency. Then tf–idf is calculated as:
    \mathrm{tfidf}(t,d,D) = \mathrm{tf}(t,d) \times \mathrm{idf}(t, D) ; idf(t,D)=log(|D|/(t出现的document数+1))
    where ft(t,d) can be raw frequency of a term

  8. Random walk: A random walk is a mathematical formalization of a path that consists of a succession of random steps.
  9. Monte Carlo methods: 也称统计模拟方法,对应确定性方法(deterministic).
  10. Heuristic: refers to experience-based techniques for problem solving, learning, and discovery. Examples of this method include using a rule of thumb, an educated guess, an intuitive judgment, or common sense. The most fundamental heuristic is trial and error
  11. recall and precision:  precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. eg: Suppose a program for recognizing dogs in scenes identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct, but 3 are actually cats, the program's precision is 4/7 while its recall is 4/9:
  12.  recall is 0.6; precision is 0.75
  13. 1 kilogram equals 2.2 pounds.
  14. CMS: Content Management System.
  15. SERP: Search Engine Results Page, is the actual result returned by a search engine in response to a keyword query.
  16. Web Graph:created by all World Wide Web pages as nodes and hyperlinks as edges
  17. Ad-hoc & A priori:
    • Ad-hoc: It generally signifies a solution designed for a specific problem or task, non-generalizable, and not intended to be able to adapted to other purposes
    • A priori: A priori knoledge or justification is independent of experience
  18. Heuristic: refers to experience-based techniques for problem solving, learning, and discovery that give a solution which is not guaranteed to be optimal. Where the exhaustive search is impractical, heuristic methods are used to speed up the process of finding a satisfactory solution via mental shortcuts to ease the cognitive load of making a decision. Examples of this method include using a rule of thumb, an educated guess, an intuitive judgment, stereotyping, or common sense.
  19. True positive, etc:
  20. Positive = identified and negative = rejected.
    
    Therefore:
    True positive = correctly identified
    False positive = incorrectly identified
    True negative = correctly rejected
    False negative = incorrectly rejected
  21. A power law relationship between two quantities x and y can be written as : y = ax^k. (a,k are constants)

  22. A priori knowledge: "from the earlier", a posteriori is "from later"
  23. s.t.: meaning, "such that", "subject to"
  24. v
  25. v
  26. v
  27. v
  28. v

转载于:https://www.cnblogs.com/wade-case/archive/2013/04/03/2998870.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值