Log-Likelihood Ratio

博客回顾了使用Log-Likelihood Ratio(LLR)进行相似商品计算的经验。核心是分析事件计数,包括同时发生、单一事件发生和都没发生的次数。通过计算LLR分数,利用香农熵H,来评估事件的相关性。提供了R语言实现LLR的代码,并引用了Mahout和相关资源。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在上一家公司用LLR做过相似商品计算,最近在找工作,在这里复习一下.
LLR方法的核心是分析事件的计数,特别是事件同事发生的计数. 我们需要的计数一般包括:
1. 两个事件同时发生的次数(k_11)
2. 一个事件发生而另一个事件没有发生的次数(k_12,k_21)
3. 两个事件都没有发生(k_22)

EventAEverything but A
Event BA and B together(k_11)B,but not A(k_12)
Everything but BA without B(K_21)Neither A nor B(k_22)

一旦有了这些计数计算log-likelihood ratio分数就很简单了.
LLR=2 sum(k)(H(k)-H(rowSums(k))-H(colSums(k)))
H表示香农熵. 在R可以如下计算:

H = function(k){N=sum(k);return (sum(k/N*log(k/N+(k==0)))}

下面是Mahout的代码

/**
   * Calculates the Raw Log-likelihood ratio for two events, call them A and B.  Then we have:
   * <p/>
   * <table border="1" cellpadding="5" cellspacing="0">
   * <tbody><tr><td>&nbsp;</td><td>Event A</td><td>Everything but A</td></tr>
   * <tr><td>Event B</td><td>A and B together (k_11)</td><td>B, but not A (k_12)</td></tr>
   * <tr><td>Everything but B</td><td>A without B (k_21)</td><td>Neither A nor B (k_22)</td></tr></tbody>
   * </table>
   *
   * @param k11 The number of times the two events occurred together
   * @param k12 The number of times the second event occurred WITHOUT the first event
   * @param k21 The number of times the first event occurred WITHOUT the second event
   * @param k22 The number of times something else occurred (i.e. was neither of these events
   * @return The raw log-likelihood ratio
   *
   * <p/>
   * Credit to http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html for the table and the descriptions.
   */
  public static double logLikelihoodRatio(long k11, long k12, long k21, long k22) {
    Preconditions.checkArgument(k11 >= 0 && k12 >= 0 && k21 >= 0 && k22 >= 0);
    // note that we have counts here, not probabilities, and that the entropy is not normalized.
    double rowEntropy = entropy(k11 + k12, k21 + k22);
    double columnEntropy = entropy(k11 + k21, k12 + k22);
    double matrixEntropy = entropy(k11, k12, k21, k22);
    if (rowEntropy + columnEntropy < matrixEntropy) {
      // round off error
      return 0.0;
    }
    return 2.0 * (rowEntropy + columnEntropy - matrixEntropy);
  }

  private static double xLogX(long x) {
    return x == 0 ? 0.0 : x * Math.log(x);
  }

  /**
   * Merely an optimization for the common two argument case of {@link #entropy(long...)}
   * @see #logLikelihoodRatio(long, long, long, long)
   */
  private static double entropy(long a, long b) {
    return xLogX(a + b) - xLogX(a) - xLogX(b);
  }

  /**
   * Merely an optimization for the common four argument case of {@link #entropy(long...)}
   * @see #logLikelihoodRatio(long, long, long, long)
   */
  private static double entropy(long a, long b, long c, long d) {
    return xLogX(a + b + c + d) - xLogX(a) - xLogX(b) - xLogX(c) - xLogX(d);
  }

参考:
http://tdunning.blogspot.hk/2008/03/surprise-and-coincidence.html

警告: 无法加载工具箱路径缓存 C:\Users\Administrator\AppData\Local\MathWorks\MATLAB\R2024b\toolbox_cache-24.2.0-390930909-win64.xml。该缓存文件的格式不正确。退出 MATLAB 时将重新生成该文件。 >> help ldpcdecode --- 未找到 ldpcdecode。改为显示 ldpcDecode 的帮助。--- ldpcDecode - Decode binary LDPC code This MATLAB function decodes the input log-likelihood ratio (LLR), llr, using the LDPC matrix specified by the input ldpcDecoderConfig configuration object, decodercfg. 语法 [Y,actualnumiter,finalparitychecks] = ldpcDecode(llr,decodercfg,maxnumiter) [Y,actualnumiter,finalparitychecks] = ldpcDecode(llr,decodercfg,maxnumiter,Name=Value) 输入参数 llr - Log-likelihood ratios matrix decodercfg - LDPC decoder configuration ldpcDecoderConfig object maxnumiter - Maximum number of decoding iterations positive scalar 名称-值参数 OutputFormat - Output format 'info' (默认值) | 'whole' DecisionType - Decision type 'hard' (默认值) | 'soft' MinSumScalingFactor - Scaling factor for normalized min-sum decoding algorithm 0.75 (默认值) | scalar in the range (0, 1] MinSumOffset - Offset for min-sum decoding algorithm 0.5 (默认值) | scalar Termination - Decoding termination criteria 'early' (默认值) | 'max' Multithreaded - Enable multithreaded execution true or 1 (默认值) | false or 0 输出参数 Y - Decoded codewords matrix actualnumiter - Actual number of decoding iterations row vector finalparitychecks - Final parity checks for each codeword matrix 示例 Decode Rate 3/4 LDPC Codewords LDPC Decoding Using GPU 另请参阅 ldpcEncode, ldpcQuasiCyclicMatrix, dvbs2ldpc, ldpcDecoderConfig, ldpcEncoderConfig 已在 R2021b 中的 Communications Toolbox 中引入 ldpcDecode 的文档
最新发布
03-28
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值