xilinx 权威设计指南_设计产品指标的权威指南-优快云博客

本文翻译自《The Definitive Guide to Designing Product Metrics》，聚焦于Xilinx设计的权威指南，阐述如何制定和衡量产品设计的关键指标。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

xilinx 权威设计指南

By Zachary Thomas (zthomas.nc@gmail.com, Twitter, LinkedIn)

由Zachary Thomas( zthomas.nc@gmail.com ， Twitter ， LinkedIn )

(This article was originally written as a Quip doc)

(本文最初是作为Quip doc编写的)

背景 (Background)

If you have a technology product, you’ll want metrics for it too. That’s the advice given by Reddit’s co-founders in this video: you’ll want to track something and its trends to avoid data debt when starting a product. Product and engineering teams at technology companies now view this as conventional wisdom. A product’s success and quality is only as good as metrics that verify it. You can’t improve it if you don’t measure it.

如果您拥有技术产品，那么您也将需要度量标准。这就是Reddit联合创始人在此视频中提供的建议：您需要跟踪某些事物及其趋势，以避免在启动产品时出现数据不足的情况。现在，技术公司的产品和工程团队将其视为惯例。产品的成功和质量仅取决于验证产品的指标。如果不衡量，就无法改进。

Reading the news and watching movies like The Social Dilemma, there seems to be a broad desire by the technology industry and society to move past merely using engagement metrics as success. That said, I’ve encountered surprisingly few approachable online resources that discuss how to design a good metric. That’s pretty surprising since this is one of the core responsibilities of data science teams! This guide is meant to be a step forward in filling that gap.

阅读新闻和观看电影《社会困境》 ( The Social Dilemma) ，技术行业和社会似乎普遍希望仅凭参与度指标作为成功来超越。就是说，我很少遇到可访问的在线资源，这些资源讨论了如何设计好的指标。这是数据科学团队的核心职责之一，这真是令人惊讶！本指南旨在填补这一空白。

两个指标组：“精确度”和“召回率” (Two buckets of metrics: Precision and Recall)

Before we design new metrics, we should understand what existing metrics already measure. Personally, I have found it helpful to bucket metrics into two broad categories: precision and recall.

在设计新指标之前，我们应该了解现有指标已经进行了哪些测量。就个人而言，我发现将指标分为两大类很有帮助： Precision和Recall 。

Analysts can classify existing metrics into these two buckets and find measurement gaps for new metrics to address. Or, this framework can help contextualize proposed new metrics among existing ones. The first recall metric might be more impactful than the tenth precision metric and vice versa.

分析师可以将现有指标分类为这两个类别，并找到差距以供新指标解决。或者，此框架可以帮助在现有指标中将建议的新指标进行上下文化。第一个召回指标可能比第十个精度指标更具影响力，反之亦然。

精密指标🎯 (Precision Metrics 🎯)

Precision metrics measure usage and feedback on the current iteration of the product. These metrics are usually derived from product logging. Teams use them to measure growth and optimize features. In fact, I would say the vast majority of metrics that analysts work with and design are precision-like in nature.

精度指标可衡量产品使用情况和对当前产品迭代的反馈。这些指标通常来自产品记录。团队使用它们来衡量增长并优化功能。实际上，我要说的是，分析师使用和设计的绝大多数指标本质上都是精确的。

Examples include:

示例包括：

DAU, MAU and other usage metrics: Understanding the total usage and engagement of products and their features is the core deliverable of product analytics teams
DAU，MAU和其他使用情况指标：了解产品及其功能的总体使用情况和参与度是产品分析团队的核心成果
Read more: See the non-revenue metrics in the “Advertising” section of Y Combinator’s Key Metrics guide
阅读更多：查看Y Combinator关键指标指南中“广告”部分的非收入指标
CSAT and other product-focused survey metrics: CSAT-like metrics (rating features or products on a scale of 1 to 5, asking why via free-form responses) focus on collecting feedback on the current state of the product
CSAT和其他以产品为中心的调查指标：类似CSAT的指标(等级为1或5的产品或产品的等级，询问为什么要通过自由格式的回复)着重于收集有关产品当前状态的反馈
Read more: This GetFeedback article outlines two ways to calculate CSAT and different industry benchmarks
阅读更多： GetFeedback这篇文章概述了两种计算CSAT的方法以及不同的行业基准
Latency metrics: Latency metrics measure product load times and infrastructure performance
延迟指标：延迟指标可衡量产品加载时间和基础架构性能
Read more: Latency metrics are not usually handled by product analyst teams, but very mature products may want to understand the relationship between latency and product growth and satisfaction. This article by Treynor Sloss, Nukala and Rau on how Google thinks about infrastructure metrics may offer some ideas in this area
阅读更多：延迟指标通常不由产品分析师团队处理，但是非常成熟的产品可能希望了解延迟与产品增长和满意度之间的关系。 Treynor Sloss，Nukala和Rau的这篇文章，介绍了Google如何看待基础架构指标，可能会在这一领域提供一些想法

召回指标📚(Recall Metrics 📚)

Recall metrics track product performance against a ground truth. If precision metrics measure growth and optimize existing features, recall metrics help measure product quality and drive new feature development. Unlike precision metrics, recall metrics may need more than product logging to measure. User surveys or data labeling by teams of humans may serve as the data inputs to recall metrics. Sometimes, User Experience Research teams will own survey-based recall metrics instead of data science teams which tend to be more focused on logging-based metrics.

召回指标根据基本事实跟踪产品性能。如果精度指标可衡量增长并优化现有功能，则召回指标可帮助衡量产品质量并推动新功能的发展。与精度指标不同，召回指标可能需要的不仅仅是产品记录。由用户团队进行的用户调查或数据标记可以用作召回指标的数据输入。有时，用户体验研究团队将拥有基于调查的召回指标，而不是数据科学团队，后者更倾向于基于日志记录的指标。

Examples include:

示例包括：

Net Promoter Score (NPS): Arguably, the most famous recall metric is NPS, as customer loyalty for a product is a function of possible alternatives
净促销分数(NPS) ：可以说，最著名的召回指标是NPS，因为客户对产品的忠诚度取决于可能的替代方案
Read more: This Qualtrics overview includes tips on NPS follow-up questions in addition to the metric’s calculation
阅读更多： Qualtrics概述除了度量的计算之外，还包含有关NPS后续问题的提示
Recall for recommender or search systems: Recall metrics measure if the end user’s intent was actually fulfilled by a recommendation or search system’s results
召回推荐人或搜索系统：召回指标衡量推荐或搜索系统的结果是否真正满足了最终用户的意图
Read more: As slide 21 of this Intro to ML lecture shows, you can use actual user likes/clicks as the ground truth for recall grading
阅读更多：正如ML入门讲座的幻灯片21所示，您可以将实际用户的喜欢/点击用作回忆等级的基础知识
Note: This approach may inflate recall scores since it excludes potential likes that weren’t even available as options on the product. For example, a recall metric should ideally penalize a e-commerce product’s recommendations if a user searches for a product and doesn’t see it listed at all 🤯. Teams can produce high-coverage recall metrics if humans manually score samples of recommender system output based on user intent. Analyst input or guidance can help reduce bias in these manual labeling processes
注意：这种方法可能会提高回想得分，因为它会排除甚至无法作为产品选件提供的潜在赞。例如，如果用户搜索某产品但根本没有列出该产品，则召回指标理想情况下应惩罚电子商务产品的建议。如果人们根据用户意图手动对推荐系统输出的样本进行评分，则团队可以制定高覆盖率的召回指标。分析师的意见或指导可以帮助减少这些手动标记过程中的偏差
Competitive analysis metrics: For products with strong competitors, analysts can design product preference, quality, or task completion metrics to compare products over time
竞争分析指标：对于具有强大竞争者的产品，分析师可以设计产品偏好，质量或任务完成指标，以比较一段时间内的产品
Read more: This blog post describes how Loup Ventures asked Siri, Google Assistant, and Alexa the same 800 queries and graded each based on response correctness and query understanding. They repeated the exercise three months later to see how much each product had improved on the same set of queries
阅读更多：这篇博客文章描述了Loup Ventures如何向Siri，Google Assistant和Alexa提出相同的800条查询，并根据响应正确性和对查询的理解对每个查询进行评分。三个月后，他们重复了该练习，以查看每种产品在同一组查询中的改进情况

公制生命周期(The Metric Lifecycle)

Now that we’ve established what kind of metrics exist, let’s dive into the process of creating, reporting, and possibly sunsetting a metric.

现在我们已经建立了什么样的指标，让我们深入研究创建，报告和可能取消指标的过程。

新指标从何而来？ 👶 (Where do new metrics come from? 👶)

Broadly, shifts in product direction and market, product, or customer maturity can drive the need for new metrics. More specifically, new metrics can come from:

从广义上讲，产品方向和市场，产品或客户成熟度的转变会推动对新指标的需求。更具体地说，新指标可以来自：

Establishing the user funnel or evolving the existing one: New products will derive their initial set of metrics from their user acquisition and engagement funnels
建立用户渠道或发展现有渠道：新产品将从其用户获取和参与渠道中获取其初始指标集
Read more: The best tutorial I’ve seen for designing a user engagement funnel and its associated metrics is Lesson 3 from Udacity’s course on A/B testing
阅读更多：关于设计用户参与渠道及其相关指标，我见过的最好的教程是Udacity的A / B测试课程的第3课
Current metrics don’t respond to feature launches: New features may stop showing statistical and practical improvements on a product’s funnel metrics. This might happen when a product saturates a market, or when the product becomes sufficiently complex. Analyst teams in this situation may need to design new, more actionable metrics
当前的指标对功能的发布没有响应：新功能可能会停止显示产品渠道指标的统计和实际改进。当产品饱和市场或产品变得足够复杂时，可能会发生这种情况。在这种情况下，分析团队可能需要设计新的，更可行的指标
Read more: In the Causal Proximity section of Designing and Evaluating Metrics, Sean Taylor describes how product and engineering teams should be able to impact the drivers of a metric through feature launches. Metrics lacking this property aren’t actionable
阅读更多：在“设计和评估指标”的“因果关系”部分中，Sean Taylor描述了产品和工程团队应如何通过功能发布来影响指标的驱动因素。缺乏此属性的指标无法执行
User complaints or notable losses: Repeated negative user or customer feedback as well as press and market analyst commentary may drive the creation of quality-focused metrics
用户投诉或重大损失：反复的负面用户或客户反馈以及媒体和市场分析师评论可能会推动建立以质量为中心的指标
Read more: As described in this New America case study (see text around footnotes 82–89), starting around 2016 Youtube began measuring video satisfaction via user surveys to better optimize Youtube’s recommendations around user happiness and satisfaction instead of watch time
阅读更多：如此新美国案例研究所述(请参阅脚注82-89周围的文本)，从2016年开始，Youtube开始通过用户调查来衡量视频满意度，以更好地优化Youtube关于用户满意度和满意度的建议，而不是观看时间
Directives from Leadership + Annual Planning: On a more practical note, analyst teams often invest in new metrics during annual planning to align metrics with the product or company strategy for the following year
领导力和年度计划的指令：从更实际的角度来看，分析师团队经常在年度计划中投资新的指标，以使指标与下一年的产品或公司战略保持一致

提出新的指标：面试框架和古德定律📝(Proposing a new metric: Interview Frameworks and Goodhart’s Law 📝)

Once analyst teams establish the need for a new metric, analysts begin metric design work. Data science teams expect these analysts to reason through the potential second-order effects of using a proposed metric in experiments and performance tracking

一旦分析师团队确定了对新指标的需求，分析师便开始进行指标设计工作。数据科学团队希望这些分析人员能够通过在实验和性能跟踪中使用建议的指标来推断潜在的二阶影响

Data scientist interviews often ask case study questions around this reasoning process. In my experience, the interviewer will either ask the candidate to design a metric or will propose a metric and ask the candidate to evaluate it. Here is a framework to approach these questions:

数据科学家访谈经常围绕这种推理过程提出案例研究问题。根据我的经验，面试官会要求候选人设计一个度量标准，或者提出一个度量标准并要求候选人评估它。这是解决这些问题的框架：

🤔 Ask clarifying questions to understand data inputs: Make sure you and the interviewer align on what user actions or other data inputs could inform the metric. For example, on a social media product scrolling, likes, link sharing, status posting, messages sent, etc. could all inform a success metric focused on engagement
🤔提出明确的问题以理解数据输入：确保您和访问员就哪些用户操作或其他数据输入可以告知度量标准保持一致。例如，在社交媒体产品上滚动时，喜欢，链接共享，状态发布，发送的消息等都可以告知关注参与度的成功指标
Tip: Since data inputs will be product specific, I would recommend studying the product your interviewer will ask about and creating a cheatsheet about the metrics you could envision teams at the company using
提示：由于数据输入是针对特定产品的，因此我建议您研究您的面试官会询问的产品，并创建一份备忘单，以了解可以使用该公司的团队设想的指标
For example, annotating “What are the most important ride sharing metrics for a company like Lyft or Uber?” and its related questions could help familiarize yourself with ride-sharing metrics and their inputs
例如，注释“对于Lyft或Uber这样的公司，最重要的乘车共享指标是什么？” 及其相关问题可以帮助您熟悉乘车共享指标及其输入
Actually trying out a product and understanding its mechanics and features also helps — a surprising number of candidates don’t do this!
实际试用产品并了解其机制和功能也有帮助-数量惊人的候选人却没有这样做！
🤝 Align on what behaviors or properties the metric should measure: Repeat back the question to the interviewer and ask about edge cases. For example, if an interviewer at an e-commerce platform asks to define a metric classifying successful customer accounts, it may be worth asking if the team expects this metric to classify an account with high transaction volume but low and declining NPS as successful
on根据指标应衡量的行为或属性：将问题重复回访员并询问边缘情况。例如，如果电子商务平台上的访问者要求定义对成功客户帐户进行分类的指标，则可能值得询问的是，团队是否希望该指标将交易量高但NPS下降且下降的帐户分类为成功
🧑‍🎓 Propose metric: My advice is to err towards simplicity and let follow-on questions help decide if you need to make your metric more complex
提出度量标准：我的建议是偏向简单，让后续问题帮助您确定是否需要使度量标准更复杂
Tip: If a top-line metric, interviewers may follow-up with what time granularity your metric should be. For example, active users on a daily, weekly, or month time scale?
提示：如果是最重要的指标，则访问员可以跟进您的指标应采用的时间粒度。例如，每天，每周或每月时间范围内的活跃用户？
⬇️ Discuss what behaviors the metric will discourage: What happens if users only end up doing what the metric measures and stop other behaviors on the product? Answering this hypothetical question teases out undesired second-order effects of a metric
⬇️讨论该度量标准将阻止哪些行为：如果用户最终只执行该度量标准所测量的内容并停止产品上的其他行为，会发生什么？回答这个假想的问题，可以弄清度量标准的不良二阶效应
Example: You propose scrolling as the success metric for a feed-based social media product. If users ended up only scrolling on the product, this would zero out usage on actions that reduce scrolling like posting, commenting, and clicking on links. Is the team alright with incentivizing this outcome?
示例：您建议将滚动作为基于Feed的社交媒体产品的成功指标。如果最终用户只滚动产品，则这将使减少滚动的操作(例如发布，评论和单击链接)的使用率归零。激励这个结果的团队还不错吗？
🚧 Discuss how Product teams could artificially increase (“hack”) the metric: Goodhart’s Law states that when people know their performance is based on a metric, they adjust their behavior to optimize that metric. In the context of technology products, Product managers, designers and engineers may start altering the product in order increase a success metric despite negative trade-offs
🚧讨论产品团队如何人为地增加(“破解”)指标：古德哈特定律指出，当人们知道他们的绩效是他们基于指标来调整其行为以优化该指标。在技术产品的背景下，产品管理人员，设计师和工程师可能会开始进行产品更改，以增加成功指标，尽管存在不利的取舍
Example: You propose scrolling as the success metric for a feed-based social media product. The metric incentivizes designers to make content blocks long or default text very large to force users to scroll more. The metric also incentivizes product managers and engineers to create features and algorithms that help launch content farms and propagate their content so that users have more items to scroll through. Together these changes decrease the quality of the product but also increases the scrolling that occurs by users
示例：您建议将滚动作为基于Feed的社交媒体产品的成功指标。度量标准鼓励设计人员将内容块加长或将默认文本设置得很大，以迫使用户滚动更多内容。该指标还激励产品经理和工程师创建有助于启动内容服务器场并传播其内容的功能和算法，从而使用户可以滚动浏览更多项目。这些变化一起降低了产品的质量，但同时也增加了用户发生的滚动
🏆🙅‍♂️ Discuss what relevant properties aren’t measured by the metric: Unfortunately, simple and clear metrics will not measure all relevant properties or user behaviors at once. You should outline to your interviewer which relevant properties your metric does not measure. You should still be able to argue that your proposed metric correlates to those relevant properties or if you need to design a new metric for them
讨论关于度量标准未度量哪些相关属性：不幸的是，简单明了的度量标准不会立即度量所有相关属性或用户行为。您应该向面试官概述度量标准不测量哪些相关属性。您仍然应该能够论证建议的指标与那些相关属性相关，或者是否需要为它们设计新的指标
Example: If an interviewer asked you to choose a single success metric for a payments platform, you could argue transactions completed works best and is correlated with other important metrics like CSAT and total payments processed even though transactions completed does not directly measure those properties
示例：如果访问者要求您为支付平台选择一个成功指标，则您可以说完成的交易效果最好，并且与其他重要指标(如CSAT和已处理的总付款)相关联，即使完成的交易不能直接衡量这些属性

This framework encompasses what I have experienced in metric design interviews. In the following sections, I discuss more about empirically validating a proposed metric when on an analyst team and obtaining stakeholder buy-in.

这个框架包含了我在度量设计访谈中所经历的。在以下各节中，我将讨论有关在分析师团队中根据经验验证拟议指标并获得利益相关者支持的更多信息。

通过实验和分析验证新指标📈 (Validating a new metric with Experiments and Analysis 📈)

After proposing a metric, the next step is to complete data and experimental analysis demonstrating the metric behaves as expected and is actionable. There are a couple steps needed to validate a metric:

在提出度量标准之后，下一步是完成数据和实验分析，以证明该度量标准符合预期且可操作。验证指标需要几个步骤：

(If relevant) Show distribution of metric at different threshold values: If a metric is threshold based, show how different possible values of the threshold impact the metric’s distribution
(如果相关)显示指标在不同阈值下的分布：如果指标基于阈值，请显示阈值的不同可能值如何影响指标的分布
Example: What percent of users would a churn metric classify as churned if we set churn’s threshold at 7, 14, 21, or 28 days? Actually show the distribution at each value as part of your explanation for choosing a particular value
示例：如果我们将用户流失阈值设置为7、14、21或28天，则用户流失率指标分类为流失率的用户百分比是多少？实际显示每个值的分布，作为选择特定值的说明的一部分
Correlation analysis with relevant existing metrics: Demonstrating correlations is especially useful for new metrics that are quality-focused or refinements of existing metrics
与相关现有指标之间的相关性分析：展示相关性对于以质量为中心的新指标或对现有指标的改进特别有用
Example: On a ride-sharing app, I would expect user satisfaction to decrease or level off as the number of stops added during a ride increases. If the new satisfaction metric instead increases as the stops increase, then that might suggest a logging issue or warrant a separate investigation into understanding this user behavior since it is counterintuitive
示例：在乘车共享应用程序上，我希望随着乘车期间添加的停靠点数量的增加，用户满意度会下降或趋于平稳。如果新的满意度指标随着停靠点的增加而增加，那么这可能表明存在日志记录问题或需要进行单独调查以了解此用户行为，因为这与直觉相反
Precision/Recall to ground truth: Analysts can make metrics that describe certain actions as “good”, “bad”, or “quality” more meaningful if they use surveys or user studies as ground truth to validate those labels
精确/回忆基础事实：如果分析人员使用调查或用户研究作为基础事实来验证那些标签，则可以使描述某些操作为“好”，“不好”或“质量”的指标更有意义。
Example: Based on a data analysis of logs, a product analyst on a business application might propose defining a “good workflow completion” as one with 3 or less clicks. To convincingly make the case that 3 or less clicks is “good”, the analyst could also collect user survey responses on satisfaction for different workflow lengths and measure the proposed metric’s precision and recall relative to the survey responses
示例：基于日志的数据分析，业务应用程序上的产品分析师可能会建议将“良好的工作流完成”定义为单击次数少于3次的操作。为了令人信服地指出3次或更少的点击是“良好”，分析师还可以收集用户对不同工作流长度满意度的调查响应，并测量建议指标的精度和相对于调查响应的回忆
Sensitivity analysis through experiments: If new features that Product teams believe will move a metric consistently do not change the metric, then the metric may not be actionable as designed. The “Validation” and “Experiment” bullet points of the Lifecycle of Metric section of Designing and Evaluating Metrics describe how analysts can use saved historical experiment data to show if a new metric has a practical and statistically significant effect that an experiment can measure
通过实验进行敏感性分析：如果产品团队认为会持续移动度量标准的新功能未更改度量标准，则该度量标准可能无法按设计执行。 “设计和评估指标”的“指标生命周期”部分的“验证”和“实验”项目要点描述了分析人员如何使用已保存的历史实验数据来显示新指标是否具有实验可以衡量的实用且具有统计意义的效果

获得利益相关者对新指标的认可🤹(Getting stakeholder approval on a new metric 🤹)

At some point in the validation process the data science team will need to present the new metric and its behavior to engineering and product teams for buy-in and feedback to use as a product success metric.

在验证过程中的某个时刻，数据科学团队将需要向工程和产品团队介绍新的指标及其行为，以便接受和反馈以用作产品成功指标。

Metric validation is a form of data analysis, so I would keep in mind Roger Peng’s advice on the topic: A data analysis is successful if the audience to which it is presented accepts the results. Make sure the Product and Engineering teams’ gut checks pass with this proposed metric and the analysis associated with it.

度量标准验证是数据分析的一种形式，因此我要牢记Roger Peng关于以下主题的建议：如果数据分析的受众接受了结果，那么数据分析将成功。 确保产品和工程团队的直觉检查通过了建议的度量标准以及与之相关的分析。

监控和报告指标🗓️📊 (Monitoring and reporting a metric 🗓️📊)

After stakeholder approval, your metric should be ready for logging! Communicating, visualizing, and telling stories with data is a topic that could fill a whole book. That said, here are some tips which might be helpful:

在利益相关者批准之后，您的指标应已准备好记录！用数据进行交流，可视化和讲故事是一个可以填满整本书的主题。也就是说，这里有一些技巧可能会有所帮助：

Metric Formatting and Communication

公制格式和通信

Use a 7-day rolling average to account for weekday/weekend variation in daily metrics, which affect most products
使用7天滚动平均值来说明每日指标的工作日/周末变化，这会影响大多数产品
Teams often recommend reporting metrics based on surveys or sampled logs with 95% confidence intervals
团队通常建议根据调查或样本日志(95％置信区间)报告指标
Align with your team on how to communicate changes in percentage-based metrics — it always gets confusing, quickly
与您的团队就如何传达基于百分比的指标的变化保持一致-总是很快会造成混乱

Dashboards

仪表板

Both engineers and analysts can easily launch dashboards, so it can become difficult for stakeholders to know which ones are the best-suited to answer their questions. At large technology companies, data science teams will create their own “trusted” set of dashboards on a specific website/application that the Product Leadership views as truth
工程师和分析人员都可以轻松启动仪表板，因此，让涉众难以了解最适合回答他们问题的仪表板。在大型技术公司，数据科学团队将在产品领导力视为真实的特定网站/应用程序上创建自己的“受信任”仪表板集
It’s simple advice, but I like what Eric Mayefsky said in this blog post — actually look at the dashboards you create! Use them to inspire deeper data investigations. “Soak in the data. Don’t think of the dashboards and reports you build as products for someone else — spend time on a regular basis, ideally daily, just messing around”
这是简单的建议，但是我喜欢Eric Mayefsky在此博文中所说的-实际看看您创建的仪表板！使用它们来激发更深入的数据调查。 “吸收数据。 不要认为您将仪表板和报告作为他人的产品来构建-定期花时间，最好是每天花些时间，只是到处乱逛”

Quarterly/Scheduled Metric Reviews

季度/计划指标审查

Besides experiment reporting, scheduled metric reviews and trends analysis with Engineering and Product leadership can drive metric impact at larger organizations
除了实验报告外，具有工程和产品领导地位的定期指标审查和趋势分析还可以推动指标对大型组织的影响
Common questions-to-be-answered include attribution of metric shifts to new features/customers or external effects, comparison of metric value to forecasts, deep dives into the causes of why metrics are trending downwards or not meeting goals, cohort analyses, etc.
常见的待回答问题包括将度量标准转换归因于新功能/客户或外部影响，将度量标准值与预测进行比较，深入探究度量标准为何呈下降趋势或未达到目标的原因，同类群组分析等。

取消指标🌅 (Sunsetting a metric 🌅)

In my experience, analysts rarely deprecate metrics. Instead, data engineering teams and code stewardship drive metric deprecation.

根据我的经验，分析师很少弃用指标。取而代之的是，数据工程团队和代码管理人员会推动指标弃用。

Data teams try to optimize the time it takes to calculate core metrics, often on a daily basis. The more metrics their daily jobs need to compute the longer it takes those jobs to run and the greater the probability of a job failing. These engineering teams have an incentive to stop calculating metrics that analysts and product teams neither monitor nor could find useful.

数据团队通常每天都在尝试优化计算核心指标所需的时间。他们的日常作业需要计算的指标越多，这些作业运行所需的时间就越长，而作业失败的可能性就越大。这些工程团队有动力停止计算分析师和产品团队既不监视也不认为有用的指标。

Unimportant or unadopted metrics may lack owners after theirs leave a team and logging is not claimed after a long period of time of no updates. Data engineering teams have leeway to deprecate these ownerless metrics.

不重要或未被采用的度量标准可能在所有者离开团队后缺少所有者，并且长时间未更新后也未声明日志记录。数据工程团队可以随意弃用这些无主指标。

总结：度量标准设计清单📋 (Summing it up: the metric design checklist 📋)

So there you have it! We’ve discussed ….

所以你有它！我们已经讨论过……。

🎯📚 Two buckets of metrics: precision and recall
🎯📚两类指标：精确度和召回率
👶 Why and where new metrics come from
👶新指标的来源和原因
📝 How to think through a metric’s 2nd order effects (and how that question appears in interviews)
📝如何思考指标的二阶效应(以及该问题如何出现在访谈中)
📈 Validating a metric with correlation and sensitivity analysis
with通过相关性和敏感性分析验证指标
🗓️📊 Tips for metric monitoring and reporting
️📊指标监控和报告提示
🌅 Why and how metric sunsetting occurs
🌅为什么会发生公制日落

As always, I consider this to be a living document and I am open to feedback — feel free to shoot me a note at zthomas.nc@gmail or on LinkedIn if you have any thoughts! Thanks.

与往常一样，我认为这是一份生动的文档，欢迎您提出反馈意见-如果您有任何想法，请随时通过zthomas.nc@gmail或LinkedIn上给我发送便条！谢谢。