apache nifi_apache nifi流指纹安全漏洞

本文讨论了Apache NiFi中的一个安全漏洞(CVE-2020–1942),该漏洞可能导致敏感信息泄露。当处理器无法与NiFi群集同步时,敏感值可能会出现在日志中。通过使用Argon2哈希算法和静态盐值,成功解决了这一问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

apache nifi

In this post, I will discuss a security vulnerability discovered in Apache NiFi flow fingerprints containing sensitive property descriptor values appearing in logs (CVE-2020–1942). During a troubleshooting session with a NiFi user who implemented custom Apache NiFi processors, Andy LoPresto — a NiFi PMC member and committer — discovered that sensitive values were output in logs when a processor failed to sync up with a NiFi cluster.

在本文中,我将讨论在Apache NiFi流指纹中发现的安全漏洞,该指纹包含出现在日志中的敏感属性描述符值( CVE-2020–1942 )。 在与实施了自定义Apache NiFi处理器的NiFi用户进行的故障排除会话中,NiFi PMC成员和提交者Andy LoPresto发现,当处理器无法与NiFi群集同步时,敏感值会输出到日志中。

OK, there were a lot of terms and concepts you just read. If you didn’t quite get it all — don’t fret! Let’s break it down and get into detail to understand what is happening.

好的,您刚刚阅读了很多术语和概念。 如果您还不了解所有内容, 请不要担心! 让我们对其进行分解,并详细了解正在发生的事情。

首先,什么是集群? (First, what’s a cluster?)

Before diving into the vulnerability, I’d like to do a quick overview of Apache NiFi clusters. Depending on your dataset — and in most-use cases — a single NiFi instance may not be powerful enough to process high volumes of data. The purpose of clustering allows NiFi Administrators or DataFlow Managers (DFM) the capability to run multiple instances from different servers and — through a single interface — make changes and monitor the dataflow.

在探究该漏洞之前,我想快速概述一下Apache NiFi集群 。 根据您的数据集(在大多数情况下),单个NiFi实例的功能可能不足以处理大量数据。 群集的目的是使NiFi管理员或DataFlow Manager(DFM)能够运行来自不同服务器的多个实例,并通过单个界面进行更改并监视数据流。

Image for post
image credit) 图片来源 )

NiFi clustering employs a Zero-Master paradigm. Each node in the cluster performs the same tasks on the data, but each operates on a different set of data.

NiFi群集采用了零主范式 。 集群中的每个节点对数据执行相同的任务,但是每个节点都对不同的数据集进行操作。

集群协调员 (Cluster Coordinator)

This concept elects one node to be the Cluster Coordinator (using Apache Zookeeper) that is responsible for three main tasks:

此概念选择一个节点作为集群协调器 (使用Apache Zookeeper ),该节点负责三个主要任务:

1. Decide which nodes are allowed to join the cluster.

1.确定允许哪些节点加入群集。

2. Synchronize cluster nodes with current flows.

2.将群集节点与当前流同步。

3. Disconnect nodes that do not have a heartbeat status after a certain amount of time.

3.在一段时间后,断开没有心跳状态的节点。

Therefore, when the DFM makes each change once from any NiFi node, it will be replicated throughout the cluster.

因此,当DFM从任何NiFi节点进行一次更改时,它将在整个群集中复制。

检查流程指纹 (Examining Flow Fingerprints)

Now that we know what a cluster is, let’s focus on one of the responsibilities of the Cluster Coordinator — determining if a node is allowed to join the cluster. When a node is added to the cluster, the Cluster Coordinator will first look at that node’s flow.xml.gz — where a flow fingerprint can be derived using attributes related to data processing.

现在我们知道集群是什么了,让我们专注于集群协调器的职责之一-确定是否允许节点加入集群。 将节点添加到群集后, 群集协调器将首先查看该节点的flow.xml.gz ,其中可以使用与数据处理相关的属性来获取流指纹

The flow fingerprint can contain properties such as processor IDs, processor relationships, and processor properties. A set of properties that the Cluster Coordinator will check are the processor flow configurations. If the flow configurations are empty, this indicates a new node, and will be allowed to join the cluster and inherit the current flow configurations. On the other hand, if flow configurations are present and they do not match the configurations of the rest of the nodes, that node will not be allowed to join the cluster.

流指纹可以包含诸如处理器ID,处理器关系和处理器属性之类的属性。 集群协调器将检查的一组属性是处理器流配置 。 如果流配置为空,则表示新节点,并且将被允许加入集群并继承当前流配置。 另一方面,如果存在流配置,但它们与其余节点的配置匹配,则将不允许该节点加入群集。

For example, let’s say we have aGetFTP processor in a cluster. Below is the flow.xml and (some of) its properties:

例如,假设我们在集群中有一个GetFTP处理器。 以下是flow.xml及其(某些)属性:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<flowController encoding-version="1.4">
<maxTimerDrivenThreadCount>10</maxTimerDrivenThreadCount>
<maxEventDrivenThreadCount>1</maxEventDrivenThreadCount>
<registries/>
<parameterContexts/>
<rootGroup>
<id>adb378b3-0170-1000-426f-ff54a5486f97</id>
<name>NiFi Flow</name>
<position x="0.0" y="0.0"/>
<comment/>
<processor>
<id>add68dbc-0170-1000-ffff-ffff9e996a54</id>
<name>GetFTP</name>
<position x="464.0" y="104.0"/>
<styles/>
<comment/>
<class>org.apache.nifi.processors.standard.GetFTP</class>
<bundle>
<group>org.apache.nifi</group>
<artifact>nifi-standard-nar</artifact>
<version>1.10.0</version>
</bundle>
<maxConcurrentTasks>1</maxConcurrentTasks>
<schedulingPeriod>0 sec</schedulingPeriod>
<penalizationPeriod>30 sec</penalizationPeriod>
<yieldPeriod>1 sec</yieldPeriod>
<bulletinLevel>WARN</bulletinLevel>
<lossTolerant>false</lossTolerant>
<scheduledState>STOPPED</scheduledState>
<schedulingStrategy>TIMER_DRIVEN</schedulingStrategy>
<executionNode>ALL</executionNode>
<runDurationNanos>0</runDurationNanos>
<property>
<name>Hostname</name>
<value>myHost.com</value>
</property>
<property>
<name>Port</name>
<value>21</value>
</property>
<property>
<name>Username</name>
<value>myUsername</value>
</property>
<property>
<name>Password</name>
<value>myPassword</value>
</property>
<property>
<name>Connection Mode</name>
<value>Passive</value>
</property>
<property>
<name>Transfer Mode</name>
<value>Binary</value>
</property>
...
<autoTerminatedRelationship>success</autoTerminatedRelationship>
</processor>
</rootGroup>
<controllerServices/>
<reportingTasks/>
</flowController>

In a real NiFi instance, the sensitive values like the password are always stored in an encrypted format — the decrypted example value is shown here for clarity. Next, we manually make local changes to the properties, such as modifying the password.

在真实的NiFi实例中,敏感值(如密码)始终以加密格式存储-为清楚起见,此处显示了解密后的示例值。 接下来,我们手动对属性进行本地更改,例如修改密码。

When this node spins back up, the Cluster Coordinator will examine the flow fingerprints, determine it does not match with the other nodes and, therefore, not allow that node to join the cluster. Now, we are left with a homeless node!

当该节点旋转起来时, 群集协调器将检查流指纹,确定它与其他节点不匹配,因此不允许该节点加入群集。 现在,我们剩下了一个无家可归的节点!

发现漏洞 (Discovering the vulnerability)

This was the scenario during a recent troubleshooting session with a NiFi user who implemented custom processors. Local changes were made to their processors while NiFi was offline. After restarting NiFi, the node was unable to join the cluster.

这是在与实施了自定义处理器的NiFi用户进行的最近一次故障排除会话期间的情况。 NiFi离线时对其处理器进行了本地更改。 重新启动NiFi后,该节点无法加入群集。

To help pinpoint the error, NiFi PMC member— Andy LoPresto — took to the logs with the level set to ‘TRACE’. Upon further inspection, he discovered that when a node failed to join the cluster, the flow fingerprints along with its property names and values were printed.

为了帮助查明错误,NiFi PMC成员Andy LoPresto进入了级别设置为“ TRACE”的日志。 经过进一步检查,他发现当节点无法加入群集时,将打印流指纹以及其属性名称和值。

2020-01-16 14:43:00,458 TRACE [main] o.a.n.c.StandardFlowSynchronizer Exporting snippets from controller2020-01-16 14:43:00,458 TRACE [main] o.a.n.c.StandardFlowSynchronizer Getting Authorizer fingerprint from controller2020-01-16 14:43:00,459 TRACE [main] o.a.n.c.StandardFlowSynchronizer Checking flow inheritability2020-01-16 14:43:00,474 TRACE [main] o.a.n.c.StandardFlowSynchronizer Local Fingerprint Before Hash = NO_VALUENO_PARAMETER_CONTEXTSadb378b3-0170-1000-426f-ff54a5486f97NO_VALUENO_VALUENO_VERSION_CONTROL_INFORMATIONadd68dbc-0170-1000-ffff-ffff9e996a54NO_VALUEorg.apache.nifi.processors.standard.GetFTPNO_VALUEorg.apache.nifinifi-standard-nar1.10.010 sec30 sec1 secWARNfalseTIMER_DRIVENALL0Hostname=myHost.comPassword=myModifiedPasswordUsername=myUsernamesuccess
2020-01-16 14:43:00,474 TRACE [main] o.a.n.c.StandardFlowSynchronizer Proposed Fingerprint Before Hash = NO_VALUENO_PARAMETER_CONTEXTSadb378b3-0170-1000-426f-ff54a5486f97NO_VALUENO_VALUENO_VERSION_CONTROL_INFORMATIONadd68dbc-0170-1000-ffff-ffff9e996a54NO_VALUEorg.apache.nifi.processors.standard.GetFTPNO_VALUEorg.apache.nifinifi-standard-nar1.10.010 sec30 sec1 secWARNfalseTIMER_DRIVENALL0Hostname=myHost.comPassword=myPasswordUsername=myUsernamesuccess
2020-01-16 14:43:00,477 ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load flow from cluster due to: org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
at org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1026)
at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:539)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1028)
at org.apache.nifi.NiFi.<init>(NiFi.java:158)
at org.apache.nifi.NiFi.<init>(NiFi.java:72)
at org.apache.nifi.NiFi.main(NiFi.java:301)
Caused by: org.apache.nifi.controller.UninheritableFlowException: Proposed configuration is not inheritable by the flow controller because of flow differences: Found difference in Flows:
Local Fingerprint: he.nifinifi-standard-nar1.10.010 sec30 sec1 secWARNfalseTIMER_DRIVENALL0Hostname=myHost.comPassword=myModifiedPasswordUsername=myUsernamesuccess
Cluster Fingerprint: he.nifinifi-standard-nar1.10.010 sec30 sec1 secWARNfalseTIMER_DRIVENALL0Hostname=myHost.comPassword=myPasswordUsername=myUsernamesuccess
at org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:315)
at org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1368)
at org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.load(StandardXMLFlowConfigurationDAO.java:88)
at org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:812)
at org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1001)
… 5 common frames omitted

At the level where these properties are printed, plaintext values that are potentially sensitive are not yet encrypted. As a result, the logs output both sensitive and non-sensitive data.

在打印这些属性的级别上,尚未加密可能敏感的明文值。 结果,日志同时输出敏感数据和非敏感数据。

怎么办? (Now what?)

Of course we never want to expose sensitive data. It’s also in direct violation of OWASP Top 10 most critical security risks. It’s ranked third on the list under A3 — Sensitive Data Exposure.

当然,我们永远都不想公开敏感数据。 它也直接违反了OWASP十大最关键的安全风险。 在“ A3-敏感数据暴露”下,它排名第三。

The OWASP Top 10 outline points out that sensitive data must always be encrypted at rest and in transit, taking care not to use weak or outdated cryptographic algorithms.

OWASP Top 10概述指出,敏感数据必须在静止和传输过程中始终进行加密,请注意不要使用弱或过时的加密算法。

So, we know not to expose sensitive data and there are a number of ways to prevent this. A simple solution that generally comes to mind is to disable printing the values. But in order to better narrow down cluster errors, comparing any discrepancy in flow fingerprints is key.

因此,我们知道不公开敏感数据,并且有很多方法可以防止这种情况的发生。 通常想到的一个简单解决方案是禁用打印值。 但是为了更好地缩小簇错误,比较流指纹中的任何差异是关键。

氩气2 (Argon2)

To keep this capability and our sensitive data safe, the implementation of a hashing algorithm came into play. And to keep in line with incorporating strong cryptographic algorithms as advised by OWASP, the Argon2 hashing algorithm — winner of the July 2015 Password Hashing Competition — was introduced to the modules.

为了保持此功能和我们的敏感数据的安全,哈希算法的实现开始起作用。 为了与OWASP建议的结合强大的加密算法保持一致,模块中引入了Argon2哈希算法(2015年7月密码哈希大赛的获胜者)。

部分解决方案 (A partial solution)

Similar to other commonly used hashing algorithms, such as Scrypt and Bcrypt, Argon2 concatenates a random salt with a given input and outputs a hashed value. For password hashing, this is desirable as it thwarts accessing plaintext values.

与其他常用的哈希算法(例如ScryptBcrypt)相似 ,Argon2将随机盐与给定的输入连接起来并输出哈希值。 对于密码哈希,这是理想的,因为它会阻止访问纯文本值。

This is not fully effective in our use-case. Hashing sensitive property values solves the issue of protecting sensitive data. But this also gives us a random hashed value each time, so we cannot determine if one value equals another.

这不是 充分地 在我们的用例中有效。 散列敏感属性值解决了保护敏感数据的问题。 但这每次也为我们提供一个随机散列值,因此我们无法确定一个值是否等于另一个。

完整的解决方案 (The complete solution)

A static salt can be swapped in. Therefore, when two identical values are hashed, they will also produce matching hashed values. The FingerprintFactory class is where the flow fingerprint is built. The class contains a method that determines whether the processor property value is encrypted — i.e., a sensitive value. If the value is marked encrypted, it will use Argon2 and a static salt to return a hashed value to be added to the flow fingerprint.

可以交换静态 。因此,当对两个相同的值进行哈希处理时,它们还将产生匹配的哈希值。 FingerprintFactory类是构建流指纹的位置。 该类包含一个确定处理器属性值是否已加密的方法,即敏感值。 如果将该值标记为已加密,它将使用Argon2和静态盐返回要添加到流指纹的哈希值。

结论 (Conclusion)

We had a brief introduction to NiFi clusters and the Cluster Coordinator responsibilities. When the Cluster Coordinator determines a node is unable to join a cluster, the flow fingerprints are compared for discrepancies. While viewing the flow fingerprints in logs set at ‘TRACE’ level, it resulted in a security vulnerability that printed processor property values that potentially contained sensitive values in plaintext.

我们简要介绍了NiFi群集和群集协调器的职责。 当群集协调器确定节点无法加入群集时,将比较流指纹的差异。 在设置为“ TRACE”级别的日志中查看流指纹时,它导致了一个安全漏洞,该漏洞打印了可能包含明文形式的敏感值的处理器属性值。

The implementation of Argon2 secure hasher, in combination with a static salt, allows for deterministic logging of these values.

Argon2安全哈希器与静态盐结合使用, 允许确定性地记录这些值。

翻译自: https://medium.com/apache-nifi-security/apache-nifi-flow-fingerprint-security-vulnerability-f105a5a5b0f6

apache nifi

<think>好的,我现在需要帮用户解决性能测试中脏数据的问题。首先,我得明确什么是脏数据。脏数据可能包括重复记录、无效数据、格式错误或者过时数据,这些都会影响测试结果的准确性。用户可能在进行性能测试时遇到了数据干扰,导致结果不可靠,所以他们想知道如何处理这些脏数据。 接下来,我应该分步骤思考如何解决这个问题。第一步通常是数据隔离,也就是在测试前把脏数据隔离开,避免影响正常测试。但用户可能想知道具体怎么隔离,比如创建独立的环境或者使用数据脱敏工具。 然后是数据清理,这里需要详细说明清理的方法,比如删除重复数据、修复格式错误等。用户可能需要具体的例子或者工具推荐,比如使用SQL语句删除重复项,或者用Python脚本自动化处理。 数据生成也是一个关键点,特别是当生产环境的数据不可用时,如何生成高质量的测试数据。可能需要提到工具如Apache JMeter或Faker库,以及如何根据业务规则生成数据。 数据回滚和版本控制可能用户不太熟悉,需要解释在测试后如何恢复数据,以及用Git或者数据库版本来管理测试数据的变化。 异常监控和自动化处理可能是进阶的内容,用户可能希望了解如何实时监控数据异常,并自动处理,比如设置阈值或使用日志分析工具。 最后,预防措施部分需要强调如何在日常开发中避免脏数据的产生,比如建立数据规范、代码审查和自动化测试程。 整个过程中,用户可能需要具体的例子或者步骤,所以需要尽量提供实用的方法和工具,同时保持语言简洁明了,结构清晰。还要注意用户可能没有提到的深层需求,比如如何平衡测试数据的真实性和处理成本,或者在持续集成中如何集成数据清理步骤。这些可能需要进一步询问用户,但根据当前问题,先覆盖主要解决方案。</think>### 性能测试脏数据问题处理指南(结构化解决方案) #### 一、脏数据定义与影响 $$脏数据 \triangleq \{重复数据 \cup 格式错误 \cup 逻辑矛盾 \cup 过期数据\}$$ 脏数据会导致:TPS波动>5%、响应时间异常跳变、错误率虚高/虚低 #### 二、四层防御体系 1. **数据隔离层** - 创建独立测试库:`CREATE DATABASE perf_test CLONE prod_db;`(使用数据库快照技术) - 网络隔离:通过VLAN划分测试环境量 - 数据标记法:`ALTER TABLE users ADD COLUMN env_flag ENUM('prod','test')` 2. **数据清洗层** ```sql -- 清理重复数据示例 WITH CTE AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY user_id, order_time ORDER BY update_time DESC ) AS rn FROM orders_test ) DELETE FROM CTE WHERE rn > 1; ``` - 工具选择矩阵: | 数据类型 | 推荐工具 | 处理效率 | |----------------|------------------------|----------| | 结构化数据 | Talend + SQL | 20000条/秒| | 半结构化数据 | Python + Pandas | 5000条/秒 | | 非结构化数据 | Apache Nifi | 1000文件/分| 3. **数据生成层** 使用约束满足模型生成测试数据: $$f(x) = \begin{cases} \text{符合业务规则} & \text{if } x \in \text{有效域} \\ \text{边界值生成} & \text{if } x \in \text{边界条件} \end{cases}$$ - JMeter配置示例: ```xml <CSVDataSet> <filename>user_data.csv</filename> <variableNames>username,age,email</variableNames> <delimiter>,</delimiter> <recycle>false</recycle> </CSVDataSet> ``` 4. **监控回滚层** - 实时校验规则: ```python def validate_data(row): checks = [ (lambda: 18 <= row['age'] <= 65, '年龄越界'), (lambda: re.match(r'^[\w-]+@\w+\.\w+$', row['email']), '邮箱格式错误'), (lambda: row['create_time'] > '2023-01-01', '过期数据') ] return [check[1] for check in checks if not check[0]()] ``` - 回滚策略决策树: ```mermaid graph TD A[发现脏数据] --> B{影响范围} B -->|单条记录| C[标记隔离] B -->|批量数据| D[触发快照回滚] D --> E[发送告警通知] ``` #### 三、进阶实践方案 1. **混沌工程注入法** - 设计有控的脏数据注入实验: $$注入概率 = \frac{当前系统错误率}{目标可靠性} \times 100\%$$ - 观察系统容错表现,记录异常传播路径 2. **数据指纹校验** 使用SHA-256生成数据指纹: $$H = \text{SHA256}(concat(field_1, field_2, ..., field_n))$$ 建立哈希索引表,实时检测数据变异 3. **时序数据分析** 构建ARIMA模型预测数据质量趋势: $$(1 - \sum_{i=1}^p \phi_i L^i)(1 - L)^d X_t = (1 + \sum_{i=1}^q \theta_i L^i)\varepsilon_t$$ 当残差异常时触发预警 #### 四、效能提升技巧 1. **建立数据质量看板** - 关键指标: $$数据纯净度 = \frac{\text{有效数据量}}{\text{总数据量}} \times 100\%$$ $$清洗耗时比 = \frac{\text{清洗时间}}{\text{总测试时间}} \times 100\%$$ 2. **自动化水线集成** ```bash # CI/CD Pipeline示例 mvn clean test -Dtest=DataPreparationTest python data_cleaner.py --env=test jmeter -n -t load_test.jmx -l result.jtl python anomaly_detector.py result.jtl ``` 3. **模式识别优化** 使用余弦相似度检测异常模式: $$similarity = \frac{\vec{A} \cdot \vec{B}}{||\vec{A}|| \times ||\vec{B}||}$$ 当相似度<阈值时触发清洗程 #### 五、验证方法论 1. 正交实验设计法: $$L_n(m^k) \text{ 正交表}$$ 通过因子分析确定关键影响因素 2. 假设检验验证: $$H_0: \mu_{\text{处理后}} = \mu_{\text{基准}}$$ 采用t检验(α=0.05)判断数据质量改进显著性 > **最佳实践建议**:建议建立数据质量SLA(服务等级协议),例如要求测试数据纯净度≥99.9%,脏数据检测响应时间<5分钟。定期执行数据健康度审计,持续优化清洗规则库。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值