How-to: Enable User Authentication and Authorization in Apache HBase

本文探讨了如何使用Kerberos为Hadoop和HBase提供用户认证,并介绍了HBase如何实施用户授权来授予特定用户对特定数据集的操作权限。此外,还讨论了通过配置防火墙和使用SASL进行加密协商等措施加强安全性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

With the default Apache HBase configuration, everyone is allowed to read from and write to all tables available in the system. For many enterprise setups, this kind of policy is unacceptable. 

Administrators can set up firewalls that decide which machines are allowed to communicate with HBase. However, machines that can pass the firewall are still allowed to read from and write to all tables.  This kind of mechanism is effective but insufficient because HBase still cannot differentiate between multiple users that use the same client machines, and there is still no granularity with regard to HBase table, column family, or column qualifier access.

In this post, we will discuss how Kerberos is used with Hadoop and HBase to provide User Authentication, and how HBase implements User Authorization to grant users permissions for particular actions on a specified set of data.

Secure HBase: Authentication & Authorization

A secure HBase aims to protect against sniffers, unauthenticated/unauthorized users and network-based attacks. It does not protect against authorized users who accidentally delete all the data. 

HBase can be configured to provide User Authentication, which ensures that only authorized users can communicate with HBase. The authorization system is implemented at the RPC level, and is based on the Simple Authentication and Security Layer (SASL), which supports (among other authentication mechanisms) Kerberos. SASL allows authentication, encryption negotiation and/or message integrity verification on a per connection basis ( “hbase.rpc.protection” configuration property).

The next step after enabling User Authentication is to give an admin the ability to define a series of User Authorization rules that allow or deny particular actions. The Authorization system, also known as the Access Controller Coprocessor or Access Control List (ACL), is available from HBase 0.92 (CDH4) onward and gives the ability to define authorization policy (Read/Write/Create/Admin), with table/family/qualifier granularity, for a specified user.

Kerberos

Kerberos is a networked authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography. The Kerberos protocol uses strong cryptography (AES, 3DES, …) so that a client can prove its identity to a server (and vice versa) across an insecure network connection. After a client and server have used Kerberos to prove their identities, they can also encrypt all of their communications to assure privacy and data integrity as they go about their business.

Ticket exchange protocol

At a high level, to access a service using Kerberos, each client must follow three steps:

  • Kerberos Authentication: The client authenticates itself to the Kerberos Authentication Server and receive a Ticket Granting Ticket (TGT).
  • Kerberos Authorization: The client request a service ticket from the Ticket Granting Server, which issues a ticket and a session key if the client TGT sent with the request is valid.
  • Service Request: The client uses the service ticket to authenticate itself to the server that is providing the service the client is using (e.g. HDFS, HBase, …)

HBase, HDFS, ZooKeeper SASL

Since HBase depends on HDFS and ZooKeeper, secure HBase relies on a secure HDFS and a secure ZooKeeper. This means that the HBase servers need to create a secure service session, as described above, to communicate with HDFS and ZooKeeper.

All the files written by HBase are stored in HDFS. As in Unix filesystems, the access control provided by HDFS is based on users, groups and permissions.  All the files created by HBase have “hbase” as user, but this access control is based on the username provided by the system, and everyone that can access the machine is potentially able to “sudo” as the user “hbase”. Secure HDFS adds the authentication steps that guarantee that the “hbase” user is trusted.

ZooKeeper has an Access Control List (ACL) on each znode that allows read/write access to the users based on user information in a similar manner to HDFS.

HBase ACL

Now that our users are authenticated via Kerberos, we are sure that the username that we received is one of our trusted users.  Sometimes this is not enough granularity – we want to control that a specified user is able to read or write a table. To do that, HBase provides an Authorization mechanism that allows restricted access for specified users.

To enable this feature, you must enable the Access Controller coprocessor, by adding it to hbase-site.xml under the master and region server coprocessor classes. (See how to setup the HBase security configuration here.)

A coprocessor is code that runs inside each HBase Region Server and/or Master.  It is able to intercept most operations (put, get, delete, …), and run arbitrary code before and/or after the operation is executed. 

Using this ability to execute some code before each operation, the Access Controller coprocessor can check the user rights and decide if the user can or cannot execute the operation.

Rights management and _acl_ table

The HBase shell has a couple of commands that allows an admin to manage the user rights:

  • grant [table] [family] [qualifier]
  • revoke [table] [family] [qualifier]

As you see, an admin has the ability to restrict user access based on the table schema:

  • Give User-W only read rights to Table-X/Family-Y (grant ‘User-W’ ‘R’ ‘Table-X’, ‘Family-Y’)
  • Give User-W the full read/write rights to Qualifier-Z (grant ‘User-W’ ‘RW’ ‘Table-X’, ‘Family-Y’, ‘Qualifier-Z’)

An admin also has the ability to grant global rights, which operate at the cluster level, such as creating tables, balancing regions, shutting down the cluster and so on:

  • Give User-W the ability to create tables (grant ‘User-W’, ‘C’)
  • Give User-W the ability to manage the cluster (grant ‘User-W’, ‘A’)

All the permissions are stored in a table created by the Access Controller coprocessor, called _acl_. The primary key of this table is the table name that you specify in the grant command. The _acl_ table has just one column family and each qualifier describes the granularity of rights for a particular table/user.  The value contains the actual rights granted.

As you can see, the HBase shell commands are tightly related to how the data is stored. The grant command adds or updates one row, and the revoke command removes one row from the _acl_ table.

Access Controller under the hood

As mentioned previously, the Access Controller coprocessor uses the ability to intercept each user request, and check if the user has the rights to execute the operations.

For each operation, the Access Controller needs to query the _acl_ table to see if the user has the rights to execute the operation.

However, this operation can have a negative impact on performance. The solution to fix this problem is using the _acl_ table for persistence and ZooKeeper to speed up the rights lookup. Each region server loads the _acl_ table in memory and get notified of changes by the ZkPermissionWatcher. In this way, every region server has the updated value every time and each permission check is performed by using an in-memory map. 

Roadmap

While Kerberos is a stable, well-tested and proven authentication system, the HBase ACL feature is still very basic and its semantics are still evolving. HBASE-6096 is the umbrella JIRA as reference for all the improvements to ship in a v2 of the ACL feature.

Another open topic on authorization and access control is implementing a per-KeyValue security system (HBASE-6222) that will give the ability to have different values on the same cell associated with a security tag. That would allow to showing a particular piece of information based on the user’s permissions.

Conclusion

HBase Security adds two extra features that allow you to protect your data against sniffers or other network attacks (by using Kerberos to authenticate users and encrypt communications between services), and allow you to define User Authorization policies, restrict operations, and limit data visibility for particular users. 

Matteo Bertozzi is a Software Engineer at Spotify and an HBase Consultant at Cloudera. 

Ref: http://blog.cloudera.com/blog/2012/09/understanding-user-authentication-and-authorization-in-apache-hbase/

内容概要:该论文聚焦于6G通信中20-100GHz频段的电磁场(EMF)暴露评估问题,提出了一种基于自适应可重构架构神经网络(RAWA-NN)的预测框架。该框架通过集成权重分析模块和优化模块,能够自动优化网络超参数,显著减少训练时间。模型使用70%的前臂数据进行训练,其余数据用于测试,并用腹部和股四头肌数据验证模型泛化能力。结果显示,该模型在不同参数下的相对差异(RD)在前臂低于2.6%,其他身体部位低于9.5%,可有效预测皮肤表面的温升和吸收功率密度(APD)。此外,论文还提供了详细的代码实现,涵盖数据预处理、权重分析模块、自适应优化模块、RAWA-NN模型构建及训练评估等内容。 适合人群:从事电磁兼容性研究、6G通信技术研发以及对神经网络优化感兴趣的科研人员和工程师。 使用场景及目标:①研究6G通信中高频段电磁暴露对人体的影响;②开发更高效的电磁暴露评估工具;③优化神经网络架构以提高模型训练效率和预测精度。 其他说明:论文不仅提出了理论框架,还提供了完整的代码实现,方便读者复现实验结果。此外,论文还讨论了未来的研究方向,包括扩展到更高频段(如300GHz)的数据处理、引入强化学习优化超参数、以及实现多物理场耦合的智能电磁暴露评估系统。建议读者在实际应用中根据具体需求调整模型架构和参数,并结合真实数据进行验证。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值