Authorization and Authentication In Hadoop

本文探讨了Hadoop系统中认证与授权的基本概念及其重要性。详细介绍了Hadoop如何通过Kerberos进行用户身份验证,并解释了文件权限和MapReduce作业控制等授权机制。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Source:http://blog.cloudera.com/blog/2012/03/authorization-and-authentication-in-hadoop


One of the more confusing topics in Hadoop is how authorization and authentication work in the system. The first and most important thing to recognize is the subtle, yet extremely important, differentiation between authorization and authentication, so let’s define these terms first:

Authentication(认证) is the process of determining whether someone is who they claim to be.

Authorization(授权) is the function of specifying access rights to resources.

In simpler terms,authentication is a way of proving who I am, andauthorization is a way of determining what I can do.

Authentication

If Hadoop is configured with all of its defaults, Hadoop doesn’t do any authentication of users. This is an important realization to make, because it can have serious implications in a corporate data center. Let’s look at an example of this.

Let’s say Joe User has access to a Hadoop cluster. The cluster does not have any Hadoop security features enabled, which means that there are no attempts made to verify the identities of users who interact with the cluster. The cluster’s superuser is hdfs, and Joe doesn’t have the password for the hdfs user on any of the cluster servers. However, Joe happens to have a client machine which has a set of configurations that will allow Joe to access the Hadoop cluster, and Joe is very disgruntled. He runs these commands:

sudo useradd hdfs
sudo -u hdfs hadoop fs -rmr /

The cluster goes off and does some work, and comes back and says “Ok, hdfs, I deleted everything!”.

So what happened here? Well, in an insecure cluster, the NameNode and the JobTracker don’t require any authentication. If you make a request, and say you’re hdfs or mapred, the NN/JT will both say “ok, I believe that,” and allow you to do whatever the hdfs or mapred users have the ability to do.

Hadoop has the ability to require authentication, in the form of Kerberos principals. Kerberos is an authentication protocol which uses “tickets” to allow nodes to identify themselves. If you need a more in depth introduction to Kerberos, I strongly recommend checking out the Wikipedia page.

Hadoop can use the Kerberos protocol to ensure that when someone makes a request, they really are who they say they are. This mechanism is used throughout the cluster. In a secure Hadoop configuration, all of the Hadoop daemons use Kerberos to perform mutual authentication, which means that when two daemons talk to each other, they each make sure that the other daemon is who it says it is. Additionally, this allows the NameNode and JobTracker to ensure that any HDFS or MR requests are being executed with the appropriate authorization level.

Authorization

Authorization is a much different beast than authentication. Authorization tells us what any given user can or cannot do within a Hadoop cluster, after the user has been successfully authenticated. In HDFS this is primarily governed by file permissions.

HDFS file permissions are very similar to BSD file permissions. If you’ve ever run `ls -l` in a directory, you’ve probably seen a record like this:

drwxr-xr-x  2 natty hadoop  4096 2012-03-01 11:18 foo
-rw-r--r--  1 natty hadoop    87 2012-02-13 12:48 bar

On the far left, there is a string of letters. The first letter determines whether a file is a directory or not, and then there are three sets of three letters each. Those sets denote owner, group, and other user permissions, and the “rwx” are read, write, and execute permissions, respectively. The “natty hadoop” portion says that the files are owned by natty, and belong to the group hadoop. As an aside, a stated intention is for HDFS semantics to be “Unix-like when possible.(尽可能采用与Unix兼容的语法)” The result is that certain HDFS operations follow BSD semantics, and others are closer to Unix semantics.

The real question here is: what is a user or group in Hadoop? The answer is: they’re strings of characters. Nothing more. Hadoop will very happily let you run a command like

hadoop fs -chown fake_user:fake_group /test-dir

The downside to doing this is that if that user and group really don’t exist, no one will be able to access that file except the superusers, which, by default, includes hdfs, mapred, and other members of the hadoop supergroup.

In the context of MapReduce, the users and groups are used to determine who is allowed to submit or modify jobs. In MapReduce, jobs are submitted via queues controlled by the scheduler. Administrators can define who is allowed to submit jobs to particular queues via MapReduce ACLs. These ACLs can also be defined on a job-by-job basis. Similar to the HDFS permissions, if the specified users or groups don’t exist, the queues will be unusable, except by superusers, who are always authorized to submit or modify jobs.

The next question to ask is: how do the NameNode and JobTracker figure out which groups a user belongs to?

When a user runs a hadoop command, the NameNode or JobTracker gets some information about the user running that command. Most importantly, it knows the username of the user. The daemons then use that username to determine what groups the user belongs to. This is done through the use of a pluggable interface, which has the ability to take a username and map it to a set of groups that the user belongs to. In a default installation, the user-group mapping implementation forks off a subprocess that runs `id -Gn [username]`. That provides a list of groups like this:

natty@vorpal:~/cloudera $ id -Gn natty
natty adm lpadmin netdev admin sambashare hadoop hdfs mapred

The Hadoop daemons then use this list of groups, along with the username to determine if the user has appropriate permissions to access the file being requested. There are also other implementations that come packaged with Hadoop, including one that allows the system to be configured to get user-group mappings from an LDAP or Active Directory systems. This is useful if the groups necessary for setting up permissions are resident in an LDAP system, but not in Unix on the cluster hosts.

Something to be aware of is that the set of groups that the NameNode and JobTracker are aware of may be different than the set of groups that a user belongs to on a client machine. All authorization is done at the NameNode/JobTracker level, so the users and groups on the DataNodes and TaskTrackers don’t affect authorization, although they may be necessary if Kerberos authentication is enabled. Additionally, it is very important that the NameNode and the JobTracker both be aware of the same groups for any given user, or there may be undefined results when executing jobs. If there’s ever any doubt of what groups a user belongs to, `hadoop dfsgroups`and `hadoop mrgroups` may be used to find out what groups that a user belongs to, according to the NameNode and JobTracker, respectively.

Putting it all together

A proper, safe security protocol for Hadoop may require a combination of authorization and authentication. Admins should look at their security requirements and determine which solutions are right for them, and how much risk they can take on regarding their handling of data. Additionally, if you are going to enable Hadoop’s Kerberos features, I strongly recommend looking into Cloudera Manager, which helps make the Kerberos configuration and setup significantly easier than doing it all by hand.


一、综合实战—使用极轴追踪方式绘制信号灯 实战目标:利用对象捕捉追踪和极轴追踪功能创建信号灯图形 技术要点:结合两种追踪方式实现精确绘图,适用于工程制图中需要精确定位的场景 1. 切换至AutoCAD 操作步骤: 启动AutoCAD 2016软件 打开随书光盘中的素材文件 确认工作空间为"草图与注释"模式 2. 绘图设置 1)草图设置对话框 打开方式:通过"工具→绘图设置"菜单命令 功能定位:该对话框包含捕捉、追踪等核心绘图辅助功能设置 2)对象捕捉设置 关键配置: 启用对象捕捉(F3快捷键) 启用对象捕捉追踪(F11快捷键) 勾选端点、中心、圆心、象限点等常用捕捉模式 追踪原理:命令执行时悬停光标可显示追踪矢量,再次悬停可停止追踪 3)极轴追踪设置 参数设置: 启用极轴追踪功能 设置角度增量为45度 确认后退出对话框 3. 绘制信号灯 1)绘制圆形 执行命令:"绘图→圆→圆心、半径"命令 绘制过程: 使用对象捕捉追踪定位矩形中心作为圆心 输入半径值30并按Enter确认 通过象限点捕捉确保圆形位置准确 2)绘制直线 操作要点: 选择"绘图→直线"命令 捕捉矩形上边中点作为起点 捕捉圆的上象限点作为终点 按Enter结束当前直线命令 重复技巧: 按Enter可重复最近使用的直线命令 通过圆心捕捉和极轴追踪绘制放射状直线 最终形成完整的信号灯指示图案 3)完成绘制 验证要点: 检查所有直线是否准确连接圆心和象限点 确认极轴追踪的45度增量是否体现 保存绘图文件(快捷键Ctrl+S)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值