hdfs同步sentry权限到文件系统acl异常,导致hdfs acl权限全部丢失

1、事情起因

集群是CDH6.2.0,权限使用sentry鉴权,一次hadoop修改配置重启后出现,无论是使用hive客户端还是直接加载文件进去都是提示文件权限问题,使用访问hiveserver2的服务是正常的(hiveserver2正常应该是鉴权是在sentry,鉴权完成后使用hive用户访问),查看了sentry上面的赋权,权限是正常的,再查看hdfs文件系统上面的目录和文件,发现acl权限全部丢失。

2、异常原因及排查思路

查询hadoop namenode的日志,发现有个同步sentry权限到acl的异常,如下:

2024-11-21 12:31:58,976 ERROR org.apache.sentry.core.common.transport.RetryClientInvocationHandler: failed to execute getAllUpdatesFrom
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor294.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.sentry.core.common.transport.RetryClientInvocationHandler.invokeImpl(RetryClientInvocationHandler.java:95)
        at org.apache.sentry.core.common.transport.SentryClientInvocationHandler.invoke(SentryClientInvocationHandler.java:41)
        at com.sun.proxy.$Proxy22.getAllUpdatesFrom(Unknown Source)
        at org.apache.sentry.hdfs.SentryUpdater.getUpdates(SentryUpdater.java:49)
        at org.apache.sentry.hdfs.SentryAuthorizationInfo.update(SentryAuthorizationInfo.java:125)
        at org.apache.sentry.hdfs.SentryAuthorizationInfo.run(SentryAuthorizationInfo.java:220)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.sentry.core.common.exception.SentryHdfsServiceException: Thrift Exception occurred !!
        at org.apache.sentry.hdfs.SentryHDFSServiceClientDefaultImpl.getAllUpdatesFrom(SentryHDFSServiceClientDefaultImpl.java:140)
        ... 16 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
        at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
        at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
        at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
		at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at org.apache.sentry.hdfs.service.thrift.SentryHDFSService$Client.recv_get_authz_updates(SentryHDFSService.java:171)
        at org.apache.sentry.hdfs.service.thrift.SentryHDFSService$Client.get_authz_updates(SentryHDFSService.java:158)
        at org.apache.sentry.hdfs.SentryHDFSServiceClientDefaultImpl.getAllUpdatesFrom(SentryHDFSServiceClientDefaultImpl.java:107)
        ... 16 more
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 30 more
2024-11-21 12:31:58,977 ERROR org.apache.sentry.core.common.transport.RetryClientInvocationHandler: Thrift call failed
org.apache.thrift.transport.TTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.sentry.core.common.transport.RetryClientInvocationHandler.invokeImpl(RetryClientInvocationHandler.java:110)
        at org.apache.sentry.core.common.transport.SentryClientInvocationHandler.invoke(SentryClientInvocationHandler.java:41)
        at com.sun.proxy.$Proxy22.getAllUpdatesFrom(Unknown Source)
        at org.apache.sentry.hdfs.SentryUpdater.getUpdates(SentryUpdater.java:49)
        at org.apache.sentry.hdfs.SentryAuthorizationInfo.update(SentryAuthorizationInfo.java:125)
        at org.apache.sentry.hdfs.SentryAuthorizationInfo.run(SentryAuthorizationInfo.java:220)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
        at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
        at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
        at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135)
		at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at org.apache.sentry.hdfs.service.thrift.SentryHDFSService$Client.recv_get_authz_updates(SentryHDFSService.java:171)
        at org.apache.sentry.hdfs.service.thrift.SentryHDFSService$Client.get_authz_updates(SentryHDFSService.java:158)
        at org.apache.sentry.hdfs.SentryHDFSServiceClientDefaultImpl.getAllUpdatesFrom(SentryHDFSServiceClientDefaultImpl.java:107)
        at sun.reflect.GeneratedMethodAccessor294.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.sentry.core.common.transport.RetryClientInvocationHandler.invokeImpl(RetryClientInvocationHandler.java:95)
        ... 12 more
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 30 more

提示其实很明显,就是sentryHdfsPlugin同步sentry权限到hdfs超时,但是因为sentry这个项目现在已经不维护了,网上的资料也很少,导致应该修改哪些参数也不清楚,于是只能翻看源码的解决了。
首先是入口,我们可以随便找个hadoop集群,把sentry的勾去掉,可以看到修改的配置,如图:
在这里插入图片描述
找到这个类的start方法:

public void start() {
   
   
    if (started) {
   
   
      throw new IllegalStateException("Provider already started");
    }
    started = true;
    try {
   
   
      if (!conf.getBoolean(DFSConfigKeys.DFS_NAMENODE_ACLS_ENABLED_KEY,
              false)) {
   
   
        throw new RuntimeException("HDFS ACLs must be enabled");
      }
      Configuration conf = new Configuration(this.conf);
      conf.addResource(SentryAuthorizationConstants.CONFIG_FILE, true);
      user = conf.get(SentryAuthorizationConstants.HDFS_USER_KEY,
              SentryAuthorizationConstants.HDFS_USER_DEFAULT);
      group = conf.get(SentryAuthorizationConstants.HDFS_GROUP_KEY,
              SentryAuthorizationConstants.HDFS_GROUP_DEFAULT);
      permission = FsPermission.createImmutable(
              (short) conf.getLong(SentryAuthorizationConstants
                              .HDFS_PERMISSION_KEY,
                      SentryAuthorizationConstants.HDFS_PERMISSION_DEFAULT)
      );
      originalAuthzAsAcl = conf.getBoolean(
              SentryAuthorizationConstants.INCLUDE_HDFS_AUTHZ_AS_ACL_KEY,
              SentryAuthorizationConstants.INCLUDE_HDFS_AUTHZ_AS_ACL_DEFAULT);

      LOG.info("Starting");
      LOG.info("Config: hdfs-user[{}] hdfs-group[{}] hdfs-permission[{}] " +
              "include-hdfs-authz-as-acl[{}]", new Object[]
              {
   
   user, group, permission, originalAuthzAsAcl});

      if (authzInfo == null) {
   
   
        authzInfo = new SentryAuthorizationInfo(conf);
      }
      authzInfo.start();
# Hive中管理HDFS副本写入权限的差异化控制方案 在Hadoop生态系统中,Hive表数据实际存储在HDFS上,而用户对HDFS副本的写入权限可以基于不同用户进行差异化配置。以下是专业解决方案: ## 1. 权限分层控制架构 ``` 应用层权限控制 ├── Hive SQL权限 (GRANT/REVOKE) ├── HDFS ACL权限 (setfacl) └── Linux文件系统权限 (chmod) ``` ## 2. HDFS ACL精细化控制 ### (1) 基础ACL设置命令 ```bash # 查看当前ACL hdfs dfs -getfacl /user/hive/warehouse/mytable # 设置用户特定写入权限 hdfs dfs -setfacl -m user:user1:rwx /user/hive/warehouse/mytable hdfs dfs -setfacl -m user:user2:r-x /user/hive/warehouse/mytable # 设置默认ACL(影响新创建文件) hdfs dfs -setfacl -m default:user:user1:rwx /user/hive/warehouse/mytable ``` ### (2) Hive表目录权限初始化 ```sql -- 创建表时指定HDFS权限 CREATE TABLE mytable (id INT) LOCATION '/user/hive/warehouse/mytable' TBLPROPERTIES ( 'hive.hdfs.acl'='user::rwx,user:user1:rwx,user:user2:r-x,group::r-x,other::---', 'hive.hdfs.acl.default'='user::rwx,user:user1:rwx,user:user2:r-x,group::r-x,other::---' ); ``` ## 3. 基于Hive Hook的动态权限控制 实现`DriverRunHook`在查询执行前后动态调整权限: ```java public class AclControlHook implements DriverRunHook { @Override public void postDriverRun(HookContext hookContext) { String user = hookContext.getUgi().getShortUserName(); String tablePath = hookContext.getTable().getPath().toString(); // 根据用户设置不同ACL if(user.equals("user1")) { ShellCmdUtil.run("hdfs dfs -setfacl -m user:"+user+":rwx "+tablePath); } else { ShellCmdUtil.run("hdfs dfs -setfacl -m user:"+user+":r-x "+tablePath); } } } ``` 配置`hive-site.xml`: ```xml <property> <name>hive.exec.driver.run.hooks</name> <value>com.example.AclControlHook</value> </property> ``` ## 4. Sentry/Ranger集成方案 使用Sentry配置精细化HDFS访问策略: ```xml <!-- sentry-site.xml --> <property> <name>sentry.hive.integration</name> <value>true</value> </property> <property> <name>sentry.hdfs.integration</name> <value>true</value> </property> <property> <name>sentry.hdfs.allow.acl.inheritance</name> <value>true</value> </property> ``` ## 5. 权限继承与覆盖规则 1. **显式ACL优先**:直接设置的ACL优先于默认ACL 2. **最近匹配原则**:精确到文件的权限设置覆盖目录级设置 3. **拒绝优先**:DENY权限覆盖ALLOW权限 4. **Hive权限HDFS权限关系**: - Hive的GRANT只控制元数据权限 - 实际文件操作需要HDFS权限配合 - 最终权限取两者交集 ## 6. 最佳实践建议 1. **权限模板化**:为不同角色创建权限模板 ```bash # 管理员模板 hdfs dfs -setfacl -x user:admin:rwx template_admin # 分析师模板 hdfs dfs -setfacl -x user:analyst:r-x template_analyst ``` 2. **定期权限审计** ```bash # 查找权限不一致的文件 hdfs dfs -ls -R /user/hive | awk '{print $8}' | xargs -I {} hdfs dfs -getfacl {} | grep -v "^#" | sort | uniq -c ``` 3. **自动化权限修复** ```python # 用Python脚本同步Hive元数据权限HDFS ACL for table in hive_client.get_tables(): hdfs_path = table.get_location() acl = generate_acl_based_on_hive_perms(table) subprocess.run(f"hdfs dfs -setfacl -R {acl} {hdfs_path}", shell=True) ``` 4. **测试环境验证** ```sql -- 使用EXPLAIN验证权限 EXPLAIN EXTENDED INSERT INTO TABLE target_table SELECT * FROM source_table; -- 检查输出的HDFS路径权限 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值