hbase 踩坑:hbase大面积读写卡死问题定位分析

hbase 踩坑:hbase大面积读写卡死问题定位分析,
作者 伍增田 Tommy WU zxpns18@126.com
生成环境中hbase和hdfs部署在一起,开启了短路读提高性能
磁盘上有较多的坏数据块, io变慢,业务请求大量受阻在队列中

2024-05-20 08:22:25,671 WARN [RpcServer.default.RWQ.Fifo.read.handler=148,queue=16,port=16020] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x1f681631): failed to load 1589308713_BP-1719712434-10.80.10.150-1522218330932
2024-05-20 08:22:25,671 WARN [RpcServer.default.RWQ.Fifo.read.handler=98,queue=14,port=16020] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x1f681631): failed to get 1589308713_BP-1719712434-10.80.10.150-1522218330932

2024-05-20 08:22:25,735 WARN [RpcServer.default.RWQ.Fifo.read.handler=144,queue=12,port=16020] hdfs.BlockReaderFactory: BlockReaderFactory(fileName=/hbase/data/default/alerts/6aedb1697e29f1e4be88996849fbb716/data/e20886e05d7b4fbcac8e209800a76709, block=BP-1719712434-10.80.10.150-1522218330932:blk_1589343399_515603961): I/O error requesting file descriptors. Disabling domain socket DomainSocket(fd=753,path=/var/lib/hadoop-hdfs/dn_socket)
java.net.SocketTimeoutException: read(2) error: Resource temporarily unavailable
at org.apache.hadoop.net.unix.DomainSocket.readArray0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.access 000 ( D o m a i n S o c k e t . j a v a : 45 ) a t o r g . a p a c h e . h a d o o p . n e t . u n i x . D o m a i n S o c k e t 000(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket 000(DomainSocket.java:45)atorg.apache.hadoop.net.unix.DomainSocketDomainInputStream.read(DomainSocket.java:532)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2292)
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:542)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1161)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1086)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1439)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1402)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:805)
at org.apache.hadoop.hbase.io.hfile.HFileBlock F S R e a d e r I m p l . r e a d A t O f f s e t ( H F i l e B l o c k . j a v a : 1565 ) a t o r g . a p a c h e . h a d o o p . h b a s e . i o . h f i l e . H F i l e B l o c k FSReaderImpl.readAtOffset(HFileBlock.java:1565) at org.apache.hadoop.hbase.io.hfile.HFileBlock FSReaderImpl.readAtOffset(HFileBlock.java:1565)atorg.apache.hadoop.hbase.io.hfile.HFileBlockFSReaderImpl.readBlockDataInternal(HFileBlock.java:1769)
at org.apache.hadoop.hbase.io.hfile.HFileBlock F S R e a d e r I m p l . r e a d B l o c k D a t a ( H F i l e B l o c k . j a v a : 1594 ) a t o r g . a p a c h e . h a d o o p . h b a s e . i o . h f i l e . H F i l e R e a d e r I m p l . r e a d B l o c k ( H F i l e R e a d e r I m p l . j a v a : 1488 ) a t o r g . a p a c h e . h a d o o p . h b a s e . i o . h f i l e . H F i l e B l o c k I n d e x FSReaderImpl.readBlockData(HFileBlock.java:1594) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex FSReaderImpl.readBlockData(HFileBlock.java:1594)atorg.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488)atorg.apache.hadoop.hbase.io.hfile.HFileBlockIndexCellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:340)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl H F i l e S c a n n e r I m p l . s e e k T o ( H F i l e R e a d e r I m p l . j a v a : 852 ) a t o r g . a p a c h e . h a d o o p . h b a s e . i o . h f i l e . H F i l e R e a d e r I m p l HFileScannerImpl.seekTo(HFileReaderImpl.java:852) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl HFileScannerImpl.seekTo(HFileReaderImpl.java:852)atorg.apache.hadoop.hbase.io.hfile.HFileReaderImplHFileScannerImpl.seekTo(HFileReaderImpl.java:802)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:326)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:227)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:470)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:369)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.(KeyValueHeap.java:103)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.(KeyValueHeap.java:81)
at org.apache.hadoop.hbase.regionserver.StoreScanner.resetKVHeap(StoreScanner.java:407)
at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:257)
at org.apache.hadoop.hbase.regionserver.MobStoreScanner.(MobStoreScanner.java:44)

2024-05-20 08:22:35,748 WARN [RpcServer.default.RWQ.Fifo.read.handler=144,queue=12,port=16020] hdfs.DFSClient: Connection failure: Failed to connect to /10.80.6.63:50010 for file /hbase/data/default/alerts/6aedb1697e29f1e4be88996849fbb716/data/e20886e05d7b4fbcac8e209800a76709 for block BP-1719712434-10.80.10.150-1522218330932:blk_1589343399_515603961:java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.80.6.63:39489 remote=/10.80.6.63:50010]
java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.80.6.63:39489 remote=/10.80.6.63:50010]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2292)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:423)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1161)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1086)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1439)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1402)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:805)
at org.apache.hadoop.hbase.io.hfile.HFileBlock F S R e a d e r I m p l . r e a d A t O f f s e t ( H F i l e B l o c k . j a v a : 1565 ) a t o r g . a p a c h e . h a d o o p . h b a s e . i o . h f i l e . H F i l e B l o c k FSReaderImpl.readAtOffset(HFileBlock.java:1565) at org.apache.hadoop.hbase.io.hfile.HFileBlock FSReaderImpl.readAtOffset(HFileBlock.java:1565)atorg.apache.hadoop.hbase.io.hfile.HFileBlockFSReaderImpl.readBlockDataInternal(HFileBlock.java:1769)
at org.apache.hadoop.hbase.io.hfile.HFileBlock F S R e a d e r I m p l . r e a d B l o c k D a t a ( H F i l e B l o c k . j a v a : 1594 ) a t o r g . a p a c h e . h a d o o p . h b a s e . i o . h f i l e . H F i l e R e a d e r I m p l . r e a d B l o c k ( H F i l e R e a d e r I m p l . j a v a : 1488 ) a t o r g . a p a c h e . h a d o o p . h b a s e . i o . h f i l e . H F i l e B l o c k I n d e x FSReaderImpl.readBlockData(HFileBlock.java:1594) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex FSReaderImpl.readBlockData(HFileBlock.java:1594)atorg.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488)atorg.apache.hadoop.hbase.io.hfile.HFileBlockIndexCellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:340)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl H F i l e S c a n n e r I m p l . s e e k T o ( H F i l e R e a d e r I m p l . j a v a : 852 ) a t o r g . a p a c h e . h a d o o p . h b a s e . i o . h f i l e . H F i l e R e a d e r I m p l HFileScannerImpl.seekTo(HFileReaderImpl.java:852) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl HFileScannerImpl.seekTo(HFileReaderImpl.java:852)atorg.apache.hadoop.hbase.io.hfile.HFileReaderImplHFileScannerImpl.seekTo(HFileReaderImpl.java:802)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:326)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:227)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:470)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:369)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.(KeyValueHeap.java:103)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.(KeyValueHeap.java:81)
at org.apache.hadoop.hbase.regionserver.StoreScanner.resetKVHeap(StoreScanner.java:407)
at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:257)
at org.apache.hadoop.hbase.regionserver.MobStoreScanner.(MobStoreScanner.java:44)
at org.apache.hadoop.hbase.regionserver.HMobStore.createScanner(HMobStore.java:159)
at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1943)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6181)

2024-05-20 08:46:03,987 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: datanode1:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_FDS operation src: unix:/var/lib/hadoop-hdfs/dn_socket dst:
java.net.SocketException: write(2) error: Broken pipe
at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.access 300 ( D o m a i n S o c k e t . j a v a : 45 ) a t o r g . a p a c h e . h a d o o p . n e t . u n i x . D o m a i n S o c k e t 300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket 300(DomainSocket.java:45)atorg.apache.hadoop.net.unix.DomainSocketDomainOutputStream.write(DomainSocket.java:601)
at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitFds(DataXceiver.java:336)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitFds(Receiver.java:187)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:89)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
at java.lang.Thread.run(Thread.java:748)
2024-05-20 08:46:03,987 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unregistering SlotId(cb6ffa8d5956cfac1e4ce76e9bc18914:19) because the requestShortCircuitFdsForRead operation failed.

regionserver 短路读上报错分析
java.net.SocketTimeoutException: read(2) error: Resource temporarily unavailable
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:542)

private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
          Slot slot) throws IOException {
    ShortCircuitCache cache = clientContext.getShortCircuitCache();
    final DataOutputStream out =
        new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
    SlotId slotId = slot == null ? null : slot.getSlotId();
    new Sender(out).requestShortCircuitFds(block, token, slotId, 1,
        failureInjector.getSupportsReceiptVerification());
    DataInputStream in = new DataInputStream(peer.getInputStream());
    BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
        PBHelper.vintPrefixed(in));
    DomainSocket sock = peer.getDomainSocket();
    failureInjector.injectRequestFileDescriptorsFailure();

PBHelper.vintPrefixed(in), 通过DomainPeer请求短路本地读的block的文件句柄状态超时了。
短路读 Java不支持Unix domainsocket,Hadoop通过jni实现了它
hadoop-common-project\hadoop-common\src\main\native\src\org\apache\hadoop\net\unix\DomainSocket.c
如下是创建DomainSocket:

static jthrowable setup(JNIEnv *env, int *ofd, jobject jpath, int doConnect)
{
  const char *cpath = NULL;
  struct sockaddr_un addr;
  jthrowable jthr = NULL;
  int fd = -1, ret;

  fd = socket(PF_UNIX, SOCK_STREAM, 0);
  if (fd < 0) {
    ret = errno;
    jthr = newSocketException(env, ret,
        "error creating UNIX domain socket with SOCK_STREAM: %s",
        terror(ret));
    goto done;
  }
  memset(&addr, 0, sizeof(&addr));
  addr.sun_family = AF_UNIX;
  cpath = (*env)->GetStringUTFChars(env, jpath, NULL);
  if (!cpath) {
    jthr = (*env)->ExceptionOccurred(env);
    (*env)->ExceptionClear(env);
    goto done;
  }
  ret = snprintf(addr.sun_path, sizeof(addr.sun_path),
                 "%s", cpath);
  if (ret < 0) {
    ret = errno;
    jthr = newSocketException(env, EIO,
        "error computing UNIX domain socket path: error %d (%s)",
        ret, terror(ret));
    goto done;
  }
  if (ret >= sizeof(addr.sun_path)) {
    jthr = newSocketException(env, ENAMETOOLONG,
        "error computing UNIX domain socket path: path too long.  "
        "The longest UNIX domain socket path possible on this host "
        "is %zd bytes.", sizeof(addr.sun_path) - 1);
    goto done;
  }
#ifdef SO_NOSIGPIPE
  /* On MacOS and some BSDs, SO_NOSIGPIPE will keep send and sendto from causing
   * EPIPE.  Note: this will NOT help when using write or writev, only with
   * send, sendto, sendmsg, etc.  See HDFS-4831.
   */
  ret = 1;
  if (setsockopt(fd, SOL_SOCKET, SO_NOSIGPIPE, (void *)&ret, sizeof(ret))) {
    ret = errno;
    jthr = newSocketException(env, ret,
        "error setting SO_NOSIGPIPE on socket: error %s", terror(ret));
    goto done;
  }
#endif
  if (doConnect) {
    RETRY_ON_EINTR(ret, connect(fd, 
        (struct sockaddr*)&addr, sizeof(addr)));
    if (ret < 0) {
      ret = errno;
      jthr = newException(env, "java/net/ConnectException",
              "connect(2) error: %s when trying to connect to '%s'",
              terror(ret), addr.sun_path);
      goto done;
    }
  } else {
    RETRY_ON_EINTR(ret, unlink(addr.sun_path));
    RETRY_ON_EINTR(ret, bind(fd, (struct sockaddr*)&addr, sizeof(addr)));
    if (ret < 0) {
      ret = errno;
      jthr = newException(env, "java/net/BindException",
              "bind(2) error: %s when trying to bind to '%s'",
              terror(ret), addr.sun_path);
      goto done;
    }
    /* We need to make the socket readable and writable for all users in the
     * system.
     *
     * If the system administrator doesn't want the socket to be accessible to
     * all users, he can simply adjust the +x permissions on one of the socket's
     * parent directories.
     *
     * See HDFS-4485 for more discussion.
     */
    if (chmod(addr.sun_path, 0666)) {
      ret = errno;
      jthr = newException(env, "java/net/BindException",
              "chmod(%s, 0666) failed: %s", addr.sun_path, terror(ret));
      goto done;
    }
    if (listen(fd, LISTEN_BACKLOG) < 0) {
      ret = errno;
      jthr = newException(env, "java/net/BindException",
              "listen(2) error: %s when trying to listen to '%s'",
              terror(ret), addr.sun_path);
      goto done;
    }
  }

done:
  if (cpath) {
    (*env)->ReleaseStringUTFChars(env, jpath, cpath);
  }
  if (jthr) {
    if (fd > 0) {
      RETRY_ON_EINTR(ret, close(fd));
      fd = -1;
    }
  } else {
    *ofd = fd;
  }
  return jthr;
}

JNIEXPORT void JNICALL
Java_org_apache_hadoop_net_unix_DomainSocket_anchorNative(
JNIEnv *env, jclass clazz)
{
  fd_init(env); // for fd_get, fd_create, etc.
}

JNIEXPORT jint JNICALL
Java_org_apache_hadoop_net_unix_DomainSocket_connect0(
JNIEnv *env, jclass clazz, jstring path)
{
  int ret, fd;
  jthrowable jthr = NULL;

  jthr = setup(env, &fd, path, 1);
  if (jthr) {
    (*env)->Throw(env, jthr);
    return -1;
  }
  if (((jthr = setAttribute0(env, fd, SEND_TIMEOUT, DEFAULT_SEND_TIMEOUT))) || 
      ((jthr = setAttribute0(env, fd, RECEIVE_TIMEOUT, DEFAULT_RECEIVE_TIMEOUT)))) {
    RETRY_ON_EINTR(ret, close(fd));
    (*env)->Throw(env, jthr);
    return -1;
  }
  return fd;
}

setAttribute0(env, fd, RECEIVE_TIMEOUT, DEFAULT_RECEIVE_TIMEOUT),设置读数据超时 DEFAULT_RECEIVE_TIMEOUT = 120000 毫秒

对应hdfs datanode的实现
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitFds(DataXceiver.java:336)

@Override
  public void requestShortCircuitFds(final ExtendedBlock blk,
      final Token<BlockTokenIdentifier> token,
      SlotId slotId, int maxVersion, boolean supportsReceiptVerification)
        throws IOException {
    updateCurrentThreadName("Passing file descriptors for block " + blk);
    DataOutputStream out = getBufferedOutputStream();
    checkAccess(out, true, blk, token,
        Op.REQUEST_SHORT_CIRCUIT_FDS, BlockTokenSecretManager.AccessMode.READ);
    BlockOpResponseProto.Builder bld = BlockOpResponseProto.newBuilder();
    FileInputStream fis[] = null;
    SlotId registeredSlotId = null;
    boolean success = false;
    try {
      try {
        if (peer.getDomainSocket() == null) {
          throw new IOException("You cannot pass file descriptors over " +
              "anything but a UNIX domain socket.");
        }
        if (slotId != null) {
          boolean isCached = datanode.data.
              isCached(blk.getBlockPoolId(), blk.getBlockId());
          datanode.shortCircuitRegistry.registerSlot(
              ExtendedBlockId.fromExtendedBlock(blk), slotId, isCached);
          registeredSlotId = slotId;
        }
        fis = datanode.requestShortCircuitFdsForRead(blk, token, maxVersion);
        Preconditions.checkState(fis != null);
        bld.setStatus(SUCCESS);
        bld.setShortCircuitAccessVersion(DataNode.CURRENT_BLOCK_FORMAT_VERSION);
      } catch (ShortCircuitFdsVersionException e) {
        bld.setStatus(ERROR_UNSUPPORTED);
        bld.setShortCircuitAccessVersion(DataNode.CURRENT_BLOCK_FORMAT_VERSION);
        bld.setMessage(e.getMessage());
      } catch (ShortCircuitFdsUnsupportedException e) {
        bld.setStatus(ERROR_UNSUPPORTED);
        bld.setMessage(e.getMessage());
      } catch (IOException e) {
        bld.setStatus(ERROR);
        bld.setMessage(e.getMessage());
      }
      bld.build().writeDelimitedTo(socketOut);

fis = datanode.requestShortCircuitFdsForRead(blk, token, maxVersion);
fis[0] = (FileInputStream)data.getBlockInputStream(blk, 0);
fis[1] = DatanodeUtil.getMetaDataInputStream(blk, data);

FsDatasetImpl.java
@Override // FsDatasetSpi
public InputStream getBlockInputStream(ExtendedBlock b,
long seekOffset) throws IOException {
File blockFile = getBlockFileNoExistsCheck(b, true);
if (isNativeIOAvailable) {
return NativeIO.getShareDeleteFileInputStream(blockFile, seekOffset);
} else {
try {
return openAndSeek(blockFile, seekOffset);
} catch (FileNotFoundException fnfe) {
throw new IOException("Block " + b + " is not valid. " +
“Expected block file at " + blockFile + " does not exist.”);
}
}
}
NativeIO.getShareDeleteFileInputStream(blockFile, seekOffset); 获取block的 InputStream长时间受阻了,磁盘上的坏扇区太多导致的,下图是对磁盘io时延的监控,io时延长达12秒。
在这里插入图片描述
解决方案:
1、优先从datanode的数据目录删除有问题的磁盘,执行 hdfs dfsadmin -reconfig datanode datanode1:50020 start
2、最后更换质量好的磁盘
3、问题得到解决,集群一直稳定运行

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值