某天集群出现Unhealthy Nodes导致集群计算能力下降的问题,检查发现该节点比较多磁盘块达到90%的瓶颈了,yarn中有相关的配置,如下:
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage | 90 | The maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. |
yarn.nodemanager.disk-health-checker.min-healthy-disks | 0.25 | The minimum fraction of number of disks to be healthy for the nodemanager to launch new containers. This correspond to both yarn.nodemanager.local-dirs |