vCenter6.7 突然发生503告警

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x0000558877143430] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

1、检查了SSL证书没有过期:

2、进入shell检测服务状态,发现vpxd无法启动

service-control --status --all

尝试手动开启服务:service-contrlol --start vpxd

最后告警如下:

查找各种资料后,发现可能是其他服务的证书过期:

执行命令,检查STS以外的证书状态:

for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

留意下面的截图,发现有部分证书的时间和其他不一样,其中VPXD和Vphere-Client等多个证书显示已到期,这种情况会导致服务无法启动,页面提示503异常。(最下方bak_*的证书过期也有可能导致这个情况,解决方法在下面)

解决方法:重新更新证书

执行 /usr/lib/vmware-vmca/bin/certificate-manager 

选择4,以下基本都是选择默认的参考,等待运行结束

(不在轻易的使用8,我的问题就是使用8后发生报错,导致部分证书没有完成更新遗留而产生的问题)

检查服务状态

service-control --status --all  

几个重要的服务也已经恢复正常

再检查证书状态

for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

注意上图中 data-encipherment 证书时间依然未更新,这需要官方的修复文件,我这里直接给出文件内容,请自行复制使用。

创建        vi fix_encipherment_cert.sh

授权        chmod +x fix_encipherment_cert.sh

执行        ./fix_encipherment_cert.sh   #会自动重启vpxd

#!/bin/bash
# Run this from the vCenter Server where data-encipherment certificate is expired and needs to be replaced

echo "Replacing Certificate in data-encipherment VECS Store"

echo ""
PNID=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep PNID | awk '{print $4}'|tr -d '"')
echo "Detected PNID: $PNID"

echo ""
PSC=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep DCName | awk '{print $4}'|tr -d '"')
echo "Detected PSC: $PSC"

echo ""
echo "Taking backup of old certificate and private key to /tmp directory"
/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store data-encipherment --alias data-encipherment --output /tmp/old-data-encipherment.crt
/usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store data-encipherment --alias data-encipherment --output /tmp/old-data-encipherment.key

echo ""
echo "Deleting the existing certificate from the VECS store"
/usr/lib/vmware-vmafd/bin/vecs-cli entry delete -y --store data-encipherment --alias data-encipherment

echo ""
echo "Generating new certificate using the existing private key and add to the VECS store"
/usr/lib/vmware-vmca/bin/certool --server=$PSC --genCIScert --dataencipherment --privkey=/tmp/old-data-encipherment.key --cert=/tmp/tmp-data-encipherment.crt --Name=data-encipherment --FQDN=$PNID

echo ""
echo "Listing the new certificate in VECS Store"
/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store data-encipherment --text | egrep 'Alias|Serial Number:|Subject:|Not Before|Not After'

echo ""
echo "*************************************************************************************************************************"
echo "  Completed the script execution, please follow the manual steps in case the script fails to replace the Certificate"
echo ""
echo "  VPXD Service needs to be restarted for the changes to take effect, otherwise Guest OS Customizations might fail"
echo "  Please execute following command to restart the service: "
echo ""
echo "  service-control --stop vpxd && service-control --start vpxd "
echo "*************************************************************************************************************************"

 修复完成后检查证书状态

for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

已更新 

最后重新启动服务或重启

service-control --stop --all && service-control --start --all

关于bak_* 的非STS证书过期问题

使用官方脚本处理,脚本在页面最左下角

Certificate alarm - Clearing BACKUP_STORES certificates in the VCSA

怕以后没办法下载,这里留下代码

#!/bin/bash

#Cesar Badilla Monday, November 16, 2020 10:41:17 PM 

echo "######################################################"
echo;echo "These are the current Certificate Stores:";echo
		
		for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done; 
echo;echo "If there is any expired or expiring Certificates within the BACKUP_STORES please continue to run this script";echo "######################################################";echo 

	read -p "Have you taken powered off snapshots of all PSC's and VCSA's within the SSO domain(Y|y|N|n)" -n 1 -r

	if [[ ! $REPLY =~ ^[Yy]$ ]]
	then 
	exit 1
	fi
echo

		for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store BACKUP_STORE |grep -i "alias" | cut -d ":" -f2);do echo BACKUP_STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store BACKUP_STORE --alias $i -y; done 
		
	for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done | grep -i 'BACKUP_STORE_H5C'&> /dev/null

	if [ $? == 0 ]; then 
		for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store BACKUP_STORE_H5C |grep -i "alias" | cut -d ":" -f2); do echo BACKUP_STORE_H5C $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store BACKUP_STORE_H5C --alias $i -y; done
	
echo 
echo "--------------------------------------------------------";
fi

echo "######################################################";
echo;echo "The resulting BACKUP_STORES after the cleanups are: ";echo

		for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

echo "######################################################";echo "--------------------------------------------------------"; echo "--------------------------------------------------------";
echo "Results: ";
echo "--------------------------------------------------------"; echo "--------------------------------------------------------";
echo;echo "The Certificate BACKUP_STORES were successfully cleaned";echo;
echo "Please acknowlege and reset to green any certificate related alarm."
echo "Restart services on all PSC's and VCSA's in the SSO Domain with command.";echo;echo "service-control --stop --all && service-control --start --all(optional)."
echo "--------------------------------------------------------";
echo;echo "If you could not restart the services, please monitor
the VCSA for 24 hours and the alarm should not reappear 
after the acknowlegement."
echo;echo "######################################################"



创建: vi clean_backup_stores.sh
授权:chmod +x clean_backup_stores.sh
执行:./clean_backup_stores.sh

该脚本将首先验证其中一个备份存储(BACKUP_STORE 和 BACKUP_STORE_H5C)中是否确实存在任何过期的证书。它会提示你执行快照(必须要先创建最新的全量快照,这非常重要):

Have you taken powered off snapshots of all PSC's and VCSA's within the SSO domain(Y|y|N|n)

输入y

如果有如下报错

 -bash: ./clean_backup_stores.sh: /bin/bash^M: bad interpreter: No such file or directory.

则运行

sed -i -e 's/\r$//' clean_backup_stores.sh
./clean_backup_stores.sh
#注意提示

 

重启所有服务

service-control --stop --all && service-control --start --all

 

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值