问题现象
由于某种原因将3副本集群的其中一台OBServer停止,差不多过了四天才准备起动该OBServer,此时发现无法启动。
排查过程
启动过程中的OCP报错
2024-01-27 10:44:17.594 ERROR 1213 --- [pool-manual-subtask-executor7,fabc5c389ceb456f,90944f18012e] com.alipay.ocp.core.util.ExceptionUtils : Checked Exception: com.alipay.ocp.core.exception.UnexpectedException occurred with code error.ob.server.start.failed, and args [9000001]
2024-01-27 10:44:17.614 INFO 1213 --- [pool-manual-subtask-executor7,fabc5c389ceb456f,90944f18012e] c.a.o.c.m.t.model.SubtaskInstanceEntity : Set state for subtask: 19024880, current state: RUNNING, new state: FAILED
2024-01-27 10:44:17.637 WARN 1213 --- [pool-manual-subtask-executor7,fabc5c389ceb456f,90944f18012e] c.a.o.c.t.engine.runner.RunnerFactory : Execute task failed, subtask=SubtaskInstanceEntity{id=19024880, name=Start observer process, state=FAILED, operation=RETRY, className=com.alipay.ocp.service.task.business.server.StartObServerProcessTask, seriesId=1, startTime=2024-01-27T10:34:08.819+08:00, endTime=2024-01-27T10:44:17.633+08:00}, failedMessage=Failed to start OBServer 9,000,001.
com.alipay.ocp.core.exception.UnexpectedException: [OCP UnexpectedException]: status=500 INTERNAL_SERVER_ERROR, errorCode=OB_SERVER_START_FAILED, args=9000001
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_312]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_312]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_312]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_312]
at com.alipay.ocp.core.util.ExceptionUtils.newException(ExceptionUtils.java:96) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.util.ExceptionUtils.throwException(ExceptionUtils.java:90) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.util.ExceptionUtils.unExpected(ExceptionUtils.java:77) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.service.task.business.server.StartObServerProcessTask.run(StartObServerProcessTask.java:67) ~[ocp-service-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.task.runtime.Subtask.retry(Subtask.java:49) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.metadb.task.model.SubtaskInstanceEntity.retry(SubtaskInstanceEntity.java:237) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.task.engine.runner.JavaTaskRunner.doExecute(JavaTaskRunner.java:30) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.task.engin
e.runner.JavaTaskRunner.run(JavaTaskRunner.java:20) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:118) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.redirectOutputIfNotSysSchedule(RunnerFactory.java:190) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.task.engine.runner.RunnerFactory.run(RunnerFactory.java:107) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at com.alipay.ocp.core.task.engine.coordinator.worker.subtask.ReadySubtaskWorker.lambda$submitTask$3(ReadySubtaskWorker.java:122) ~[ocp-core-3.3.3-20220905.jar!/:3.3.3-20220905]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_312]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_312]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_312]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_312]

博客记录了OceanBase数据库3副本集群中一台OBServer停止约四天后无法启动的问题。排查时查看启动OCP报错和observer.log.wf日志,通过官网搜索定位问题。解决方法是发现日志序号差别大,怀疑clog过旧,删除clog和ilog下内容后重启,问题解决。
最低0.47元/天 解锁文章
2167

被折叠的 条评论
为什么被折叠?



