今天重启CDH集群时出现了一个错误:Completed only 2/3 steps. First failure: Role not started due to unhealthy host slave5.,其中一台zookeeper无法启动,如下图所示:

CDH-ERROR

CDH-ERROR-02

异常的原因是CDH启动顺序出错,应该是这台机器先启动了cloudera-scm-agent,然后又去启动了cloudera-scm-server。正确的顺序应该是先启动cloudera-scm-server,再启动cloudera-scm-agent

解决办法:

停止所有服务

先停止cloudera-scm-server

1
service cloudera-scm-server stop

在停止其他机器的cloudera-scm-agent

1
service cloudera-scm-agent stop

启动服务

先启动cloudera-scm-server

1
service cloudera-scm-server start

再启动cloudera-scm-agent

1
service cloudera-scm-agent start

启动完成服务正常

CDH-ERROR-03

参考资料: