Just Published! Threat Report: Exposing Malware in Linux-Based Multi-Cloud Environments | Download Now

CB cluster will not start - RabbitMQ service out of sync in clustered environment

CB cluster will not start - RabbitMQ service out of sync in clustered environment

Version

Cb Response 4.x, 5.x

Topic

This document describes how to sync the RabbitMQ service in a clustered environment.

Symptoms

The following may be observed during cluster start and will be printed to the console standard out:

node8: ('channel %d is closed: %s : %s', 1, 404, "NOT_FOUND - home node 'rabbit@CB-SERVER-CLUSTER-HEAD-NODE' of durable queue 'alliance.data' in vhost '/' is down or inaccessible")

node8: Traceback (most recent call last):

node8: File "/opt/jenkins/builds/workspace/build-cbent-hotfix-5.0.2/code/coreservices/src/cb/utils/exceptions.py", line 57, in decorator

node8: File "/usr/lib/python2.6/site-packages/cb/maintenance/cbstartup/main.py", line 132, in main

node8: File "/usr/lib/python2.6/site-packages/cb/maintenance/cbstartup/main.py", line 82, in run

node8: File "/opt/jenkins/builds/workspace/build-cbent-hotfix-5.0.2/code/coreservices/src/cb/maintenance/cbstartup/actions/init_rabbitmq.py", line 106, in execute

node8: File "/opt/jenkins/builds/workspace/build-cbent-hotfix-5.0.2/code/coreservices/src/cb/maintenance/cbstartup/actions/init_rabbitmq.py", line 187, in _declare_persistent_objects

node8: File "/usr/lib/python2.6/site-packages/haigha/classes/queue_class.py", line 71, in declare

node8: return self.channel.add_synchronous_cb( self._recv_declare_ok )

node8: File "/usr/lib/python2.6/site-packages/haigha/channel.py", line 293, in add_synchronous_cb

node8: self.close_info['reply_text'] )

node8: ChannelClosed: ('channel %d is closed: %s : %s', 1, 404, "NOT_FOUND - home node 'rabbit@CB-SERVER-CLUSTER-HEAD-NODE' of durable queue 'alliance.data' in vhost '/' is down or inaccessible")

Note: References to node refer to master and minions

Cause

In any clustered environment (not just Carbon Black), RabbitMQ relies on the last node that was stopped to be started first. If the RabbitMQ service hangs during a routine cbcluster stop, attempts at starting the cluster may potentially fail due to RabbitMQ's design properties.

Solution

  1. Stop the cluster from the master:
    /usr/share/cb/cbcluster stop
  2. Kill all cb owned processes on all nodes (starting with the master):
    killall -KILL -u cb
  3. Start the cb-rabbitmq service on all nodes (starting with the master):
    service cb-rabbitmq start
    Note: RabbitMQ will not start until the last node to stop is started
  4. Stop cb-rabbitmq service one by one on each node:
    service cb-rabbitmq stop
    Warning: do NOT continue to the next node until RabbitMQ fully stops.
  5. Start the cluster from the master:
    /usr/share/cb/cbcluster start

Note: If RabbitMQ still won't start, follow this guide next: The Mnesia Fix - Resetting RabbitMQ

Labels (1)
Was this article helpful? Yes No
No ratings
Article Information
Author:
Creation Date:
‎08-28-2015
Views:
1741
Contributors