Environment
- EDR Server: All Supported Versions
Symptoms
- Datagrid starts on minions but SupervisorD service does not start on the master. Datagrid logs on minions may complain about PGSQL not being started on the master.
- No error messages are logged on the master node outside of seeing an attempted services start in /var/log/messages.
- SupervisorD logs do not show an attempted service start but individual services can be started by calling on them individually using the /usr/share/cb/cbservice utility.
- Note: This is not a common issue that VMWare CB support has encountered very often. If the file count described above is not high ( > 1000 files/directories), then typically startup troubleshooting should happen starting with the services that are failing to start on their respective node.
Cause
- An unexpected large number of files ('unexpected' in the sense that a service crash may have created additional files) are being scanned for permissions on cb-enterprise startup. Until these files are scanned the services will not start on the master node. Upon a full scan and startup, the Mnesia directory is reset but if thousands of files exist this may take more time than expected, causing a timeout of the services and the cluster to fail to start.
- In this case the /var/cb/data/rabbitmq/mnesia directory can be the cause of such an issue. To confirm if this issue is related please confirm the amount of files in the /var/cb/data/rabbitmq/mnesia directory via:
ls -l /var/cb/data/rabbitmq/mnesia | wc -l
Resolution
- If the amount of files/directories in the Mnesia directory is higher than ~400 ( > 1000 files/directories) then proceed to the KB below. If the number of files/directories is not high, this KB is most likely not the solution. Please see the "Related Content" below.
- Reset the Mnesia directory by backing it up and then removing it as described in this article: EDR: How to reset Mnesia for RabbitMQ .
Related Content