Environment
- EDR Server: All Versions
- CentOS/RHEL: Version 7.x
Symptoms
- cb-enterprise services do not fully start due to SystemD timeout
- 'Result: timeout' error seen when running : systemctl status cb-enterprise :
Cause
- This article has historically been meant for:
- Instances with less resources.
- Instances with slow startups or loading multiple services on startup may experience a timeout.
- Symptoms :
- cb-enterprise services is taking longer than the default systemd timeout
- When the timeout is reached the services stop initializing.
- This can be verified from the startup time of the cb-enterprise services in the /var/log/messages log. Anything over the default 5 minutes will result in a timeout.
- Notice:
- There is no harm in extending this timeout, however this may mask another underlying condition. This being said, extremely long startups aren't typically expected. If the resulting services are taking more than 10 minutes to fully start, please check individual services that are failing to start or reach out to VMWare Carbon Black Support.
Resolution
- Check for DNS response. A dns entry that is bad or slow can cause services to timeout.
- Get the dns entries on the server
cat /etc/resolv.conf
- use nslookup to get a response on localhost. Run through each entry in the resolv.conf like below. An example is a dns server of 8.8.8.8. If the response on just one entry does not come back or is slow it can cause timeouts.
# nslookup
>server <dns address 1>
>localhost
>server 8.8.8.8
>localhost
- If you have a proxy setup, confirm that localhost and 127.0.0.1 are set to no_proxy in the environment variables, if these are not they should be added
export | grep -i proxy
- If the services timed out at any time and got into a failed state, systemctl may need to be told to exit the failed state in order for the services to start
- Stop services again
- Check for any running service by the user cb and kill the processes that are still running
ps -aef | grep cb
- Once all processes are cleared, run the following command to reset the failed state
systemctl reset-failed
- If the above items do not correct the issue, the next step is to extend the timeout using unit file :
- Create a cb-enterprise unit file by running the following command:
systemctl edit --ful cb-enterprise.service
- Increase this parameter in the file
TimeoutSec=10min
- Save the file and exit VI
- Reload the systemd units
systemctl daemon-reload
- To enable start at boot:
- Add the following lines to the bottom of the /etc/systemd/system/cb-enterprise.service file :
[Install]
WantedBy=multi-user.target
- Save the file.
- Run the following command from the terminal :
systemctl enable cb-enterprise.service
Additional Notes
- If the problem persists, disable the gevent resolution by adding the following variable to cb.conf on all EDR servers and restarting cb-enterprise or the cluster. Consider notifying Customer Support if this option was needed.
UseGeventAresResolver=False