Just Published! Threat Report: Exposing Malware in Linux-Based Multi-Cloud Environments | Download Now

Cluster Backlog Growing Due to Redis Communication Issue

Cluster Backlog Growing Due to Redis Communication Issue

Version

5.2.x

Issue

Cluster has a growing backlog. Sensors checking in but not submitting data

Symptoms

/var/log/cb/enterprise/enterprise.log:

2016-03-01 23:31:01 [14244] <err>  cb.enterprise.tasks.throttle_calc - Error updating sensor data throttle.  Trying again in 10 seconds.

Traceback (most recent call last):

  File "/usr/lib/python2.6/site-packages/cb/enterprise/tasks/throttle_calc.py", line 45, in work

  File "/usr/lib/python2.6/site-packages/cb/core/throttle/throttler.py", line 28, in update

  File "/usr/lib/python2.6/site-packages/cb/core/throttle/throttle_rates.py", line 22, in update

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/rate.py", line 24, in update

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/rate.py", line 45, in update

  File "/usr/lib/python2.6/site-packages/cb/core/throttle/pressured_rate.py", line 22, in _calc_rate

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/rate_pressure.py", line 33, in update

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/pressures/datastore_cache_pressure.py", line 34, in _update

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/pressures/datastore_cache_pressure.py", line 53, in _get_stats

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/stats/query.py", line 43, in collect_stats

  File "/usr/lib/python2.6/site-packages/cb/core/stats/query_source.py", line 114, in _collect_stats

  File "/usr/lib/python2.6/site-packages/cb/core/stats/query_source.py", line 117, in _query_redis

  File "/usr/lib/python2.6/site-packages/redis/client.py", line 1387, in hgetall

    return self.execute_command('HGETALL', name)

  File "/usr/lib/python2.6/site-packages/redis/client.py", line 397, in execute_command

    connection.send_command(*args)

  File "/usr/lib/python2.6/site-packages/redis/connection.py", line 306, in send_command

    self.send_packed_command(self.pack_command(...

2016-03-01 23:31:01 [14244] <err>  cb.enterprise.tasks.throttle_calc -

                                            ...*args))

  File "/usr/lib/python2.6/site-packages/redis/connection.py", line 288, in send_packed_command

    self.connect()

  File "/usr/lib/python2.6/site-packages/redis/connection.py", line 235, in connect

    raise ConnectionError(self._error_message(e))

ConnectionError: Error 111 connecting 10.107.74.45:6379. Connection refused.

2016-03-01 23:32:11 [14244] <err>  cb.enterprise.tasks.throttle_calc - Error updating sensor data throttle.  Trying again in 10 seconds.

Traceback (most recent call last):

  File "/usr/lib/python2.6/site-packages/cb/enterprise/tasks/throttle_calc.py", line 45, in work

  File "/usr/lib/python2.6/site-packages/cb/core/throttle/throttler.py", line 28, in update

  File "/usr/lib/python2.6/site-packages/cb/core/throttle/throttle_rates.py", line 22, in update

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/rate.py", line 24, in update

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/rate.py", line 45, in update

  File "/usr/lib/python2.6/site-packages/cb/core/throttle/pressured_rate.py", line 22, in _calc_rate

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/rate_pressure.py", line 33, in update

  File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__

    self.gen.throw(type, value, traceback)

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/rate_pressure.py", line 63, in _all_stop_context

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/rate_pressure.py", line 33, in update

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/pressures/datastore_cache_pressure.py", line 34, in _update

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/throttle/pressures/datastore_cache_pressure.py", line 53, in _get_stats

  File "/opt/jenkins/builds/workspace/build-cbent-release-5.1.1/code/coreservices/src/cb/core/stats/query.py", line 43, in collect_stats

  File "/usr/lib/python2.6/site-packages/cb/core/stats/query_source.py", line 114, in _collect_stats

  File "/usr/lib/python2.6/site-packages/cb/core/stats/query_sources/datastore_solr_client_stats.py", line 8, in next_time_interval

  File "/usr/lib/python2.6/site-packages/cb/core/stats/query_source.py", line 41, in next_time_interval

KeyError: 'num_bytes_pushed'

Cause

Sensor eventlog throttle may end up in a permanent bad state if Redis connection is lost and restored

Solution

Workaround

Restart services:

  1. Standalone server:
    service cb-enterprise restart
  2. Cluster:
    /usr/share/cb/cbcluster stop
    /usr/share/cb/cbcluster start

Resolution

This issue is resolved in 5.3.1 and 6.x. Please check 5.3.1 release notes for more information on CB-8472.

Labels (1)
Was this article helpful? Yes No
No ratings
Article Information
Author:
Creation Date:
‎04-13-2017
Views:
769
Contributors