We’re migrating product documentation to docs.vmware.com, starting with Carbon Black Cloud. Learn more.

CB Protection Windows 10 Upgrade Performance

CB Protection Windows 10 Upgrade Performance

We've heard from many customers that with the CB Protection agent enabled, Windows 10 Updates can increase the time for these updates to complete. Additionally the amount of time it takes varies from customer to customer.

While we expect any security product to incur some sort of performance impact, we want that impact to be minimal. We have invested significant time and resources into investigating the issue and we want to share what we’ve learned so far. 

 

Define Problem, Set Scope, & Set Goal

At the outset of this problem we had an early realization, the term Windows Update is a non-specific term. It could encompass any number of software upgrades that have nothing to do with the core operating system (e.g. Patch Tuesday upgrading Non-OS Microsoft Software) or  it could be a Windows major upgrade (e.g. Windows 10 Redstone 1 -> Redstone 2). It also doesn’t describe how the upgrade was performed: Is the customer’s network going to slow the upgrade down? What process will they do for approval of the files in the upgrade? What types of rules exist in the customers environment? 

These variables tweak the outcome significantly; some of these scenarios are not possible for us to reproduce for testing (e.g. it’s not realistic for us to test every upgrade scenario with every rule set that each individual customer has). Therefore, we needed to limit the scope of the investigation. We defined the axioms as the following: 

  1. Our solution must be repeatable; anything we do must be something that any customer could do with minimal environmental impacts.
  2. Our solution must be realistic: CBP agent settings must be something that a customer could use in a realistic setting. Running the upgrade with CBP in disabled mode (no enforcement) doesn’t help us determine what customers are seeing.

 

The Goal

We needed to distil the essence of the problem that our customers were experiencing. Based on the feedback from those directly engaged with our customers (Support, Sustaining Engineers, Customer Success), the message was this: Performance of Windows 10 Upgrades has been declining with each new release of the agent since 7.x. 

Our goal was to understand where our performance went and return the agent to 7.x performance levels.

 

Plan the Experiments

In order to accommodate axiom 1, we decided that our tests would only track core windows os upgrades (upgrades like Redstone 1 -> Redstone 2). This would limit some of the environment differences that could happen if we were testing Microsoft updates (updates for any Microsoft product installed on a machine, which could be different across different environments). We also removed the network from the equation - we acknowledge that it’s not realistic for all customers to have the upgrade package fully downloaded before starting the upgrade and that network traffic could play a part in some customers pain. Unfortunately, for the sake of repeatability we opted to remove this. 

In evaluating axiom 1, we also needed to answer the question, when does an upgrade start and end? This is a hard question for a number of reasons like: Is an upgrade with a reboot complete when the computer allows you to log in or after you’ve logged in? What if the Windows upgrade service is still running in the background, do you count that time or not? If the CB Protection agent is still analyzing new files, is the upgrade still going or not? Does the upgrade start when the “Go” button is clicked or when the user is logged off? 

To answer these questions, we consider how customers use their computers, when they hit “Go”, the upgrade starts and when they’ve completed the log in afterwards, it’s likely done. We used this as a guide to start with, but we found that after login the machine can still be upgrading in the background, the CBP agent is often still working hard, and the performance of the machine is still in question. To find a point where the upgrade was considered ‘done’ from a Windows point of view, we instead moved the test endpoint to when the Windows upgrade service ‘trustedinstaller.exe’ had terminated. At that point, we could be confident that the upgrade was done and the system was starting to make its way towards a steady state. 

Evaluating axiom 2 was a lot easier than axiom 1. We needed to make sure that the agent was running in a mode that still enforced the rules in a way that would mimic the behavior of customer systems. For this, we chose a visibility policy that does all the same work as a Medium or High enforcement policy, but without the risk of blocking and complicating the tests. The last thing we needed to determine were the rules for CBP. We needed substantive rules so that we weren’t just running with the default rules and producing non-realistic results. At first, we reached out to a handful of customers to try and mimic their rulesets, but that effort wasn’t as productive as we had hoped - turns out customers are reasonably uncomfortable sharing their rules. Instead, we used a copy of our own rules that we use internally at Carbon Black. This got us to a place where we were comfortable to start large scale testing. 

 

Running the Tests to collect data

Running the tests consisted of repeatedly installing the CBP agent, allowing the agent to finish Initialization and establishing a steady state, then beginning the Windows upgrade. We divided one test set into a number of tests. Each test was a set of trials with the same CBP agent version and OS upgrade path. The only difference across trials was the CBP agent version. The following chart illustrates an example test set. 

 

Test A: 

Agent Version 8.1.4.173

OS Start: Win 10 RS1

OS End: Win10 RS5

Trial Count: 3

Test A: 

Agent Version 8.1.6.130

OS Start: Win 10 RS1

OS End: Win10 RS5

Trial Count: 3

Trial 1

Duration in Minutes and Seconds

Duration in Minutes and Seconds

Trial 2

Duration in Minutes and Seconds

Duration in Minutes and Seconds

Trial 3

Duration in Minutes and Seconds

Duration in Minutes and Seconds

 

While preparing the system we would test with a narrow selection of agents to keep the runtime down. However, as we transitioned to official tests, we ran using the following versions of CBP:

  • None*
  • 7.2.3.3703 (Patch 6) 
  • 8.0.0.2621 (Patch 7)
  • 8.1.0.3546 (Patch 2)
  • 8.1.5.5 
  • 8.1.6.132**

* None is the term used whenever a CBP agent was not installed. This was the control.
** 8.1.6.132 is an experimental build that was used to attempt to remedy some of the problems found in early runs. 

During our system preparation, we found that there was a slight variance on the system. If you ran the same test multiple times, the upgrade time would be +/- 15 seconds (30 total second variance). To account for that variance and still get viable data, we continuously raised the trial count until we saw the averages begin to better fit the data and not behave as though it were weighted by a single outlier. This happened around 10 trials for each test. 

 

What we Found

To avoid dumping data, this document will only show the average trial information. Below is a table representing all of the average upgrade times for each agent version upgrading from RS1 to RS5. 

Agent Version

Time in HH:MM:SS:MS

Agent Version None

0:23:29.110000

Agent Version 7.2.3.3703 (Patch 6) 

0:31:25.395000

Agent Version 8.0.0.2621 (Patch 7)

0:31:25.207778

Agent Version 8.1.0.3546 (Patch 2)

0:32:02.960000

Agent Version 8.1.5.5

0:32:43.007000

Agent Version 8.1.6.132

0:31:34.340000

 

This data gave us a handful of take aways: 

Agent Impacts Performance

The agent has always had an impact on Windows Upgrades. There was never a version of Windows upgrade that was below a 33% performance hit (7.2.3 being the smallest).

8.1.X Agent Slower 

The 8.1.X series was approaching 10% slower than 7.2.3, customers who pointed out that upgrades were faster in earlier agents were correct. 

8.1.6 Agent Better

8.1.6 showed us that it wasn’t because of quality improvements.  While developing the process to do full scale, repeatable, Windows 10 upgrades, we found a number of small bugs that, on their own, would have negligible performance improvements. When the totality of 8.1.6 was examined and all these bugs were fixed, we started to see an impact. It’s important to point this out because often performance problems shape up to be death by 1000 paper cuts. A small change here, a small change there, eventually created a 1% slowdown.

Continued Focus

While we believe we’ve made good strides to move the needle on performance, we’re aware we can still improve. 

 

CBP Future Goals and Plans

Some of our customers may be saying to themselves “Hey, I wish it was only a 33% slowdown because I’m seeing a 100% increase in the amount of time it takes to complete a Windows upgrade.”

We will shortly be starting an Early Access Program for the 8.1.6 Windows agent. We’re hoping to get customers who have been experiencing upgrade performance problems to take part in this program and provide us with detailed results of testing the agent.

We don’t anticipate that the changes we’ve made will resolve the issue for all customers, but we do hope that it starts helping everyone move in the right direction.

Our next steps include:

  • Modifying our axioms as we get feedback from 8.1.6 to expand the scope of our experiments
  • Continue our research and get a better understanding of how Windows 10 performs different types of upgrades
  • Continue analysis of Windows 10 upgrade data to isolate non-performant areas within the agent
  • Perform more regular Windows 10 upgrade testing to catch performance changes much earlier in the development process

If you would like to take part in testing the 8.1.6 agent and can deploy to a non-production environment, have the time to capture and provide feedback, we’d like to hear from you. Please contact lhowarth@carbonblack.com for consideration.


Thank you,

Larry

Labels (1)
Comments

@lhowarth  this is good stuff thank you!

Do you expect any of this benefit to also apply to Windows Server 2016?

It's great that you guys are looking at this.  I'm a little worried that your testing results don't reflect the scale of the impact that your customers are seeing though.  In our testing last year of monthly patches against Windows Server 2016 we saw an impacts of 300%, not 33% (from 22 minutes without Bit9 to 71 minutes with Bit9).

I would argue that most of us aren't as concerned with the cost of a single, every 2 year upgrade.  Instead, we're concerned about the 24 monthly patches that will happen in between.

@flakshack 

"I'm a little worried that your testing results don't reflect the scale of the impact that your customers are seeing though.  In our testing last year of monthly patches against Windows Server 2016 we saw an impacts of 300%, not 33% (from 22 minutes without Bit9 to 71 minutes with Bit9)."

We know that we have customers that have been reporting different results.  We were not able to reproduce those results internally. 

We also know that the changes in 8.1.6 were just the beginning for us.  Since 8.1.6 code freeze we've identified other performance improvements that will make it into the 8.1.8 Agent.    While I'm hopeful that the 8.1.6 changes will bring performance improvements to impacted customers, I'm committed to making sure we get those numbers back to a reasonable place for all customers.

I'm hoping customers like yourself would be willing to join our Early Access Program (EAP) so we can work directly with you in the future.

Thank you,

 

Larry

Article Information
Author:
Creation Date:
‎08-01-2019
Views:
2519
Contributors