This incident was caused by increased traffic to our load balancers exceeding the configured maximum number of connections. This was the result of a steady and expected increase of traffic. The increase in the connection count was anticipated and planned for, but a very specific configuration value for the load balancing tier was incorrect and prevented traffic from going above the misconfigured threshold despite there being an overabundance of available CPU and network capacity to handle the load.
The settings remained the same from the previous OS to the upgraded OS, as anticipated. However, there were some subtle differences in the way the upgraded version of the operating system handled the settings and this inadvertently capped the number of available connections from our load balancer fleet to a lower value.
As a result, we are now keenly aware of the many settings that have to be configured and reconfigured. The load balancer fleet is now more robust and we have additional tests in place to verify the settings are correct.