Analysis of a Google Satellite Machine Outage Post - Mortem
1. Incident Overview
On August 11, 2014, from 17:10 to 17:50 PST8PDT, Google experienced a significant incident where all satellite machines were sent to diskerase. This incident had far - reaching impacts on frontend services.
- Impact :
- Frontend queries dropped.
- Some ads were not served.
- There was a latency increase for all services normally served from satellite for nearly two days.
- Root Cause : A long - standing input validation bug in the Traffic Admin server was triggered by the manual re - execution of a workflow to decommission the a12bcd34 satellite. This bug removed the machine constraint on the decom action, sending all sa
超级会员免费看
订阅专栏 解锁全文
720

被折叠的 条评论
为什么被折叠?



