why Twilio wasn't affected by AWS crash(http://www.twilio.com/engineering/2011/04/22/why-twilio-wasnt-affected-by-todays-aws-issues/),
there some important experences:
1, by building simple services composed of a single host, rather than multiple dependent hosts, one can create replicated instances that can survive host failures.
For example, if we had an application that consisted of business logic component A, B, C each of which had to live on separate host, we could compose service group (A, B, C), (A, B, C)… or, we could create component pools (A, A, …), (B, B, …), (C, C, …).
With the composition (A, B, C), a single machine failure would result in the loss of a whole system group. By decomposing resources into independent pools, a single host failure only results in the loss of a single host’s worth of functionality.
// i think it's a good experence for system design
2, short timeouts and quick retries
by running multiple redundant copies of service, software should quickly identify those failures and retries to route around failed or slow service.
so, our strategy of avoiding slow location is follow it.
3, relax consistency requirements
when strict consistency is not required, you can partition the reading and writing of data.
本文分享了在Twilio系统设计中避免AWS故障影响的方法,包括构建单一主机服务组成的简单服务、短时间超时和快速重试策略、以及放松一致性要求等经验。这些方法有助于提高系统的稳定性和可用性。
559

被折叠的 条评论
为什么被折叠?



