experences on building a reliable service

最新推荐文章于 2021-02-22 15:19:49 发布

原创最新推荐文章于 2021-02-22 15:19:49 发布 · 524 阅读

0 ·

CC 4.0 BY-SA版权

Distributed and Parallel 专栏收录该内容

31 篇文章

订阅专栏

本文分享了在Twilio系统设计中避免AWS故障影响的方法，包括构建单一主机服务组成的简单服务、短时间超时和快速重试策略、以及放松一致性要求等经验。这些方法有助于提高系统的稳定性和可用性。

why Twilio wasn't affected by AWS crash(http://www.twilio.com/engineering/2011/04/22/why-twilio-wasnt-affected-by-todays-aws-issues/),

there some important experences:

1, by building simple services composed of a single host, rather than multiple dependent hosts, one can create replicated instances that can survive host failures.

For example, if we had an application that consisted of business logic component A, B, C each of which had to live on separate host, we could compose service group (A, B, C), (A, B, C)… or, we could create component pools (A, A, …), (B, B, …), (C, C, …). With the composition (A, B, C), a single machine failure would result in the loss of a whole system group. By decomposing resources into independent pools, a single host failure only results in the loss of a single host’s worth of functionality.

// i think it's a good experence for system design

2, short timeouts and quick retries

by running multiple redundant copies of service, software should quickly identify those failures and retries to route around failed or slow service.

so, our strategy of avoiding slow location is follow it.

3, relax consistency requirements

when strict consistency is not required, you can partition the reading and writing of data.