Know Your Limits

本文探讨了软件工程师如何理解和应对资源限制,包括时间、空间复杂度及系统性能特性。通过对比不同搜索算法的效率,强调了有效利用缓存的重要性。

"Man's got to know his limitations." — Dirty Harry

Your resources are limited. You only have so much time and money to do your work, including the time and money needed to keep your knowledge, skills, and tools up-to-date. You can only work so hard, so fast, so smart, and so long. Your tools are only so powerful. Your target machines are only so powerful. So you have to respect the limits of your resources.

How to respect those limits? Know yourself, know your people, know your budgets, and know your stuff. Especially, as a software engineer, know the space and time complexity of your data structures and algorithms, and the architecture and performance characteristics of your systems. Your job is to create an optimal marriage of software and systems.

Space and time complexity are given as the function O(f(n)) which for n equal the size of the input is the asymptotic space or time required as n grows to infinity. Important complexity classes for f(n) include ln(n), n, n ln(n), ne, and en. As graphing these functions clearly shows, as n gets bigger O(ln(n)) is ever so much smaller than O(n) and O(n ln(n)), which are ever so much smaller than O(ne) and O(en). As Sean Parent puts it, for achievable n all complexity classes amount to near-constant, near-linear, or near-infinite.

 access time capacity
register< 1 ns64 b 
cache line 64 B
L1 cache1 ns64 KB
L2 cache4 ns8 MB
RAM20 ns32 GB
disk10 ms10 TB
LAN20 ms> 1 PB
Internet100 ms> 1 ZB
 

Complexity analysis is in terms of an abstract machine, but software runs on real machines. Modern computer systems are organized as hierarchies of physical and virtual machines, including language runtimes, operating systems, CPUs, cache memory, random-access memory, disk drives, and networks. The first table shows the limits on random access time and storage capacity for a typical networked server.

Note that capacity and speed vary by several orders of magnitude. Caching and lookahead are used heavily at every level of our systems to hide this variation, but they only work when access is predictable. When cache misses are frequent the system will be thrashing. For example, to randomly inspect every byte on a hard drive could take 32 years. Even to randomly inspect every byte in RAM could take 11 minutes. Random access is not predictable. What is? That depends on the system, but re-accessing recently used items and accessing items sequentially are usually a win.

Algorithms and data structures vary in how effectively they use caches. For instance:

  • Linear search makes good use of lookahead, but requires O(n) comparisons.
  • Binary search of a sorted array requires only O(log(n)) comparisons.
  • Search of a van Emde Boas tree is O(log(n)) and cache-oblivious.
         Search time (ns)
8509040
6418015070
5121200230100
409617000320160
 linearbinaryvEB

How to choose? In the last analysis, by measuring. The second table shows the time required to search arrays of 64-bit integers via these three methods. On my computer:

  • Linear search is competitive for small arrays, but loses exponentially for larger arrays.
  • van Emde Boas wins hands down, thanks to its predictable access pattern.

"You pays your money and you takes your choice." — Punch


By Greg Colvin

This work is licensed under a Creative Commons Attribution 3

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值