14 numbers every developer should know

Jeff Dean , a famous Google engineer, popularized a list of latency numbers everyone should know. The list is a great resource for designing large scale infrastructure systems.

Algorithms and their complexity often occur in critical parts of computer systems, but I find that few engineers have a good understanding of how a O(n!) algorithm compares to a O(n5) one.

In the coding contest world, competitors think about these tradeoffs all the time. No wonder, there's a set of numbers every algorithm designer should know.

The table below shows the limits that can be reached in a few seconds by algorithms of different complexities, n being the input size. I've added some algorithms and data structure examples for each complexity class.

maximum n complexity algorithms data structures
1,000,000,000 and higherlog n, sqrt nbinary search, ternary search, fast exponentiation, euclid algorithm 
10,000,000n, n log log n, n log* nset intersection, Eratosthenes sieve, radix sort, KMP, topological sort, Euler tour, strongly connected components, 2satdisjoint sets, tries, hash_map, rolling hashdeque
1,000,000n log nsorting, divide and conquer, sweep line, Kruskal, Dijkstrasegment trees, range trees, heaps, treaps, binary indexed trees, suffix arrays
100,000n log2 ndivide and conquer2d range trees
50,000n1.585, n sqrt nKaratsuba, square root tricktwo level tree
1000 - 10,000n2largest empty rectangle, Dijkstra, Prim (on dense graphs) 
300-500n3all pairs shortest paths, largest sum submatrix, naive matrix multiplication, matrix chain multiplication, gaussian elimination, network flow 
30-50n4, n5, n6  
25 - 403n/2, 2n/2meet in the middlehash tables (for set intersection)
15 - 242nsubset enumeration, brute force, dynamic programming with exponential states 
15 - 20n2 2ndynamic programming with exponential statesbitsets, hash_map
13-173ndynamic programming with exponential stateshash_map (to store the states)
11n!brute force, backtracking, next_permutation 
8nnbrute force, cartesian product 

These numbers aren't very precise, they assume in memory operations and some varying constant factors, but they do give a good starting point in your search for a solution that fits your problem and your data size.

Let's go through an example.

Suppose you work for a GPS company and your project is to improve their directions feature. In school you've learned about using Dijkstra's algorithm to find the shortest path between two nodes in a graph. Knowing these numbers you will understand that it will take seconds to process a graph with millions of edges given that Dijkstra implementations have m log n time complexity (where m is the number of edges and n the number of nodes).

Now you face a few questions:

How fast do you want your code to be? seconds? hundreds of milliseconds?

A response on the web feels fast if it takes less then 500 milliseconds. So let's pick half a second.

How big is the graph? Do you want to solve the problem for a city, a coutry or even a continent?

Each is about a magnitude larger than the other and will be solved by different approaches. Let's say we want to solve the problem for the whole of Europe.

Here are sizes for a few input sets:

input Europe USA/CAN USA (Tiger)
#nodes18 029 72118 741 70524 278 285
#directed edges42 199 58747 244 84958 213 192
#road categories13134

Since we chose half a second to be our execution time and the size of our problem to be about 40 million edges it's clear from our table that m log n is too slow. So pure Dijkstra won't do. We need to look at how other algorithms like A star search or one based on Highway hierarchies behave for this problem.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值