Excerpts from Writing Efficient Programs

最新推荐文章于 2023-08-12 16:09:38 发布

转载最新推荐文章于 2023-08-12 16:09:38 发布 · 880 阅读

文章标签：

#variables #numbers #compiler #character #loops #structure

本文详细阐述了通过空间换时间、时间换空间、循环优化、逻辑优化、过程优化和表达式优化等策略来提高程序性能的方法。包括数据结构增强、预计算结果、缓存、延迟评价、打包存储、解释器使用、循环融合、代数身份应用、短路运算、逻辑测试重排序、布尔变量消除、过程内联、常见案例利用、协程、递归转换、并行编程、初始化变量、代数表达式优化、子表达式消除、配对计算和利用全字宽计算等技术，旨在减少运行时间和存储需求。

SUMMARY OF THE RULES

The following list restates each rule from Chapters 4 and 5 and then brieflysummarizes the major points made in the text. A list of the names of the rulescan be found in Section 7.2 on page 110.

SPACE-FOR-TIME RULES

Space-For-Time Rule 1-Data Structure Augmentation: The time requiredfor common operations on data can often be reduced by augmenting the structurewith extra information or by changing the information within the structure sothat it can be accessed more easily. (Page 39.)

Reference counters facilitate garbage collection by keeping additionalinformation in dynamically allocated nodes.
Hints augment data structures by keeping a fast but possibly inaccuratestructure along with a slow but robust structure.

Space-For-Time Rule 2-Store Precomputed Results: The cost ofrecomputing an expensive function can be reduced by computing the function onlyonce and storing the results. Subsequent requests for the function are thenhandled by table lookup rather than by computing the function. (Page 40.)

Peterson stored the value of evaluated board positions to reduce the timeof a game playing program from 27.10 seconds to 0.18 seconds.
A procedure for computing Fibonacci numbers can be replaced by a table ofthe numbers.
Stu Feldman precomputed the number of ones in all eight-bit strings toreduce run time from over a week to less than two hours.

Space-For-Time Rule 3-Caching: Data that is accessed most oftenshould be the cheapest to access. (Page 42.)

Jalics found that caching the last element retrieved from a table reducedthe access cost from 2004 instructions to 4 instructions in 99% of the queries.
Chris Van Wyk's storage allocator cached the most popular kind of node andreduced the system run time by over fifty percent; Peter Deutsch cachedactivation records in an allocator and reduced run time by thirty percent.
In implementing a dictionary, keep most of the dictionary on disk but cachethe most common words in core.
Rick Cattell cached recently-used tuples in a database system to reduce thetime of an access from 8 milliseconds to 1.4 milliseconds.
Caching can "backfire" and increase the run time of a program iflocality is not present in the underlying data.

Space-For-Time Rule 4-Lazy Evaluation: The strategy of neverevaluating an item until it is needed avoids evaluations of unnecessary items.(Page 43.)

In building a table of Fibonacci numbers, only compute the numbers actuallyused.
Al Aho evaluated the elements of a table as they were needed and reducedthe run time of a program from 30 seconds to less than half a second.
Brian Kernighan reduced the run time of a document formatter by twentypercent by calculating the width of the current line only as needed rather thanfor every input character.

TIME-FOR-SPACE RULES

Time-For-Space Rule 1-Packing: Dense storage representations candecrease storage costs by increasing the time required to store and retrievedata. (Page 45.)

Storing integers in one decimal digit per eight-bit byte, two digits perbyte, and in binary format represent three levels of packing.
The space of a database system could be reduced by one-third by packingthree integers (between 0 and 1000) in two 16-bit words.
John Laird reduced the time required to read a file of real numbers by afactor of over 80 by packing the file.
Stu Feldman found that by unpacking a table he increased the dataspace slightly but decreased the code space by over four thousand words
Overlaying reduces data space by storing data items that are neversimultaneously active in the same memory space.
Code overlaying reduces code space by using the same storage for routinesthat are never simultaneously needed. Many operating systems provide thisservice automatically in their virtual memory systems.

Time-For-Space Rule 2-Interpreters: The space required to representa program can often be decreased by the use of interpreters in which commonsequences of operations are represented compactly. (Page 47.)

Finite State Machines (FSM's) can be implemented by small tables; they areeasy to define, code, prove correct, and maintain.
Brooks describes how an interpreter led to small space requirements for aconsole interpreter, and how the time spent in decoding a dense representationof a FORTRAN compiler was paid for by drastically reduced input and outputcosts.
In some systems the programmer should use the interpreter provided by theunderlying machine architecture and "compile" common operations intomachine code.

LOOP RULES

Loop Rule 1-Code Motion Out of Loops: Instead of performing acertain computation in each iteration of a loop, it is better to perform it onlyonce, outside the loop. (Page 52.)

Moving the calculation of a constant factor outside a for loop reduced itstime from 13.8N microseconds to 7.9N microseconds.
Code cannot be moved out of loops if it has side effects that are desiredon every iteration.

Loop Rule 2-Combining Tests: An efficient inner loop should containas few tests as possible, and preferably only one. The programmer shouldtherefore try to simulate some of the exit conditions of the loop by other exitconditions. (Page 53.)

Adding a sentinel in the last element of an unsorted vector reduced thetime to search it from 7.3C to 4.1C microseconds.
Sentinels can decrease the robustness of a program. Improper use of asentinel caused a C compiler to generate non-reentrant code; the bug surfacedrarely, but was fatal in those circumstances.
Sentinels are a common application of Loop Rule 2: we place asentinel at the boundary of a data structure to reduce the cost of testingwhether our search has exhausted the structure.
Bob Sproull described how the lexical analyzer of the SAIL compiler used acontrol character at the end of the input buffer as a sentinel to avoid testingfor end-of-buffer on each input character.
Combining tests in the sequential search of a sorted array increasedthe run time from 6.8C microseconds to 7.3C microseconds(due to a system-dependent peculiarity); using sentinels finally reduced thesearch time to 4.1C microseconds.
Bob Sproull described how three tests could be combined into one toincrease the speed of the inner loop of a screen editor.

Loop Rule 3-Loop Unrolling: A large cost of some short loops is inmodifying the loop indices. That cost can often be reduced by unrolling theloop. (Page 56.)

Unrolling a loop to sum an array of ten real numbers reduced the run timefrom 63.4 microseconds to 22.1 microseconds.
Unrolling the loop of a sequential search reduced its time from 4.1Cmicroseconds to 3.4C microseconds.

Loop Rule 4-Transfer-Driven Loop Unrolling: If a large cost of aninner loop is devoted to trivial assignments, then those assignments can oftenbe removed by repeating the code and changing the use of variables.Specifically, to remove the assignment I: = J, the subsequent code must treat Jas though it were I. (Page 59.)

Unrolling the inner loop of a routine for Fibonacci numbers reduced itstime from 273 microseconds to 143 microseconds.
Knuth used unrolling to decrease the time of inserting an element into alinked list by 16 percent.

Loop Rule 5-Unconditional Branch Removal: A fast loop should containno unconditional branches. An unconditional branch at the end of a loop can beremoved by "rotating" the loop to have a conditional branch at thebottom. (Page 62.)

This technique is applicable only in low-level languages.

Loop Rule 6-Loop Fusion: If two nearby loops operate on the same setof elements, then combine their operational parts and use only one set of loopcontrol operations. (Page 63.)

To find the maximum and minimum elements in an array, we make only oneiterative pass through the array.

LOGIC RULES

Logic Rule 1-Exploit Algebraic Identities: If the evaluation of alogical expression is costly, replace it by an algebraically equivalentexpression that is cheaper to evaluate. (Page 66.)

Simple optimizations are often done by compilers; programmers must becareful that a change of this type does not result in slower code.
An algebraic identity allowed us to remove the square root in Fragment A2to yield Fragment A3; this gave a speedup of almost a factor of two.

Logic Rule 2-short-circuiting Monotone Functions: If we wish to testwhether some monotone nondecreasing function of several variables is over acertain threshold, then we need not evaluate any of the variables once thethreshold has been reached. (Page 67.)

A simple application is evaluating and and or: to evaluate A and B we neednot test B if A is false.
Short-circuiting the distance evaluation in Fragment A5 reduced the time ofFragment A6 by forty percent.
A more complex application of this rule exits from a loop as soon as thepurpose of the loop has been accomplished.

Logic Rule 3-Reordering Tests: Logical tests should be arranged suchthat inexpensive and often successful tests precede expensive and rarelysuccessful tests. (Page 69.)

This was used in testing the character types in a lexical analyzer.
This rule is used to push an expensive test inside a cheaper test.
Peter Weinberger used a single-line test in a Scrabble program that wasable to avoid an expensive test in over 99% of the cases.

Logic Rule 4-Precompute Logical Functions: A logical function over asmall finite domain can be replaced by a lookup in a table that represents thedomain. (Page 72.)

Testing character types in a lexical analyzer is often implemented by atable of character types indexed by characters; Brian Kernighan reports thatthis reduced the run time of some programs by thirty to forty percent.
David Moon designed a fast interpreter for a PDP-8 that had one table entryfor each of the 4096 possible instructions.

Logic Rule 5-Boolean Variable Elimination: We can remove booleanvariables from a program by replacing the assignment to a boolean variable V byan if-then-else statement in which one branch represents the case that V is trueand the other represents the case that V is false. (This generalizes to casestatements and other logical control structures.) (Page 73.)

This rule usually decreases time slightly (say, less than 25 percent), butgreatly increases code space.
More complex applications of this rule remove boolean variables from datastructures by keeping separate structures for the true and false records.

PROCEDURE RULES

Procedure Rule 1-Collapsing Procedure Hierarchies: The run times ofthe elements of a set of procedures that (nonrecursively) call themselves canoften be reduced by rewriting procedures in line and binding the passedvariables. (Page 75.)

Rewriting the distance procedure in line reduced the run time of FragmentA4 from 21.2N2 microseconds to 14.0N2 microseconds.
Dennis Ritchie increased the speed of a macro processor by a factor of fourby writing procedures in line.

Procedure Rule 2-Exploit Common Cases: Procedures should beorganized to handle all cases correctly and common cases efficiently. (Page 76.)

Mary Shaw used this technique to increase the efficiency of the registerSAVE and UNSAVE operations on the Rice University Computer; efficiently handlingthe special case of operating on all possible registers reduced the run time ofsome programs by thirty percent.
This rule encourages us to remove unneeded generality from subroutines;Chris Van Wyk increased the speed of a program by a factor of three by using aspecial-purpose procedure for intersecting line segments.
We should organize systems so that efficient cases are common cases; byensuring that bit fields always start in the same positions in words, Rob Pikeincreased the efficiency of a raster graphics operation by a factor of two.

Procedure Rule 3-Coroutines: A multiple-pass algorithm can often beturned into a single-pass algorithm by use of coroutines. (Page 79.)

An intermediate file that is written sequentially and then readsequentially can often be removed by linking together the two programs ascoroutines; this increases space requirements but reduces costly input/outputoperations.

Procedure Rule 4-Transformations on Recursive Procedures: The runtime of recursive procedures can often be reduced by applying the followingtransformations: (Page 80.)

Code the recursion explicitly by use of a program stack.
If the final action of a procedure P is to call itself recursively, replacethat call by a goto to its first statement; this is usually known as removingtail recursion. That goto can often be transformed into a loop.
If a procedure contains only one recursive call on itself, then it is notnecessary to store the return address on the stack.
It is often more efficient to solve small subproblems by use of anauxiliary procedure, rather than by recurring down to problems of size zero orone.

Procedure Rule 5-Parallelism: A program should be structured toexploit as much of the parallelism as possible in the underlying hardware. (Page80.)

Kulsrud, Sedgewick, Smith, and Szymanski used techniques at many designlevels to build a Quicksort program on a Cray-1 that can sort 800,000 elementsin less than 1.5 seconds.

EXPRESSION RULES

Expression Rule 1-Compile-Time Initialization: As many variables aspossible should be initialized before program execution. (Page 82.)

John Laird preprocessed data unchanged between runs of a program to reducethe program's run time from 120 seconds to 4 seconds.

Expression Rule 2-Exploit Algebraic Identities: If the evaluation ofan expression is costly, replace it by an algebraically equivalent expressionthat is cheaper to evaluate. (Page 82.)

An algebraic identity yields a fast range test that compiler writers canuse on two's-complement architectures.
We can often multiply or divide by powers of two by shifting left or right.
Strength reduction on a loop that iterates through the elements of an arrayreplaces a multiplication by an addition. This technique generalizes to a largeclass of incremental algorithms.
David Jefferson used an incremental algorithm to reduce the number ofcharacters sent to a terminal by a factor of over five.

Expression Rule 3-Common Subexpression Elimination: If the sameexpression is evaluated twice with none of its variables altered betweenevaluations, then the second evaluation can be avoided by storing the result ofthe first and using that in place of the second. (Page 84.)

We cannot eliminate the common evaluation of an expression with importantside-effects.

Expression Rule 4-Pairing Computation: If two similar expressionsare frequently evaluated together, then we should make a new procedure thatevaluates them as a pair. (Page 84.)

Knuth reported that both the sine and the cosine of a given angle can becomputed together for 1.5 times the cost of computing either individually.Similarly, the maximum and the minimum elements of a vector can be found atabout 1.5 times the cost of finding either one.

Expression Rule 5-Exploit Word Parallelism: Use the full word widthof the underlying computer architecture to evaluate expensive expressions. (Page85.)

When we OR two 32-bit sets together giving as output their 32-bit union, weare performing 32 operations in parallel.
Stu Feldman's program to count one bits in a word (described inSpace-For-Time Rule 1) and Peter Weinberger's Scrabble program (described inLogic Rule 3) both use this rule.

From Writing Efficient Programs by Jon Bentley; Prentice-Hall 1982ISBN 0-13-970244-X

From http://users.erols.com/blilly/programming/Writing_Efficient_Programs.html