Excerpts from Writing Efficient Programs

本文详细阐述了通过空间换时间、时间换空间、循环优化、逻辑优化、过程优化和表达式优化等策略来提高程序性能的方法。包括数据结构增强、预计算结果、缓存、延迟评价、打包存储、解释器使用、循环融合、代数身份应用、短路运算、逻辑测试重排序、布尔变量消除、过程内联、常见案例利用、协程、递归转换、并行编程、初始化变量、代数表达式优化、子表达式消除、配对计算和利用全字宽计算等技术,旨在减少运行时间和存储需求。

SUMMARY OF THE RULES

The following list restates each rule from Chapters 4 and 5 and then brieflysummarizes the major points made in the text. A list of the names of the rulescan be found in Section 7.2 on page 110.

SPACE-FOR-TIME RULES

Space-For-Time Rule 1-Data Structure Augmentation: The time requiredfor common operations on data can often be reduced by augmenting the structurewith extra information or by changing the information within the structure sothat it can be accessed more easily. (Page 39.)

  • Reference counters facilitate garbage collection by keeping additionalinformation in dynamically allocated nodes.
  • Hints augment data structures by keeping a fast but possibly inaccuratestructure along with a slow but robust structure.

Space-For-Time Rule 2-Store Precomputed Results: The cost ofrecomputing an expensive function can be reduced by computing the function onlyonce and storing the results. Subsequent requests for the function are thenhandled by table lookup rather than by computing the function. (Page 40.)

  • Peterson stored the value of evaluated board positions to reduce the timeof a game playing program from 27.10 seconds to 0.18 seconds.
  • A procedure for computing Fibonacci numbers can be replaced by a table ofthe numbers.
  • Stu Feldman precomputed the number of ones in all eight-bit strings toreduce run time from over a week to less than two hours.

Space-For-Time Rule 3-Caching: Data that is accessed most oftenshould be the cheapest to access. (Page 42.)

  • Jalics found that caching the last element retrieved from a table reducedthe access cost from 2004 instructions to 4 instructions in 99% of the queries.
  • Chris Van Wyk's storage allocator cached the most popular kind of node andreduced the system run time by over fifty percent; Peter Deutsch cachedactivation records in an allocator and reduced run time by thirty percent.
  • In implementing a dictionary, keep most of the dictionary on disk but cachethe most common words in core.
  • Rick Cattell cached recently-used tuples in a database system to reduce thetime of an access from 8 milliseconds to 1.4 milliseconds.
  • Caching can "backfire" and increase the run time of a program iflocality is not present in the underlying data.

Space-For-Time Rule 4-Lazy Evaluation: The strategy of neverevaluating an item until it is needed avoids evaluations of unnecessary items.(Page 43.)

  • In building a table of Fibonacci numbers, only compute the numbers actuallyused.
  • Al Aho evaluated the elements of a table as they were needed and reducedthe run time of a program from 30 seconds to less than half a second.
  • Brian Kernighan reduced the run time of a document formatter by twentypercent by calculating the width of the current line only as needed rather thanfor every input character.

TIME-FOR-SPACE RULES

Time-For-Space Rule 1-Packing: Dense storage representations candecrease storage costs by increasing the time required to store and retrievedata. (Page 45.)

  • Storing integers in one decimal digit per eight-bit byte, two digits perbyte, and in binary format represent three levels of packing.
  • The space of a database system could be reduced by one-third by packingthree integers (between 0 and 1000) in two 16-bit words.
  • John Laird reduced the time required to read a file of real numbers by afactor of over 80 by packing the file.
  • Stu Feldman found that by unpacking a table he increased the dataspace slightly but decreased the code space by over four thousand words
  • Overlaying reduces data space by storing data items that are neversimultaneously active in the same memory space.
  • Code overlaying reduces code space by using the same storage for routinesthat are never simultaneously needed. Many operating systems provide thisservice automatically in their virtual memory systems.

Time-For-Space Rule 2-Interpreters: The space required to representa program can often be decreased by the use of interpreters in which commonsequences of operations are represented compactly. (Page 47.)

  • Finite State Machines (FSM's) can be implemented by small tables; they areeasy to define, code, prove correct, and maintain.
  • Brooks describes how an interpreter led to small space requirements for aconsole interpreter, and how the time spent in decoding a dense representationof a FORTRAN compiler was paid for by drastically reduced input and outputcosts.
  • In some systems the programmer should use the interpreter provided by theunderlying machine architecture and "compile" common operations intomachine code.

LOOP RULES

Loop Rule 1-Code Motion Out of Loops: Instead of performing acertain computation in each iteration of a loop, it is better to perform it onlyonce, outside the loop. (Page 52.)

  • Moving the calculation of a constant factor outside a for loop reduced itstime from 13.8N microseconds to 7.9N microseconds.
  • Code cannot be moved out of loops if it has side effects that are desiredon every iteration.

Loop Rule 2-Combining Tests: An efficient inner loop should containas few tests as possible, and preferably only one. The programmer shouldtherefore try to simulate some of the exit conditions of the loop by other exitconditions. (Page 53.)

  • Adding a sentinel in the last element of an unsorted vector reduced thetime to search it from 7.3C to 4.1C microseconds.
  • Sentinels can decrease the robustness of a program. Improper use of asentinel caused a C compiler to generate non-reentrant code; the bug surfacedrarely, but was fatal in those circumstances.
  • Sentinels are a common application of Loop Rule 2: we place asentinel at the boundary of a data structure to reduce the cost of testingwhether our search has exhausted the structure.
  • Bob Sproull described how the lexical analyzer of the SAIL compiler used acontrol character at the end of the input buffer as a sentinel to avoid testingfor end-of-buffer on each input character.
  • Combining tests in the sequential search of a sorted array increasedthe run time from 6.8C microseconds to 7.3C microseconds(due to a system-dependent peculiarity); using sentinels finally reduced thesearch time to 4.1C microseconds.
  • Bob Sproull described how three tests could be combined into one toincrease the speed of the inner loop of a screen editor.

Loop Rule 3-Loop Unrolling: A large cost of some short loops is inmodifying the loop indices. That cost can often be reduced by unrolling theloop. (Page 56.)

  • Unrolling a loop to sum an array of ten real numbers reduced the run timefrom 63.4 microseconds to 22.1 microseconds.
  • Unrolling the loop of a sequential search reduced its time from 4.1Cmicroseconds to 3.4C microseconds.

Loop Rule 4-Transfer-Driven Loop Unrolling: If a large cost of aninner loop is devoted to trivial assignments, then those assignments can oftenbe removed by repeating the code and changing the use of variables.Specifically, to remove the assignment I: = J, the subsequent code must treat Jas though it were I. (Page 59.)

  • Unrolling the inner loop of a routine for Fibonacci numbers reduced itstime from 273 microseconds to 143 microseconds.
  • Knuth used unrolling to decrease the time of inserting an element into alinked list by 16 percent.

Loop Rule 5-Unconditional Branch Removal: A fast loop should containno unconditional branches. An unconditional branch at the end of a loop can beremoved by "rotating" the loop to have a conditional branch at thebottom. (Page 62.)

  • This technique is applicable only in low-level languages.

Loop Rule 6-Loop Fusion: If two nearby loops operate on the same setof elements, then combine their operational parts and use only one set of loopcontrol operations. (Page 63.)

  • To find the maximum and minimum elements in an array, we make only oneiterative pass through the array.

LOGIC RULES

Logic Rule 1-Exploit Algebraic Identities: If the evaluation of alogical expression is costly, replace it by an algebraically equivalentexpression that is cheaper to evaluate. (Page 66.)

  • Simple optimizations are often done by compilers; programmers must becareful that a change of this type does not result in slower code.
  • An algebraic identity allowed us to remove the square root in Fragment A2to yield Fragment A3; this gave a speedup of almost a factor of two.

Logic Rule 2-short-circuiting Monotone Functions: If we wish to testwhether some monotone nondecreasing function of several variables is over acertain threshold, then we need not evaluate any of the variables once thethreshold has been reached. (Page 67.)

  • A simple application is evaluating and and or: to evaluate A and B we neednot test B if A is false.
  • Short-circuiting the distance evaluation in Fragment A5 reduced the time ofFragment A6 by forty percent.
  • A more complex application of this rule exits from a loop as soon as thepurpose of the loop has been accomplished.

Logic Rule 3-Reordering Tests: Logical tests should be arranged suchthat inexpensive and often successful tests precede expensive and rarelysuccessful tests. (Page 69.)

  • This was used in testing the character types in a lexical analyzer.
  • This rule is used to push an expensive test inside a cheaper test.
  • Peter Weinberger used a single-line test in a Scrabble program that wasable to avoid an expensive test in over 99% of the cases.

Logic Rule 4-Precompute Logical Functions: A logical function over asmall finite domain can be replaced by a lookup in a table that represents thedomain. (Page 72.)

  • Testing character types in a lexical analyzer is often implemented by atable of character types indexed by characters; Brian Kernighan reports thatthis reduced the run time of some programs by thirty to forty percent.
  • David Moon designed a fast interpreter for a PDP-8 that had one table entryfor each of the 4096 possible instructions.

Logic Rule 5-Boolean Variable Elimination: We can remove booleanvariables from a program by replacing the assignment to a boolean variable V byan if-then-else statement in which one branch represents the case that V is trueand the other represents the case that V is false. (This generalizes to casestatements and other logical control structures.) (Page 73.)

  • This rule usually decreases time slightly (say, less than 25 percent), butgreatly increases code space.
  • More complex applications of this rule remove boolean variables from datastructures by keeping separate structures for the true and false records.

PROCEDURE RULES

Procedure Rule 1-Collapsing Procedure Hierarchies: The run times ofthe elements of a set of procedures that (nonrecursively) call themselves canoften be reduced by rewriting procedures in line and binding the passedvariables. (Page 75.)

  • Rewriting the distance procedure in line reduced the run time of FragmentA4 from 21.2N2 microseconds to 14.0N2 microseconds.
  • Dennis Ritchie increased the speed of a macro processor by a factor of fourby writing procedures in line.

Procedure Rule 2-Exploit Common Cases: Procedures should beorganized to handle all cases correctly and common cases efficiently. (Page 76.)

  • Mary Shaw used this technique to increase the efficiency of the registerSAVE and UNSAVE operations on the Rice University Computer; efficiently handlingthe special case of operating on all possible registers reduced the run time ofsome programs by thirty percent.
  • This rule encourages us to remove unneeded generality from subroutines;Chris Van Wyk increased the speed of a program by a factor of three by using aspecial-purpose procedure for intersecting line segments.
  • We should organize systems so that efficient cases are common cases; byensuring that bit fields always start in the same positions in words, Rob Pikeincreased the efficiency of a raster graphics operation by a factor of two.

Procedure Rule 3-Coroutines: A multiple-pass algorithm can often beturned into a single-pass algorithm by use of coroutines. (Page 79.)

  • An intermediate file that is written sequentially and then readsequentially can often be removed by linking together the two programs ascoroutines; this increases space requirements but reduces costly input/outputoperations.

Procedure Rule 4-Transformations on Recursive Procedures: The runtime of recursive procedures can often be reduced by applying the followingtransformations: (Page 80.)

  • Code the recursion explicitly by use of a program stack.
  • If the final action of a procedure P is to call itself recursively, replacethat call by a goto to its first statement; this is usually known as removingtail recursion. That goto can often be transformed into a loop.
  • If a procedure contains only one recursive call on itself, then it is notnecessary to store the return address on the stack.
  • It is often more efficient to solve small subproblems by use of anauxiliary procedure, rather than by recurring down to problems of size zero orone.

Procedure Rule 5-Parallelism: A program should be structured toexploit as much of the parallelism as possible in the underlying hardware. (Page80.)

  • Kulsrud, Sedgewick, Smith, and Szymanski used techniques at many designlevels to build a Quicksort program on a Cray-1 that can sort 800,000 elementsin less than 1.5 seconds.

EXPRESSION RULES

Expression Rule 1-Compile-Time Initialization: As many variables aspossible should be initialized before program execution. (Page 82.)

  • John Laird preprocessed data unchanged between runs of a program to reducethe program's run time from 120 seconds to 4 seconds.

Expression Rule 2-Exploit Algebraic Identities: If the evaluation ofan expression is costly, replace it by an algebraically equivalent expressionthat is cheaper to evaluate. (Page 82.)

  • An algebraic identity yields a fast range test that compiler writers canuse on two's-complement architectures.
  • We can often multiply or divide by powers of two by shifting left or right.
  • Strength reduction on a loop that iterates through the elements of an arrayreplaces a multiplication by an addition. This technique generalizes to a largeclass of incremental algorithms.
  • David Jefferson used an incremental algorithm to reduce the number ofcharacters sent to a terminal by a factor of over five.

Expression Rule 3-Common Subexpression Elimination: If the sameexpression is evaluated twice with none of its variables altered betweenevaluations, then the second evaluation can be avoided by storing the result ofthe first and using that in place of the second. (Page 84.)

  • We cannot eliminate the common evaluation of an expression with importantside-effects.

Expression Rule 4-Pairing Computation: If two similar expressionsare frequently evaluated together, then we should make a new procedure thatevaluates them as a pair. (Page 84.)

  • Knuth reported that both the sine and the cosine of a given angle can becomputed together for 1.5 times the cost of computing either individually.Similarly, the maximum and the minimum elements of a vector can be found atabout 1.5 times the cost of finding either one.

Expression Rule 5-Exploit Word Parallelism: Use the full word widthof the underlying computer architecture to evaluate expensive expressions. (Page85.)

  • When we OR two 32-bit sets together giving as output their 32-bit union, weare performing 32 operations in parallel.
  • Stu Feldman's program to count one bits in a word (described inSpace-For-Time Rule 1) and Peter Weinberger's Scrabble program (described inLogic Rule 3) both use this rule.

From Writing Efficient Programs by Jon Bentley; Prentice-Hall 1982ISBN 0-13-970244-X

From http://users.erols.com/blilly/programming/Writing_Efficient_Programs.html

基于遗传算法的新的异构分布式系统任务调度算法研究(Matlab代码实现)内容概要:本文档围绕基于遗传算法的异构分布式系统任务调度算法展开研究,重点介绍了一种结合遗传算法的新颖优化方法,并通过Matlab代码实现验证其在复杂调度问题中的有效性。文中还涵盖了多种智能优化算法在生产调度、经济调度、车间调度、无人机路径规划、微电网优化等领域的应用案例,展示了从理论建模到仿真实现的完整流程。此外,文档系统梳理了智能优化、机器学习、路径规划、电力系统管理等多个科研方向的技术体系与实际应用场景,强调“借力”工具与创新思维在科研中的重要性。; 适合人群:具备一定Matlab编程基础,从事智能优化、自动化、电力系统、控制工程等相关领域研究的研究生及科研人员,尤其适合正在开展调度优化、路径规划或算法改进类课题的研究者; 使用场景及目标:①学习遗传算法及其他智能优化算法(如粒子群、蜣螂优化、NSGA等)在任务调度中的设计与实现;②掌握Matlab/Simulink在科研仿真中的综合应用;③获取多领域(如微电网、无人机、车间调度)的算法复现与创新思路; 阅读建议:建议按目录顺序系统浏览,重点关注算法原理与代码实现的对应关系,结合提供的网盘资源下载完整代码进行调试与复现,同时注重从已有案例中提炼可迁移的科研方法与创新路径。
【微电网】【创新点】基于非支配排序的蜣螂优化算法NSDBO求解微电网多目标优化调度研究(Matlab代码实现)内容概要:本文提出了一种基于非支配排序的蜣螂优化算法(NSDBO),用于求解微电网多目标优化调度问题。该方法结合非支配排序机制,提升了传统蜣螂优化算法在处理多目标问题时的收敛性和分布性,有效解决了微电网调度中经济成本、碳排放、能源利用率等多个相互冲突目标的优化难题。研究构建了包含风、光、储能等多种分布式能源的微电网模型,并通过Matlab代码实现算法仿真,验证了NSDBO在寻找帕累托最优解集方面的优越性能,相较于其他多目标优化算法表现出更强的搜索能力和稳定性。; 适合人群:具备一定电力系统或优化算法基础,从事新能源、微电网、智能优化等相关领域研究的研究生、科研人员及工程技术人员。; 使用场景及目标:①应用于微电网能量管理系统的多目标优化调度设计;②作为新型智能优化算法的研究与改进基础,用于解决复杂的多目标工程优化问题;③帮助理解非支配排序机制在进化算法中的集成方法及其在实际系统中的仿真实现。; 阅读建议:建议读者结合Matlab代码深入理解算法实现细节,重点关注非支配排序、拥挤度计算和蜣螂行为模拟的结合方式,并可通过替换目标函数或系统参数进行扩展实验,以掌握算法的适应性与调参技巧。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值