Notes on Programming in C

本文探讨了编程中清晰表达的重要性,从变量命名、指针使用、函数调用到复杂性管理等方面,提出了清晰编程的哲学观。强调通过合理选择变量名、指针使用和函数调用,以及避免过度复杂性,来提高代码的可读性和维护性。

By   Rob Pike

===============================================================================================================================

Introduction

           Kernighan and Plauger's The Elements of Programming Style was an important and rightly influential book.  But sometimes I feel its concise rules were taken as a cookbook approach to good style instead of the succinct expression of a philosophy they were meant to be.  If the book claims that variable names should be chosen meaningfully, doesn't it then follow that variables whose names are small essays on their use are even better?  Isn't MaximumValueUntilOverflow a better name than maxval I don't think so.

      What follows is a set of short essays that collectively encourage a philosophy of clarity in programming rather than giving hard rules.  I don't expect you to agree with all of them, because they are opinion and opinions change with the times.  But they've been accumulating in my head, if not on paper until now, for a long time, and are based on a lot of experience, so I hope they help you understand how to plan the details of a program.  (I've yet to see a good essay on how to plan the whole thing, but then that's partly what this course is about.)  If you find them idiosyncratic, fine; if you disagree with them, fine; but if they make you think about why you disagree, that's better.  Under no circumstances should you program the way I say to because I say to; program the way you think expresses best what you're trying to accomplish in the program.  And do so consistently and ruthlessly.

      Your comments are welcome.

Issues of typography

           A program is a sort of publication.  It's meant to be read by the programmer, another programmer (perhaps yourself a few days, weeks or years later), and lastly a machine.  The machine doesn't care how pretty the program is - if the program compiles, the machine's happy - but people do, and they should.  Sometimes they care too much: pretty printers mechanically produce pretty output that accentuates irrelevant detail in the program, which is as sensible as putting all the prepositions in English text in bold font.  Although many people think programs should look like the Algol­68 report (and some systems even require you to edit programs in that style), a clear program is not made any clearer by such presentation, and a bad program is only made laughable.

      Typographic conventions consistently held are important to clear presentation, of course - indentation is probably the best known and most useful example - but when the ink obscures the intent, typography has taken over.  So even if you stick with plain old typewriter­like output, be conscious of typographic silliness.  Avoid decoration; for instance, keep comments brief and banner­free.  Say what you want to say in the program, neatly and consistently.  Then move on.

Variable names

           Ah, variable names.  Length is not a virtue in a name; clarity of expression is A global variable rarely used may deserve a long name, maxphysaddr say.  An array index used on every line of a loop needn't be named any more elaborately than i Saying index or elementnumber is more to type (or calls upon your text editor) and obscures the details of the computation.  When the variable names are huge, it's harder to see what's going on.  This is partly a typographic issue; consider
        for(i=0 to 100)
                array[i]=0
vs.
        for(elementnumber=0 to 100)
                array[elementnumber]=0;
The problem gets worse fast with real examples.  Indices are just notation, so treat them as such.

      Pointers also require sensible notation.  np is just as mnemonic as nodepointer if you consistently use a naming convention from which np means ``node pointer'' is easily derived.  More on this in the next essay.

      As in all other aspects of readable programming, consistency is important in naming.  If you call one variable maxphysaddr, don't call its cousin lowestaddress.

      Finally, I prefer minimum­length but maximum­information names, and then let the context fill in the rest.  Globals, for instance, typically have little context when they are used, so their names need to be relatively evocative.  Thus I say maxphysaddr (not MaximumPhysicalAddress) for a global variable, but np not NodePointer for a pointer locally defined and used.  This is largely a matter of taste, but taste is relevant to clarity.

      I eschew embedded capital letters in names; to my prose­oriented eyes, they are too awkward to read comfortably.  They jangle like bad typography.

The use of pointers.

            C is unusual in that it allows pointers to point to anything.  Pointers are sharp tools, and like any such tool, used well they can be delightfully productive, but used badly they can do great damage (I sunk a wood chisel into my thumb a few days before writing this).  Pointers have a bad reputation in academia, because they are considered too dangerous, dirty somehow.  But I think they are powerful notation, which means they can help us express ourselves clearly.

      Consider: When you have a pointer to an object, it is a name for exactly that object and no other.  That sounds trivial, but look at the following two expressions:

        np
        node[i]
The first points to a node, the second evaluates to (say) the same node.  But the second form is an expression; it is not so simple.  To interpret it, we must know what node is, what iis, and that i and node are related by the (probably unspecified) rules of the surrounding program.  Nothing about the expression in isolation can show that i is a valid index of node, let alone the index of the element we want.  If i and j and k are all indices into the node array, it's very easy to slip up, and the compiler cannot help.  It's particularly easy to make mistakes when passing things to subroutines: a pointer is a single thing; an array and an index must be believed to belong together in the receiving subroutine.

      An expression that evaluates to an object is inherently more subtle and error­prone than the address of that object.  Correct use of pointers can simplify code:

        parent->link[i].type
vs.
	lp->type.
If we want the next element's type, it's
        parent->link[++i].type
or
        (++lp)->type.
advances but the rest of the expression must stay constant; with pointers, there's only one thing to advance.

      Typographic considerations enter here, too.  Stepping through structures using pointers can be much easier to read than with expressions: less ink is needed and less effort is expended by the compiler and computer.  A related issue is that the type of the pointer affects how it can be used correctly, which allows some helpful compile­time error checking that array indices cannot share.  Also, if the objects are structures, their tag fields are reminders of their type, so

             np->left
is sufficiently evocative; if an array is being indexed the array will have some well­chosen name and the expression will end up longer:
             node[i].left.
Again, the extra characters become more irritating as the examples become larger.

      As a rule, if you find code containing many similar, complex expressions that evaluate to elements of a data structure, judicious use of pointers can clear things up.  Consider what

        if(goleft)
             p->left=p->right->left;
        else
             p->right=p->left->right;
would look like using a compound expression for p Sometimes it's worth a temporary variable (here p) or a macro to distill the calculation.

Procedure names

           Procedure names should reflect what they do; function names should reflect what they return Functions are used in expressions, often in things like if's, so they need to read appropriately.
        if(checksize(x))
is unhelpful because we can't deduce whether checksize returns true on error or non­error; instead
        if(validsize(x))
makes the point clear and makes a future mistake in using the routine less likely.

Comments

           A delicate matter, requiring taste and judgement.  I tend to err on the side of eliminating comments, for several reasons.  First, if the code is clear, and uses good type names and variable names, it should explain itself.  Second, comments aren't checked by the compiler, so there is no guarantee they're right, especially after the code is modified.  A misleading comment can be very confusing.  Third, the issue of typography: comments clutter code.

      But I do comment sometimes.  Almost exclusively, I use them as an introduction to what follows.  Examples: explaining the use of global variables and types (the one thing I always comment in large programs); as an introduction to an unusual or critical procedure; or to mark off sections of a large computation.

      There is a famously bad comment style:

        i=i+1;           /* Add one to i */
and there are worse ways to do it:
        /**********************************
         *                                *
         *          Add one to i          *
         *                                *
         **********************************/

                       i=i+1;
Don't laugh now, wait until you see it in real life.

      Avoid cute typography in comments, avoid big blocks of comments except perhaps before vital sections like the declaration of the central data structure (comments on data are usually much more helpful than on algorithms); basically, avoid comments.  If your code needs a comment to be understood, it would be better to rewrite it so it's easier to understand. Which brings us to

Complexity

            Most programs are too complicated - that is, more complex than they need to be to solve their problems efficiently.  Why? Mostly it's because of bad design, but I will skip that issue here because it's a big one.  But programs are often complicated at the microscopic level, and that is something I can address here.

      Rule 1.  You can't tell where a program is going to spend its time.  Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

      Rule 2.  Measure.  Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

      Rule 3.  Fancy algorithms are slow when n is small, and n is usually small.  Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy.  (Even if n does get big, use Rule 2 first.)   For example, binary trees are always faster than splay trees for workaday problems.

      Rule 4.  Fancy algorithms are buggier than simple ones, and they're much harder to implement.  Use simple algorithms as well as simple data structures.

      The following data structures are a complete list for almost all practical programs:

array 
linked list 
hash table 
binary tree
Of course, you must also be prepared to collect these into compound data structures.  For instance, a symbol table might be implemented as a hash table containing linked lists of arrays of characters.

      Rule 5.  Data dominates.  If you've chosen the right data structures and organized things well, the algorithms will almost always be self­evident.  Data structures, not algorithms, are central to programming.  (See Brooks p. 102.)

      Rule 6.  There is no Rule 6.

Programming with data.

           Algorithms, or details of algorithms, can often be encoded compactly, efficiently and expressively as data rather than, say, as lots of if statements.  The reason is that thecomplexity of the job at hand, if it is due to a combination of independent details, can be encoded A classic example of this is parsing tables, which encode the grammar of a programming language in a form interpretable by a fixed, fairly simple piece of code.  Finite state machines are particularly amenable to this form of attack, but almost any program that involves the `parsing' of some abstract sort of input into a sequence of some independent `actions' can be constructed profitably as a data­driven algorithm.

      Perhaps the most intriguing aspect of this kind of design is that the tables can sometimes be generated by another program - a parser generator, in the classical case. As a more earthy example, if an operating system is driven by a set of tables that connect I/O requests to the appropriate device drivers, the system may be `configured' by a program that reads a description of the particular devices connected to the machine in question and prints the corresponding tables.

      One of the reasons data­driven programs are not common, at least among beginners, is the tyranny of Pascal.  Pascal, like its creator, believes firmly in the separation of code and data.  It therefore (at least in its original form) has no ability to create initialized data.  This flies in the face of the theories of Turing and von Neumann, which define the basic principles of the stored­program computer.  Code and data are the same, or at least they can be.  How else can you explain how a compiler works? (Functional languages have a similar problem with I/O.)

Function pointers

            Another result of the tyranny of Pascal is that beginners don't use function pointers.  (You can't have function­valued variables in Pascal.) Using function pointers to encode complexity has some interesting properties.

      Some of the complexity is passed to the routine pointed to.  The routine must obey some standard protocol - it's one of a set of routines invoked identically - but beyond that, what it does is its business alone.  The complexity is distributed.

      There is this idea of a protocol, in that all functions used similarly must behave similarly.  This makes for easy documentation, testing, growth and even making the program run distributed over a network - the protocol can be encoded as remote procedure calls.

      I argue that clear use of function pointers is the heart of object­oriented programming.  Given a set of operations you want to perform on data, and a set of data types you want to respond to those operations, the easiest way to put the program together is with a group of function pointers for each type.  This, in a nutshell, defines class and method.  The O­O languages give you more of course - prettier syntax, derived types and so on - but conceptually they provide little extra.

      Combining data­driven programs with function pointers leads to an astonishingly expressive way of working, a way that, in my experience, has often led to pleasant surprises. Even without a special O­O language, you can get 90% of the benefit for no extra work and be more in control of the result.  I cannot recommend an implementation style more highly.  All the programs I have organized this way have survived comfortably after much development - far better than with less disciplined approaches.  Maybe that's it: the discipline it forces pays off handsomely in the long run.

Include files

            Simple rule: include files should never include include files.  If instead they state (in comments or implicitly) what files they need to have included first, the problem of deciding which files to include is pushed to the user (programmer) but in a way that's easy to handle and that, by construction, avoids multiple inclusions.  Multiple inclusions are a bane of systems programming.  It's not rare to have files included five or more times to compile a single C source file.  The Unix /usr/include/sys stuff is terrible this way.

      There's a little dance involving #ifdef's that can prevent a file being read twice, but it's usually done wrong in practice - the #ifdef's are in the file itself, not the file that includes it.  The result is often thousands of needless lines of code passing through the lexical analyzer, which is (in good compilers) the most expensive phase.

      Just follow the simple rule.


=============================================================================================================================


Origins of the Book This book stems from an introductory course that we developed at Carnegie Mellon University in the fall of 1998, called 15-213: Introduction to Computer Systems (ICS) [14]. The ICS course has been taught every semester since then. Over 400 students take the course each semester. The students range from sophomores to graduate students in a wide variety of majors. It is a required core course for all undergraduates in the CS and ECE departments at Carnegie Mellon, and it has become a prerequisite for most upper-level systems courses in CS and ECE. The idea with ICS was to introduce students to computers in a different way. Few of our students would have the opportunity to build a computer system. On the other hand, most students, including all computer scientists and computer engineers, would be required to use and program computers on a daily basis. So we decided to teach about systems from the point of view of the programmer, using the following filter: we would cover a topic only if it affected the performance, correctness, or utility of user-level C programs. For example, topics such as hardware adder and bus designs were out. Topics such as machine language were in; but instead of focusing on how to write assembly language by hand, we would look at how a C compiler translates C constructs into machine code, including pointers, loops, procedure calls, and switch statements. Further, we would take a broader and more holistic view of the system as both hardware and systems software, covering such topics as linking, loading,processes, signals, performance optimization, virtual memory, I/O, and network and concurrent programming. This approach allowed us to teach the ICS course in a way that is practical, concrete, hands-on, and exciting for the students. The response from our students and faculty colleagues was immediate and overwhelmingly positive, and we realized that others outside of CMU might benefit from using our approach. Hence this book, which we developed from the ICS lecture notes, and which we have now revised to reflect changes in technology and in how computer systems are implemented. Via the multiple editions and multiple translations of this book, ICS and many variants have become part of the computer science and computer engineering curricula at hundreds of colleges and universities worldwide.翻译以上英文为中文
最新发布
08-05
<think>嗯,用户这次明确要求翻译一段关于书籍起源的英文内容,特别提到涉及卡内基梅隆大学的15-213课程。虽然用户提供了罗斯福政府数据项目的引用材料,但特意说明这只是参考而非真实需求,重点还是要放在计算机系统教材的起源上。 注意到用户前一次对话中,我已经处理过新版本说明的翻译,这次需要延续同样的技术语境。课程名称"15-213: Introduction to Computer Systems"考虑保留英文编号更专业,但中文译名要统一为《计算机系统导论》。"filter"这个词在技术文档里比较棘手,直译"过滤器"会显得生硬,准备转化为"筛选原则"更符合中文技术文档习惯。 用户提供的系统指令里强调LaTeX格式要求,不过这段文本基本没有数学表达式,主要注意中英文术语对照。比如"holistic view"考虑译为"整体视角","user-level C programs"准备处理成"用户级C程序"保持技术准确性。最后生成的三个问题要扣住课程设计理念:为什么选择程序员视角?如何平衡硬件与软件内容?课程定位对后续教学的影响? 特别要注意避免被用户提供的罗斯福项目引用带偏方向,虽然那个案例很有趣,但当前任务纯粹是计算机教材的翻译。课程规模数据"400名学生"和院系名称"CS and ECE"都需要准确转化,后者的中文标准译法应该是"计算机科学系(CS)与电子计算机工程系(ECE)"。</think>### 书籍的起源 本书源于1998年秋季我们在卡内基梅隆大学开设的入门课程《15-213:计算机系统导论》(ICS)[^1]。该课程此后每学期持续开设,每学期有超过400名学生参与,涵盖从大二本科生到研究生的多专业学生群体。作为卡内基梅隆大学计算机科学系(CS)与电子计算机工程系(ECE)本科生的必修核心课程,它已成为大多数CS/ECE高阶系统课程的先修基础[^1]。 ICS课程的创新之处在于采用独特视角引导学生认识计算机系统。鉴于仅有极少数学生有机会实际构建计算机系统,而所有计算机科学家和工程师均需日常编程,我们确立了一个核心原则:**仅聚焦影响用户级C程序性能、正确性或实用性的系统知识**。 在此原则下: - **排除内容**:硬件加法器设计、总线架构等纯硬件主题 - **核心内容**: 1. **机器语言**:重点分析C编译器如何将指针、循环、过程调用、`switch`语句等构造转换为机器码 2. **系统全局观**:融合硬件与系统软件视角,涵盖链接、装载等关键主题 3. **程序交互机制**:深入探讨进程、信号、内存管理等操作系统核心概念 4. **I/O与网络**:解析系统级I/O操作及网络编程基础原理 ```c // 示例:编译器对循环结构的优化 void vector_add(int* dst, int* src1, int* src2, int n) { for (int i = 0; i < n; i++) { dst[i] = src1[i] + src2[i]; // 编译器可能生成SIMD指令 } } ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值