Compiler vs. Interpreter

最新推荐文章于 2025-07-03 15:02:17 发布

Aphraaa

最新推荐文章于 2025-07-03 15:02:17 发布

阅读量1.5k

点赞数 2

本文深入探讨了编译器的工作原理，从源代码到目标代码的转换过程，包括词法分析、语法分析、语义分析等阶段。同时，对比了编译器与解释器在执行效率、错误处理及内存需求上的差异。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

What is a compiler?

https://hackernoon.com/compilers-and-interpreters-3e354a2e41cf

The simplest definition of a compiler is a program that translates code written in a high-level programming language (like JavaScript or Java) into low-level code (like Assembly) directly executable by the computer or another program such as a virtual machine.

For example, the Java compiler converts Java code to Java Bytecodeexecutable by the JVM (Java Virtual Machine). Other examples are V8, the JavaScript engine from Google which converts JavaScript code to machine code or GCC which can convert code written in programming languages like C, C++, Objective-C, Go among others to native machine code.

What’s in the black box?

So far we’ve looked at a compiler as a magic black box which contains some spell to convert high-level code to low-level code. Let’s open that box and see what’s inside.

A compiler can be divided into 2 parts.

The first one generally called the front end scans the submitted source code for syntax errors, checks (and infers if necessary) the type of each declared variable and ensures that each variable is declared before use. If there is any error, it provides informative error messages to the user. It also maintains a data structure called symbol table which contains information about all the symbols found in the source code. Finally, if no error is detected, another data structure, an intermediate representation of the code, is built from the source code and passed as input to the second part.
The second part, the back end uses the intermediate representation and the symbol table built by the front end to generate low-level code.

Both the front end and the back end perform their operations in a sequence of phases. Each phase generates a particular data structure from another data structure emitted by the phase before it.

The phases of the front end generally include lexical analysis, syntax analysis, semantic analysis and intermediate code generation while theback end includes optimization and code generation.

Structure of a compiler

Lexical Analysis

The first phase of the compiler is the lexical analysis. In this phase, the compiler breaks the submitted source code into meaningful elements called lexemes and generates a sequence of tokens from the lexemes.

A lexeme can be thought of as a uniquely identifiable string of characters in the source programming language, for example, keywords such as if, whileor func, identifiers, strings, numbers, operators or single characters like (, ), . or :.

A token is an object describing a lexeme. Along with the value of the lexeme(the actual string of characters of the lexeme), it contains information such as its type (is it a keyword? an identifier? an operator? …) and the position (line and/or column number) in the source code where it appears.

Sequence of lexemes generated during lexical analysis

If the compiler encounters a string of characters for which it cannot create a token, it will stop its execution by throwing an error; for example, if it encounters a malformed string or number or an invalid character (such as a non-ASCII character in Java).

Syntax Analysis

During syntax analysis, the compiler uses the sequence of tokens generated during the lexical analysis to generate a tree-like data structure called Abstract Syntax Tree, AST for short. The AST reflects the syntactic and logical structure of the program.

Abstract Syntax Tree generated after syntax analysis

Syntax analysis is also the phase where eventual syntax errors are detected and reported to the user in the form of informative messages. For instance, in the example above, if we forget the closing brace } after the definition of the sum function, the compiler should return an error stating that there is a missing } and the error should point to the line and column where the } is missing.

If no error is found during this phase, the compiler moves to the semantic analysis phase.

Semantic Analysis

During semantic analysis, the compiler uses the AST generated during syntax analysis to check if the program is consistent with all the rules of the source programming language. Semantic analysis encompasses

Type inference. If the programming language supports type inference, the compiler will try to infer the type of all untyped expressions in the program. If a type is successfully inferred, the compiler will annotate the corresponding node in the AST with the inferred type information.
Type checking. Here, the compiler checks that all values being assigned to variables and all arguments involved in an operation have the correct type. For example, the compiler makes sure that no variable of type String is being assigned a Double value or that a value of type Bool is not passed to a function accepting a parameter of type Double or again that we’re not trying to divide a String by an Int, "Hello" / 2 (unless the language definition allows it).
Symbol management. Along with performing type inference and type checking, the compiler maintains a data structure called symbol tablewhich contains information about all the symbols (or names) encountered in the program. The compiler uses the symbol table to answer questions such as Is this variable declared before use?, Are there 2 variables with the same name in the same scope? What is the type of this variable? Is this variable available in the current scope? and many more.

The output of the semantic analysis phase is an annotated AST and the symbol table.

Intermediate Code Generation

After the semantic analysis phase, the compiler uses the annotated AST to generate an intermediate and machine-independent low-level code. One such intermediate representation is the three-address code.

The three-address code (3AC), in its simplest form, is a language in which an instruction is an assignment and has at most 3 operands.

Most instructions in 3AC are of the form a := b <operator> c or a := b.

The above drawing depicts a 3AC code generated from an annotated ASTcreated during the compilation of the function

func sum(n: Int): Int = {
    n * (n + 1) / 2
}

The intermediate code generation concludes the front end phase of the compiler.

Optimization

In the optimization phase, the first phase of the back end, the compiler uses different optimization techniques to improve on the intermediate code generated by making the code faster or shorter for example.

For example, a very simple optimization on the 3AC code in the previous example would be to eliminate the temporary assignment t3 := t2 / 2 and directly assign to id1 the value t2 / 2.

Code Generation

In this last phase, the compiler translates the optimized intermediate code into machine-dependent code, Assembly or any other target low-level language.

Compiler vs. Interpreter

Let’s conclude this article with a note about the difference between compilers and interpreters.

Interpreters and compilers are very similar in structure. The main difference is that an interpreter directly executes the instructions in the source programming language while a compiler translates those instructions into efficient machine code.

An interpreter will typically generate an efficient intermediate representation and immediately evaluate it. Depending on the interpreter, the intermediate representation can be an AST, an annotated AST or a machine-independent low-level representation such as the three-address code.

Difference between Compiler and Interpreter

http://www.c4learn.com/c-programming/compiler-vs-interpreter/

No	Compiler	Interpreter
1	Compiler Takes Entire program as input	Interpreter Takes Single instruction as input .
2	Intermediate Object Code is Generated	No Intermediate Object Code is Generated
3	Conditional Control Statements are Executes faster	Conditional Control Statements are Executes slower
4	Memory Requirement : More(Since Object Code is Generated)	Memory Requirement is Less
5	Program need not be compiledevery time	Every time higher level program is converted into lower level program
6	Errors are displayed after entire program is checked	Errors are displayed for every instruction interpreted (if any)
7	Example : C Compiler	Example : BASIC

https://www.programiz.com/article/difference-compiler-interpreter

Interpreter	Compiler
Translates program one statement at a time.	Scans the entire program and translates it as a whole into machine code.
It takes less amount of time to analyze the source code but the overall execution time is slower.	It takes large amount of time to analyze the source code but the overall execution time is comparatively faster.
No intermediate object code is generated, hence are memory efficient.	Generates intermediate object code which further requires linking, hence requires more memory.
Continues translating the program until the first error is met, in which case it stops. Hence debugging is easy.	It generates the error message only after scanning the whole program. Hence debugging is comparatively hard.
Programming language like Python, Ruby use interpreters.	Programming language like C, C++ use compilers.

https://stackoverflow.com/questions/2377273/how-does-an-interpreter-compiler-work

Compiler characteristics:

spends a lot of time analyzing and processing the program
the resulting executable is some form of machine- specific binary code
the computer hardware interprets (executes) the resulting code
program execution is fast

Interpreter characteristics:

relatively little time is spent analyzing and processing the program
the resulting code is some sort of intermediate code
the resulting code is interpreted by another program
program execution is relatively slow

What is a translator?

An S -> T translator accepts code expressed in source language S, and translates it to equivalent code expressed in another (target) language T.

Examples of translators:

Compilers - translates high level code to low level code, e.g. Java -> JVM
Assemblers - translates assembly language code to machine code, e.g. x86as -> x86
High-level translators - translates code from one PL to another, e.g. Java -> C
Decompilers - translates low-level code to high-level code, e.g. Java JVM bytecode -> Java

What is an interpreter?

An S interpreter accepts code expressed in language S, and immediately executes that code. It works by fetching, analysing, and executing one instruction at a time.

Great when user is entering instructions interactively (think Python) and would like to get the output before putting in the next instruction. Also useful when the program is to be executed only once or requires to be portable.

Interpreting a program is much slower than executing native machine code
Interpreting a high-level language is ~100 times slower
Interpreting an intermediate-level (such as JVM bytecode) language is ~10 slower
If an instruction is called repeatedly, it will be analysed repeatedly - time-consuming!
No need to compile code

Differences

Behaviour

A compiler translates source code to machine code, but does not execute the source or object code.
An interpreter executes source code one instruction at a time, but does not translate the source code.

Performance

A compiler takes quite a long time to translate the source program to native machine code, but subsequent execution is fast
An interpreter starts executing the source program immediately, but execution is slow

Interpretive compilers

An interpretive compiler is a good compromise between compilers and interpreters. It translates source program into virtual machine code, which is then interpreted.

An interpretive compiler combines fast translation with moderately fast execution, provided that:

VM code is lower than the source language, but higher than native machine code
VM instructions have simple formats (can be quickly analysed by an interpreter)

Example: JDK provides an interpretive compiler for Java.

The Compiler translates the entire program before it is run.

The Interpreters translates one statement into machine language, executes it, and proceeds to next statement.

Examples with Languages

Interpreted

Python
Ruby
PHP
JAVA(Almighty)
Perl
R
Powershell

compiled

C
C++
C#
Objective-C
SWIFT
Fortran

...

Java is Both a Compiled and Interpreted Language

https://techwelkin.com/compiler-vs-interpreter

When you write a Java program, the javac compiler converts your program into something called bytecode. All the Java programs run inside a JVM (this is the secret behind Java being cross-platform language). Bytecode compiled by javac, enters into JVM memory and there it is interpreted by another program called java. This java program interprets bytecode line-by-line and converts it into machine code to be run by the JVM. Following flowchart shows how a Java program executes.

Execution of a Java program. Java is both a compiled and interpreted language.