I’ve been reading SICP (Structure and Interpretation of Computer Programs) recently, and I feel that I’ve benefited a lot from it, so I’m going to summarize and share some of what I’ve learned.SICP is very comprehensive, and the content includes functional programming, hierarchical and abstraction of data structures, object-oriented, infinite streams, meta-cyclic interpreters, inert value, non-deterministic programming, logic programming, assembly language and machines, compilation principles, and more. The theme of this series of articles is continuation, and the main topics may include:
- Scheme language
- Scheme meta loop interpreter
- The magic of `call/cc'.
call/cc
via the CPS interpreter.call/cc
via CPS transforms.- …
I’ve been reading the last chapter recently and will see about updating some of the content when I finish the book. The basis of this content is the Scheme language, so we’ll start with an introduction to the Scheme language. The main purpose of this article is to let people who don’t know anything about Scheme understand the next few articles, so we won’t get into the details. If you want to know more about it, you can read the relevant documentation.
1 Characteristics of Scheme
Scheme is a dialect of Lisp. Lisp is the second oldest language in the world (Fortran being the first) and has many dialects. These dialects share a common feature - they are based on S-expressions.
S-expressions can be atomic expressions (atom) or lists. Atomic expressions can be numbers, such as 1
, 42
, 3.14
; strings, such as "hello"
; booleans, such as #t
, #f
; or just symbols, such as a
, if
, add
. A list, on the other hand, is a number of S-expressions in a pair of parentheses, separated by spaces:
|
|
Some examples of S-expressions are given below:
The first three S-expressions are atomic expressions. (add 1 2)
is a list of length 3, where the three elements are the symbol add
, the number 1, and the number 2. (display "Hello world")
is a list of length 2, where the first element is the symbol display
, and the second element is the string "Hello world"
. (list (list 1 2) (list "a" "b"))
is a list of length 3, the three elements are the symbol list
, list (list 1 2)
, and list (list "a" "b")
.
Scheme is composed entirely of S-expressions. In Scheme, the first element of a compound expression is used as the type of the expression, and the remaining elements are used as the parameters of the expression.
The type of the expression determines the semantics of the expression and the meaning of its arguments. For example, an if expression takes three arguments, the first of which is a condition, the second of which is an expression that is executed when the condition is true, and the third of which is an expression that is executed when the condition is false. Since S-expressions can be nested arbitrarily, they can be used to construct arbitrarily complex code. The following is an example of Scheme code:
|
|
You can see that S-expressions are nested in layers, forming a tree structure, which is actually the syntax tree. This means that the language actually writes out the syntax tree explicitly. Later we can see the benefits of this approach: the code can be directly represented as a data structure, and the code is extremely easy to parse and compile.
2 Programming Environment
Scheme, as a dialect of Lisp, has many dialects of its own, such as Chez Scheme, MIT Scheme, Racket, and so on. The environment we use is Racket, which is powerful and easy to use. We can download the latest version from its official website. Racket comes with an IDE called DrRacket, which we can use to learn to write Scheme.
Open DrRacket and you are ready to start Scheme programming. The first line of the program should declare the language #lang racket
. Once you have edited the program, click “Run” to execute the code.
Some people may not be used to a language that is full of parentheses, and it is cumbersome to read code that requires counting parentheses. But if the code is properly indented and aligned, the nesting relationships are readily apparent. We can make the parameters on a separate line and indent them by two spaces relative to the type:
Or the first argument is placed on the same line as the type, and subsequent arguments are aligned with the first argument:
If the first parameter is special, you can also have the first parameter on the same line as the type, with the remaining parameters on a separate line and indented by two spaces.
These are basically the three indentation styles. With DrRacket, indentation is automatic; you don’t need to worry about parentheses when reading code, you can just look at the code indentation, just like Python.
3 Basic Expressions
A high-level language must have these three elements:
- primitive expressions: the simplest, most basic elements that a language provides.
- methods of combination: methods for combining atomic expressions into composite elements.
- methods of abstraction: naming composite elements so that they can be manipulated as a whole.
We say assembly language is not a high-level language because it has very weak combinatorial and abstraction capabilities. For example, add $42 %eax
can represent eax + 42
, but to represent (eax + 42) * 3
, you have to write two instructions, because the language has no ability to nest combinations. As for abstraction, the function in assembly (subroutine, to be precise) is more like a goto, whereas Scheme is a very high-level language because it has very strong combinatorial and abstraction capabilities, as we’ll see later.
3.1 Atomic expressions
There are so many kinds of atomic expressions:
- Numbers. Can be integers
10
,-12
; floating point numbers3.14
; rational numbers1/2
,-3/5
, in the form of two integers separated by/
, taking care that there is no space in between, as this is an atom. - String. Identified by double quotes, such as
"Hello world"
. - Boolean. There are two types,
#t
and#f
. - Symbol. Also known as “variables”, or identifiers. For example,
pi
, with a value of3.141592653589793
;sqrt
, a built-in function. Unlike many languages, Scheme’s symbols are not limited to letters, numbers, and underscores; for example,reset!
,*important*
,+
, and1st-item
are all valid symbols.
3.2 Compound Expressions
There are two types of compound expressions in Scheme, special forms and function calls.The syntax for Scheme function calls is (function arg1 arg2 ...)
. Let the first element of an S-expression be a function. The remaining elements are function arguments. For example, the following expressions are function calls:
Here sqrt
, +
, and *
are function names that perform square root, addition, and multiplication operations, respectively. Unlike most other languages, Scheme has no operators; addition, subtraction, multiplication, division, comparison, etc. are all functions.
It may seem strange to beginners, but this syntax has great benefits. First of all, the relationship between expressions is clear and unambiguous, so the programmer does not need to remember the priority of operators, whether to combine left or right, and the program is easy to parse and compile. The usage is uniform, unlike C, where multiplication is a * b
and exponentiation is pow(a, b)
. You don’t need the complex operator overloading rules of C++, you can just define a function named +
, *
.
Some common functions and calls are given below:
|
|
The semicolon ;
is used as a one-line comment in Scheme.
Looking at this, you might think that the expression (if (> a b) a b)
is also a call to an if
function. But it’s not. When you evaluate a function, you evaluate each argument in turn, and then you call the function. In the case of if
, when (> a b)
is true, only a
should be evaluated, not b
. Conversely, only b
should be evaluated. Therefore if
cannot be a function, it should be a special form.
S An expression is like a representation of a syntax tree, and a special form is a particular syntax that defines which child nodes the syntax has and what they mean. Some common special forms and how to use them are given below.
|
|
4 Defining functions
The lambda
special form creates a function of the form (lambda (arg1 arg2 ...) body ...)
. where (arg1 arg2 ...)
is the argument list, and the remaining body ...
is the body of the function, which can consist of multiple expressions, and the return value of the function is the value of the last expression. Functions are usually defined in conjunction with define
, an example of which is given below.
This function implements Euclid’s algorithm to find the greatest common divisor of two integers a
and b
. The argument list is (a b)
, and the body of the function consists of a single if
expression. The if
expression checks if b
is 0 and returns a
if b
is 0, otherwise it recursively calls itself (gcd b (remainder a b))
. Now we can call gcd
:
4.1 Environment
Functions can be nested. For example, to define the function prime?
to determine whether a number is prime, we look for an integer greater than 1 that divides it. If we can’t find an integer that divides it, then it is a prime number.
The environment in which prime?
is located is called the global environment, and the environment in which iter
is located is the internal environment of prime?
. When define
is executed, it adds a variable to the environment it is in. When a function is called, a new environment is created that inherits the environment in which the function was defined; the function’s arguments are instantiated in the new environment. To find the value of an expression, the value of the variable is looked for in the current environment, and if it is not found, it is looked for in the higher environment, and so on. Thus to examine the behavior of a function, two elements must be considered: the code for the function, and the environment in which the function resides. Together, these two elements are sometimes called “c”.
When we execute (prime? 11)
in the global environment, there are these steps:
- Find the
prime?
variable in the global environment, discover that it is a function, call it. - Read the Env field of the closure and find that the environment of this this function is the global environment G, so create a new environment that inherits G, denoted E1.
- Instantiate the arguments in E1 with
n: 11
. - Start executing the code for
prime?
. - Execute
(define (iter i) ...)
, adding the variableiter
to E1.iter
is a function in an environment pointing to E1. - Execute
(iter 2)
, finditer
in E1, find that it is a function, call it. - Finds that the environment of this function is E1, so creates a new environment that inherits from E1, denoted E2.
- Instantiate arguments in E2 with
i: 2
. - Start executing the code for
iter
. - Executed to
(> (* i i) n)
:- Looked for variable
*
in E2 and couldn’t find it; then looked again in E1 and still couldn’t find it; finally found*
as a built-in function in G. - Looked for variable
i
in E2 and foundi: 2
. - Looked for variable
n
in E2, couldn’t find it; then looked in E1, foundn: 11
- …
- Looked for variable
- …
- Execute
(iter (+ i 1))
to findi: 2
in E2 anditer
in E1. Calliter
. - Finds that
iter
is in environment E1, so creates a new environment E3 that inherits from E1. - Instantiate parameters in E3 with
i: 3
. - Start executing the code for
iter
, and so on.
This is how the Scheme environment works. In the next article we will implement this mechanism to realize a Scheme interpreter.
Scheme functions are first class citizens and can be passed as arguments or returned as values. When a function is passed, its environment is also passed. Example:
The f
function returns a function that holds the environment created when f
was called. So we can use this function to get the values passed in when f
was called. We’ll see an interesting application of this mechanism later.
4.2 let
and let*
When we need an intermediate variable, e.g. to calculate $5(3x^2+1)^2 + 4(3x^2+1)$, we need an intermediate variable $t=3x^2+1$ in order to avoid double-counting, and we will use the let
special form for this purpose.
The syntax of let
has the following format:
It’s really a syntactic sugar, which is equivalent to creating a function using lambda
and then calling it immediately:
let
has the drawback that you can’t define the value of a later variable in a way that references the earlier variable, i.e. (let ([a 1] [b (+ a 1)]) b)
is illegal. So we have let*
:
It is also a syntactic sugar, equivalently:
let*
is implemented by nesting let
s, thus allowing references to preceding variables.
5 Data Structures
While the previous section covered combining and abstracting code, this section covers data structures. This series of articles will only use very simple data structures.
5.1 Ordered pairs and lists
To construct a composite structure, we use cons
to construct ordered pairs (pairs). car
gets the first element of the ordered pair and cdr
gets the second element of the ordered pair.
Ordered pairs can be arbitrarily nested, as in (cons (cons 1 2) (cons 3 4))
. Because they can be arbitrarily nested, it is theoretically possible to construct arbitrarily complex data structures from ordered pairs alone. If the ordered pairs are concatenated sequentially, a linked list is obtained:
|
|
Thus for lists, car
is used to get the first element of the list, cdr
is used to get the remaining elements of the list, and cons
inserts an element at the head of the list.
5.2 Quote
You may be wondering what the single quotes '
in '()
and '(2 3 4)
mean. Recall from Section 1 that S-expressions can be atomic expressions or lists. Yes, the list we’re talking about here is the same thing as the list created by the list
function. That is, the S expression (1 2 3 4)
is itself a list. But this expression is interpreted by Scheme as a call to function 1, passing in arguments 2, 3, 4. To represent the list itself, we use the quote special form. quote takes an S-expression as an argument, and instead of evaluating the expression, it returns it. Here are some examples of its use.
|
|
Since quote is so common, we have a simplified form. Preceding any S-expression with single quotes '
means quote the S-expression.
This is very important - it means that code can be parsed as data. This is a capability not available in other non-Lisp-based languages. We’ll be using it a lot in the next article, so let’s look at some simple examples first:
Here cadr
and caddr
are shortcut functions. (cadr x)
is equivalent to (car (cdr x))
, and (caddr x)
is equivalent to (car (cdr (cdr x)))
. This naming is also easy to remember: the a
and d
distributions in the middle indicate that car
and cdr
are called in turn.
We know that lists consist of ordered pairs.S expressions use parentheses to represent lists, so how can they represent more basic elements like ordered pairs? We can experiment:
|
|
If two elements in parentheses are separated by .
, it means that it is an ordered pair. But if the second element of the ordered pair is enclosed in parentheses, the .
and the parentheses of the second element are omitted:
Thus (cons 1 (cons 2 (cons 3 (cons 4 '()))))
) results in '(1 2 3 4)
, which looks like a list now. The advantage of this syntax is that it reflects the fact that lists are made up of ordered pairs (which can be written explicitly as (+ . (2 . (3 . ())))
), but also makes the list look comfortable (typically written (+ 2 3)
).
5.3 Quasiquote and unquote
Scheme also provides a pair of special forms for constructing specific lists: quasiquote
and unquote
. They also take an S-expression as an argument. (quasiquote exp)
can be abbreviated as `exp, and (unquote exp)
can be abbreviated as ,exp
. Like quote
, quasiquote
returns the S-expression as is, but evaluates the unquote
part of it.
A similar syntax is unquote-splicing
, which takes a list as an argument, (unquote-splicing list)
is abbreviated as ,@list
. It evaluates and expands the list:
5.4 Common Functions
Here are some common functions for manipulating lists.
pair?
Determines whether a list is an ordered pair.
list?
Determines if it is a list.
symbol?
Determines if it is a symbol.
null?
Determines if the list is empty.
memq
Finds the ordered pair in the list where car is equal to the given value.
assoc
Assuming that the elements of the list are all ordered pairs, find the element of the ordered pair car that is equal to the given value.
append
Connects two lists.
6 Functional Programming
Scheme advocates functional programming, and in addition to the fact that functions are first-class citizens, it also advocates “no assignment unless necessary”. So far, we haven’t introduced assignment statements. For imperative programming, you can’t even write a finite while loop without an assignment statement. In functional programming, however, we’re comfortable with all kinds of recursion.
Although it is not possible to change a variable by assignment, we can change the value of a parameter when we can call a function. One might argue that using recursion is poor performance because it consumes stack space. It’s true that the code above pushes the value of (car items)
onto the stack before calling (sum (cdr items))
, so that the sum can be calculated after sum
returns. But we only need to change the way we write it a little:
We find that the return value of the recursive call (iter (cdr i) (+ s (car i)))
is directly used as the return value of the original function (iter i s)
, and therefore does not need to be pushed onto the stack before being called. This is called tail recursion. Tail recursion is essentially iteration, because the recursive call to iter
is a constant iterative update of the variables i
and s
.
6.1 Accumulate
We just defined a function that sums all the elements. What about the product of all the elements? We can define a product
function.
We find that this function is almost identical to sum
. Both functions, given an initial value, perform some operation with the elements of the list in turn, and then iterate through the list in turn; the only differences are in the initial value (0 in one case and 1 in the other) and the operation (+
in one case and *
in the other). In Scheme, functions can be passed as values, and both +
and *
are functions. So we can define a generic function that passes initial values and operations as arguments:
6.2 Map
A similar function is map
. map
maps each element of a list to a new value by a given function.
map
also supports passing multiple lists, as in (map proc list1 list2 ...)
. These lists must be of equal length, and the number of lists must equal the number of arguments passed to the function. The elements of list1
are passed to proc
as the first argument, the elements of list2
are passed to proc
as the second element, and so on.
How to implement map
, Scheme supports defining functions with variable arguments. We can define (define (map proc . lists))
, in which case lists
is a list with the remaining arguments. Since (map proc list1 list2)
can also be written as (map proc . (list1 list2))
(see Section 5.2), it is easy to see how this works.
Conversely, if there are n arguments stored in a list, they can be passed to a specified function with apply
:
This allows us to implement the map
function:
6.3 Filter
To filter out matching functions from a list, you can use filter
. It accepts a function that returns a boolean value and a list as arguments, e.g.
We can also implement filter
:
Can you change
map
andfilter
to iterative (tail recursive) form?
7 Assignment
Although functional programming discourages the use of assignments, there are many scenarios in which it would be inconvenient not to use assignments at all, and there are some scenarios in which appropriate use of assignments can improve code performance and simplify some implementations.Scheme performs assignments using the set!
special form, which is used in the format (set! var val)
. set!
evaluates a val
expression and then assigns the value to var
. Example:
Introducing assignment adds a lot of uncertainty to the system. For functions that do not use assignments, passing in a definite argument is bound to result in a definite value, just like a math function. Once assignment is introduced, this is not necessarily the case. You can see the following example:
Here (account 10)
is called twice, passing in the same arguments but returning different values.Section 4.1 mentioned that when we pass a function as a value, the environment in which it resides is also passed. So we can use functions as data structures. The account
above is a function that can also be thought of as a piece of data.
Once constructed, ordered pairs in Racket cannot be modified. We can implement a modifiable ordered pair using a function:
|
|
The above example can be thought of as “object-oriented programming” in Scheme. The functions returned by mcons
are treated as objects, and the actions they perform are determined by the arguments passed to them, hence the message passing style. mcons
can be called a constructor, and the return value of a constructor is called to get a “member”; an expression like (mpair 'mcar)
is similar to mpair.mcar
in Java. The following code shows some usage of mcons
.
8 Summary
This article introduces some of the basics of Scheme, including the construction of S-expressions, the use of common special shapes, function calls and definitions, understanding the environment, ordered pairs and lists, the use of common functions, and assignment operations. These contents are enough to write many Scheme programs. In the next article, we will use Scheme to implement a Scheme interpreter, which implements most of the language features introduced in this article.