Arrays and slices are common data structures in the Go language, and many developers new to Go tend to confuse these two concepts. In addition to arrays, Go introduces another concept - slicing. Slicing has some similarities to arrays, but their differences lead to significant differences in usage. In this section we will introduce the underlying implementation of arrays from the Go compile-time runtime, which will include several common operations for initializing, accessing, and assigning values to arrays.
Overview
An array is a data structure consisting of a collection of elements of the same type. The computer allocates a contiguous piece of memory for the array to hold its elements, and we can use the indexes of the elements in the array to quickly access specific elements. Most common arrays are one-dimensional linear arrays, while multidimensional arrays have more common applications in numerical and graphical computing.
As a basic data type, arrays are usually described in two dimensions, namely the type of elements stored in the array and the maximum number of elements that the array can store. In Go language we tend to use the following representation of array types as shown below.
Go arrays cannot be changed in size after initialization, and arrays that store elements of the same type but different sizes are considered completely different in Go, and are the same type only if both conditions are the same.
The array type during compilation is generated by the above cmd/compile/internal/types.NewArray function, which contains two fields, the element type Elem and the size of the array Bound, which together form the array type, and whether the current array should be initialized on the stack is determined at compile time.
Initialization
Arrays in Go are created in two different ways, either by explicitly specifying the size of the array, or by using […] T to declare the array, and Go will derive the size of the array from the source code during compilation.
The above two declaration methods get exactly the same result during runtime. The latter declaration method is converted to the former one during compilation, which is the compiler’s derivation of the array size, and we will introduce the compiler’s derivation process below.
Upper bound derivation
The two different declarations lead to completely different treatment by the compiler. If we use the first way [10]T, then the type of the variable is extracted as soon as the compilation proceeds to the type checking stage, and the cmd/compile/internal/types.NewArray containing the array size is subsequently created using cmd/compile/internal/types. Array structure containing the array size.
When we declare an array using […] T, the compiler will derive the size of the array in the cmd/compile/internal/gc.typecheckcomplit function.
|
|
This truncated cmd/compile/internal/gc.typecheckcomplit calls cmd/compile/internal/gc.typecheckarraylit to count the number of elements in the array by traversing the elements.
So we can see that […] T{1, 2, 3} and [3]T{1, 2, 3} are exactly equivalent at runtime, […] T initialization is just a kind of syntactic sugar provided by the Go language to reduce the workload when we don’t want to count the number of elements in the array.
Statement conversion
For an array of literals, depending on the number of array elements, the compiler does two different optimizations in the cmd/compile/internal/gc.anylit function, which is responsible for initializing the literals.
- when the number of elements is less than or equal to 4, the elements of the array will be placed directly on the stack.
- when the number of elements is greater than 4, the elements of the array are placed in the static area and removed at runtime.
When the array has less than or equal to four elements, cmd/compile/internal/gc.fixedlit takes care of converting [3]{1, 2, 3} to a more primitive statement before the function is compiled.
|
|
When the number of elements in the array is less than or equal to four and the kind received by the cmd/compile/internal/gc.fixedlit function is initKindLocalCode, the above code splits the original initialization statement [3]int{1, 2, 3} into an expression declaring a variable and several assignment expressions. These expressions will complete the initialization of the array:
However, if the current array has more than four elements, cmd/compile/internal/gc.anylit will first get a unique staticname and then call the cmd/compile/internal/gc.fixedlit function to initialize the elements of the array in static storage and assign temporary variables to the array :
|
|
Assuming that the code needs to initialize [5]int{1, 2, 3, 4, 5}, then we can interpret the above procedure as the following pseudo-code.
To summarize, without considering escape analysis, if the number of elements in the array is less than or equal to 4, then all variables will be initialized directly on the stack, and if the array has more than 4 elements, the variables will be initialized in static storage and then copied to the stack.
Access and assignment
Whether on the stack or in static storage, arrays are a sequence of memory spaces in memory. We represent arrays by a pointer to the beginning of the array, the number of elements, and the size of the space occupied by the element type. If we do not know the number of elements in the array, an out-of-bounds access may occur; and if we do not know the size of the element types in the array, there is no way to know how many bytes of data should be taken out at a time, and whichever information is missing, we have no way of knowing what data is actually stored in this contiguous memory space: the
Out-of-bounds array access is a very serious error, and can be determined by static type checking during compilation in Go. cmd/compile/internal/gc.typecheck1 verifies that the index of the accessed array.
|
|
- the error “non-integer array index %v” is reported when accessing an array whose index is a non-integer.
- the error “invalid array index %v (index must be non-negative)” is reported when the index of the accessed array is negative.
- the error “invalid array index %v (out of bounds for %d-element array)” is reported when the index of the accessed array is out of bounds.
Some simple out-of-bounds errors for arrays and strings are found during compilation, e.g. accessing an array directly with an integer or constant; however, when accessing an array or string with a variable, the compiler cannot detect the error in advance, and we need the Go runtime to prevent illegal accesses.
Out-of-bounds operations on arrays, slices and strings found by the Go runtime are triggered by runtime.panicIndex and runtime.goPanicIndex, which trigger a runtime error and cause a crash and exit.
When the array access operation OINDEX successfully passes the compiler’s checks, it is converted into several SSA directives. Suppose we have Go code as shown below, compiling it in the following way will result in the ssa.html file.
The SSA code generated in the start phase is the first version of the intermediate code before the optimization, which is shown below for elem := arr[i].
In this intermediate code, we find that Go generates the instruction IsInBounds to determine the upper limit of the array and the PanicBounds instruction to crash the program if the condition is not met.
|
|
When the array subscript is not out of bounds, the compiler will first obtain the memory address of the array and the subscript to be accessed, calculate the address of the target element using PtrIndex, and finally load the element in the pointer into memory using the Load operation.
Of course, only when the compiler is unable to make a judgment on whether the array subscript is out of bounds or not, it will add the PanicBounds instruction to the runtime to make a judgment.
It not only finds simple out-of-bounds errors in advance during compilation and inserts function calls to check the upper limit of the array, but also ensures that no out-of-bounds occur during runtime with the inserted functions.
The array assignment and update operation a[i] = 2 also generates the SSA generation which calculates the memory address of the current element of the array during generation and then modifies the contents of the current memory address, these assignment statements are converted into SSA code as follows.
The assignment process will first determine the address of the target array, and then get the address of the target element through PtrIndex.
Finally, the Store instruction is used to store the data into the address. From the above SSA code, we can see that the above array addressing and assignment are done at the compilation stage without any runtime involvement.
Summary
Arrays are an important data structure in Go, and understanding its implementation can help us understand the language better. Through the analysis of its implementation, we know that accessing and assigning values to arrays depends on both the compiler and the runtime, and most of its operations are converted into direct memory reads and writes during compilation. panicIndex call to prevent out-of-bounds errors from occurring.