Implementation principles of Go language arrays

Arrays and slices are common data structures in the Go language, and many developers new to Go tend to confuse these two concepts. In addition to arrays, Go introduces another concept - slicing. Slicing has some similarities to arrays, but their differences lead to significant differences in usage. In this section we will introduce the underlying implementation of arrays from the Go compile-time runtime, which will include several common operations for initializing, accessing, and assigning values to arrays.

Overview

An array is a data structure consisting of a collection of elements of the same type. The computer allocates a contiguous piece of memory for the array to hold its elements, and we can use the indexes of the elements in the array to quickly access specific elements. Most common arrays are one-dimensional linear arrays, while multidimensional arrays have more common applications in numerical and graphical computing.

As a basic data type, arrays are usually described in two dimensions, namely the type of elements stored in the array and the maximum number of elements that the array can store. In Go language we tend to use the following representation of array types as shown below.

1
2

[10]int
[200]interface{}

Go arrays cannot be changed in size after initialization, and arrays that store elements of the same type but different sizes are considered completely different in Go, and are the same type only if both conditions are the same.

func NewArray(elem *Type, bound int64) *Type {
	if bound < 0 {
		Fatalf("NewArray: invalid bound %v", bound)
	}
	t := New(TARRAY)
	t.Extra = &Array{Elem: elem, Bound: bound}
	t.SetNotInHeap(elem.NotInHeap())
	return t
}

The array type during compilation is generated by the above cmd/compile/internal/types.NewArray function, which contains two fields, the element type Elem and the size of the array Bound, which together form the array type, and whether the current array should be initialized on the stack is determined at compile time.

Initialization

Arrays in Go are created in two different ways, either by explicitly specifying the size of the array, or by using […] T to declare the array, and Go will derive the size of the array from the source code during compilation.

1
2

arr1 := [3]int{1, 2, 3}
arr2 := [...]int{1, 2, 3}

The above two declaration methods get exactly the same result during runtime. The latter declaration method is converted to the former one during compilation, which is the compiler’s derivation of the array size, and we will introduce the compiler’s derivation process below.

Upper bound derivation

The two different declarations lead to completely different treatment by the compiler. If we use the first way [10]T, then the type of the variable is extracted as soon as the compilation proceeds to the type checking stage, and the cmd/compile/internal/types.NewArray containing the array size is subsequently created using cmd/compile/internal/types. Array structure containing the array size.

When we declare an array using […] T, the compiler will derive the size of the array in the cmd/compile/internal/gc.typecheckcomplit function.

func typecheckcomplit(n *Node) (res *Node) {
	...
	if n.Right.Op == OTARRAY && n.Right.Left != nil && n.Right.Left.Op == ODDD {
		n.Right.Right = typecheck(n.Right.Right, ctxType)
		if n.Right.Right.Type == nil {
			n.Type = nil
			return n
		}
		elemType := n.Right.Right.Type

		length := typecheckarraylit(elemType, -1, n.List.Slice(), "array literal")

		n.Op = OARRAYLIT
		n.Type = types.NewArray(elemType, length)
		n.Right = nil
		return n
	}
	...

	switch t.Etype {
	case TARRAY:
		typecheckarraylit(t.Elem(), t.NumElem(), n.List.Slice(), "array literal")
		n.Op = OARRAYLIT
		n.Right = nil
	}
}

This truncated cmd/compile/internal/gc.typecheckcomplit calls cmd/compile/internal/gc.typecheckarraylit to count the number of elements in the array by traversing the elements.

So we can see that […] T{1, 2, 3} and [3]T{1, 2, 3} are exactly equivalent at runtime, […] T initialization is just a kind of syntactic sugar provided by the Go language to reduce the workload when we don’t want to count the number of elements in the array.

Statement conversion

For an array of literals, depending on the number of array elements, the compiler does two different optimizations in the cmd/compile/internal/gc.anylit function, which is responsible for initializing the literals.

when the number of elements is less than or equal to 4, the elements of the array will be placed directly on the stack.
when the number of elements is greater than 4, the elements of the array are placed in the static area and removed at runtime.

func anylit(n *Node, var_ *Node, init *Nodes) {
	t := n.Type
	switch n.Op {
	case OSTRUCTLIT, OARRAYLIT:
		if n.List.Len() > 4 {
			...
		}

		fixedlit(inInitFunction, initKindLocalCode, n, var_, init)
	...
	}
}

When the array has less than or equal to four elements, cmd/compile/internal/gc.fixedlit takes care of converting [3]{1, 2, 3} to a more primitive statement before the function is compiled.

func fixedlit(ctxt initContext, kind initKind, n *Node, var_ *Node, init *Nodes) {
	var splitnode func(*Node) (a *Node, value *Node)
	...

	for _, r := range n.List.Slice() {
		a, value := splitnode(r)
		a = nod(OAS, a, value)
		a = typecheck(a, ctxStmt)
		switch kind {
		case initKindStatic:
			genAsStatic(a)
		case initKindLocalCode:
			a = orderStmtInPlace(a, map[string][]*Node{})
			a = walkstmt(a)
			init.Append(a)
		}
	}
}

When the number of elements in the array is less than or equal to four and the kind received by the cmd/compile/internal/gc.fixedlit function is initKindLocalCode, the above code splits the original initialization statement [3]int{1, 2, 3} into an expression declaring a variable and several assignment expressions. These expressions will complete the initialization of the array:

var arr [3]int
arr[0] = 1
arr[1] = 2
arr[2] = 3

However, if the current array has more than four elements, cmd/compile/internal/gc.anylit will first get a unique staticname and then call the cmd/compile/internal/gc.fixedlit function to initialize the elements of the array in static storage and assign temporary variables to the array :

func anylit(n *Node, var_ *Node, init *Nodes) {
	t := n.Type
	switch n.Op {
	case OSTRUCTLIT, OARRAYLIT:
		if n.List.Len() > 4 {
			vstat := staticname(t)
			vstat.Name.SetReadonly(true)

			fixedlit(inNonInitFunction, initKindStatic, n, vstat, init)

			a := nod(OAS, var_, vstat)
			a = typecheck(a, ctxStmt)
			a = walkexpr(a, init)
			init.Append(a)
			break
		}

		...
	}
}

Assuming that the code needs to initialize [5]int{1, 2, 3, 4, 5}, then we can interpret the above procedure as the following pseudo-code.

var arr [5]int
statictmp_0[0] = 1
statictmp_0[1] = 2
statictmp_0[2] = 3
statictmp_0[3] = 4
statictmp_0[4] = 5
arr = statictmp_0

To summarize, without considering escape analysis, if the number of elements in the array is less than or equal to 4, then all variables will be initialized directly on the stack, and if the array has more than 4 elements, the variables will be initialized in static storage and then copied to the stack.

Access and assignment

Whether on the stack or in static storage, arrays are a sequence of memory spaces in memory. We represent arrays by a pointer to the beginning of the array, the number of elements, and the size of the space occupied by the element type. If we do not know the number of elements in the array, an out-of-bounds access may occur; and if we do not know the size of the element types in the array, there is no way to know how many bytes of data should be taken out at a time, and whichever information is missing, we have no way of knowing what data is actually stored in this contiguous memory space: the

Out-of-bounds array access is a very serious error, and can be determined by static type checking during compilation in Go. cmd/compile/internal/gc.typecheck1 verifies that the index of the accessed array.

func typecheck1(n *Node, top int) (res *Node) {
	switch n.Op {
	case OINDEX:
		ok |= ctxExpr
		l := n.Left  // array
		r := n.Right // index
		switch n.Left.Type.Etype {
		case TSTRING, TARRAY, TSLICE:
			...
			if n.Right.Type != nil && !n.Right.Type.IsInteger() {
				yyerror("non-integer array index %v", n.Right)
				break
			}
			if !n.Bounded() && Isconst(n.Right, CTINT) {
				x := n.Right.Int64()
				if x < 0 {
					yyerror("invalid array index %v (index must be non-negative)", n.Right)
				} else if n.Left.Type.IsArray() && x >= n.Left.Type.NumElem() {
					yyerror("invalid array index %v (out of bounds for %d-element array)", n.Right, n.Left.Type.NumElem())
				}
			}
		}
	...
	}
}

the error “non-integer array index %v” is reported when accessing an array whose index is a non-integer.
the error “invalid array index %v (index must be non-negative)” is reported when the index of the accessed array is negative.
the error “invalid array index %v (out of bounds for %d-element array)” is reported when the index of the accessed array is out of bounds.

Some simple out-of-bounds errors for arrays and strings are found during compilation, e.g. accessing an array directly with an integer or constant; however, when accessing an array or string with a variable, the compiler cannot detect the error in advance, and we need the Go runtime to prevent illegal accesses.

1
2

arr[4]: invalid array index 4 (out of bounds for 3-element array)
arr[i]: panic: runtime error: index out of range [4] with length 3

Out-of-bounds operations on arrays, slices and strings found by the Go runtime are triggered by runtime.panicIndex and runtime.goPanicIndex, which trigger a runtime error and cause a crash and exit.

TEXT runtime·panicIndex(SB),NOSPLIT,$0-8
	MOVL	AX, x+0(FP)
	MOVL	CX, y+4(FP)
	JMP	runtime·goPanicIndex(SB)

func goPanicIndex(x int, y int) {
	panicCheck1(getcallerpc(), "index out of range")
	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsIndex})
}

When the array access operation OINDEX successfully passes the compiler’s checks, it is converted into several SSA directives. Suppose we have Go code as shown below, compiling it in the following way will result in the ssa.html file.

package check

func outOfRange() int {
	arr := [3]int{1, 2, 3}
	i := 4
	elem := arr[i]
	return elem
}

$ GOSSAFUNC=outOfRange go build array.go
dumped SSA to ./ssa.html

The SSA code generated in the start phase is the first version of the intermediate code before the optimization, which is shown below for elem := arr[i].

In this intermediate code, we find that Go generates the instruction IsInBounds to determine the upper limit of the array and the PanicBounds instruction to crash the program if the condition is not met.

b1:
    ...
    v22 (6) = LocalAddr <*[3]int> {arr} v2 v20
    v23 (6) = IsInBounds <bool> v21 v11
If v23 → b2 b3 (likely) (6)

b2: ← b1-
    v26 (6) = PtrIndex <*int> v22 v21
    v27 (6) = Copy <mem> v20
    v28 (6) = Load <int> v26 v27 (elem[int])
    ...
Ret v30 (+7)

b3: ← b1-
    v24 (6) = Copy <mem> v20
    v25 (6) = PanicBounds <mem> [0] v21 v11 v24
Exit v25 (6)

When the array subscript is not out of bounds, the compiler will first obtain the memory address of the array and the subscript to be accessed, calculate the address of the target element using PtrIndex, and finally load the element in the pointer into memory using the Load operation.

Of course, only when the compiler is unable to make a judgment on whether the array subscript is out of bounds or not, it will add the PanicBounds instruction to the runtime to make a judgment.

b1:
    ...
    v21 (5) = LocalAddr <*[3]int> {arr} v2 v20
    v22 (5) = PtrIndex <*int> v21 v14
    v23 (5) = Load <int> v22 v20 (elem[int])
    ...

It not only finds simple out-of-bounds errors in advance during compilation and inserts function calls to check the upper limit of the array, but also ensures that no out-of-bounds occur during runtime with the inserted functions.

The array assignment and update operation a[i] = 2 also generates the SSA generation which calculates the memory address of the current element of the array during generation and then modifies the contents of the current memory address, these assignment statements are converted into SSA code as follows.

b1:
    ...
    v21 (5) = LocalAddr <*[3]int> {arr} v2 v19
    v22 (5) = PtrIndex <*int> v21 v13
    v23 (5) = Store <mem> {int} v22 v20 v19
    ...

The assignment process will first determine the address of the target array, and then get the address of the target element through PtrIndex.

Finally, the Store instruction is used to store the data into the address. From the above SSA code, we can see that the above array addressing and assignment are done at the compilation stage without any runtime involvement.

Summary

Arrays are an important data structure in Go, and understanding its implementation can help us understand the language better. Through the analysis of its implementation, we know that accessing and assigning values to arrays depends on both the compiler and the runtime, and most of its operations are converted into direct memory reads and writes during compilation. panicIndex call to prevent out-of-bounds errors from occurring.

Table of Contents