Map In Golang

Introduction

This article is mainly to learn and understand the characteristics of map by exploring the data structure and source code implementation of map in golang, containing a total of map’s model exploration, access, expansion, etc..

Map’s underlying memory model

The underlying struct that represents map in golang’s source code is hmap, which is short for hashmap.

type hmap struct {

   // The number of elements stored in the map, which is returned directly when len(map) is called in golang
   count     int
   // The status marker bit, which can be used to determine whether or not the current status is in this state by performing & operations with the defined enumeration value
   flags     uint8
   B         uint8  // 2^B denotes the number of buckets, B denotes the number of bits after the hash to group the buckets
   noverflow uint16 // Approximate number of overflow buckets
   hash0     uint32 // hash seed is generally a prime number

   buckets    unsafe.Pointer // There are 2^B buckets, but if no elements are deposited, this field may be nil
   oldbuckets unsafe.Pointer // During the expansion period, the old bucket array is placed here, and the new buckets will be twice as big as this one
   nevacuate  uintptr        // A pointer to a bucket with an address less than the current pointer has already been migrated.

   extra *mapextra // optional fields
}

B is the logarithm of the length of the buckets array, i.e., the length of the bucket array is 2^B. A bucket is essentially a pointer to a piece of memory space that points to the struct shown below.

// A bucket for a Go map.
type bmap struct {
   tophash [bucketCnt]uint8
}

But this is only the surface (src/runtime/hashmap.go) structure, which is spiced up during compilation to create a new structure dynamically.

type bmap struct {
    topbits  [8]uint8
    keys     [8]keytype
    values   [8]valuetype
    pad      uintptr        // Memory aligned usage, may not be required
    overflow uintptr        // When the 8 keys of the bucket are full
}

bmap is the underlying data structure that we often refer to as a “bucket”. A bucket can hold up to 8 keys/values, and the map uses the hash function to get the hash value to determine which bucket to assign it to, and then finds the location of the bucket based on the higher 8 bits of the hash value. The composition of the map is shown below.

The structure of map

Map’s put and get

Both put and get in a map are essentially doing one thing, and that is:

look up the current location where k/v should be stored.
get/put, so we understand how to locate the location of the key in the map we understand get and put.

Underlying code

func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
    // If the map is empty, or the number of elements is 0, return not found
    if h == nil || h.count == 0 {
        return unsafe.Pointer(&zeroVal[0]), false
    }
    // Does not support concurrent reads and writes
    if h.flags&hashWriting != 0 {
        throw("concurrent map read and map write")
    }
    //  According to the hash function to calculate the hash value, note that different types of key may use different hash functions
    hash := t.hasher(key, uintptr(h.hash0))
    // If B = 5, then the result is 11111 in binary, which returns the value of all 1's in the B bit
    m := bucketMask(h.B)
    // Locate the position in the bucket array based on the last B bits of the hash
    b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
    // When h.oldbuckets is not empty, the map has been expanded
    // At this point, the new buckets may not have the old ones in them yet
    // So you have to look in the old ones, otherwise the "disappearing" weirdness may happen
    if c := h.oldbuckets; c != nil {
        if !h.sameSizeGrow() {
            // means only half of the bucket before, need to divide by 2
            m >>= 1
        }
        oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
        if !evacuated(oldb) {
            b = oldb
        }
    }
    // tophash takes the value of the higher 8 bits
    top := tophash(hash)
    // After a bucket is full of 8 elements, it won't fit anymore, so a new bucket will be created and hung on the overflow pointer member of the original bucket
    // Iterate through all chained buckets of the current bucket
    for ; b != nil; b = b.overflow(t) {
        // Query on the 8 positions of the bucket
        for i := uintptr(0); i < bucketCnt; i++ {
            // If an equal tophash is found, that means it's the bucket
            if b.tophash[i] != top {
                continue
            }
            // Locate the location of the key according to the memory structure
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey {
                k = *((*unsafe.Pointer)(k))
            }
            // Verify that the key found matches
            if t.key.equal(key, k) {
                // Locate the position of v
                v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
                if t.indirectvalue {
                    v = *((*unsafe.Pointer)(v))
                }
                return v, true
            }
        }
    }

    // All buckets are not found, return zero and false
    return unsafe.Pointer(&zeroVal[0]), false
}

Addressing process

Map expansion

Like slice, map in golang first requests a small amount of memory space at initialization, and then dynamically expands it as the map is deposited. There are two types of expansion, incremental expansion and equivalent expansion (rearranging and reallocating memory). Let’s understand how the expansion is triggered.

the load factor exceeds the threshold value, the threshold value defined in the source code is 6.5.(trigger incremental expansion)
Too many buckets in overflow: when B is less than 15, that is, the total number of buckets 2^B is less than 2^15, if the number of buckets in overflow exceeds 2^B; when B >= 15, that is, the total number of buckets 2^B is greater than or equal to 2^15, if the number of buckets in overflow exceeds 2^15. 15.(trigger equivalent expansion)

The first case

incremental expansion

The second case

equivalent expansion

Orderliness of Map

First of all, the map in golang is unordered, to be precise, it is not strictly guaranteed to be ordered, from the source code above we know that the map in golang, after expansion, may move part of the key to the new memory, because in the process of expansion and moving data, the original data location is not recorded, and in the golang data structure does not save the order of the data, so this part is actually unordered after expansion.

The process of traversal is actually traversing the memory addresses sequentially, and traversing the keys in the memory addresses sequentially, but this is already out of order. But if I just have a map, and I make sure I don’t modify or delete the map, then it is reasonable to assume that no changes will occur without expansion. But because of this, GO is in the source code. But there is an interesting phenomenon that even if the map is not expanded by insertion and deletion, it is still unordered during the traversal.


objMap := make(map[string]int)
for i := 0; i < 5; i++ {
   objMap[strconv.Itoa(i)] = i
}
for i := 0 ; i < 5; i ++ {
   var valStr1, valStr2 string
   for k, v := range objMap {
      fmt.Println(k)
      fmt.Println(v)
      valStr1 += k
   }
   for k, v := range objMap {
      fmt.Println(k)
      fmt.Println(v)
      valStr2 += k
   }
   fmt.Println(valStr1 == valStr2)
   if valStr1 != valStr2 {
      fmt.Println("not equal")
   }
}
fmt.Println("end")

The result of the above code runs as follows.

result

It is easy to see that even if the map is not expanded, it is still unordered when it is traversed multiple times. This is because golang is officially designed to add random elements to randomize the order of traversing the map to prevent users from traversing it sequentially.

Relying on the map’s order for traversal is risky code and is strongly discouraged by GO’s strict syntax rules. So we must remember that maps are unordered when we use them, and not rely on their order.

Concurrency of Map

First of all, we all know that map is not a concurrency-safe data structure in golang, and when several goruotines read and write to a map at the same time, there is a concurrent write problem: fatal error: concurrent map writes. but why is map not concurrency-safe? cost and benefit.

The official answer is as follows.

Typical usage scenario: The typical usage scenario for map is that it does not require secure access from multiple goroutines.
Atypical scenarios (requiring atomic operations): map may be part of some larger data structure or an already synchronized computation.

Performance Scenario Considerations: Adding security for just a few programs that would cause all operations of map to handle mutexes would degrade the performance of most programs. Meanwhile, golang provides a concurrency-safe sync map.

    // Concurrent reads and writes are not supported
    if h.flags&hashWriting != 0 {
        throw("concurrent map read and map write")
    }

But we have questions again, why golang map concurrency conflict does not throw an error out, or panic off, but to let the program panic, choose to let the program crash crashed out. Here is the golang official considerations for the tradeoff between risk and map complexity scenarios, first of all, map in the official said clearly does not support concurrent read and write, so concurrent read and write operations on the map itself is incorrect.

Scenario 1: If map chooses to add an error return value when writing or reading, it will cause the program to not be able to use map as it does now. Additional catching and err determination is required.

Scenario 2: If the map selects panic (recoverable), then if there is a concurrent data writing scenario, it will lead into the recover, and if there is no special treatment for this scenario, it will lead to dirty data in the map, and then the program will cause unpredictable errors when using the map. In this case, it is difficult to find the root cause of the problem.

Therefore, after considering these scenarios, golang chooses to explicitly throw crash exceptions to expose the risk in advance. The problem can be clearly located. In summary, when using map, we have to strictly ensure that it is used in a single thread, if there are multi-threaded scenarios, we recommend using sync map.

Table of Contents