A long time ago, I came across an article by Brad Fitzpatrick called netaddr.IP: a new IP address type for Go. Brad is the core developer of the Go language and founder of tailscale, and in this article he analyzes the problems with the Go language net.IP type and their solution to them and how it evolved. Eventually Brad and his team open sourced the inet.af/netaddr package. I took a few glances at it and was impressed. Today I received an email subscription saying that Go 1.18 had accepted Brad’s proposal to introduce a new package net/netip. So I quickly found Brad’s article and read it carefully. IP type and the ingeniousness of the new scheme, but also have a better understanding of memory allocation, garbage collection and the use of unsafe packages in Go language. Today, I’d like to share it with you.
What’s wrong with the net.IP type of Go?
Brad lists the “seven problems” with net.IP in his article.
- The contents are mutable. The underlying type of
net.IP
is[]byte
, which means thatnet.IP
is passed by reference, and any function that handles it can change its contents. - Cannot be compared directly. Because slice cannot be compared directly,
net.IP
cannot be used directly to determine if two addresses are equal using==
, nor can it be used as the key of a map. - There are two types of addresses in the standard library,
net.IP
andnet.IPAddr
. Common IPv4 and IPv6 addresses are stored usingnet.IP
. IPv6 link-local addresses need to be stored usingnet.IPAddr
(because of the additional storage of the link’s NIC). Since there are two types of addresses, it’s a matter of determining which one to use or both. - Takes up a lot of memory. A single slice header message takes 24 bytes (64-bit platforms, see Russ’s article for details). So the memory footprint of
net.IP
contains 24 bytes of header information and 4 bytes (IPv4) or 6 bytes (IPv6) of address data. If the local link NIC (zone) needs to be stored, thennet.IPAddr
also needs a 16-byte string header and the specific NIC name. - Memory needs to be allocated from the heap. Each time memory is allocated from the heap, it puts extra pressure on the GC.
- Cannot distinguish between IPv4 addresses and IPv4-mapped IPv6 addresses (in the form of ::ffff:192.168.1.1) when parsing IP addresses from strings.
- Expose implementation details to the outside world. The definition of
net.IP
istype IP []byte
, and the underlying[]byte
is part of the API and cannot be changed.
So what would the ideal IP type look like?
Brad has summarized a table.
Features | Go’s net.IP |
---|---|
Immutable | ❌, slice |
Comparable | ❌, slice |
Small | ❌, 28-56 bytes |
No need to allocate memory from the heap | ❌, slice’s underlying array |
support IPv4 and IPv6 | ✅ |
Distinguish between IPv4/IPv6 | ❌, #37921 |
support for IPv6 zones | ❌, using a specialized net.IPAddr type |
Hide implementation details from the outside world | ❌, expose the underlying type []byte |
interoperable with standard libraries | ✅ |
What follows is a series of improvement options.
Option 1: wgcfg.IP
David Crawshaw Submitted code in April 2019 89476f8cb5
, which introduces the following wgcfg.IP
Type.
Not perfect, but solves some problems, see the following table.
featured | net.IP |
wgcfg.IP |
---|---|---|
Immutable | ❌, slice | ✅ |
Comparable | ❌, slice | ✅ |
Small size | ❌, 28-56 bytes | ✅, 16 bytes |
No need to allocate memory from the heap | ❌ | ✅ |
Supports IPv4 and IPv6 | ✅ | ✅ |
Differentiate between IPv4/IPv6 | ❌ | ❌ |
support for IPv6 regions (zones) | ❌ | ❌ |
Hide implementation details externally | ❌ | ❌ |
Interoperable with standard libraries | ✅ | ❌, requires adaptation |
This solution takes up only 16 bytes and is very compact. The implementation details can be hidden from the public by simply changing Addr to addr. However, David’s solution still does not distinguish between IPv4 and IPv4-maped IPv6 addresses, and does not support saving zone information.
So there’s a second option.
Option 2: netaddr.IP with embedded interface variables
In Go, interface variables can also be compared to each other (either using ==
comparisons or as the key of a map). So Brad implemented version 1 of the netaddr.IP
scheme.
This time, an interface variable is embedded in IP
. On 64-bit platforms, an interface takes up 16 bytes, so the IP
type here also takes up 16 bytes. This is better than the standard library where net.IP
takes up 24 bytes plus the address content. Because the compass is stored, additional memory needs to be allocated for v4Addr/v6Addr/v6AddrZone
. However, IPv6 support is solved this time.
Features | net.IP |
wgcfg.IP |
Program 2 |
---|---|---|---|
Immutable | ❌, slice | ✅ | ✅ |
Comparable | ❌, slice | ✅ | ✅ |
Small | ❌, 28-56 bytes | ✅, 16 bytes | 🤷, 20-32 bytes |
No need to allocate memory from the heap | ❌ | ✅ | ❌ |
Supports IPv4 and IPv6 | ✅ | ✅ | ✅ |
Distinguish between IPv4/IPv6 | ❌ | ❌ | ✅ |
Support for IPv6 regions (zones) | ❌ | ❌ | ✅ |
Hide implementation details from the outside world | ❌ | ❌ | ✅ |
Interoperable with standard libraries | ✅ | ❌ | ❌ |
Compared to wgcfg.IP
, only the memory allocation problem is left unresolved. Keep carrying the front!
Option 3: 24-byte representation without heap memory allocation
The slice header of ,net.IP
is 24 bytes long. The length of time.Time
is also 24 bytes. So Brad thinks it’s best to keep the new address type to no more than 24 bytes.
The IPv6 address itself already takes up 16 bytes, which leaves 8 bytes to hold the following information.
- Address type (v4, v6, null). At least two bits are needed.
- IPv6 zone information (aka NIC name)
The interface scheme is out because a pointer takes up 16 bytes, which is too big. The string header information also takes 16 bytes and is out.
Brad came up with this solution.
Then find a way to save the address type and zone information in the zoneAndFamily
field. The question is how to store it?
If you use one or two bits to save the address type, that leaves 62 or 63 bits. The following options can be used.
- Use the remaining 62 bits to save ASCII characters, which supports up to 8 characters. Too short.
- Number the NIC and save only the numeric number. But this only saves the local NIC.
- Use the NIC name mapping table to create a name-to-number index. Go Standard Library does this internally like this. However, this may be vulnerable to external attacks, as this mapping table only increases and does not decrease. the Go standard library only keeps the local NIC, so it does not have this problem.
Brad was not satisfied with any of these options and came up with the pointer option.
For now, let’s assume that it works regardless of the type of T. Only three sentinel variables need to be declared to identify the address type.
The next step is to consider how to save the zone information to achieve the following effect.
Simply new two identical strings will return different pointers. But Brad wanted to find a way to always return the same pointer for strings that have the same value. This way you can compare two strings by pointer to see if they are equal.
So a map is needed to hold all the strings. So what’s the difference between this and the previous indexed table? The biggest difference is that if you use a zone index (integer) as the key, the corresponding map has no way to clean up and will get bigger and bigger. If you use pointers, you can use runtime.SetFinalizer
to clean up the index table during garbage collection. Eventually they got the go4.org/intern
package, whose core logic is as follows.
|
|
There are two subtleties in the above code.
The first is that it disables Value comparisons; the Go language supports comparing structs, but only if the first member of the struct supports comparisons. Here, we can disable Value structs from comparing with each other by embedding a _ [0]func()
member that does not support comparisons. See this article for a detailed analysis.
The other is the garbage collection-enabled object pool valMap = map[key]uintptr{}
. valMap stores the uintptr pointer of *Value
, which is a so-called weak reference and does not affect Go’s garbage collection. That is, although valMap “references” an object via uintptr, it will still be reclaimed by GC if it is not referenced by normal Go code. It’s just that all *Value
s are associated with a finalize function, and Go will execute the finalize function before executing a GC, and the recycling process will be delayed until the next GC cycle. This way, the *Value
object will not be GC’d as long as it is referenced elsewhere, and if all references are released, a GC will be triggered, where resurrected will be set to false, and memory will actually be reclaimed by the next GC cycle. The full working process can be found in this article.
With the intern package, it is possible to achieve the following.
|
|
So IP
can be expressed as follows.
The accessors to get/set the zone are then:
|
|
The final result is as follows.
feature | net.IP |
netaddr.IP |
---|---|---|
Immutable | ❌, slice | ✅ |
Comparable | ❌, slice | ✅ |
Small size | ❌, 28-56 bytes | ✅, 24 bytes, fixed |
No need to allocate memory from the heap | ❌ | ✅ |
Supports IPv4 and IPv6 | ✅ | ✅ |
Distinguish between IPv4/IPv6 | ❌ | ✅ |
Support for IPv6 regions (zones) | ❌ | ✅ |
Hide implementation details externally | ❌ | ✅ |
Ability to interoperate with standard libraries | ✅ | 🤷 |
Option 4: uint64s acceleration
The new scheme does not expose the underlying details, and we can easily modify the internal implementation. So Dave Anderson took [16]byte
optimized and made it a pair of uint64
.
|
|
Option 5: uint128 type
Finally, Brad replaces the uint64
pair in 318330f177
with a custom uint128
type.
But the Go compiler has problems with allocating memory, so Brad again in bf0e22f9f3
modified the definition of uint128
in
The above is the full content of the article. The new net/netip package will follow the Go 1.18 release, look forward to it 😚 .