Overview
I’ve been learning C through Redis recently, and I have to say that Redis code is really neatly written. This article is a comprehensive and in-depth explanation of the source code implementation of the Redis data structure, so I hope you can learn something from it.
The source code for Redis strings is placed in two files, sds.c
and sds.h
. The implementation has been stripped out into a separate library: https://github.com/antirez/sds.
The dynamic string structure of Redis is shown in the following figure.
SDS consists of two parts: header and data segment. header also contains three fields: len, alloc, and flags. len indicates the length of data. alloc indicates the length of allocated memory, and flags indicates the data type of sds.
In previous versions, the header of sds actually occupied a fixed size of 8 bytes of memory, so if all the strings stored in redis were small, the header of sds would take up a lot of memory space.
However, as the version of sds has changed, there have been some optimizations in memory usage.
- before sds 2.0 the header size was fixed int type, after 2.0 the header len and alloc types are adjusted according to the incoming character size to save memory.
- The header structure is modified with
__attribute__
to prevent the compiler from automatically aligning the memory, which reduces the amount of memory used by the compiler due to the number of padding caused by memory alignment;
There are five types of sds header defined in the current version, of which sdshdr5 is useless, so it is not drawn.
Source code analysis
Creation of sds
The sds is created with the following functions.
- sdsnewlen passes in a string of C to create a string of sds and needs to pass in the size; * sdsnewlen passes in a string of C to create a string of sds and needs to pass in the size.
- sdsnew passes in a string of C to create a string of sds, which actually calls
strlen(init)
inside to calculate the length and then calls sdsnewlen. - sdsempty will create an empty string “”.
- sdsdup calls sdsnewlen, which copies an already existing string.
So you can tell by the creation above that eventually sdsnewlen will be called to create the string, so let’s see exactly how this function is implemented.
sdsnewlen
In fact, for the creation of a string object, there are actually three things done: requesting memory, constructing the structure, and copying the string into the string memory area. Let’s look at the specific implementation of Redis.
Request Memory
|
|
Before the memory request, there are several things that need to be done.
- since sds will design the type of header based on the size passed in, the sdsReqType function needs to be called to get the header type based on initlen.
- then call sdsHdrSize to get the number of bytes occupied by the header according to the header type.
- finally call s_malloc to allocate memory according to header length and string length, here we need to add 1 because in c, the string ends with
\0
, in order to be compatible with c’s string format.
Now that we are talking about header type, let’s look at the type definition of header.
|
|
Here __attribute__
is used to prevent memory alignment, which would otherwise also waste some storage space.
After reading the above definition, it is natural to assume that Redis will determine the type of sds header generated based on the size passed in.
|
|
You can see that sdsReqType determines the character type based on the length of the string passed in.
Constructing Structs
For those who have never used C, Redis is a bit of a hack in the way it constructs structures, first by requesting a block of memory based on the size of memory needed, then by initializing the location of the pointer to the header structure, and finally by setting the value of the pointer to the header structure.
|
|
I have marked the above process clearly, it may be difficult to understand the process of constructing the header structure here by looking at the code directly, I will use a diagram below to show the location of the pointer.
String Copy
The memcpy function will copy the string byte by byte into the memory area corresponding to s.
sdscatlen string append
|
|
In this string appending method, the space check and expansion are actually encapsulated in the sdsMakeRoomFor function, which has the following logic.
- if there is space left, return it directly if there is space left.
- if there is no space left, then it needs to be expanded and by how much?
- whether the string can be appended to the original location, or whether a new memory area needs to be requested.
So I will divide the sdsMakeRoomFor function into two parts: expansion and memory request.
expansion
|
|
For scaling, the first step is to see if there is enough space, which is determined by alloc-len
, and return directly if there is space left.
Then Redis will expand the capacity depending on the size of the sds. If len+addlen
is less than 1m, then the new space is doubled; if len+addlen
is greater than 1m, then the new space is only increased by 1m.
memory request
|
|
In terms of memory requests, Redis has two cases: one is that the sds header type has not changed, so you can just call realloc to append a new memory area behind the original memory.
The other case is when the sds header type has changed, which generally means that the header is taking up more space, because realloc cannot append the memory area forward, so you can only call malloc to reapply a memory area and then copy the string to the new address via memcpy.
Summary
In this article, we learned a lot about how Redis implements strings, and what changes have been made through versioning. You can compare sds with your own familiar language’s string implementation to see what the differences are and which is better.