In everyday development StringBuilder must be used by everyone, even a lot. After all, we all know the unwritten norm that StringBuilder’s performance is higher than direct string splicing when a large number of strings need to be constructed at high frequencies, because using +
or +=
directly will generate a new String
instance, because String objects are immutable objects
, which means that each time the string This means that each time the content of a string is manipulated, a new instance of the string is created, which is very unfriendly to the large number of scenarios where string splicing is performed. Hence StringBuilder
was born. Note that this does not mean that StringBuilder can be used for all string stitching scenarios, but rather that it is used frequently
for stitching the same string object. Today we will look at the clever implementation of StringBuilder in c# and experience the way the underlying class library solves the problem.
Note that immutability here means that the content of the string object itself is immutable, but the reference to the string variable can be changed.
Simple example
Let’s take a simple example of the operation, in fact, the core operation is mainly Append method
and ToString method
, and the constructor of StringBuilder from the point of view of the source code. The first is the most commonly used way to get the results of various Append directly and then finally.
|
|
StringBuilder also supports initializing some data through the constructor, and whether or not the initialization data is passed in the constructor means different initialization logic. For example, the following operations
Because StringBuilder is a basic class library, it is simple to look at, simple to use, and everyone uses these operations all the time.
Source Code Exploration
Above we simply demonstrate the use of the StringBuilder, general similar StringBuilder or List, although I did not use the process can not pay attention to the length of the container itself has been to add elements, in fact, the internal logic of the implementation of these containers themselves contain some expansion-related logic. We mentioned above that the core of StringBuilder is mainly three operations, that is, through these three functions can present the way StringBuilder works and the principle.
- One is the
Constructor
, because the constructor contains some logic for initialization. - Next is the
Append
method, which is the core operation of StringBuilder for string splicing. - And finally, the
ToString
method, which converts the StringBuilder into a string, which is the operation we use to get the stitched string.
Let’s start with these three related methods to see the core implementation of StringBuilder, here I refer to the .net version v6.0.2
.
Start with the construction
As we mentioned above, the constructor of StringBuilder represents the initialization logic, which is roughly the default constructor, i.e. the default initialization logic and a part of the custom constructor logic, mainly the logic that determines the length of the StringBuilder container that can hold the string.
Parameterless constructors
First, let’s take a look at the implementation of the default unparametric constructor [Click to view source code 👈]
|
|
With the default parameterless constructor, we can learn two things
- The first is that the core StringBuilder container for storing strings is a
char[]
array. - The default container for
char[]
character arrays is declared to be 16, i.e. the expansion mechanism is triggered if the first StringBuilder holds more than 16 characters.
Constructors with parameters
StringBuilder has several constructors with parameters, as follows
|
|
Although there are many constructors, but most of them are calling their own overloaded methods, the core constructor with parameters is actually two, let’s look at them separately, the first is to specify the capacity of the initialization constructor [Click to view source code 👈]
|
|
The main thing is to judge and assign values to the maximum and initialized capacities, and if the initial and maximum capacities are formulated then the passed-in ones are the main ones. Next, let’s look at the main operation to initialize StringBuilder according to the specified string [Click to view source code 👈]
|
|
This initialization operation mainly intercepts the specified length of the given string and stores it in ChunkChars for initializing StringBuilder, where the initialized capacity depends on whether the length that can be intercepted is greater than the specified capacity, and the essence is to be able to store the intercepted length of the string.
Summary of Constructors
Through the logic of StringBuilder’s constructor we can see that StringBuilder is essentially stored in char[]
, the initialized length of this character array is 16, the main role of this length is the expansion mechanism, that is, the first time you need to expand the capacity is when the length of m_ChunkChars exceeds 16, at this time the original m_ChunkChars can no longer carry the string to be built when the expansion is triggered.
Core Methods
We have seen the initialization code of StringBuilder above, through the initialization operation, we can understand the data structure of StringBuilder itself, but if we want to understand the expansion mechanism of StringBuilder, we need to start from its Append method
, because only when Append has the opportunity to judge the original The length of the m_ChunkChars array is sufficient to store the appended string. There are many overloads for the Append method of StringBuilder, so we won’t list them all here, but the essence is the same. So let’s choose the most familiar and commonly used Append(string? value)
method to explain, directly find the source code location [Click to view source code 👈]
|
|
This part of the logic mainly shows the logic when the expansion condition is not reached, which is essentially appending the Append string to the m_ChunkChars
array, where m_ChunkLength
represents the length of the current m_ChunkChars
already used, and another meaning is that it represents the starting position of the next Append element is stored to the starting position of m_ChunkLength
. And the logic needed for expansion goes to the AppendHelper
method, let’s look at the implementation of the AppendHelper method [Click to view source 👈]
Here we get the value pointer passed in and call another overloaded Append method, but we can get a message from this code that this operation is non-thread safe. We continue to find another Append method [click to view source code 👈]
|
|
The source code here deals with the length of a StringBuilder. Length represents the actual length of characters stored in the current StringBuilder object, which is defined as follows
The Append method in the source code above is actually another overloaded method, only Append(string? value)
calls this logic, here you can clearly see that if the current storage block meets the storage, it is directly used. If the current storage location does not meet the storage, then the storage space will not be wasted, according to the available storage length of the current storage block to intercept the length of the string that needs to Append, into the remaining location of this storage block, the remaining characters that can not be stored will be stored in the expanded new storage block m_ChunkChars
, this practice is to not waste storage space.
This is very well thought out, even if the expansion happens, then the storage block of my current node must be filled, ensuring the maximum use of storage space.
Through the Append source code above we can naturally see that the logic of expansion is naturally in the ExpandByABlock
method [Click to view source code 👈]
|
|
This code is the core operation of expansion, through this we can clearly understand the storage nature of StringBuilder
- First of all, the data of StringBuilder is stored in
m_ChunkChars array
, but the nature of expansion is aone way chain
operation, StringBuilder itself containsm_ChunkPrevious
which points to the data saved in the last expansion. - The actual length of the expansion is
max(the remaining length of the current appended characters, min(the current StringBuilder length, 8000))
, so we can know that the maximum size of a block of m_ChunkChars is8000
.
StringBuilder also contains a method to build an instance through StringBuilder. This constructor is used to build one way chains
when expanding the capacity, and its implementation is also very simple
The purpose is to pass the various data related to the storage before the expansion to the new StringBuilder instance. Well, so far the core logic of Append is finished, let’s roughly run through the core logic of Append
Let’s roughly list it, for example
- default m_ChunkChars[16], m_ChunkOffset=0, m_ChunkPrevious=null, Length=0
- First expansion m_ChunkChars[16], m_ChunkOffset=16, m_ChunkPrevious=points to the original StringBuilder, m_ChunkLength=16
- Second expansion of m_ChunkChars[32], m_ChunkOffset=32, m_ChunkPrevious=StringBuilder of m_ChunkChars[16] before expansion, m_ChunkLength=32
- third expansion of m_ChunkChars[64], m_ChunkOffset=64, m_ChunkPrevious=StringBuilder of m_ChunkChars[64] before expansion, m_ChunkLength=64
I have tried to draw a diagram. I wonder if I can help understand the data structure of StringBuilder. The chain table structure of StringBuilder is the current node pointing to the last StringBuilder, that is, the current instance of StringBuilder before expansion
c# StringBuilder overall data structure is a one-way chain table, but each node of the chain table storage block is m_ChunkChars is
char[]
. The essence of expansion is to add a new node to the chain table, and the capacity of the new node storage block will increase with each expansion. Most of the cases encountered when using it are 16 for the first time, 16 for the second time, 32 for the third time, 64 for the fourth time, and so on.
Convert to String
From the above data structure of StringBuilder, we understand that StringBuilder is essentially a unidirectional chain table
, which contains m_ChunkPrevious
pointing to the previous StringBuilder instance, i.e. a chain table in reverse order. We finally get the result of the construction of the StringBuilder through the ToString()
method of the StringBuilder to get a final result string, next we will look at the implementation of ToString [Click to view source code 👈]
|
|
The essence of this ToString operation is an inverted-link traversal operation, each traversal gets the current StringBuilder’s m_ChunkPrevious character array
and after the data stitching is completed, it gets the last StringBuilder node of the current StringBuilder, i.e. m_ ChunkPrevious
, the end condition is m_ChunkPrevious==null
which means the node is the first node, and finally stitching into a string is returned. For example, our StringBuilder stores ``I can’t be separated from my country, wherever I go, I leave a hymn. ‘, then the traversal process for the ToString traversal StringBuilder is roughly as follows
|
|
After all, StringBuilder can only record the data of the last StringBuilder, so this is an operation that traverses the chain of StringBuilder in reverse order, each traversal is to add the data recorded in m_ChunkPrevious
until m_ChunkPrevious==null
then the traversal is completed and the result is returned directly.
c# StringBuilder class ToString is essentially a reverse-order traversal of a one-way chain, each node of the chain is a StringBuilder instance, get the storage block inside
m_ChunkChars character array
to assemble, loop through all the nodes after the results are assembled into a string to return.
Compare to java implementation
We can see that the implementation of StringBuilder on C# is essentially a chain table. So and C# language similar to the Java implementation of the idea of whether the same, let’s look at the general idea of the implementation of StringBuilder in Java how my local jdk version for 1.8.0_191
, the first is also the initialization logic
|
|
Here you can see that the logic of the initialization capacity of java is a little different from c#, the default initialization length of c# depends on the length of the initialization string that can be stored mainly, while the implementation of java is in the current length +16
length, that is, the length of this initialization 16 must be available anyway. So let’s look at the source code for the implementation of append
again.
|
|
The core is the method to expand ensureCapacityInternal, let’s simply look at its implementation
|
|
The last thing to show is the operation to get the StringBuilder result, again the toString
method, let’s look at the implementation of this logic in java.
By now the logic of StringBuilder in java is very clear to all of us.
- The real data of StringBuilder in c# is stored in
m_ChunkChars character array
, but the overall data structure isone way chain table
, while in java it is completelychar[]
character array. - The initial length of StringBuilder in c# is the length that can hold the current initialized string, while the initialized length of java is the length of the current passed string + 16.
- The expansion of StringBuilder in c# is to generate a new instance of StringBuilder with a capacity related to the length of the previous StringBuilder. java generates a new array of the length of the original
char[] array * 2 + 2
length. - The ToString implementation in c# iterates through the inverted chain to assemble a new string to return, while java initializes a new string with the
char[]
of the current StringBuilder to return.
About the c# and java StringBuilder implementation is so different, in the end which implementation is better? There is no way to evaluate this, after all, each language’s underlying class library implementation is well thought out and integrates many people’s ideas. From the owner’s point of view, the core function of StringBuilder itself lies in the construction process, so the performance of the construction process is very important, so the logic of similar array expansion and then copy is not as efficient as the way of a chain table. However, when it comes to the final ToString result, the advantage of the array is very obvious, after all, string is essentially a char[] array
.
For StringBuilder append is a frequent operation and most of the cases may be append operation many times, while ToString operation for StringBuilder is basically only once, that is when you get the result of StringBuilder construction. So the owner feels that improving the performance of append is the key.
Summary
In this article we have explained the general implementation of c# StringBuilder, and also compared the differences between c# and java regarding the implementation of StringBuilder, the main difference is that the underlying data structure implemented in c# is a unidirectional chain table
, but the data of each node is stored in char[]
, while the overall implementation of java is an array
. is an array
. This also provides us with a different way of thinking, and here we also summarize its implementation again.
- c# StringBuilder is essentially a
one way chain
operation, StringBuilder itself containsm_ChunkPrevious
pointing to the data saved in the last expansion, the essence of the expansion is to add a new node to the chain. - The actual length of the expansion is
max(the remaining length of the current appended character, min(the current StringBuilder length, 8000))
, and the capacity of the new node storage block will be increased each time the expansion is done. Most of the cases encountered when using is the first time for 16, the second time for 16, three times for 32, four times for 64 and so on. - c# StringBuilder class ToString is the essence of inverted traversal of the one-way chain table, each traversal to obtain the current StringBuilder
m_ChunkPrevious array of characters
after the completion of data stitching, and then getm_ChunkPrevious
pointed to the last StringBuilder instance, and finally the results are assembled into a string to return. - The main difference between the c# and java implementations of StringBuilder is that the overall underlying data structure of the c# implementation is a
unidirectional chain table
, but the data itself is stored inchar[]
in each StringBuilder instance, which is a bit like the redisquicklist
. The overall approach is achar[]
array of characters.