Simple Binary Encoding

I. Preface

In this article, let’s learn the SBE(Simple Binary Encoding) transfer protocol, which is the same as protobuf, which is a binary transport protocol with higher performance than protobuf, inspired by fast binary variant of FIX fix-simple-binary-encoding), and was originally intended for use in financial-grade, low-latency trading systems. SBE is also widely used as a data transfer medium in the open source software Aeron.

II. Design Principles

2.1 Copy-Free

Network and storage systems usually need to encode and decode data buffers when processing data. the principle of Copy-Free is not to use any intermediate buffers to encode and decode data. If intermediate buffers were used, performance loss would occur due to multiple copies of the data.

SBE uses direct encoding and decoding of the underlying buffer, which has the limitation that it does not support direct sending of data longer than the transmission buffer, in which case segmentation and data reassembly are required.

2.2 Native Type Mapping

Copy-Free mode gets a performance boost by encoding data directly into the native type in the underlying buffer. For example, a 64-bit integer can be encoded directly into the underlying buffer as a single x86_64 MOV instruction. If the byte order of the data (big end/small end) does not match the CPU’s, then the data can be swapped in registers using the x86 BSWAP instruction before being written to the underlying buffer.

2.3 Allocation-Free

The creation of objects leads to a reduction in CPU cache, which reduces efficiency. The objects need to be managed and released later. For Java, this process is done by the garbage collector, which reclaims memory by triggering STW (Stop The World) of varying duration (except for the new generation C4 garbage collector.) C++ is better, but introduces a locking mechanism when memory is released back to the memory pool, which also incurs performance overhead. The SBE codec uses the flyweight model to avoid allocation problems. flyweight windows encode and decode data directly on the underlying buffer, selecting the appropriate type of flyweigh via the templateId field in the message header. if the fields in the message need to be kept outside the island processing flow, they need to be copied out separately .

public class ShapeFactory {
    private static final Map<String, Shape> circleMap = new HashMap<>();
    // 享元模式，存在则直接返回对象复用，不存在则创建
    public static Shape getCircle(String color) {
        Circle circle = (Circle)circleMap.get(color);
        if(circle == null) {
            circle = new Circle(color);
            circleMap.put(color, circle);
            System.out.println("Creating circle of color : " + color);
        }
        return circle;
    }
}

2.4 Streaming Access

Modern memory subsystems have become more and more complex, and the algorithmic model of access memory largely determines performance and consistency. The best performance and most consistent latency can be achieved by using a stream-based approach to access memory addresses in ascending mode. The SBE codec encodes and decodes data based on the forward progression of positions in the underlying buffer. While some backtracking is possible, this operation is highly discouraged from a performance and latency perspective.

2.5 Word Aligned Access

Many CPU architectures exhibit significant performance problems when we do not access data at the boundary of a word. A word should start at an address that is a multiple of its size in bytes, e.g. a 64-bit integer can only start at an address whose byte address is divisible by 8, a 32-bit integer can only start at an address whose byte address is divisible by 4, and so on.

The SBE mode supports the concept of defining an offset for the start of a field in the data. It assumes that the data is encapsulated in protocol frames with 8-byte size boundaries. To achieve compactness and efficiency, message fields should be ordered by their type and size in descending order.

typedef struct _tcp_hdr
{
    unsigned short src_port; //源端口号
    unsigned short dst_port; //目的端口号
    unsigned int seq_no; //序列号
    unsigned int ack_no; //确认号
#if LITTLE_ENDIAN
    unsigned char reserved_1:4; //保留6位中的4位首部长度
    unsigned char thl:4; //tcp头部长度
    unsigned char flag:6; //6位标志
    unsigned char reseverd_2:2; //保留6位中的2位
#else
    unsigned char thl:4; //tcp头部长度
    unsigned char reserved_1:4; //保留6位中的4位首部长度
    unsigned char reseverd_2:2; //保留6位中的2位
    unsigned char flag:6; //6位标志
#endif
    unsigned short wnd_size; //16位窗口大小
    unsigned short chk_sum; //16位TCP检验和
    unsigned short urgt_p; //16为紧急指针
}tcp_hdr;

2.6 Backwards Compatibility

The message format must be backwards compatible, i.e. the older system should be able to read the newer version of the same message, and vice versa.

An extension mechanism is designed in SBE to allow the introduction of new optional fields in the data, which can be used by the new system and ignored by the old system until it is upgraded. If required fields or basic structures need to be changed, the new message type must be used, as this is no longer a semantic extension of the existing data type.

III. Tutorials for use

3.1 XML Grammar

First we have to learn how to define an SBE transport protocol, which differs from protobuf in that it uses a custom format, SBE uses XML format. We have to refer to the official documentation, so we won’t talk about it here.

3.2 Maven Plugins

After we have prepared the XML file of SBE, the next step is to generate the corresponding codec code according to the file, the official recommendation is to use sbe-all-${SBE_LIB_VERSION}.jar toolkit, and then call java -jar to generate the code, the specific process and detailed parameters can be found in the official document.

Here I introduce another way to generate code through the Maven plugin, which is much easier to use. First assume that I store the XML file in the resources directory of the project, as shown in the following figure.

Then add the plugin in Maven

<build>
    <plugins>
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>1.6.0</version>
            <executions>
                <execution>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>java</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <includeProjectDependencies>false</includeProjectDependencies>
                <includePluginDependencies>true</includePluginDependencies>
                <mainClass>uk.co.real_logic.sbe.SbeTool</mainClass>
                <systemProperties>
                    <systemProperty>
                        <key>sbe.output.dir</key>
                        <value>${project.build.directory}/generated-sources/java</value>
                    </systemProperty>
                    <!-- Is XInclude supported for the schema -->
                    <systemProperty>
                        <key>sbe.xinclude.aware</key>
                        <value>true</value>
                    </systemProperty>
                </systemProperties>
                <arguments>
                    <argument>${project.build.resources[0].directory}/sbe-example1.xml</argument>
                    <argument>${project.build.resources[0].directory}/sbe-example2.xml</argument>
                </arguments>
                <workingDirectory>${project.build.directory}/generated-sources/java</workingDirectory>
            </configuration>
            <dependencies>
                <dependency>
                    <groupId>uk.co.real-logic</groupId>
                    <artifactId>sbe-tool</artifactId>
                    <version>1.24.0</version>
                </dependency>
            </dependencies>
        </plugin>
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>build-helper-maven-plugin</artifactId>
            <version>3.0.0</version>
            <executions>
                <execution>
                    <id>add-source</id>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>add-source</goal>
                    </goals>
                    <configuration>
                        <sources>
                            <source>${project.build.directory}/generated-sources/java/</source>
                        </sources>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Then execute the maven clean install command and the automatically generated code will be placed in the target/generated-sources/java directory.

Note: If you need to adjust the XML file location, or the location of the generated file, or the generation parameters, please adjust the specific parameters in the plugin above by yourself.

3.3 Codec

The next step is to use the generated codec class for data transmission, the difficulty is the use of its API, here I give two examples that I used when I was studying, you can debug locally against the document and understand it quickly.

The first example is from the Aerona Cookbook.

Example description: https://aeroncookbook.com/simple-binary-encoding/basic-sample/
Corresponding source code: com.github.jitwxs.sample.aeron.sbe1.SbeExample1Test

The second example is from SBE GitHub:

Example description: https://github.com/real-logic/simple-binary-encoding/wiki/Java-Users-Guide
Corresponding source code: com.github.jitwxs.sample.aeron.sbe2.ExampleUsingGeneratedStub

Table of Contents