The security vulnerability of Json serialization framework has always been a topic of conversation among programmers, especially in the past two years, fastjson has been targeted research, and more frequently reported vulnerabilities, a vulnerability does not matter, but the security team is always using email to urge the online application to upgrade the dependency, which can be fatal, I believe that many people are also unbearable, consider using other serialization framework to replace fastjson. No, we recently had a project where fastjson was replaced by gson, which caused a problem on line. Share this experience so that you do not encounter the same problem.
Problem Description
A very simple logic on the wire, serialize the object into fastjson and send the string using HTTP request.
It was working fine, but after replacing fastjson with gson, it triggered an OOM on the wire.
After memory dump analysis, it was found that a 400 M+ message was sent, and because the HTTP tool did not do the send size checksum, the transmission was forced, which directly led to the overall unavailability of the online service.
Problem Analysis
Why the same Json serialization, fastjson did not have any problems, but immediately exposed after switching to gson? By analyzing the memory dump data, we found that the values of many fields are duplicated, and then combined with the characteristics of our business data, we immediately located the problem - gson serialization duplicate objects have serious defects.
A simple example is used directly to illustrate the problem at that time. Simulate the data characteristics on line, using List<Foo>
to add into the same reference object
|
|
Observe the printed results:
gson:
|
|
fastjson:
|
|
You can find that gson handles duplicate objects by serializing each object, while fastjson handles duplicate objects by marking all objects except the first one with the reference symbol $ref.
The two different serialization strategies can lead to a qualitative change when the number of individual duplicate objects is very large and when a single object is submitted in a larger size, so let’s compare them for a special scenario.
Compression ratio test
- Serialized objects: contain a large number of attributes. To simulate online business data.
- Number of repetitions: 200. i.e. List contains 200 objects of the same reference to simulate the complex object structure on line and expand the variability.
- serialization methods: gson, fastjson, Java, Hessian2. extra Java and Hessian2 control group is introduced to facilitate our understanding of the performance of each serialization framework in this particular scenario.
- The main observation is the byte size of each serialization method after compression, because it is related to the size of the network transmission; the secondary observation is whether the list is still the same object after deserialization
|
|
Output results:
Conclusion Analysis: Due to the large size of a single object after serialization, the use of reference representation can be a good way to reduce the volume, it can be found that gson does not take this serialization optimization strategy, resulting in volume expansion. Even Java serialization, which is not always favored, is much better than it, and Hessian2 is even more exaggerated, which is directly optimized by 2 orders of magnitude than gson. And after deserialization, gson does not restore the same reference back to the original object, while other serialization frameworks can achieve this.
Throughput Testing
In addition to the size of the data after serialization, the throughput of each serialization is also a point of interest. The throughput of each serialization method can be accurately tested using benchmark tests.
|
|
Throughput Report:
Isn’t it a bit surprising that fastjson leads the way, with the throughput of text class serialization being an order of magnitude higher than that of binary serialization, at a million per second and 100,000 per second, respectively?
Overall Test Conclusion
- fastjson serialization with $ reference mark can also be gson correct deserialization, but I did not find the configuration to allow gson serialization into references
- fastjson, hesian, java support circular reference resolution; gson does not support
- fastjson can set DisableCircularReferenceDetect to turn off the detection of circular references and duplicate references
- gson deserialization before the same reference object, after serialization and then deserialization back, will not be considered the same object, may lead to the expansion of the number of memory objects; and fastjson, java, hesian2 serialization method due to the record is the reference mark, there is no such problem
- Take my test case as an example, hesian2 has a very strong serialization compression ratio, suitable for large messages serialized for network transmission scenarios
- In my test case, for example, fastjson has a very high throughput, which can afford its fast, suitable for scenarios requiring high throughput
- Serialization also needs to consider whether to support circular references, whether to support circular object optimization, whether to support enumerated types, collections, arrays, subclasses, polymorphism, internal classes, generalization and other comprehensive scenarios, as well as whether to support visualization and other comparative scenarios, compatibility after adding or deleting fields, and other features. In general, I recommend hessian2 and fastjson two serialization methods
Summary
We all know fastjson in order to fast, do relatively some of the more hack logic, which also leads to more vulnerabilities, but I think the coding is in the trade off, if there is a perfect framework, that other competing frameworks would not exist long ago. I do not have a deep study of each serialization framework, you may say jackson more excellent, I can only say that you can solve the problems encountered in your scenario, that is the right framework.
Finally, when you want to replace the serialization framework must be careful to understand the characteristics of the alternative framework, the original framework may solve the problem, the new framework may not be able to cover well.
Reference https://www.cnkirito.moe/serialize-practice/