TensorFlow to depart from YAML due to code execution vulnerability, recommends developers switch to JSON

Tensorflow is a Python based machine learning and artificial intelligence project developed by Google. Recently TensorFlow has dropped support for YAML in order to fix a critical code execution vulnerability.

AML or YAML Ain’t Markup Language is a human-readable data serialization language used to pass objects and store data between processes and applications, and many Python applications use YAML to serialize and deserialize objects.

The CVE ID for this vulnerability is CVE-2021-37678, and the maintainers of TensorFlow and Keras, a wrapper project for TensorFlow, say that the vulnerability stems from an insecure parsing of YAML, which could allow an attacker to execute arbitrary code when an application deserializes a Keras model served in YAML format. Deserialisation vulnerabilities typically occur when an application reads bad or malicious data from a non-genuine source.

This YAML deserialisation vulnerability is rated at 9.3 for severity and was reported to the TensorFlow maintainers by security researcher Arjun Shibu.

The source of the vulnerability was the infamous “yaml.unsafe_load()” function in TensorFlow code.

Security researcher Arjun Shibu said, “I searched for deserialization patterns in TensorFlow for Pickle and PyYAML, and surprisingly, I found a call to the dangerous function yaml.unsafe_load().”

The “unsafe_load” function is known to allow fairly liberal deserialisation of YAML data - it parses all tags, even those that are known to be unsafe on untrusted input. The function loads the YAML input directly without cleaning it up, which makes it possible to inject data using malicious code.

The use of serialisation is very common in machine learning applications. Training a model can be an expensive and slow process. Therefore, developers often use pre-trained models that are already stored in YAML or other formats supported by ML libraries such as TensorFlow.

Following the disclosure of the vulnerability, the maintainers of TensorFlow decided to drop support for YAML altogether and use JSON for deserialisation. It is worth noting that TensorFlow is not the first, nor the only, project to be found to use YAML unsafe_load. The use of this function is quite common in Python projects.

The maintainers of TensorFlow have stated that the CVE-2021-37678 vulnerability will be fixed in the TensorFlow 2.6.0 update and will also be backported to previous versions 2.5.1, 2.4.3 and 2.3.4. Since the beginning of the year, Google has fixed more than 100 security vulnerabilities in TensorFlow.

Table of Contents