I've heard that I should never use Java serialization (Serializable/ObjectInputStream/ObjectOutputStream) because of security. What's the problem?
5 Answers
Any time you deserialize an object by calling ObjectInputStream.readObject, you have a remote code execution vulnerability: if someone can make you deserialize the wrong bytes, they can run any command on your computer.
The exploit works by creating an object that will run evil code inside its readObject method, which is called during the process of deserialization, then serializing this object and making you deserialize it. At first glance, you'd think this could only happen if the attacker could load an evil class into your program, in which case they've already hacked your program with or without serialization. However, there are several ways to create an "evil object" using only classes from common libraries (example) and in the future one might be found in the standard library, which would work in every program.
If you happen to be familiar with CVE-2010-0840 (escaping the Java applet sandbox using "Trusted Method Chaining") the concept is very similar but the details are completely different.
The exploit occurs during the process of deserialization, inside readObject, so nothing you do after readObject returns can prevent it.
For more details, see the writeup.
You can prevent the exploit in a cumbersome way, by creating a subclass of ObjectInputStream, overriding the resolveClass method so that only the specific classes you want to be deserialized can be resolved, and using this subclass for deserialization, or by calling setObjectInputFilter with an appropriate filter before reading any objects.
"never" is a strong word. However, when the official documentation of a class starts with a bold security warning:
Warning: Deserialization of untrusted data is inherently dangerous and should be avoided. Untrusted data should be carefully validated according to the "Serialization and Deserialization" section of the Secure Coding Guidelines for Java SE. Serialization Filtering describes best practices for defensive use of serial filters.
(emphasis in original)
it's probably a good idea to take that warning seriously.
In a nutshell, the problem with Java Serialization is that the classes to be instantiated are determined by the serialized data. That is, the data controls with objects are created, and can instantiate any (serializable) class available to your program. In addition, classes can declare methods to be invoked upon deserialization, which allows an attacker to invoke any such method with any input. While care should be taken to harden such methods against misuse, their sheer number makes it virtually certain that at least one class in your classpath remains vulnerable, and can be exploited by the attacker.
(Yes, after the object has been instantiated and returned to your code, your code will notice that the object is of the wrong type and raise an error. However, that happens after the attacker-selected code has executed, and is therefore too late to prevent remote code execution, resource exhaustion, or whatever else the vulnerable method did)
More modern serialization libraries prevent this by asking the caller for the expected class, rather than trusting the data. For instance, with a Jackson ObjectMapper, you'd write:
var person = objectMapper.readObject(Person.class)
That puts control of which classes are instantiated into the hands of the receiving rather than the sending program.
To mitigate the risk in cases where we can't switch to a more modern data format, ObjectInputStreams have been extended with filtering options. However, due to backward compatiblity concerns, these protections are disabled by default. They are also somewhat onerous to configure. And filters just restrict which classes can be instantiated - if one of those is vulnerable, you can still be owned.
Overall, the official recommendation that Java Serialization should not be used with untrusted data is entirely reasonable.
Now, one might argue that Java Serialization remains suitable if data comes from a trusted source. I'd disagree, because
- it's quite possible that even though the data is trusted now, it may become untrusted in the future. For instance, suppose you use Java Serialization for session persistence. Sounds harmless, right? But suppose the customer reports some weird error when reading persisted sessions. To diagnose the issue, you ask them for the session file and load it in your dev environment. Oops.
- in the event that a trusted system is compromised, Java Serialization can be used to escalate the compromise to your system
What could possibly justify these risks? The only things in favor of Java Serialization are that it is part of the JDK and easy to use. But more modern serialization libraries are also easy to use, and often have a more readable, text-based wire format such as JSON, which helps greatly with debugging. (One might expect Java Serialization's binary format to be more compact, but it also contains a lot more metadata, which can more than offet the gains from the binary format). And in the age of Maven (or whichever dependency managment tool you prefer), pulling in an additional library is not nearly as hard as it used to be.
Overall, I find the recommendation to "never use Java Serialization" quite sensible. It's one of those early Java technologies that did not anticipate the security challenges we face today.
- 4,338
you should never use it for long term storage, because the de-serialization of stored data is very fragile, and likely to become impossible or unreliable if there is even a trivial change in the data structures. I'm a little more on the fence about short term round trips, but its very easy to make mistakes that will not be easily noticed. Its much better to have complete control over the serialization process so you can understand and be responsible for what's happening.
- 4,078
I'm going to answer the question asked in the header, rather than the security focused body. So why should we not use serialization?
If the serialization and deserialization occur within your current program run, then all is reasonably OK.
Otherwise your saved information is saved in an undocumented format that can arbitrarily be changed between Java versions.
There is no third party app that you can use to examine or update your saved data. For instance if saved as JSON then a text editor is sufficient. If saved in a SQL database, then an SQL client gives access.
The versionong system is underdeveloped. Last time I used Java it was near impossible to fix major class format changes. The GUID stored as the change version simply told you there was a problem, but made it impossible to fix class changes. Having no access to other data storage like the underlying database did not help.
I've been out of the Java development area for 10 years, so take my suggestions as a starting point rather than gospel.
- 2,384
An approach that is language independent: Instead of serialising and deserialising objects you can extract the data of your object and stash them into a JSON document, and later extract them from the JSON document and build your objects by hand.
JSON is very simple, your OS will have methods to read and write it with zero vulnerabilities. It is then up to you to check the data you find before you use it. Unlike the Java deserialisation class where the damage is done before you can check anything.
- 49,096