Pages

Tuesday, May 10, 2005

Binary Serialization The Complete Reference

Serialization can be defined as the process of storing the state of an object to a storage medium. During this process, the public and private fields of the object and the name of the class, including the assembly containing the class, are converted to a stream of bytes, which is then written to a data stream. When the object is subsequently deserialized, an exact clone of the original object is created.



When implementing a serialization mechanism in an object-oriented environment, you have to make a number of tradeoffs between ease of use and flexibility. The process can be automated to a large extent, provided you are given sufficient control over the process. For example, situations may arise where simple binary serialization is not sufficient, or there might be a specific reason to decide which fields in a class need to be serialized. The following sections examine the robust serialization mechanism provided with the .NET Framework and highlight a number of important features that allow you to customize the process to meet your needs.


Serialization Concepts

Why would you want to use serialization? The two most important reasons are to persist the state of an object to a storage medium so an exact copy can be re-created at a later stage, and to send the object by value from one application domain to another. For example, serialization is used to save session state in ASP.NET and to copy objects to the Clipboard in Windows Forms. It is also used by remoting to pass objects by value from one application domain to another.

Basic Serialization




The easiest way to make a class serializable is to mark it with the Serializable attribute as follows.



[Serializable]
public class MyObject {
public int n1 = 0;
public int n2 = 0;
public String str = null;
}



The code example below shows how an instance of this class can be serialized to a file.



MyObject obj = new MyObject();
obj.n1 = 1;
obj.n2 = 24;
obj.str = "Some String";
IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile.bin", FileMode.Create,
 FileAccess.Write, FileShare.None);
formatter.Serialize(stream, obj);
stream.Close();



This example uses a binary formatter to do the serialization. All you need to do is create an instance of the stream and the formatter you intend to use, and then call the Serialize method on the formatter. The stream and the object to serialize are provided as parameters to this call. Although not explicitly demonstrated in this example, all member variables of a class will be serialized — even variables marked as private. In this aspect, binary serialization differs from the XMLSerializer Class, which only serializes public fields. For information on excluding member variables from binary serialization, see Selective Serialization.



Restoring the object back to its former state is just as easy. First, create a stream for reading and a formatter, and then instruct the formatter to deserialize the object. The code example below shows how this is done.



IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile.bin", FileMode.Open,
FileAccess.Read, FileShare.Read);
MyObject obj = (MyObject) formatter.Deserialize(stream);
stream.Close();

// Here's the proof.
Console.WriteLine("n1: {0}", obj.n1);
Console.WriteLine("n2: {0}", obj.n2);
Console.WriteLine("str: {0}", obj.str);



The BinaryFormatter used above is very efficient and produces a compact byte stream. All objects serialized with this formatter can also be deserialized with it, which makes it an ideal tool for serializing objects that will be deserialized on the .NET Framework. It is important to note that constructors are not called when an object is deserialized. This constraint is placed on deserialization for performance reasons. However, this violates some of the usual contracts the runtime makes with the object writer, and developers should ensure they understand the ramifications when marking an object as serializable.



If portability is a requirement, use the SoapFormatter instead. Simply replace the BinaryFormatter in the code above with SoapFormatter, and call Serialize and Deserialize as before. This formatter produces the following output for the example used above.



<SOAP-ENV:Envelope
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:SOAP- ENC="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:SOAP- ENV="http://schemas.xmlsoap.org/soap/envelope/"
SOAP-ENV:encodingStyle=
"http://schemas.microsoft.com/soap/encoding/clr/1.0"
"http://schemas.xmlsoap.org/soap/encoding/"
xmlns:a1="http://schemas.microsoft.com/clr/assem/ToFile">

<SOAP-ENV:Body>
<a1:MyObject id="ref-1">
<n1>1</n1>
<n2>24</n2>
<str id="ref-3">Some String</str>
</a1:MyObject>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>



It is important to note that the Serializable attribute cannot be inherited. If you derive a new class from MyObject, the new class must be marked with the attribute as well, or it cannot be serialized. For example, when you attempt to serialize an instance of the class below, you will get a SerializationException informing you that the MyStuff type is not marked as serializable.



public class MyStuff : MyObject
{
public int n3;
}



Using the Serializable attribute is convenient, but it has limitations as demonstrated above. Refer to the Serialization Guidelines for information about when you should mark a class for serialization; serialization cannot be added to a class after it has been compiled.

Selective Serialization



A class often contains fields that should not be serialized. For example, assume a class stores a thread ID in a member variable. When the class is deserialized, the thread stored the ID for when the class was serialized might no longer be running; so serializing this value does not make sense. You can prevent member variables from being serialized by marking them with the NonSerialized attribute as follows.



[Serializable]
public class MyObject
{
public int n1;
[NonSerialized] public int n2;
public String str;
}



If possible, make an object that could contain security-sensitive data nonserializable. If the object must be serialized, apply the NonSerialized attribute to specific fields that store sensitive data. If you do not exclude these fields from serialization, be aware that the data they store will be exposed to any code that has permission to serialize. For more information about writing secure serialization code, see Security and Serialization.

Custom Serialization



You can customize the serialization process by implementing the ISerializable interface on an object. This is particularly useful in cases where the value of a member variable is invalid after deserialization, but you need to provide the variable with a value in order to reconstruct the full state of the object. In addition, you should not use default serialization on a class that is marked with the Serializable attribute and has declarative or imperative security at the class level or on its constructors. Instead, these classes should always implement the ISerializable interface.



Implementing ISerializable involves implementing the GetObjectData method and a special constructor that is used when the object is deserialized. The sample code below shows how to implement ISerializable on the MyObject class from a previous section.



[Serializable]
public class MyObject : ISerializable
{
public int n1;
public int n2;
public String str;

public MyObject()
{
}

protected MyObject(SerializationInfo info, StreamingContext context)
{
n1 = info.GetInt32("i");
n2 = info.GetInt32("j");
str = info.GetString("k");
}
[SecurityPermissionAttribute(SecurityAction.Demand,
SerializationFormatter
=true)]
public virtual void GetObjectData(SerializationInfo info,
StreamingContext context)
{
info.AddValue("i", n1);
info.AddValue("j", n2);
info.AddValue("k", str);
}
}



When GetObjectData is called during serialization, you are responsible for populating the SerializationInfo provided with the method call. Simply add the variables to be serialized as name/value pairs. Any text can be used as the name. You have the freedom to decide which member variables are added to the SerializationInfo, provided that sufficient data is serialized to restore the object during deserialization. Derived classes should call the GetObjectData method on the base object if the latter implements ISerializable.



Note that serialization can allow other code to see or modify object instance data that would otherwise be inaccessible. Therefore, code performing serialization requires the SecurityPermission with the SerializationFormatter flag specified. Under default policy, this permission is not given to Internet-downloaded or intranet code; only code on the local computer is granted this permission. The GetObjectData method should be explicitly protected either by demanding the SecurityPermission with the SerializationFormatter flag specified or by demanding other permissions that specifically help protect private data.



If a private field stores sensitive information, you should demand the appropriate permissions on GetObjectData to protect the data. Remember that code that has been granted SecurityPermission with the SerializationFormatter flag specified can view and modify the data stored in private fields. A malicious caller granted this SecurityPermission can view data such as hidden directory locations or granted permissions and use the data to exploit a security vulnerability on the computer. For a complete list of the security permission flags you can specify, see the SecurityPermissionFlag Enumeration.



It is important to stress that when ISerializable is added to a class you must implement both GetObjectData and the special constructor. The compiler will warn you if GetObjectData is missing. However, because it is impossible to enforce the implementation of a constructor, no warning will be provided if the constructor is absent, and an exception will be thrown when an attempt is made to deserialize a class without the constructor.



The current design was favored above a SetObjectData method to get around potential security and versioning problems. For example, a SetObjectData method must be public if it is defined as part of an interface; thus users must write code to defend against having the SetObjectData method called multiple times. Otherwise, a malicious application that calls the SetObjectData method on an object in the process of executing an operation can cause potential problems.



During deserialization, SerializationInfo is passed to the class using the constructor provided for this purpose. Any visibility constraints placed on the constructor are ignored when the object is deserialized; so you can mark the class as public, protected, internal, or private. However, it is best practice to make the constructor protected unless the class is sealed, in which case the constructor should be marked private. The constructor should also perform thorough input validation. To avoid misuse by malicious code, the constructor should enforce the same security checks and permissions required to obtain an instance of the class using any other constructor. If you do not follow this recommendation, malicious code can preserialize an object, obtain control with the SecurityPermission with the SerializationFormatter flag specified and deserialize the object on a client computer bypassing any security that would have been applied during standard instance construction using a public constructor.



To restore the state of the object, simply retrieve the values of the variables from SerializationInfo using the names used during serialization. If the base class implements ISerializable, the base constructor should be called to allow the base object to restore its variables.



When you derive a new class from one that implements ISerializable, the derived class must implement both the constructor as well as the GetObjectData method if it has variables that need to be serialized. The code example below shows how this is done using the MyObject class shown previously.



[Serializable]
public class ObjectTwo : MyObject
{
public int num;

public ObjectTwo() : base()
{
}

protected ObjectTwo(SerializationInfo si,
StreamingContext context) : base(si,context)
{
num = si.GetInt32("num");
}
[SecurityPermissionAttribute(SecurityAction.Demand,
SerializationFormatter
=true)]
public override void GetObjectData(SerializationInfo si
, StreamingContext context)
{
base.GetObjectData(si,context);
si.AddValue("num", num);
}
}



Do not forget to call the base class in the deserialization constructor; if this is not done, the constructor on the base class will never be called, and the object will not be fully constructed after deserialization.



Objects are reconstructed from the inside out; and calling methods during deserialization can have undesirable side effects, because the methods called might refer to object references that have not been deserialized by the time the call is made. If the class being deserialized implements the IDeserilizationCallback, the OnDeserialization method is automatically called when the entire object graph has been deserialized. At this point, all the child objects referenced have been fully restored. A hash table is a typical example of a class that is difficult to deserialize without using the event listener described above. It is easy to retrieve the key/value pairs during deserialization, but adding these objects back to the hash table can cause problems, because there is no guarantee that classes that derived from the hash table have been deserialized. Calling methods on a hash table at this stage is therefore not advisable.

Steps in the Serialization Process



When the Serialize method is called on a formatter, object serialization proceeds according to the following sequence of rules:


  • A check is made to determine whether the formatter has a surrogate selector. If the formatter does, check whether the surrogate selector handles objects of the given type. If the selector handles the object type, ISerializable.GetObjectData is called on the surrogate selector.


  • If there is no surrogate selector or if it does not handle the object type, a check is made to determine whether the object is marked with the Serializable attribute. If the object is not, a SerializationException is thrown.


  • If the object is marked appropriately, check whether the object implements the ISerializable interface. If the object does, GetObjectData is called on the object.


  • If the object does not implement ISerializable, the default serialization policy is used, serializing all fields not marked as NonSerialized.

Versioning



The .NET Framework provides support for versioning and side-by-side execution, and all classes will work across versions if the interfaces of the classes remain the same. Because serialization deals with member variables and not interfaces, be cautious when adding member variables to, or removing them from, classes that will be serialized across versions. This is especially true for classes that do not implement the ISerializable interface. Any change of state of the current version, such as the addition of member variables, changing the types of variables, or changing their names, will mean that existing objects of the same type cannot be successfully deserialized if they were serialized with a previous version.



If the state of an object needs to change between versions, class authors have two choices:


  • Implement ISerializable. This allows you to take precise control of the serialization and deserialization process, allowing future state to be added and interpreted correctly during deserialization.


  • Mark nonessential member variables with the NonSerialized attribute. This option should only be used when you expect minor changes between different versions of a class. For example, when a new variable has been added to a later version of a class, the variable can be marked as NonSerialized to ensure the class remains compatible with previous versions.

Serialization Guidelines


You should consider serialization when designing new classes, because a class cannot be made serializable after it has been compiled. Some questions to ask are: Will this class need to be sent across application domains? Will this class ever be used with remoting? What will users do with this class — might they derive a new class from mine that needs to be serialized? When in doubt, mark the class as serializable. It is probably better to mark all classes as serializable unless any of the following are true:


  • The class will never cross an application domain. If serialization is not required and the class needs to cross an application domain, derive the class from MarshalByRefObject.


  • The class stores special pointers that are only applicable to the current instance of the class. If a class contains unmanaged memory or file handles, for example, ensure these files are marked as NonSerialized, or don't serialize the class at all.


  • Class data members contain sensitive information. In this case, it is advisable to mark the class as serializable, but to mark the individual data members that contain sensitive information as NonSerialized. Another alternative is to implement the ISerializable interface and serialize only the required fields.



Be aware of the security implications of marking a class as serializable. A Link Demand or an Inheritance Demand for a CodeAccessPermission on a class or class constructor can be bypassed by default or custom serialization that implements a corresponding demand for the same CodeAccessPermission. If a class has a Link Demand for a permission, the runtime checks only the immediate caller to verify that the caller has been granted the permission. The .NET Framework class library code is signed with the Microsoft strong name and is always granted full trust. Any code can use code that is granted full trust to bypass link-time security checks. For example, in the case of serialization, malicious code that does not have the required serialization permission can call one of the fully trusted .NET Framework formatters, such as BinaryFormatter, and bypass the link-demand check for the permission.