Pages

Tuesday, June 07, 2005

Improve Serialization Performance

Overview



Serialization is used to persist the state of an object so that the object can be saved and then regenerated later. ASP.NET uses serialization to save objects in session state. Serialization is also used when an object is passed across a remoting boundary, such as an application domain, process, or computer. Finally, serialization is used if parameters are passed to and from Web services.



The .NET Framework provides two serialization mechanisms:


  • ASP.NET Web services use the XmlSerializer class to perform serialization.


  • .NET remoting uses the two classes that implement IFormatter: the BinaryFormatter and the SoapFormatter. To support serialization by a formatter object, a type must be marked with the Serializable attribute.



Serialization performance is an important consideration for .NET applications because serialization is used frequently. There are a number of techniques that you can use to improve performance. These are described in this How To.



What You Must Know



If you plan to use serialization, you should know the following:


  • Consider the data contract between client and server, and ensure that your interface is designed with efficiency of remote access in mind. For example, avoid chatty interfaces, and, where necessary, implement a data façade to wrap existing chatty interfaces and reduce round trips.


  • The XmlSerializer used by Web services serializes both the public fields and properties of a class.


  • The BinaryFormatter and SoapFormatter classes used by .NET remoting require that you serialize all of the fields of a class, including those marked as private, whenever you pass an object by value to a remote method call.


  • The XmlSerializer provides faster serialization of DataSet objects than the BinaryFormatter and SoapFormatter because it does not serialize private data. DataSet objects maintain collections of internal properties to supply functionality, such as DataViews and XML Diffgrams which can be expensive to serialize.


  • Any type can be serialized by the XmlSerializer class, provided that it has a public constructor and at least one public member that can be serialized, and it does not have declarative security. Types that include member variables that cannot be handled by XmlSerializer, such as Hashtable, are not serialized.


  • The BinaryFormatter produces a more compact byte stream than SoapFormatter. SoapFormatter is generally used for cross-platform interoperability.


  • When you use the Serializable attribute, .NET run-time serialization uses reflection to identify the data that should be serialized. All nontransient fields are serialized, including public, private, protected, and internal fields. XML serialization uses reflection to generate special classes to perform the serialization.


  • The ISerializable interface allows you to explicitly control how data is serialized.


  • Binary serialization usually outperforms XML serialization because its output is more compact.


  • XML serialization cannot serialize classes such as HashTable and ListDictionary that implement IDictionary. If you need to serialize objects that implement IDictionary, you must implement your own custom serialization functionality.


  • You should avoid serializing security sensitive data by annotating sensitive fields with the NonSerialized or XmlIgnore attributes as described in "Use the NonSerialized or XmlIgnore Attributes," later in this How To.



Improving Serialization Performance



There are multiple ways that you can improve run-time serialization performance. For example, you can reduce the size of the serialized data stream by instructing the run-time serializers to ignore specific fields within your class. Another way to improve performance is to implement the ISerializable interface to gain explicit control over the serialization (and deserialization) process.



Using the NonSerialized or XmlIgnore Attributes



You can use attributes to prevent specific fields in your class from being serialized. This reduces the size of the output stream and reduces serialization processing overhead. This technique is also useful to prevent security-sensitive data from being serialized.



There are two attributes: NonSerialized and XmlIgnore. The one you should use depends on the serializer that you are using.


  • The SoapFormatter and BinaryFormatter classes used by .NET remoting recognize the NonSerialized attribute.


  • The XmlSerializer class used by Web services recognizes the XmlIgnore attribute.



The following code fragment shows the XmlIgnore attribute.



[Serializable]
public class Employee
{
public string FirstName;
[XmlIgnore]
public string MiddleName;
public string LastName;
}



Using ISerializable for Explicit Control



The ISerializable interface gives you explicit control over how your class is serialized. However, you should only implement this interface as a last resort. New formatters provided by future versions of the .NET Framework and improvements to the framework provided by serialization cannot be used if you take this approach.



Note In general, you should avoid implementing ISerializable for the following reasons:


  • It requires derived classes to implement ISerializable to participate in serialization.


  • It requires that you override the constructor and GetObjectData.


  • It limits the type from taking advantage of future features and performance improvements.



Implementing ISerializable



The ISerializable interface contains a single method, GetObjectData, which you use to specify precisely which data should be serialized.



public interface ISerializable
{
public void GetObjectData(SerializationInfo info, StreamingContext context);
}



The following code shows a simple implementation of the GetObjectData method.

Data is retrieved from the current object instance and stored in the SerializationInfo object.



public void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("id", ID);
info.AddValue("firstName", firstName);
...
info.AddValue("zip", zip);
}



When you implement ISerializable, you must also create a new constructor that

accepts SerializationInfo and StreamingContext parameters. This constructor is called by the .NET runtime to de-serialize your object. In the constructor, you read data out of the supplied SerializationInfo object and store the data in the current object instance, as shown in this example.



[Serializable]
public class CustomerInterface : ISerializable
{
protected CustomerInterface(SerializationInfo info, StreamingContext context)
{
ID = info.GetInt32("id");
firstName = info.GetString("firstName");
...
zip = info.GetString("zip");
}
...
}



Serializing Base Class Members



When you implement ISerializable, be sure to serialize base class members. If the base class also implements ISerializable, you can call the base class's GetObjectData. If the base class does not implement ISerializable, you need to store each required value.



Versioning Considerations



If you add, remove, or rename the member variables of a class that you have previously serialized, existing persisted objects cannot be successfully de-serialized. This is especially true for classes that do not implement ISerializable and just call GetValue. In this case, an exception is generated if the value you request is not present in the serialized stream.



One way to address this issue is to use a SerializationInfoEnumerator to walk through the items in the SerializationInfo object, and then use a switch to set values. With this approach, you only restore those fields that are present in the serialized stream and you can manually initialize any missing fields.



Improving DataSet Serialization



Many applications pass DataSet objects between remote tiers, although doing so incurs a significant serialization overhead and can cause your application to not meet its performance goals.



DataSets are complex objects with a hierarchy of child objects, and as a result, serializing a DataSet is a processor-intensive operation. Also, DataSet objects are serialized as XML even if you use the binary formatter. This means that the output stream is not compact.



There are a number of techniques that you can use to improve DataSet serialization performance.



Using Column Name Aliasing



You can try aliasing long column names with shorter names to reduce the size of the serialized data. The following example shows how you can use aliases for column names by using the as keyword in your SQL.



DataSet objDataset = new DataSet("Customers");
SqlDataAdapter myAdapter = new SqlDataAdapter
("Select CustomerId as C,CompanyName as D,ContactName as E,ContactTitle as F from Customers",myConnection);
myAdapter.Fill(objDataset);
Stream serializationStream = new MemoryStream(byteData,0,byteData.Length,true,true);
serializationStream.Position=0;
iBinForm.Serialize(serializationStream,objDataset);



Avoiding Serializing Multiple Versions of the Same Data



As soon as you make changes to the data in a DataSet you begin to maintain multiple copies of the data. The DataSet maintains the original data along with the changed values. If you do not need to serialize new and old values, call AcceptChanges before you serialize a DataSet to reset the internal buffers. Depending upon the amount of data held in the DataSet and the number of changes you make, this can significantly reduce the amount of data serialized. This approach is shown in the following code example.



// load some data into the dataset
customers.Fill(northwind, "Customers");
orders.Fill(northwind, "Orders");
// ... modify the data
northwind.AcceptChanges();
// accept the changes made and flush the internal buffers
// ... serialize the dataset



Reducing the Number of DataTables Serialized



If you don't need to send all of the DataTables contained in a DataSet, consider copying the DataTables you need to send into a separate DataSet. This will reduce the amount of data serialized by reducing the DataTables processed and by initializing the change buffers that are used by the DataView.



customers.Fill(northwind, "Customers");
orders.Fill(northwind, "Orders");
//… use or modify some data
DataSet subset = new DataSet();
// copy just the customer DataTable
subset.Tables.Add( northwind.Tables["customers"].Copy());
// ... serialize the subset DataSet



Overriding DataSet for Binary Serialization



By default, DataSets are serialized as XML even if you use the BinaryFormatter. This leads to large serialization data streams. To produce a more compact output format, you can consider overriding the DataSet class and implementing your own serialization.



Web Services Serialization Considerations



To reduce the size of serialized data sent to and from Web services you can consider a number of compression techniques to compress the data streams. You can achieve other optimizations by efficiently initializing the XmlSerializer class and by using XmlIgnore. Consider the following approaches:


  • Compressing the serialized data


  • Initializing XmlSerializer by calling FromTypes on startup


  • Using the XmlIgnore attribute



Compressing the Serialized Data



There are a number of ways that you can compress the serialized data passed to and from Web services:


  • Implement SoapExtensions on both server and client side to compress and decompress the data.


  • Implement an HttpModule to compress the response, for example by using gzip compression, and then unzip the data on the client in the proxy. To do so, you need to override the GetWebRequest and the GetWebResponse methods for the Web service client proxy as shown here.
    //overriding the GetWebRequest method in the Web service proxy
    protected override WebRequest GetWebRequest(Uri uri)
    {
    WebRequest request = base.GetWebRequest(uri);
    request.Headers.Add("Accept-Encoding", "gzip, deflate");
    return request;
    }
    //overriding the GetWebResponse method in the Web service proxy
    protected override WebResponse GetWebResponse(WebRequest request)
    {
    //decompress the response from the Web service
    return response;
    }



  • Use the HTTP compression features in Internet Information
  • Services (IIS) 5.0, and then decompress the response within the client-side proxy by using a utility that understands IIS 5.0 compression. Once again, you need to override the GetWebRequest and the GetWebResponse methods for the Web service client proxy.



Initializing XmlSerializer by Calling FromTypes on Startup



The first time XmlSerializer encounters a type, it generates code to perform serialization and then it caches that code for later use. However, if you call the FromTypes static method on the XmlSerializer, it forces XmlSerializer to immediately generate and cache the required code for the types you plan to serialize. This reduces the time taken to serialize a specific type for the first time. The following example shows this approach.



static void OnApplicationStart()
{
Type[] myTypes = new Type[] { Type.GetType("customer"), Type.GetType("order") };
XmlSerializer.FromTypes( myTypes );
}



Using the XmlIgnore Attribute



You can consider using the XmlIgnore attribute, as described earlier to prevent any field you do not need to serialize being included within the output stream.



Remoting Serialization Considerations



The .NET remoting infrastructure uses formatters that implement the IFormatter interface to perform serialization. The two formatters provided by the .NET Framework are SoapFormatter and BinaryFormatter, although you can implement your own. When you use .NET remoting, all nontransient fields are serialized. This includes private, protected, and internal fields.



Using the NonSerialized Attribute



To optimize performance and security, consider using the NonSerialized attribute as described previously to prevent unnecessary or security-sensitive fields from being serialized.



DataSets and Remoting



If your application uses DataSets and you experience serialization performance issues, consider implementing a serialization wrapper class. By implementing a serialization wrapper class, you can reduce the transient memory allocations that remoting typically performs. For an explanation of the issue and a sample, see Microsoft Knowledge Base article 829740, "Improving DataSet Serialization and Remoting Performance," at http://support.microsoft.com/default.aspx?scid=kb;en-us;829740.

Source : http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/scalenethowto01.asp