Pages

Tuesday, May 31, 2005

Nine Solutions For The Problem Of .NET Serialization

Introduction

I recently ran into an article entitled Nine reasons not to use Serialization, and decided to check it out and see what this was all about.

I found that the argument posted by the author were a little inaccurate. Not because there aren't any problems with .NET Serialization, but because the examples given seemed somewhat misguided. I recently received messages admonishing my initial attack on the author as rude and; being the good man that I am, decided to apologize and rewrite my article focusing entirely on the points. Before I begin though I would like to say something regarding serialization in general. I can only hope that the reader realizes that although the concept of serialization is addressed in .NET, it is not a unique facet of the technology. The pattern of persisting state is as ancient as boolean gate feedback, indeed it premised memory. As such, there are certain well known issues associated with serialization which do not refute the well known benefits identified by its pattern. So point for point here we go.


Well, points 1, 2, 3 and 4 are essentially the same problem. Fortunately, it's one that's plaguing the entire software development and information technology community so there have been several steps taken towards addressing issues such as this. The chief solution is design. I'm not going to say that a day when software will provide all the answers for us will never come but I think we all know, author of the aforementioned article included, that that day is not today, nor is it tomorrow, nor ... well you get the idea. One simple solution to the problem is to use structural patterns, more specifically the proxy pattern. The indirection this affords you will more than likely compensate for the alleged lack of adaptability in .NET Serialization.


Point 5 accuses XmlSerialization of not being secure. Well, on its face, I guess that's true. But if we passed user names and passwords un-encoded in GET requests between web pages, it wouldn't make ASP.NET insecure. Moreover, the public visibility problem the author highlighted in XML Serialization is by design. If you think about it, there really would not be much of a point to serializing to XML an entity that is only consumed by the system that creates it and then privatizing it. By default, it should be publicly visible or you have no business putting it in XML. That being said, there are countless imaginative solutions to the security issue addressed by the author. A simple approach would be to do the following:



  1. Don't write to a file, write to a StringWriter
  2. Convert the internal string to a byte array using the Encoding object
  3. New school way) Wrap the bytes array in a MemoryStream and encrypt the stream

    or


    Old school way) Loop through the byte array and XOR the values with something


  4. Send the encrypted stream to the WebService as a byte array. Obviously, if you go the old-school route, you'll just be sending the byte array.


P.S. - Chances are the WebService recipient requires a token to validate the request. Use that to generate your keys.


Point 6 is kinda not fair because it focuses on one form of serialization. I would simply say to this, write a more efficient serializer for your specific needs. SDK development is always geared towards common functionality across broad ranges. This usually hampers performance because of implementation logistics. What you're supposed to do when you run into situations that a framework cannot solve for you is, write an adapter. (See design patterns). You then judge the flexibility of a framework not on its shortcomings, but on the extent of plug-ability into the architecture you as an end-user of the SDK have. So if you're concerned with space so much so that neither XML nor Binary serialization work for you, write a HoffmanSerilzer that utilizes the compression algorithms of its namesake. Seriously, though another quick and easy solution to the problem would be using smart serialization. The concept is basic and has been applied numerous times in all forms of critical decision software (that macro pattern includes but is not limited to application servers, databases, web servers, etc). Here the premise, create a custom serialization standard (yikes! I said that but bear with me). Next create a toolkit that abstracts the native interaction mechanisms away from any application you might produce. As far as any application you might create is concerned, this toolkit is a black box. Next write your applications on top of this toolkit. Finally standardize, standardize, standardize.


Points 7 & 8: If this sounds familiar, it's because it's standard industry practice. I could not have planned it better since this leads me right back to my response to point 6. You don't' really want to know what's under the hood of .NET, and in reality you shouldn't.


As to points 8 & 9; the benchmark used by the esteemed author is really very not very meaningful. It does not in any way present a reasonable premise. We should always regard testing as a function of feasibility in a linear f(x) kind of way. The proposed solution is incorrect. Of course, the run-time overhead of deserializing 100,000 objects will be high, just as the network overhead of creating a 100,000 distinct connections to a database to query a 100,000 rows and return the results as a dataset will be. I doubt that there is any hardware out there that would not register a spike at that algorithm! It's kind of like a Big-0 issue. If you do not design a sort algorithm well, it won't work well, even if you don't notice it. Now I don't want to sound insulting. I am by no means Neo, but this is just the kind of thing that separates the CS majors from the 21 days majors. It's easy to see that both example patterns are absurdly inefficient. Even if you run it on a 64 bit processor with 100 gig of memory, the problem will still exist. I see this all the time in the industry. Computers and software are so powerful these days you can make obvious design and implementation mistakes and not even notice it. In the example for point 8, a flyweight pattern should be used. The inefficiency has little to do with .NET much less Serialization. Take a trip down memory lane with me to CS100 and you'll remember: "When inheritance breaks encapsulation, use delegation or composition". (And indeed, when does inheritance not break encapsulation?) The issues raised in point 9 are consistent with problems 'inherent' with inheritance. At least some part of the sub-class is defined in the parent class. You could re-employ delegation or designing the solution around composition. Restricting your object interactions to well designed interfaces will produce fewer implementation dependencies.