Pages

Wednesday, April 20, 2005

Sorting in the .Custom Collection in .NET Framework

In .NET, sorting arrays or collections of primitive types is a relatively simple task. Most developers understand that for a collection of numbers or strings, the runtime will know how to compare these values. However, some developers are left wondering how to sort complex objects. For example, what if you wanted to sort an array of FileInfo objects by size rather than name?
After witnessing one fellow developer attempt to code his own BubbleSort algorithm because he could not find another way to sort an array of his custom objects, I felt this article would help spread the word about the rich sorting capabilities supported by .NET.
Sorting Arrays and ArrayLists
Sorting an array composed of primitive data types such as strings and integers is straightforward. The Array class contains a static method called Sort. This method takes in an existing array object and sorts it. The following code demonstrates how simple it can be:
Dim aryBeatles() As String = {"John", "Paul", "George", "Ringo"}
Array.Sort(aryBeatles)
' New order of the aryBeatles array:
'
' aryBeatles
' ----------
' George
' John
' Paul
' Ringo
Sorting an ArrayList is equally easy but somewhat different. The ArrayList is based on an Array. It internally manages an array while exposing properties and methods to make working with them easier. Taking from the example above, here is a demonstration of sorting an ArrayList:
'Base the arraylist on the aryBeatles array.
Dim lstBeatles As New ArrayList(aryBeatles)
lstBeatles.Sort()
Aside from the syntactical differences, the ArrayList still internally relies on the Array.Sort method to sort its contents. The Array.Sort method is the core of all sorting in. NET. Because of this, for the rest of the article I will use arrays to demonstrate the sorting capabilities of the .NET Framework.
Key-value Pair Sorting
One of the features of sorting in the .NET framework is the ability to sort one array based on the values of another array of equal length. This is called key-value sorting. Basically, you have an array of keys and an array of values. The key value in the key array at a given index is used to look up the value in the value array at the same index.
As you can see, it sorts the key array and changes the index in the value array so that their mapping stays consistent. This type of sorting is useful if you implement your own hashtable or dictionary collection, and the underlying collections are a pair of arrays corresponding to the keys and values.
Sorting a Portion of a Collection
Sometimes you may want to sort only a portion of a collection. Figure 1 illustrates partial sorting. Typically, you could accomplish this by copying the range of values you want sorted into a temporary array, sort it, and copy them back. However, in .NET you can accomplish this in one line of code:
Dim aryNumbers() As Integer = {13, 4, 7, 10, 2, 8, 17, 5, 1, 11}
' Using the Array.Sort(array, index, length) method call to sort
' only elements 2-6.
Array.Sort(aryStateKeys, 2, 5)
' New order of the aryNumbers after partial sorting:
' aryNumbers' ----------
' 0: 13
' 1: 4
' 2: 2
' 3: 7
' 4: 8
' 5: 10
' 6: 17
' 7: 5
' 8: 1
' 9: 11
Automatic Sorting Using a SortedList
The .NET Framework contains a collection called a SortedList. Instead of calling sort when needed to sort the collection, the collection stores all its values already presorted. As you add new values to the collection, it internally finds the appropriate spot to insert the object. This class is useful when you need the objects contained within the collection to be pre-sorted at all times.
The SortedList acts more like a Hashtable rather than an ArrayList. The reason for this is that the collection stores key-value pairs rather than the values themselves. Because the Hashtable and DictionaryBase classes do not have sorting built-in, it may be more suitable to use SortedList when you require the keys to be sorted.
This code shows how to use a SortedList
Dim slsStates As New SortedList
slsStates.Add("CA", "California")
slsStates.Add("AZ", "Arizona")
slsStates.Add("NV", "Nevada")
' slsStates order:
' Keys Values
' ---------------
' AZ Arizona
' CA California
' NV Nevada
Curiously, Microsoft left out sorting in Hashtables and other dictionary objects. At the time of writing, I was unable to confirm whether or not that sorting capability will be added in .NET 2.0. Neglecting to add this capability prolongs developer inconvenience and will force us to create an ad hoc workaround.
Customized Sorting with IComparer
Another way to sort objects is to use a comparer class. A comparer class is one that implements the IComparer interface. This is not to be confused with IComparable, which has a different use.
When you use a comparer class to sort objects, you are overriding the default comparing method defined by the object's CompareTo() method. As a matter of fact, if you use a comparer class to sort a list of objects, the class definition for the object doesn't even need to implement IComparable. Thus, the Sort() methods defined in Array and ArrayList requires that all objects being sorted either implement the IComparable interface, or a class implementing IComparer be passed. The .NET Framework defines several comparer classes for you to use. One of them, the CaseInsensitiveComparer class, allows you to sort strings by ignoring their casing. So in this case, the .NET framework ignores the String class' CompareTo method, and instead uses the rules defined in the CaseInsensitiveComparer class. The following code illustrates this feature:
Dim aryLastNames() As String = {"sMiTh, ZULU", "smith, john&", "SMITH, TerrY"}
Array.Sort(aryLastNames, New CaseInsensitiveComparer)
' aryLastNames order:
' smith, john
' SMITH, TerrY
' sMiTh, ZULU
In the opening paragraphs I talked about sorting objects like System.IO.FileInfo. FileInfo does not implement IComparable, so using a comparer class is absolutely necessary. In any case, defining your own comparer class for FileInfo objects allows for maxmium flexibility when determining how they should be sorted. For example, a program that displays files on disk to the user and allows the user to sort these files by name, extension, length, and whatever else (just like Windows Explorer) would take advantage of a comparer class to implement this functionality. The download for this article provides a demonstration of using IComparer sort an array of FileInfo objects.
IComparable vs. IComparer
Because of the way they're named, these two interfaces are oftentimes confused. But after exploring their uses, the differences should be clear. The following table shows a side-by-side comparison between the two.
IComparable
  • Comparing method must be defined within the class for which it is used.
  • Can provide only one method for which to compare objects of the class.
  • Because it is defined within the class, it can access private variables on which to sort.
  • If both interfaces are defined for sorting, this one is not used.
  • Provides a default comparing method for the class in which it is defined.

IComparer

  • Comparing method is defined in a separate class.
  • Many classes can be defined, each classthat can contain a distinct comparing method.
  • Can only sort on public properties and fields.
  • If both interfaces are defined for sorting, this one is used.
  • Provides an alternative sorting method for the class for which it is used.

Table 1. IComparable vs. IComparer: This table breaks down the differences between these two interfaces.

Basically, IComparable is used to define a default comparing method, and IComparer is used to override or use in the absence of the default comparing method.

Comparing Objects with Operator Overloading

Operator overloading is the practice of defining a use for an operator, such as the plus sign + or equality sign =, for the purpose of comparing objects. Although only loosely related to sorting, operator overloading is a nice, powerful feature traditionally available in languages such as C++. Currently in .NET, operator overloading is a feature only supported in C#. However, in .NET 2.0 (code-named Whidbey), operator overloading will be introduced in VB.NET.

A good example of an object that uses overloaded operators is the String class. When you compare two strings using the equal sign = (== in C#), you are implicitly using an overloaded operator. Considering the Recipe class explained before, you can easily extend what has already been defined to includes operators for equality, less than, and greater than operations. Listing 2 shows the Recipe class modified from above to include these operator definitions.

Making Objects Comparable Using IComparable

Implicitly, all primitive types implement an interface called IComparable. This interface defines one method, CompareTo(), that allows a comparison between itself and another instance of the same type. This means that all basic data types, such as integers, strings, and even Booleans have a CompareTo method on it.

In most cases, your class will be sorted based upon one of its properties, which will most likely be of a primitive type. This makes implementing the IComparable interface even easier.

Sorting Multi-dimensional Arrays

The developer community is abuzz with discussion of sorting multidimensional arrays. Strangely enough, the .NET Framework doesn't support the sorting of multidimensional arrays. As a matter of fact, if you try to pass a multidimensional array into Array.Sort method you will get an exception. This leaves many developers confused on how to deal with sorting arrays of more than one dimension. Fortunately, there is a workaround.

The way to approach this problem is to turn a multidimensional array into a jagged array. A jagged array is basically an array of arrays. In other words, the array's element type is also of type array. The key advantage of jagged arrays over multidimensional arrays is that jagged arrays can dynamically scale to infinite dimensions, whereas multidimensional arrays are fixed in their rank, or number of dimensions. And as previously stated, jagged arrays can be sorted.

Different methods of sorting a multidimensional array produce different resuts. The first method is called full sort. This method sorts all of the dimensions. The second method performs a surface sort, or sort on only the first dimension, using the second and subsequent dimensions as tie breakers. The third method sorts the entire array as if it were a one-dimensional array. This method is called flat sorting.

Flat Sorting

When sorting the array in either a structured or flattened manner, the final thing to consider is its dimensional direction. The previous discussion assumed top-down sorting—as if you were starting from the first dimension and working your way down to the last one. However, it's feasible to sort from the bottom-up, or working from the last dimension to the first. Figure 2 illustrates the six possible ways that a multidimensional array can be sorted. These ways are described in Table 2.

Full Sort : From the starting dimension to the end, each dimension is sorted and the restructuring of the elements of that dimension affects the sorting of the subsequent dimensions.

Surface Sort : From the starting dimension to the end, each element is compares, and in the event of a tie, the next element in the dimension serves as the tie-breaker.

Flat Sort : The array is treated as a one-dimensional array for sorting.