Tuesday, October 12, 2010

Order zen

Pretty interesting question (and a good one for interview as well).

An Enumerable class contains two sets of methods: OrderBy() and ThenBy(). The first set sorts the elements of a sequence, second one performs a subsequent ordering of the elements in a sequence. If you have a sequence of names as {FirstName, LastName} and what to sort it by first name and then by last name you probably will not write as following:
var sorted = names.OrderBy(x=>x.FirstName).OrderBy(x=>x.LastName)

This is obviously wrong because you will firstly sort the sequence by FirstName and then resort it by LastName. To do the requested thing, you'll write:
var sorted = names.OrderBy(x=>x.FirstName).ThenBy(x=>x.LastName)

And here is the question - how ThenBy() is implemented (or how would you implement ThenBy() method)?

The task is not so simple as it looks like on a first view. To complete it, next statements should be taken into account:
  1. You can not call ThenBy() on IEnumerable<T>, but only after OrderBy().
  2. ThenBy() could be chained.
  3. OrderBy() and ThenBy() do not sort a sequence immediately. Instead, according to LINQ paradigm, both methods returns a sequence ready to be sorted after starting enumeration (i.e, after first MoveNext() on a enumerator of the prepared sequence).
  4. Obviously, sorting routine should be performed only once.
These observations are rather simple, but without them solution could be ugly enough.
Now, let's take these observation into action.
  1. To achieve this, OrderBy() should return not IEnumerable<T>, but a derived interface. Let's name it IOrderedEnumerable<T>.
  2. ThenBy() should also return IOrderedEnumerable<T>
  3. We will sort the sequence only when GetEnumerator() is requested.
  4. The major tip. ThenBy() actually changes the comparison used in sorting routine. So we just need to provide a mechanism that allows to change the comparison in IOrderedEnumerable<T>
Ok, let's start with defining IOrderedEnumerable<T> and OrderBy() and ThenBy() signatures:
public interface IOrderedEnumerable<T> : IEnumerable<T>
{
}

static class Enumerable
{
 public static IOrderedEnumerable<T> OrderBy<T, TKey>(this IEnumerable<T> sequence, Func<T, TKey> selector)
 {
  return sequence.OrderBy(selector, (x, y) => Comparer<TKey>.Default.Compare(x, y));
 }

 public static IOrderedEnumerable<T> OrderBy<T, TKey>(this IEnumerable<T> sequence, Func<T, TKey> selector, Comparison<TKey> comparison)
 {
  // To be defined 
 }

 public static IOrderedEnumerable<T> ThenBy<T, TKey>(this IOrderedEnumerable<T> sequence, Func<T, TKey> selector)
 {
  return sequence.ThenBy(selector, (x, y) => Comparer<TKey>.Default.Compare(x, y));
 } 

public static IOrderedEnumerable<T> ThenBy<T, TKey>(this IOrderedEnumerable<T> sequence, Func<T, TKey> selector, Comparison<TKey> comparison) 
 {
  // To be defined 
 }

 public static Comparison<T1> Wrap<T1, T2>(this Comparison<T2> comparison, Func<T1, T2> wrapper)
 {
  return (x, y) => (comparison(wrapper(x), wrapper(y)));
 }
}

Also I've defined a helper method Wrap() that simply turn one comparison to another using specified wrapper.

Let's go deeper and define line 14 by creating a new class Ordered<T>:
internal class Ordered<T> : IOrderedEnumerable<T>
{
    private readonly IEnumerable<T> _sequence;
    private readonly Comparison<T> _comparison;

    public Ordered(IEnumerable<T> sequence, Comparison<T> comparison)
    {
        _sequence = sequence;
        _comparison = comparison;
    }

    public IEnumerator<T> GetEnumerator()
    {
        List<T> sorted = new List<T>(_sequence);
        sorted.Sort(_comparison);
        return sorted.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Pretty straightforward.

Now we need to accomplish 4th action. Now you can see that we have a good gap to do this - all we need is to substitute a comparison with slightly different one, which will compare items using nested comparison, if an outer comparison returns zero. But to do this we need inner comparison of a Ordered class. So the natural way is to delegate this code to the Ordered class itself and call it with interface method (as well we have interface in ThenBy() call). So, let's add a method to IOrderedEnumerable<T>:

public interface IOrderedEnumerable<T> : IEnumerable<T>
{
    IOrderedEnumerable<T> OrderThenBy<TKey>(Func<T, TKey> selector, Comparison<TKey> comparison);
}
 
And finally let's implement it in Ordered<T> as defined above.

public IOrderedEnumerable<T> OrderThenBy<TKey>(Func<T, TKey> selector, Comparison<TKey> comparison)
{
    Comparison<T> innerComprison = comparison.Wrap(selector);

    Comparison<T> nestedComparison = (x, y) =>
                                         {
                                             int compare = _comparison(x, y);
                                             return compare == 0 ? innerComprison(x, y) : compare;
                                         };
    return new Ordered<T>(_sequence, nestedComparison);
}
 
The last line code is inside ThenBy() method:

public static IOrderedEnumerable<T> ThenBy<T, TKey>(this IOrderedEnumerable<T> sequence, Func<T, TKey> selector, Comparison<TKey> comparison)
{
    return sequence.OrderThenBy(selector, comparison);
} 

Our solution is close enough for the one from .NET source and satisfies all our observations.

2 comments:

  1. If you read the documentation of the GroupBy carefully, then you realize you can trick the OrderBy. The implementation of OrderBy is by merge sort, which is a stable sorting algorithm.
    Hence you can write:
    var sorted = names.OrderBy(x=>x.LastName).OrderBy(x=>x.FirstName)
    To achieve:
    var sorted = names.OrderBy(x=>x.FirstName).ThenBy(x=>x.LastName)

    Nice posts btw :-)

    ReplyDelete