Friday, November 19, 2010

Commutativity and LINQ

Yet another stupid questions. Imagine, you have defined a class:

class Entity{
 public int Id { get; set; }
 public int Priority { get; set; }

And also you have some sequence of Entity objects (i.e. IEnumerable entities). You need to group entities by its Id and select only entities with priority greater than 5.

As far as "and" is commutative and according to the task the following two blocks of code should return the same result

var group1 = entities.Where(e=>e.Priority > 5).GroupBy(e=>e.Id);
var group2 = entities.GroupBy(e=>e.Id).Select(g=>g.Where(e=>e.Priority > 5));

But this is wrong. There are two different sets of input data that will produce to different results (by two different reasons). Could you find them out?

First set and first reason, more dangerous, is that this two operations - grouping and filtering - is actually are not commutative. Consider the following entities definition:
var entities = new[] 
       new Entity{Id = 1, Priority = 1}
Now let's track down both cases:
  1. Firstly, the entity will be filtered out, then grouping will be perfomed on empty enumerable - as a result we'll have an empty set of groups
  2. The entities collection will be grouped by Id - as a result we'll get a one group. Then the group values will be filtered out - and finally we'll get one group with empty set inside the group.
Second set and reason is less dangerous. The cause is that GroupBy preserve the order of grouping field. Consider the following set:
var entities = new[] 
       new Entity{Id = 1, Priority = 6}
       new Entity{Id = 2, Priority = 6}
       new Entity{Id = 2, Priority = 1}
       new Entity{Id = 1, Priority = 1}
  1. Where 
    will filter out first two entities and
    will create two groups in order - 2,1.
  2. GroupBy
    will create two groups in order 1,2, then
    will filter out fist items in both groups.