Friday, 28 January 2011

Mocks and dependencies

On rationalizing class dependencies -- John Sonmez is exploring a number of these ideas at the moment, and is building a useful list of patterns for dealing with common scenarios.

Thursday, 27 January 2011

First, check your dependencies; avoid breaking encapsulation

Recently I was bogged down attempting to mock out half the world when testing a method, only to find that a little thought would have made it irrelevant.

You don’t have to mock to isolate – sometimes you just need to understand your dependencies better.

This example is a case in point. I went down a couple of blind alleys because I didn’t consider the coupling between the method under test and other classes. It’s this one:

public void MethodUnderTest()
{
    UtilityClassProvider.Get<UtilityClass>().SomeUsefulMethod();
}

In this abstracted example the problem’s pretty easy to spot. I started mocking a UtilityClassProvider and a UtilityClass, which was all rather more involved than necessary because of the way UtilityClassProvider instantiates a class, and because UtilityClass was sealed, and couldn’t be unsealed. Rhino Mocks complains if you attempt to mock a sealed class, and I couldn’t extract an interface and use that as UtilityClass needs an instantiatable type to create – and so on.

But then -- UtilityClassProvider looks like it’s meant to be a builder. Surely it could be refactored into one, standardising the code a bit too. Refactoring UtilityClassProvider to a builder pattern would be worthwhile, even if it did mean the refactoring started to affect more than the class under test.

Probably best to limit the refactoring to begin with.

I looked at it and realised that this method didn’t depend on UtilityClassProvider at all. At worst, it relies on a UtilityClass, and at best only on the effects of SomeUsefulMethod. The encapsulation-breaking of UtilityClassProvider.Get<UtilityClass>.SomeUsefulMethod() is simply ugly, and let me lead myself up the garden path.

So, in this case I extracted an IUtilityClass property with public accessors, and extracted an IUtilityClass interface from UtilityClass. This shifts the emphasis for creating (and injecting) the IUtilityClass instance to the calling code.

public sealed class UtilityClass : IUtilityClass
{
    public void SomeUsefulMethod()
    {
    }
}

public interface IUtilityClass
{
    void SomeUsefulMethod();
}

public class ClassUnderTest
{
    public IUtilityClass UtilityClass { get; set; }

    public void MethodUnderTest()
    {
        UtilityClass.SomeUsefulMethod();
    }
}

I could have gone a step further and removed the dependency on UtilityClass, and just extracted a method that proxied SomeUsefulMethod instead – but it would have required a call to an IUtilityClass anyway, and going this far removes the more awkward problems associated with the dependency on UtilityClassProvider.

With either solution I’ve removed direct dependencies from the method under test, meaning that the method is a) simpler and b) easier to isolate. The extracted method(s) represent logic that might be better provided by another object or structure – but that’s another step. I might recover a UtilityClassProvider from a container like Unity, for example.

Finally, now I can test the method either with a mock IUtilityClass or an actual UtilityClass, if the class is amenable to it (in this simple example it is; in real life it wasn’t, so mocking helped). I can then inject the mocked IUtilityClass, call MethodUnderTest and voila.

[TestFixture]
public class Tests
{
    [Test]
    public void Test()
    {
        var mockUtilityClass = MockRepository
            .GenerateMock<IUtilityClass>();
        mockUtilityClass
            .Expect(o => o.SomeUsefulMethod());
        
        var cut = new ClassUnderTest();
        cut.UtilityClass = mockUtilityClass;

        cut.MethodUnderTest();

        mockUtilityClass.VerifyAllExpectations();
    }
}

So, the moral is: if you find yourself refactoring half the world just so you can then mock it – check your dependencies first, as you might not be dependent on half of what you think you are.

Mocking dependencies

Given the following UtilityClassProvider and UtilityClass classes, what are suitable ways of mocking out the call to SomeUsefulMethod() and isolating the MethodUnderTest()? Wrinkle: UtilityClass needs to stay sealed.

public class UtilityClassProvider
{
    public T Get<T>() where T : new()
    {
        return new T();
    }
}

public sealed class UtilityClass
{
    public void SomeUsefulMethod()
    {
    }
}

public class ClassUnderTest
{
    public UtilityClassProvider UtilityClassProvider { get; set; }

    public void MethodUnderTest()
    {
        UtilityClassProvider.Get<UtilityClass>().SomeUsefulMethod();
    }
}

[TestFixture]
public class Tests
{
    // Fails; Rhino can't mock a sealed class
    [Test]
    public void Test()
    {
        var mockUtilityClass = MockRepository
            .GenerateMock<UtilityClass>();
        mockUtilityClass
            .Expect(o => o.SomeUsefulMethod());
        var mockUtilityClassProvider = MockRepository
            .GenerateMock<UtilityClassProvider>();
        mockUtilityClassProvider
            .Expect(o => o.Get<UtilityClass>())
            .Return(mockUtilityClass);

        var classUnderTest = new ClassUnderTest() { 
            UtilityClassProvider = mockUtilityClassProvider };
        classUnderTest.MethodUnderTest();

        mockUtilityClassProvider.VerifyAllExpectations();
        mockUtilityClass.VerifyAllExpectations();
    }
}

Unsealing UtilityClass would make mocking easier – but what are the alternatives?

Friday, 21 January 2011

A simple paged IList, arrays of doubles and the LOH

Turns out large (1000+ elements) arrays of doubles get put on the Large Object Heap, rather than living out their days on the normal heap. Apparently, it’s an optimization by Microsoft to keep the arrays aligned to 8-byte boundaries (as the LOH is collected but not compacted).

This leads to some exciting behaviour if you’re using a List<double> that reaches 1000 doubles. Especially if you’re not thinking about it, and just allowing it to expand, as you’ll keep recreating new arrays whenever you reach the array size limit. Did MS choose 1000 or 1024? Not sure.

A colleague hit this recently in our production code. In our specific case the right thing to do was stop using so many large arrays of doubles, as they weren’t needed.

Alternatively, one way to avoid fragmentation and LOH usage would be to preallocate large arrays or Lists of doubles – if you know how large an array you’re going to need.

Another way would be to create a paged IList implementation, one keeps its contents in a set of arrays that are sized just under the maximum to avoid LOH use.

Yes, it subverts Microsoft’s performance enhancement, and yes, I’m sure they know what they’re doing. I’m not sure what they’re doing though, so if nothing else it means I can run some comparisons, iterating through a list of doubles, for example.

Conveniently it also makes another nice workout – some code below, using a List<T[]> as the underlying page collection. I've not implemented Remove, RemoveAt and Insert methods as I didn't immediately need them for the tests I had in mind.

public class PagedList<T> : IList<T>
{
 private int _pageSize = 100;

 private int _currentPage = 0;

 private int _currentPageHighwatermark = 0;

 private List<T[]> _underlyingList;

 public PagedList(int pageSize)
 {
  _pageSize = pageSize;
  _underlyingList = new List();
  _underlyingList.Add(new T[_pageSize]);      
                    // if it has pages already, we'll be tagging an 
                    // extra empty page on the end ...
  _currentPage = _underlyingList.Count - 1;                   
                    // also, if it has pages already then the 
                    // highwatermark will be wrong            
 }

 /// <returns>Index of element, -1 if not found</returns>
 public int IndexOf(T item) // consider boxing, unboxing
 {
  int index = 0;
  var iter = _underlyingList.GetEnumerator();
  while(iter.MoveNext())
  {
   foreach(var t in iter.Current)
   {
    if(t.Equals(item))
    {
     return index;
    }
    index++;
   }
  }
  return -1;
 }

 public void Insert(int index, T item)
 {
  throw new NotImplementedException();
 }

 public void RemoveAt(int index)
 {
  throw new NotImplementedException();
 }

 public T this[int index]
 {
  get
  {
   int desiredPage = Math.Abs(index/_pageSize);
   int desiredPageIndex = index%_pageSize;
   if (desiredPage * _pageSize + desiredPageIndex > 
                                _currentPage * _pageSize + _currentPageHighwatermark 
                                || index < 0)
   {
    throw new IndexOutOfRangeException("Index was out of bounds");
   }
   return _underlyingList[desiredPage][desiredPageIndex]; // Consider what to do about nulls here
  }
  set
  {
   // Check whether index is out of range
   int desiredPage = Math.Abs(index / _pageSize);
   int desiredPageIndex = index % _pageSize;
   if (desiredPage * _pageSize + desiredPageIndex > 
                                _currentPage * _pageSize + _currentPageHighwatermark 
                                || index < 0)   {
    throw new IndexOutOfRangeException("Index was out of bounds");
   }
   _underlyingList[desiredPage][desiredPageIndex] = value;
  }
 }

 public void Add(T item)
 {
  var page = _underlyingList.Last();
  if (_currentPageHighwatermark == _pageSize)
  {
   _underlyingList.Add(new T[_pageSize]);
   _currentPage++;
   _currentPageHighwatermark = 0;
  }
  _underlyingList[_currentPage][_currentPageHighwatermark] = item;
  _currentPageHighwatermark++;
 }

 public void Clear()
 {
  _underlyingList.Clear();
  _underlyingList.Add(new T[_pageSize]);
  _currentPage = 0;
  _currentPageHighwatermark = 0;
 }

 public bool Contains(T item)
 {
  var iter = _underlyingList.GetEnumerator();
  while (iter.MoveNext())
  {
   foreach (var t in iter.Current)
   {
    if (t.Equals(item))
    {
     return true;
    }
   }
  }
  return false;
 }

        public void CopyTo(T[] array, int arrayIndex)
        {
            // Do out-of-bounds checking on arrayIndex, sizeof array
            for(int i = arrayIndex; i < Count; i++)
            {
                array[i - arrayIndex] = this[i];
            }
        }

 public int Count
 {
  get { return (_currentPage*_pageSize) + _currentPageHighwatermark; }
 }

 public bool IsReadOnly
 {
  get { throw new NotImplementedException(); }
 }

 public bool Remove(T item)
 {
  throw new NotImplementedException();
 }

 public IEnumerator<T> GetEnumerator()
 {
  return new PagedListEnumerator<T>(this);
 }

 private class PagedListEnumerator<T> : IEnumerator<T>
 {
  private PagedList<T> _underlyingPagedList;
  private int _cursor;

  public PagedListEnumerator(PagedList<T> list)
  {
   _underlyingPagedList = list;
   _cursor = -1;
  }

  public void Dispose()
  {
  }

  public bool MoveNext()
  {
   if (_cursor < _underlyingPagedList.Count)
    _cursor++;

   return (!(_cursor == _underlyingPagedList.Count));
  }

  public void Reset()
  {
   _cursor = -1;
  }

  public T Current
  {
   get { return _underlyingPagedList[_cursor]; }   // Handle fail when out of bounds
  }

  object IEnumerator.Current
  {
   get { return Current; }
  }
 }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
}

Thursday, 20 January 2011

C# BiDirectional Lookup

This seems to happen regularly: I start using a Dictionary to group data, only to find that at some stage I need to lookup from value to key again. As .NET has no birectional map, dictionary or lookup, handling this is always cumbersome, like trying to find out which kitchen junk drawer contains the screwdriver you need.

In my case I was building groups of financial transactions by creating a map keyed on group name. I matched each new transaction to each existing transaction to work out which was the closest match, based on some criteria, then added the new transaction to the same group as the closest match.

The problem here is that I needed to do a reverse lookup – also traversing each element of the group map to work out which group the closest match was in. It works, but it feels wrong.

One solution is to maintain both forward and lookup structures – which informed the bidirectional lookup workout presented below. That felt a little better – at least I was no longer completely abusing the dictionary every time I wanted to add something.

A little thought, though, showed that if I want to work out which group a new transaction should be in, then I should compare each transaction to some group criteria (not trade criteria) and add it to the closest matching group. As there’ll be fewer groups than trades, this should be quicker – and no reverse lookup needed.

In any event: just because a bidirectional lookup might be wrong in this case, doesn’t mean it’s always so, and working a simple one up isn’t an enormous task. And it makes a nice workout for the list.

Also: this is certainly not the only way of doing a bidirectional lookup; it really does depend on what you need. For a set of {first, second} pairs, I wanted to allow a second to map to multiple firsts, and vice versa.

Bit of a mix of styles as it's the result of a workout.

public class BiDiLookup<TFirst, TSecond>
{
    private IDictionary<TFirst, List<TSecond>> Forward
    {
        get;
        set;
    }

    private IDictionary<TSecond, List<TFirst>> Backward
    {
        get;
        set;
    }

    private BiDiLookup()
    {
    }

    public BiDiLookup(
                IDictionary<TFirst, List<TSecond>> forward, 
                IDictionary<TSecond, List<TFirst>> backward)
    {
        Forward = forward;
        Backward = backward;
    }

    public void Add(TFirst first, TSecond second)
    {
        // Consider what to do when adding an already-existing pair ..
        // Nice trick here courtesy Jon Skeet to reduce the number 
        // of lookups -- see also GetByFirst etc for comparison
        List<TSecond> secondList;
	if (!Forward.TryGetValue(first, out secondList))
        {
            secondList = new List<TSecond>();
            Forward[first] = secondList;
        }
        List<TFirst> firstList;
        if (!Backward.TryGetValue(second, out firstList))
        {
            firstList = new List<TFirst>();
            Backward[second] = firstList;
        }
        secondList.Add(second);
        firstList.Add(first);
    }

    private static List<TFirst> EmptyFirstList = new List<TFirst>();
    private static List<TSecond> EmptySecondList = new List<TSecond>();

    public List<TSecond> GetByFirst(TFirst first)
    {
        // Consider what to do if first doesn't exist yet -- return an empty list? return null?
        if (Forward.ContainsKey(first))
        {
            return Forward[first];
        }
        return EmptySecondList;
    }

    public List<TFirst> GetBySecond(TSecond second)
    {
        if (Backward.ContainsKey(second))
        {
            return Backward[second];
        }
        return EmptyFirstList;
   }

   public void Remove(TFirst first, TSecond second)
    {
        // Consider -- what if we want to just remove all instances of first, or second?
        List<TSecond> seconds = Forward[first];
        List<TFirst> firsts = Backward[second];
        Forward.Remove(first);
        foreach (var s in seconds)
        {
            Backward[s].Remove(first);
            if (Backward[s].Count == 0)
            {
                Backward.Remove(s);
            }
        }
        Backward.Remove(second);
        foreach (var f in firsts)
        {
            Forward[f].Remove(second);
            if (Forward[f].Count == 0)
            {
                Forward.Remove(f);
            }
        }
    }
}

IDictionary, injection and covariance (followup)

Jon Skeet suggests an alternative which I like, as it avoids polluting the generics with information about the underlying/internal data structure: inject a delegate to handle list instantiation when it’s needed:

public class Lookup<A, B>
{
    public Lookup(
            IDictionary<A, IList<B>> underlyingDict,
            Func<IList<B>> listCreator)
    {
        // client passes in a delegate that Lookup can use
        // to instantiate the desired list type ...
    }

    // ... remaining code
}

Monday, 17 January 2011

IDictionary, injection and covariance

I have a class named Lookup, and want to inject a Dictionary that maps a key to a list of values. Easy enough – just:

public class Lookup<T1, T2>
{
    public Lookup(IDictionary<T1, IList<T2>> underlyingDict)
    {
        // use ctor to inject a dictionary
    }

    public void Add(T1 key, T2 value)
    {
        // add pair to an underlying collection
    }
}

However, this means that I have to give Lookup control over the type of list used. I can't create the mapping I want and then inject it for two reasons:

First, because the IDictionary interface isn't covariant I can’t just create an instance of the closed type that I want – this doesn’t compile:

public class Program()
{
    public static void Main(string[] args)
    {
        IDictionary<string, IList<string>> dict =
            new Dictionary<string, List<string>>();

    	Lookup<string, string> c = new Lookup<string, string>(dict);
    }
}
Note that I can pass in an instance of IDictionary<string, IList<string>>, but then I give Lookup control over the implementation of IList used. When a new key is added to the _underlyingDict a new instance of a class would need to be created – by Lookup.

Second -- even if I could inject as I wanted, Lookup<T1, T2> would need to instantiate a new list for each new key added -- and it doesn't have any information about implementation of IList<T2> I used, so wouldn't know how to create it.

So; to keep control over the implementation used I needed to provide Lookup with some information about the IList implementation I’d chosen. I did it with constrained generics:

public class Lookup<T1, T2, T3> where T2 : IList<T3>, new()
{
    private IDictionary<T1, T2> _underlyingDict;

    public MyLookup(IDictionary<T1, T2> dict)
    {
        _underlyingDict = dict;
    }

    public T2 Get(T1 key)
    {
        return _underlyingDict[key];
    }

    public void Add(T1 key, T3 value)
    {
        if (!_underlyingDict.ContainsKey(key))
        {
            _underlyingDict[key] = new T2();
        }
        _underlyingDict[key].Add(value);
    }
}

Here I've individually specified the key type to use (T1), the List type (T2), and the List item or value type (T3). The constraint T2 : IList<T3> ensures that I’m getting something with the right interface, and the constraint new() is needed as Lookup is going to have to instantiate it. (Without it, the compiler helpfully coughs up: "Cannot create an instance of the variable type 'T2' because it does not have the new() constraint"). To show it in action we can add and retrieve a key-value pair:

class Program
{
    static void Main(string[] args)
    {
        Dictionary<string, List<string>> map = 
            new Dictionary<string, List<string>>();

        Lookup<string, List<string>, string> lookup = 
            new Lookup<string, List<string>, string>(map);

        lookup.Add("foo", "bar");
        Console.WriteLine(lookup.Get("foo")[0]);

        Console.ReadKey();
    }
}

It appears to work, but it makes the definition of Lookup rather ugly.