Tuesday 20 September 2011

Hitting the filesystem; mock it out, or abstract as a Store?

In legacy code it's entirely possible that code directly touches the filesystem, and there's no seam at which to modify behaviour. Assuming you want to get the code under test then you need to decide whether you’re going to abstract out the filesystem, or alternatively introduce an abstraction to isolate filesystem operations. Typically I'd look to isolate the filesystem at high level with some sort of Store that abstracts access to underlying persistence tech.

Over a long period either approach might lead to code with both solutions present.

Chris Oldwood and I have been discussing whether to start by introducing a Store, or with abstracting out the filesystem. To me the decision should be based on limiting the number of changes required to meet your goal; and limiting the number of goals.

Introducing a Store meets three goals: one, to abstract access to the technology in which you persist entities; two, to simplify the existing code; and three, to give you a seam along which to inject new behaviour.

On the other hand abstracting out the filesystem really only provides a seam.

It’s a fine point of difference. In principle both introduce a layer of abstraction under which you could swap new technologies. However, to use a Store you change the semantics of the interface, which involves changes in the existing code. If you abstract access to the filesystem then the existing code can remain unchanged, and filesystem semantics don’t map well to other technologies.

So using a Store involves more changes to existing code, and is also meeting more goals than you might immediately need – do you need to abstract the technology used at this point in time?

Take this code for example:

using System.IO;

...

public ITrade DecorateTrade(string rootfolder, string decorationType, ITrade trade)
{
    ITrade decoratedTrade;
    var file = String.Format("{0}.Decoration.dat", decorationType);
    file = Path.Combine(rootFolder, file);

    if(File.Exists(file))
    {
        decoratedTrade = Trade.LoadFromFile(file, trade);
    }
    
    return decoratedTrade
}

Here DecorateTrade knows a lot about the filesystem – how to create a Path, what a file is and how to check that a file exists, and that it needs a filename to load from. We want to test this code but are wary because it touches the filesystem.

We can refactor by introducing a Store that abstracts access to the underlying entities. An example would be:

public class DecorationFileStore : IDecorationStore
{
    public DecorationStore(string rootFolder)
    {
        _rootFolder = rootFolder;
    }

    // Interface method
    public ITrade Decorate(string decoration, ITrade trade) { ... }

    private bool DecorationExists(string decorationType) { ... }

    private string _rootFolder;
}

This tends to force some changes on the existing code, however, which becomes:

//using System.IO; // no longer needed

...

public ITrade DecorateTrade(string rootfolder, string decorationType, ITrade trade)
{
    var store = new DecorationFileStore(rootFolder);

    return store.Decorate(decorationType, trade);
}

As you can see the original code must change to meet the new semantics of the Store interface. It's not bad -- it's just that as well as creating a new class, you've changed the existing code. The new code is actually much simpler; in some sense we’ve swapped local complexity for coarser type complexity. It could be simpler still, as the Store would become a member of the class, and wouldn't need to be instantiated as a local instance. Still, if your intent is to characterize the behaviour of the class you’ll want to make as few changes as possible.

Swapping local complexity for type system complexity like this might also lead to a proliferation of Store types, each operating as an abstraction of the filesystem, which may be unnecessarily complicated.

Alternatively then we could abstract the filesystem operations. Take these two new classes:

namespace IO.Abstractions
{
    public class File : IFile
    {
        public bool Exists(string path)
        {
            return System.IO.File.Exists(path);
        }    
    }

    public class Path
    {
        public string Combine(string first, string second)
        {
            return System.IO.Path.Combine(first, second);
        }
    }
}

These are intended to be direct passthroughs to the filesystem. By decorating the original class with new properties,

public IFile File { get; set; }

public Path Path { get; set; }

and by replacing references to System.IO with IO.Abstractions, we introduce a seam in the existing code without having to make any changes in that code. Instead of calling the static methods on System.IO classes, we call instance methods on IO.Abstractions classes through local properties. This means that we can now write tests for the system without changing the existing code.

// using System.IO; // no longer needed
using IO.Abstractions;

...

public IFile File { get; set; }

public Path Path { get; set; }

public ITrade DecorateTrade(string rootfolder, string decorationType, ITrade trade)
{
    ITrade decoratedTrade;
    var file = String.Format("{0}.Decoration.dat", decorationType);
    file = Path.Combine(rootFolder, file);

    if(File.Exists(file))
    {
        decoratedTrade = Trade.LoadFromFile(file, trade);
    }
    
    return decoratedTrade
}

That is, almost -- the story's not quite finished. We still have a call on Trade.LoadFromFile(...) which hits the filesystem -- and we don't have any seam within Trade to use to isolate the filesystem ops. In theory Trade probably shouldn't know about the filesystem at all, but typically we want to minimize the number of changes needed to get under test.

In this case I tend to heal around this dependency. There's a number of ways to do it -- extracting the access as a public method and inheriting to create a test proxy class works, if your class isn't sealed or static. If it is sealed then you can introduce a seam by wrapping the access in another type, or by publishing a delegate that the existing code will call; either gives you a seam at which point you can override behaviour.

No comments: