Friday 15 February 2013

Logging and context in distributed systems

Garry Shutler posted about logging at the end of last year; at the same time I was thinking about some structures that my friend Chris Oldwood had put in place to enrich our logging with contextual information. Gary’s intent was to demonstrate that good logging is a vital part of a distributed system; his example mechanism for relating log messages was necessarily simple, and I’d like to expand it.

“Given sufficient detail in your logs,” Garry says, “you no longer have to theorise or piece together how something happened. Your logs will lay it out for you, plain as day.

Anyone with production experience of a distributed system is likely to agree. If you can identify and tie together related log entries over thread and process boundaries then working out what actually happened becomes possible. An important first step is to have process and thread ids in every log message; the alternative is chaos.

Moreover, it’s not just about stack traces; to replicate a problem you often need to understand the path to failure, and that’s where contextual information helps.

Enriching logs with contextual information

Garry’s simplified example has him manually inserting a username into each log entry, allowing him to tag those entries caused by each user’s actions. This is a not so practical in larger-scale system; in such systems we want contextual information to be tagged automatically onto log messages, and we want to be able to arbitrarily extend that contextual information as we see fit. We can do that with a Diagnostic Context, described in Pattern Languages of Program Design Vol. 3.

Diagnostic contexts are essentially maps associated with a logging subsystem. Developers can place whatever they want into the diagnostic context, and the logging subsystem picks it up and tags log messages with it.

At coarsest grain we want a key piece of information carried from one thread to another – like a session id – that allows you to tie related messages together. At finer grain we might want to signal transition through system layers; or the creation of important contextual information like authorisation for a specific user; or the start and end of significant independent workflows.

As crossing boundaries generally implies marshalling this map in some way, especially in service-oriented architectures, then using a single piece of tie information on either side of the boundary is more efficient than simply reconstituting the entire map.

Log4J has both Nested and MappedDiagnosticContexts. Nested contexts are not map-based but instead allow you to push and pop new context strings onto and from a stack. Mapped contexts do not give you this stack-like behaviour – a push of a new value for an already-defined parameter overrides it. In both cases the context can be inherited from a parent thread but you must clone the entire current context and use the clone to create a new context for the new thread.

Log4Net has a GlobalContext which has stack semantics, and allows the attribution of values to named parameters. It similarly makes use of the Dispose pattern and using { } syntax to make management of contexts simpler.

The team I’m currently with has a home-grown Diagnostic context called Causality which bears a deal of similarity to log4net’s GlobalContext.

A simple C# Diagnostic Context

The team I’m currently with has a diagnostic context called Causality. Causality is plumbed into our logging subsystem. If you’re using an off-the-shelf logger that doesn’t support diagnostic contexts you can just decorate your logger invocations with methods that append information from Causality.

Causality is essentially a thread static map. For convenience it's structured as a set of transactions so that attributes can be updated and reverted rather than just overwritten. A single Transaction represents a change to a single Attribute-Value pair, and the Dispose pattern is hijacked to provide some syntactic sugar:

private sealed class Transaction
{
    public Transaction(Attribute attribute, Value @value)
    {
        // instantiate _attributes map if needed and add key-value pair ...
        // if this is an update of an existing Attribute then store the old
    }

    public Dispose()
    {
        // rollback the attribute if it was modified ...
        // or just remove it if it was new
    }

    [ThreadStatic]
    private readonly static Dictionary<string, string> _attributes;
}

As _attributes is private readonly ThreadStatic it’s safe to use without locking in concurrent operations.

A CompoundTransaction extends this, representing a stack of Transactions that is unwound on Dispose. The CompoundTransaction just provides a simple way of updating the context with a set of Attribute-Value pairs.

Labelling the context -- or appending/updating an Attribute -- is then the creation of a Transaction:

public static IDisposable Label(Attribute attribute, Value @value)
{
    // instantiate a transaction ...

    return transaction;
}

which would be used within a using block:

Log.Info("I'm about to do something ...");
using(Causality.Label(Category.System, "User", "timbarrass")
{
    Log.Info("I'm doing something!"); 
}

Log.Info("I've done something.");

Causality also provides a Format or ToString method that flattens the map to a string representation. Assuming that Log.Info or the logging subsystem uses this method, the result of the code above might look like:

2013-01-22 07:34 INFO 3432 1 I'm about to do something ...
2013-01-22 07:34 INFO 3432 1 I'm doing something! [User=timbarrass]
2013-01-22 07:34 INFO 3432 1 I've done something.

The Transaction holds attributes at thread static scope, which means that the attributes won’t be available when you cross a thread or other boundary. On the other side of the boundary you need to reconstitute Causality with a key piece of information that can tie the two threads together. On the first thread:

Causality.IdentifyPrimaryCausality("Request");

using(Causality.Label("User", "timbarrass"))
using(Causality.Label("Request", "31415))
{
    Attributes causality = Causality.AcquirePrimaryCausality();

    Log.Info("Creating worker thread.");

    var worker = new Worker();
    ThreadPool.Start(worker, causality);

    Log.Info("Worker started.");
}

and on the other side of the thread boundary:

public class Worker
{
    public void Start(object state)
    {
        Causality.InjectPrimaryCausality(state as Attributes);

        using(Causality.Label("Component", "Worker")
        {
            Log.Info("Starting work.");
        }
    }
}

The logs here might look like:

2013-01-22 07:34 INFO 3432 1 Creating worker thread. [User=timbarrass;Request=31415]
2013-01-22 07:34 INFO 3432 2 Starting work. [Request=31415]
2013-01-22 07:34 INFO 3432 1 Worker started, [User=timbarrass;Request=31415]