Thursday, 24 February 2011

A simple paged IList, arrays of doubles and the LOH, part 3

In these posts I looked at creating a paged IList after finding that large arrays of doubles are stored on the LOH rather than the normal heap, and gave a simple first-cut implementation. Here I thought I’d compare the add-item behaviour with a comparable List<double>.

just-adding-amortized-perf-results-1

There’s a lot of data here but there’s a couple of things I want to draw out.

Where the page size is large compared to the array size, performance is comparable to List<double>. Conversely, small page sizes are really not suited to large arrays, even though the PagedList has an advantage in that it doesn’t need to recopy all the data, as a List would.

I want is for this chart to be flat, with similar performance for all PageList total sizes. I want to knock out that big peak to the left hand side and make it perform more consistently.

One way to handle this is to follow List’s lead an allow the amount of capacity added to vary, doubling in size every time it fills, achieving additions in constant amortized time. PagedList can do the same by doubling the pageSize every time the last page is filled.

The fundamental change comes in the AddPage method, where I double the size of _pageSize after adding each new page.

private void AddPage()
{
    var t = new T[_pageSize];
    _underlyingList.Add(t);
    _currentPage++;
    _currentPageHighwatermark = 0;
    CurrentPage = t;
    _pageSize *= 2;
}

A new constructor lets the PagedList be instantiated with a default page size of 4.

private const int DefaultPageSize = 4;
private int _pageSize = DefaultPageSize;

public PagedList(int initialPageSize)
{
    _pageSize = initialPageSize;
    _underlyingList = list;
    _underlyingList.Add(new T[_pageSize]);     
    _currentPage = _underlyingList.Count - 1;  
    CurrentPage = _underlyingList[_currentPage];
}

Various consequences then ripple through the code, leading to changes in Count() and the indexer this[], which needs to map a global index to a page location and page index. I’ve left this as a pretty simple implementation for now:

private PageLocation PageLocationForIndex(int index)
{
    var l = new PageLocation();
    l.desiredPage = 0;

    int lower = 0;
    int upper = 0;
    foreach(var page in _underlyingList)
    {
        upper = lower + page.Length;
        if (index >= lower && index < upper)
        {
            break;
        }
        l.desiredPage++;
        lower = upper;
    }

    l.desiredPageIndex = lower == 0 ? index : index % lower;

    return l;
}

where PageLocation is simple struct holding ints desiredPage and desiredPageIndex.

Looking at the performance then:

just-adding-amortized-perf-results-final-debug-release

That’s significantly better. Even in debug the peak is knocked down from ~10 to ~2 times, and in a release build it’s more comparable to List<double>.

However, I’m a little concerned that the relative times are increasing with array size for the PagedList<double>, which might indicate some O(n) or worse behaviour that needs looking at.

Also, I’m no longer meeting my main requirement; the PagedList<double> will, at some stage, store a double array larger than 1000 elements. I could fix that by capping – but I’ll look at that in a later post.

Wednesday, 16 February 2011

A simple paged IList, arrays of doubles and the LOH part 2

So, first of all it might be interesting to check that large double arrays are put on the LOH, in case the internet lies. Bring on CLR Profiler to check allocations.

Tangentially: >=1000? Not >1024?

So, if I allocate a number of double arrays of size = 100 everything allocated on the normal heap (in these CLR Profiler diagrams, System.Double[] allocations are in red):small-double-arrays-objects-by-address

… and with array size = 1000 everything is allocated on the Large Object Heap.1000-double-arrays-objects-by-address

Finally, I can show that PagedList<double>s of size 1000 but a page size of 100 use space on the normal heap, not on the LOH:paged-100pagesize-1000arraysize-oba

Not sure about all those System.Double[][] – that might be something to look into; but with the allocations briefly confirmed, I want to do some very brief performance comparisons in the next post.