Wednesday, 1 December 2010

GPS route correlations

I’ve been running for a while – mostly toward a better 10km this year – and have been playing with a few GPS apps while doing it (Buddy Runner for Android was good, but now I have a Garmin 405 running watch which is working out nicely).

However – one thing the software I’ve used hasn’t seemed to do is correlate routes that I run regularly. What I’d like to do is compare progress on the same route; one way of doing this would be to tag each route as I stored it. Another way would be for the software to recognize that it was likely to be the same route and automatically group it with its peers. Bonus points for asking me whether it really was the same route an suggesting a link, like Android does with your contact list.

A usable algorithm to correlate these routes isn’t hard, if we don’t worry about being too accurate: the gps route data is XML, and we can build a repository of route data quite easily, especially as some helpful types have already trodden that path. After that, finding matching routes, for me, goes as:

  1. Compare new route with each existing route.
  2. For each point on the new route, determine the closest point in the existing route being compared.
  3. Sum the closest distances for each point in the new route and the existing route.*
  4. If the total distance difference is less than some fraction of the total existing route length, and the total distance of each is the same within some tolerance, suggest it as a matching route. Or use to build a sorted list of matching existing routes.

* The final measure calculated depends on the number of points in the route; we can simplify things by assuming that for any given route there’ll be a similar number of points, which may be reasonable if you use the same piece of equipment every time you run. Alternatively we could average over deltas for each original route point.

So, first of all we need to parse and load the GPX XML. I worked up a static class that can load GPX and create a list of waypoints. I’m exporting the GPX from SportTracks, which means that for each run I have a single track, with a single track segment, with a set of track points. This means the code is a little simpler, at the expense of portability:

        public static Track Load(string gpxFilePath)
            Track track = new Track();

            XNamespace gpx = XNamespace.Get("");

            var doc = XDocument.Load(gpxFilePath);

            var tracks = from trk in doc.Descendants(gpx + "trk")
                         select new
                             Name = (string)trk.Element(gpx + "name"),
                             Points = (
                                  from trackpoint in trk.Descendants(gpx + "trkpt")
                                  select new
                                      Latitude = (string)trackpoint.Attribute("lat").Value,
                                      Longitude = (string)trackpoint.Attribute("lon").Value,
                                      Elevation = (string)trackpoint.Element(gpx + "ele").Value,
                                      Time = (string)trackpoint.Element(gpx + "time").Value

            string name = (string)tracks.First().Name;
            foreach (var point in tracks.First().Points)
                track.Add(new Waypoint
                    Longitude = Convert.ToDouble(point.Longitude),
                    Latitude = Convert.ToDouble(point.Latitude),
                    Elevation = Convert.ToDouble(point.Elevation),
                    Time = DateTime.Parse(point.Time)

            return track;

So from this we can build a list of routes. Next we need to implement the algorithm above … however, we have a slight problem in that we have locations in Longitude and Latitude, and we need a way of working out the distance between them points. We can do that by working out the great circle distance between the two using a quick calculation – like this with the Haversine formula:

        /// Calculates the great circle distance between two waypoints in km using the
        /// Haversine formula (e.g.
        /// First waypoint
        /// Second waypoint
        /// Distance in km
        private double GreatCircleDistance(Waypoint first, Waypoint second)
            double lon1 = toRadians(first.Longitude);
            double lat1 = toRadians(first.Latitude);
            double lon2 = toRadians(second.Longitude);
            double lat2 = toRadians(second.Latitude);

            double R = 6378.1;
            double dlon = lon2 - lon1;
            double dlat = lat2 - lat1;
            double a = Math.Pow(Math.Sin(dlat / 2.0), 2.0)
                + Math.Cos(lat1) * Math.Cos(lat2) * Math.Pow(Math.Sin(dlon / 2.0), 2.0);
            double c = 2.0 * Math.Asin(Math.Min(1, Math.Sqrt(a)));
            return R * c;

        private double toRadians(double degrees)
            return (degrees * Math.PI / 180.0);

Now we can implement a method for Track that calculates the total track distance:

        /// Returns total distance in km.
        public double TotalDistance()
            double distance = 0.0;
            for (int i = 1; i < _points.Count; i++)
                distance += GreatCircleDistance(_points[i - 1], _points[i]);
            return distance;

And we can finally implement the main part of the algorithm above, which calculates the total minimum distance between each point on one route with each point on the other. I set up a list of tracks, and treat the first one as the new track:

            foreach (var track in tracks)
                double totalDiff = 0.0;
                foreach (var newPoint in tracks[0])
                    double min = Double.MaxValue;
                    foreach (var existingPoint in track)
                        double diff = Track.GreatCircleDistance(newPoint, existingPoint); 
                        if (diff < min)
                            min = diff;
                    totalDiff += min;

The first thing I spot is that the total differences are a lot larger for similar routes than I assumed they’d be. For example, I tested with 3 routes – two over the same ~4 mile loop, and one ~13 mile out and back. The total diff between the first and each came back as:

Route 1 distance, km: 6.10723439167184
Route 2 distance, km: 6.42205099887688
Route 3 distance, km: 21.1510356703256
Total diff between route 1 and route 1: 0
Total diff between route 1 and route 2: 2.78080312304104
Total diff between route 1 and route 3: 32150.8929484652

Clearly the difference between route 1 and route 1 should be zero, but I didn’t expect the distance between two “identical” routes – 1 and 2 – to be of similar order to the total route distance. However, it just means that the threshold between “similar” and “different” needs adjustment – or that we just provide a list of possible matches in metric-sorted order.

Next step – take a look at performance, as there’s almost certainly some improvements to be made. After that, maybe create a module for something like SportTracks that hooks when you enter a new route, compares with existing routes and prompts you to link the new one with an existing group, if you like.