I’m aware that this is my first post here in an incredibly long time. There’s been a lot of stuff I have wanted to blog about, but also a lot of other stuff I’ve been busy with, so apologies for that. This is a blog that I have been planning for a long time and I think is super important. But to the surprise of no-one, it turns out to be way more work than I anticipated. For that reason, it will be broken up into parts (else I’ll never get around to releasing it). I’ll keep updating this page with the new data as it becomes available.
I think by now, everyone knows how much I love working with Location data, especially on iPhones. I have done a few blog posts before and several presentations about this very topic. I am still continually asked questions about how reliable the data is and specific questions about visits and frequent locations. This blog post aims to answer all these questions. One at a time.
The Setup I wanted to provide real world data to show exactly how good, or how bad, location can be. So to do this, I gathered a few of my test phones, each different hardware, different OS, different settings and different apps, and sat them all on the passenger seat of my car.
All devices were locked. Kinda.
I did have one device, effectively taped to my center console, recording my journey for comparison reasons. This device was obviously unlocked.
I took a drive for about an hour, and this journey was planned to encompass rural, semi-rural and urban areas.
Once I was back home, I extracted all devices and all the different locations sources were processed. Some had virtually no data, but some were packed with the best data you could hope for. The locations were mapped out and a video was created comparing the mapped data against the video that had been recorded from inside the car. The possible comparisons included not only the accuracy of the location, but speed too.
This is the bit that is taking time.
So, starting today, I will post a new device whenever I get chance.
How I interpret Location Data
One thing to bear in mind when talking about all these devices though, is something that I find myself saying a lot.
The coordinates found on the device IS NOT THE LOCATION OF THE DEVICE. The coordinates ARE only the centre of the radius that can be drawn (using accuracy information).
This is a minor difference when talking about an accuracy like 5m. In these circumstances, then for all intents and purposes, the coordinates are the location of the device.
But when the accuracy can be anywhere between 5m and 25km, then suddenly it becomes clear that the coordinates cannot necessarily be trusted as the device location.
To demonstrate this better, let’s look at this data from Cellebrite’s 2021 CTF dataset “Beth”. I’m focusing on the 2 records highlighted below that were recorded 2 seconds apart:
There is over 2km (1.28 miles) between these locations; an impossible distance to cover in just 2 seconds, and so it’s easy to say that one of them “must be wrong”. But which one? And if one of them is wrong, how can we trust either? This is where the Horizontal Accuracy information becomes super important.
One of these records has an accuracy of 3000m, one has an accuracy of 65m. If we draw that information on the map too and remember that the device can be anywhere within the radius, we see that actually, both records are correct.
The difference is that one record is correct with low accuracy and the other is correct with high accuracy.
In most cases, I personally ignore any location data with an accuracy of more than 200m. Sometimes it may be useful to know, but most of the time I just find it creates noise as I can show next.
This is all location data over a 24hr period an shown without accuracy information. Each gold dot is a coordinate found in the ZRTCLLOCATIONMO table.
How can you possibly know what is accurate and what isn’t? Where did the device actually go?
Now here it is, with Horizontal Accuracy information overlayed. Now we can start to see how inaccurate some of this data actually is.
And finally, here is the data with all records with accuracies 200m+ data removed.
How much more useful is that? It gives a much clearer picture of where the device was and makes it easier to evidence.
The other data wasn’t wrong, it just wasn’t especially accurate.
Of course, there are still times that the accuracy information is a little overconfident, reporting an accuracy of 20m when in reality, it should be more like 40m for example.
And there are times when the location is wildly inaccurate. Typically, this is a throwback to an earlier record and I expect to cover this in detail when it happens.
My rule of thumb is to look for patterns of records that make sense and not trust singular records.
All of the videos that are/will be included, will follow the same layout which is based on a flipbook video created by my tool ArtEx. The biggest difference is that I have replaced the zoomed out map with the video recorded from inside my car.
So the left side of the screen will show the video. This includes the view from the windshield, the time and a close-up on the speedo.
The right side of the screen is a map showing the location information from the device. Typically, the location will be shown as a blue marker with a light blue circle. The marker is the coordinates and the circle in the radius.
In cases where the location information is accurate (or close enough), that icon is all that is shown.
In cases where the location information is not very accurate, I had added an additional marker in RED to show the actual location at the time.
In cases where the actual location is not visible on the map, there will be a note indicating such. Additionally, there may be other notes where appropriate to indicate a change of road layout for example. This may mean the location information is actually correct, even though it looks like I am off-roading. It’s simply that the map hasn’t yet been updated.
Other information includes the speed, heading and accuracy as per the record.
The first device I looked at was an iPhone 8 running iOS 14.3.
Analysis Between the times in the video, there were 3739 records made in the ZRTCLLOCATIONMO table of cache.sqlite; almost one every second, which was interesting considering the device was locked. The accuracy values recorded, showed an overwhelming level of “accurate” records, with 5 or 10 metres being by far the most popular.
Less than or equal to 10m accuracy
Between 10m and 25m
Between 25m and 65m
Between 65m and 200m
To take a look at the overview of the journey, it really couldn’t get much better than this.
Thousands of gold dots can be seen which are locations including a really small accuracy radius. They are so tightly packed that the appears as a solid line showing the path taken. I have highlighted the handful of lower accuracy records, but even then, most are correct. A relatively small number of records are “wrong” But even these appear to fall into the “over-confident” bucket. For example, quite early on in the video, at 13:04:11, there is a record that slightly breaks the pattern of travel.
We see numerous records all with 5m accuracy as the device travels south. Then suddenly, the device jumps further south for a single record before going back and continuing where it left off a second ago.
This map shows the actual location of the device vs the location recorded. We see the acuracy on this record drops from 5m to 35m but it is technically wrong. But if it the accuracy had been 60m (the larger blue circle), it would have been correct. This is what I mean when I say it is over-confident.
Another example occurs at 13:04:56.
The red arrow was the actual location at the time, so the record is “wrong”. If the accuracy had been 400m instead of 165m, it would be correct. 165m to 400m is a fairly substantial distance and I don’t want to downplay that. But it’s still not completely wrong.
It’s easy in the cases above to see that these are erroneous records. The pattern alone, where the record jumps around ,makes it stand out. But if this was the only record you have, can you trust it? Maybe?. Maybe one record is enough to believe that the device was in that approximate location, but I would still hope to verify using some other form of evidence too. The more records there are, the more trustworthy the location is.
I went over all the records carefully and highlighted the ones that are incorrect/over-confident. Note that I ignore the records here which are low accuracy but correct and focus only on the one that are incorrect.
16 records. 16 records out of 3739 are technically wrong. But none of these are too far away.
Cache_encryptedB I’ve previously addressed how I don’t like this source for identifying locations but here is a good example of why. I’ll just show an overview here.
Again we can see the gold line showing the path taken according to ZRTCLLOCATIONMO.
I’ve now added the red dots to show wifi networks that exist in cache_encryptedB. It doesn’t look bad does it? Despite missing a huge part of the journey, the records there are at least locations I visited.
My problem however is that all those 803 records all have the same timestamp (within 2 minutes) and that the timestamp is over an hour late. To summarize, these are typically locations that are close to where the device has been at some point. Sometimes, the timestamp will be accurate. Sometimes the timestamp will be weeks out. And there is no-way to tell the difference between the two. It’s a fairly similar story with cell towers, only worse.
This map shows orange dots for all the cell tower records in cache_encryptedB. Notice how there are records fairly far south west, no-where near where the device was.
Well, here is where I usually tie everything up. But not today.