What's the best way to mark GPS drift? Stationary or bogus

I sometimes stay at locations experiencing some GPS drift. By that I mean when I am totally stationary, the phone thinks I am moving to another building nearby, and then soon afterwards moving back. In the Arc app sometimes that displays as a car or walking segment to some place and back.

What’s the best way to clean this up? I see two approaches: one is to simply mark the segments as stationary, another is to mark the location as bogus. What’s the difference between them? If I keep marking segments as stationary, would that eventually expand the place radius (the orange circle) and make it bigger? If I use the bogus feature would it not affect that orange circle?

What if I use the “cleanup” feature which says it will convert all samples within the place radius to be stationary? This uses the orange circle to determine whether to mark as stationary or bogus but not expand the orange circle?

1 Like

Excellent questions!

The basic rule of thumb I use is: If it’s less than about 100 metres from the real location, mark it stationary; If it’s more than about 100 metres away, mark it bogus.

Yep, exactly. So if the drifting data was too far away, and you mark it as stationary, it’s going to firstly expand that Visit’s radius out to possibly an absurd size, then incorrectly consuming some trip details on either side. And secondly it will end up expanding the Place’s orange radius out, which will potentially mess up auto place assignments on subsequent visits.

Hence my vague 100 metres rule. (That I tend to break sometimes when the mood takes me. If I feel like a specific Visit’s radius could be / should be cleaner and smaller, I’ll edge out even more samples as bogus, to clean it up further).

Yep, exactly. Samples marked as bogus are excluded from any calculation involving location data. That means Visit radius/centre calculations, Place radius/centre calculations, and also Trip item durations and distances. (So for example if a trip item had a patch of very messed up data along the way, you can mark it as bogus and the distance/duration calculations will skip over those samples, treating them as not existing).

Yep. So in theory if you only ever used the “clean up” feature, never doing any manual stationary/bogus cleanup, the Place radius would bias smaller over time. Although in practice it’s not enough of an effect to be a concern.

Aside: The “clean up” feature doesn’t quite do what it says it does. It uses the orange circle plus 2 standard deviations. Place and Visit radiuses are calculated as average sample distance from weighted centre, but then also stored alongside the standard deviation, to allow for different radiuses for different purposes. Like, some overlap tests work best with mean+1SD, some with mean+2SD, etc.

Place orange circles are actually mean+1SD, and “clean up” worked out best in testing with mean+3SD. Oh and Visit circles are mean+2SD. The different radiuses look intuitively more “correct” in the UI :man_shrugging:t2:

In the next Arc update there’ll also be an option to mark entire items as bogus (with some caveats), instead of having to go into the individual segments of an item and mark each segment bogus. This makes it faster to clean up overnight drifts that broke out of the Visit’s gravity well and ended up becoming separate drifting timeline items.

Wow thanks for the answer!

1 Like

Quick follow-up: when I see bogus in the main timeline (see attached photo), does it mean there are segments within it that are not marked bogus? So I assume once the next version of Arc comes out I can try marking it as bogus again and it will then disappear?

Yep, exactly. Well there’s one more detail. The classifier’s auto detection of bogus isn’t good enough to be trustworthy (bogus location data is just really hard to auto detect), so I’m treating auto detected and manually confirmed bogus samples differently.

The exact details are scattered in different places in the code, so I can’t find or remember all of them right now, but I think roughly it’s this: If it’s only auto bogus it will still be excluded from distances, durations, radiuses, etc, the same as manual bogus, but for an entire item to be considered bogus and thus be merged out of existence by the timeline processing it has to be 100% confirmed bogus samples.

Otherwise the classifier could potentially incorrectly mark samples as bogus, then the timeline processing engine could incorrectly merge away entire items, without us seeing the mistake unless we happened to go into the Individual Segments view of the consuming item. Correct data could just disappear, in an unintuitive and unexplained way.

You can spot the distinction between auto and manual bogus in the Individual Segments view - manually confirmed bogus segments will be coloured red, while auto bogus segments will be the same black text as the other segment rows.

I’m really not happy with the bogus system in general. It achieves its goal as a way to manually mark data as nonsense and remove it from consideration, but it fails to reach any reasonable level of accuracy for the purpose of automatically detecting such nonsense data and cleaning it up without our help. It continues to be an unsolved problem, and probably the one I spend the most time thinking about!

Wow thanks for the detailed answer. I understand it’s a difficult problem to solve but at least now I have a much better mental model of what’s going on now. Thanks!

1 Like