Accuracy has absolutely tanked since the latest update

This app used to be so accurate. Since the latest update with its promise of a major upgrade in accuracy, accuracy for me has actually tanked. It now identifies any walking activity as driving, and if I stop somewhere for less than five minutes, it doesn’t register as a location, and even trying to edit the walk into chunks to manually tag a location no longer works. And I follow all the developer recommendations, such as always making sure the app is actively running. I’m on an iPhone 14 Pro Max running iOS 16.1.2

(Incidentally, this is definitely an Arc-specific issue. Gyroscope, despite not even actively running, has been correctly logging the locations and a activities that Arc now ignores or misidentifies.)

1 Like

Hi @rolandsmash!

This was explained in the release notes for the update. The migration to the new classifier will take a day or so to adjust, during which time accuracy will be temporarily worse. Check again in another day or two, and you will see accuracy steadily improving to match and then surpass the previous accuracy.

For me, it was the opposite, after 1 day the accuracy was so much better. I didn’t need to correct anything in the last 2 days! Great work :muscle:

1 Like

Unfortunately, I’ve been having accuracy problems with Arc since I first installed it. I gave it a while to adjust and learn, especially since the answer to other peoples’ reports of similar issues was that the classifier was continuing to evolve and get more accurate. There was a brief period (I think with the 3.6.0 update?) where I noticed improvements, but the recent updates have brought back my problems and there’s been no improvement or seemingly no learning happening for Arc on my device.

In my case, almost everything I do ends up being
classified simply as “Transport”. This includes walks, drives, or even when I’m in a single location but occasionally moving between rooms (as long as it isn’t home or some other remembered location). I have to go into Arc at the end of each day to manually correct almost every single thing I do, and it’s honestly a huge drag for what was billed to me as an automatic app that learns to classify your movements and locations.

Hi @davidcelis!

Could you please install Arc Mini, and have a look at the “System Debug Info” view. If you could post a screenshot of the top and the bottom of that view in this thread, that will help us to debug what’s going wrong on your phone.

The key details I’ll be looking for are “CD2 models pending update” at the top, which ideally should be only 1 or 2, and “coreMLModelUpdates” at the bottom, which should have completed within the last few days, and not be more than a few days overdue.

Oh, also it’ll be useful to make sure you’ve got “Background App Refresh” in iOS Settings → General set to on for Arc. If that’s turned off, Arc won’t be able to ever run its daily scheduled tasks, such as updating the activity type models.

Low Power Mode will also stop the daily scheduled tasks from running.

Yup, background refresh is definitely on and has been since I installed Arc. Here are those screenshots from Arc Mini, though!


Well, those screenshots look perfect. Which means the classifiers should be working very well. So I don’t know what’s going wrong.

Perhaps next time Arc gets the type wrong (or leaves the type as “transport”) for a trip, if you could screenshot the Confirm or Type Change view for it, showing the list of types with their scores.

Example

Oh, also, in Arc Mini, when you’re in an area of town where Arc produces poor activity type classification results, open the “Classifier Debug Info” and screenshot that.

The top section of that view will be most useful for debugging what’s going wrong. Though the “Most Recent Sample” section might give you some insights if you watch it while traveling, to see what activity types the classifiers are choosing in real time. The correct match won’t stay at the top all the time - the raw data is noisy - but as long as the correct type is at the top of the list most of the time, the resulting Timeline Item will have the correct type assigned in the timeline view.

I find with moving it’s much not accurate but the classification of stationary seems wrong

Like it seems to accurately identify I’m in a location but then when I’m actually at a location it says I’m never stationary

Like in the first example every data point is identified as car

And then on the second, it’s identified I’m in a place, but the main thing is walking (I’m typing away in bed so definitely not walking)

So I’ve had to manually reclassify almost all place data to stationary, but it doesn’t seem to be doing much

Like I understand you want more data within a location, but honestly I’d prefer a button that would allow me to reclassify all point within a place as stationary, and the old classifier did that but this one one seems to want to always put a movement type on all data points

The new classifier is capable of more accurate sample classification inside Visits than the old classifier, but it seems to need much more encouragement to recognise genuine stationary samples inside Visits.

The old classifier basically gave up completely, treating everything inside Visits during Sleep Mode as stationary. Which made for less cleanup work (most of the time), but meant it was never going to be able to recognise any of that data as anything other than stationary, even if you were walking around inside.

I’m still mulling on this one. The new classifier does learn, if you correct those segments inside Visits to stationary. But it’s a lot more work for not much gain, given most samples inside Visits genuinely should be classified as stationary anyway.

I haven’t done any fine-tuning of the new classifier yet - it’s fairly stock CoreML Boosted Tree. But I don’t think I will be able to tune it for this specific case of stationary data inside Visits, without adding a new model feature. (“Model features” being things like speed, location, accelerometer data, direction, time of day, etc).

One model feature I’m considering adding is “time since arrival”, meaning how much time after arriving inside a Visit the sample was recorded. (For samples outside of visits it would be 0). That would hopefully allow the classifier to better learn “when inside for more than a short period of time, it’s probably stationary”.

Also in the change to the new type of GPS location, I find that transient signals in subways don’t get recorded anymore as they don’t have enough data quality making the metro tracks unreliable, if there’s any chance for a toggle for data quality in like advanced settings or something that’d be appreciated

I had to mark this segment as bogus because even though I had location data it never had sufficient data quality to enter arc

Google whole similar recorded more of the transient data locations

(In this case it comes from cell reception st stations but no actual GPS fix)

Arc diesn’t discard any location data, and its requested accuracy levels haven’t been changed in years.

What’s more likely is iOS is delivering different levels of location data accuracy to different apps, even when the apps are running at the same time. This is a problem that started several years back, and may be an iOS bug. (I certainly can’t think of any good reason why iOS would want to do it on purpose).

hmm I see, I thought it might be because of the smoothing but I misremembered, and it seems that you reduced it in 3.6

The data points are from the wifi triangulation from the metro’s wifi network and I did see the purple dot shot up properly in Arc because I purposefully got off the station to try and get a location lock for more than a minute, but it didn’t seem to record until I had solid mobile signal. But it might be that since there was no data for more than a minute the app just hadn’t fully woken up yet or something.

Anyway it is an edge case so nothing to worry about

Sure, here are a few examples from today:

This is one that should obviously be a drive (21 miles in 29 minutes) but it was detected as stationary and the top option for correction is walking, somehow.

Here’s one that came through as Stationary, despite measuring 14mph. Mind bogglingly, the top correction is airplane. We were still driving, but a bit slower after we turned into this cross street. Things being misclassified as “Stationary” despite being in motion is also super common for me, with actual stationary activity (like a visit) coming in as “Transport”.

Same as before: registered as stationary with Airplane as the top correction. At this point, we’d gotten out of the car and started walking (2.3mph as you can see)


This last one is interesting because it was treated a little differently in Arc vs. Arc Mini. This was a short walk around the block with my dog, but it got registered as stationary in both spots. Walking was the top correction and, in this case, I can understand why it came through as stationary (it was a slow walk). But even when I have a reasonable pace, this will happen to me. Also, Arc Mini was interesting in this last one because you can actually see the route around these few blocks, but Arc just shows a stationary dot in a purple circle and not the route I walked.

I don’t understand why I keep having to repeatedly correct almost every activity from transport to the real movement, or from transport to stationary, or from stationary to some motion-based activity. Do visits come in as Transport because changing rooms is detected as movement? But even then, why do things that are at least correctly classified as a motion all get lumped into “Transport” for me instead of more intelligently classifying into walking or whatever else I’m doing?

Hmm. I also don’t know. Given the debug view says your CD2 models are up to date, there’s no obvious reason why it would be making these mistakes. The kinds of mistakes in your screenshots are the kind that would happen if the CD2 models haven’t been built at all yet. If the CD2 model for a specific region (each CD2 model is about the size of a city neighbourhood) has even just one day of confirmed/corrected data, it would achieve results many times better than what you’re seeing in those screenshots. So … something very weird is going on.

The “transport” type is a special case, and isn’t so much an “activity type” as a way for the classifier to say “I really don’t have a clue - I don’t have enough information”. When the classifier doesn’t know enough about the data to give even a reasonable guess, it will say “transport”. Which is what makes me think the CD2 models are somehow missing, even though the debug view says they’re all up to date.

Are these areas you’ve spent time in before? Or are you travelling, and these are new regions? Perhaps if you’re travelling to new areas, that could explain it.

Though in that case, Arc falls back to using city scale models and even global scale models, which are still able to classify the basic types of walking, running, car, etc with greater accuracy than you’re seeing. These days, with the level of knowledge the “shared models” have (the ones that Arc falls back on when it doesn’t have enough data from your own confirms/corrects) there shouldn’t really be anywhere in the world that produces classification results as bad as in your screenshots. So I’m perplexed… I’ll keep thinking on it, and maybe will have an idea…

Oh, there’s one more thing to check: The Classifier Debug Info in Arc Mini.

The top section of that view lists the classifiers Arc (or Arc Mini - they share the same database and classifiers) is using right at that moment, for that specific geographic area.

The first row should be a CD2 classifier, then next GD2, then GD1, and GD0. The CD2 is the model built from your own confirms/corrects. The GD models are the “shared models” provided by the server, which are used to fill in the gaps when your CD2 model doesn’t have enough data yet.

GD2 is city neighbourhood size (the same regions as CD2). GD1 is city size, and GD0 is the global model.

On the right of each row, the first number is the number of samples used to build that model. A decent CD2 should have a few thousand samples, but even a few hundred should produce better results than you’re seeing (which is what’s got me so perplexed - the results you’re seeing imply almost no data at all, and certainly not models built from a bunch of confirms/corrects).

The GD2 model should ideally have tens of thousands of samples. The one for my local neighbourhood is almost 200,000. The GD1 should have a number in the hundreds of thousands or millions. The GD0 will likely say over 500 million.

Actually, if the GD0 doesn’t have a number around 540 million, then that would suggest a possible cause of part of the problem. I discovered a very strange bug a few months back, that suggests database corruption in the database table that stores the GD models on the phone. Data added to that table was appearing to be saved correctly, but then attempts to fetch it out again returned nothing, or only partial results. Nothing has changed with that code in Arc for years, so to my eyes it suggests a bug in SQLite (the standard database system used by pretty much all iPhone apps, provided by iOS).

I haven’t looked further into that bug yet, because the new CD2 (Core ML model) system effectively replaces the old GD model system, and is stored in a separate database table that isn’t exhibiting the problem. But if for some reason the CD2 models on your phone are also flaking out, that’ll be putting pressure on the GD models, and if the database on your phone also has this weird corruption problem, then that would explain why the classifier results are failing so dismally.

Anyway, have a look at that Classifier Debug Info view in Arc Mini when you’re in one of the areas where you’re getting terrible classifier results. The classifiers view shows you the classifiers being used right at that moment, for the area you’re in now, so it’s not useful for looking at when you’re in another part of town, only when you’re in the trouble zone.

If the numbers for the CD2, GD2, etc are very low, that will point us in the direction of what’s going wrong.

Yeah, theoretically the reduced smoothing should improve the situation for underground train trips.

I’ve actually had it on my todos for years now to spend more time on improving the recording of underground train trips. But in the last year or so while doing a little bit of testing on underground trains I noticed that Arc’s recording debug view was showing no new incoming CLLocations, even though another location data app on the phone was showing an updated current location. Which fuelled my suspicion (also based on some other testing experiences) that iOS at some stage began providing different location data accuracy and frequency to different apps even while they’re both actively monitoring location at the same time. A bit of a “wtf” moment.

I suspect it’s also possible that even separate CLLocationManager instances within the same app might be getting given different accuracy and frequency. For example the sky blue “current location” dot on Arc’s map view is provided by the Mapbox framework, and uses its own CLLocationManager, separate from the one Arc is using for recording. My suspicion is that that CLLocationManager sometimes receives different data from what Arc’s one is being given. Still yet to be proven, but it would fit the theory - iOS at one point changed to treat separate CLLocationManagers differently, even when in the same app, and even if they have the same settings.

Anyway, I might get a chance to do more testing on this soon. I’m moving to Tokyo later this month, so will be getting the opportunity to use a lot of underground trains. Bangkok (where I live now) uses mostly elevated train lines in the city.

Aside: This is a handy little map detail to know the difference between what the phone is providing as real time current location, versus what Arc is determining at the current Visit or stationary period on the map. The sky blue dot and circle are from Mapbox, and indicate real time current location (and the circle radius indicating current accuracy level), while Arc’s current Visit or stationary segment is shown as a purple circle.

Arc’s purple circle should be relatively stable in position and size, indicating Arc’s filtering and smoothing, and processing samples into a sensible Current Visit, while Mapbox’s light blue dot and circle will likely be erratic and unpredictable, indicating the noisiness of the raw location data.

It’s been more than a month. Significant place stops are still being ignored, even when attempting to drill down into segments. Every kind of movement is just generically identified as “transport,” although a few times walking was incorrectly mislabeled “taxi.” This app used to be so reliable. I keep the app actively running at all times. I have zero idea why accuracy is such a disaster now.

@rolandsmash I also have no idea why it’s not working well for you, and will need more information before I can take a guess and find a resolution.

  1. Make sure that Arc has full location permissions in your phone’s Settings → Privacy → Location. Note that “Precise Location” must be turned on, and location access set to “Always”.

  2. Install Arc Mini to get access to the detailed debug views.

  3. Please post screenshots of the “System Debug Info” view in Arc Mini, and also the “Classifier Debug Info” view.

Those will contain debug information necessary to determine the nature of the problem.