"Messy" days / weeks which don't backup; take long to clean up

Hi,
I’ve been keeping an eye on the new backups as they roll into the iCloud Drive, especially the weeklies as they are easier to keep track of. ARC hasn’t yet completed the first backup.

I noticed large numbers of weeks missing in the directory. And when I navigate to those periods in the ARC GUI, weekly view crashes the app. Daily view then reveals one or several days which contain some junk (I seldom have the time to clear up the mis-categorisations, and usually I don’t really care that much). I often end up with days which show “Processing 2200 timeline items!”, or some similarly high ridiculous number. If I force ARC to stay open, it eventually chews its way through until there are none left, but it takes a whole day or night or so. Some items rush through at 1 per second, some of them take a minute each to process. Is this normal? I mean both the high number of items, and the time it takes to process them. Is there maybe a way to manually “nuke” the entries for a day so that all that processing can be avoided? When I know that a day was probably unremarkable, I don’t care so much about creating a gap in the recordings.

If I go the aforementioned route of forcing ARC to work its way through the individual days, then eventually the weekly backup drops into place too. But it’s more work than I would like to put in, as it fully occupies the phone and needs a lot of babysitting (for example to restart the app when it terminates, etc).

Yeah that’ll definitely be the problem - those days with disastrously messy data.

I hesitate to say “normal”, because it’s absolutely not intended nor desirable, but these mega messy days do happen, for several reasons. One cause is when the phone starts sending nonsense location data to the app, for several hours. This is the case that often requires a phone reboot to snap the phone out of its weird state. That cause unfortunately is out of our control.

Another cause is … well, if I could describe it accurately I’d be able to fix it! But basically something happens with the data processing that might be due to a bug in Arc, which results in hundreds (or even thousands) of samples getting orphaned (ie no longer attached to a parent TimelineItem). The processing engine can then recognise that problem and clean it up, but as you’ve seen, it can take a long time.

The two cases often look identical, so it’s hard to debug (and I still haven’t found a convincing cause for the latter described bug). But the good news is that I’m seeing effectively none of either of these cases happening anymore, in both recent iOS versions (newer iOS 14 releases) and recent Arc versions. So it’s possible that either the iOS problem has been solved, or the Arc problem has been solved (by me changing something else, and not realising that it also fixed this mystery bug), or both. But yeah, good news is it looks like it might no longer be a problem, and it’s only a problem of messy older data needing cleaning up now.

Aside: If it seems like it’s taking a minute to process just one, it’s likely that it’s actually processing tens or hundreds of small hidden items. The “Thinking…” sections of the timeline view can be hiding one or more small items, and even tens or hundreds of items. So it could be churning through a large “Thinking” block, full of a massive number of items. That cleanup won’t show up in the timeline view until it gets all the way through cleaning up those little items.

No there’s no way to do this. I think it’s probably easiest to just ignore these mega messy days, unless you’re in the mood to go back and clean one of them up. Though the only concern there is if the messy day is causing that week to not get backed up, as you describe. That seems quite unexpected to me, so I’m still pondering it.

How many weeks are you seeing with this problem?

Hi Matt,
thanks for the detailed breakdown of the individual (possible) parts of the problem.

No there’s no way to do this. I think it’s probably easiest to just ignore these mega messy days, unless you’re in the mood to go back and clean one of them up. Though the only concern there is if the messy day is causing that week to not get backed up, as you describe. That seems quite unexpected to me, so I’m still pondering it.

How many weeks are you seeing with this problem?

Unfortunately quite a lot. For the last 3 years (going back to approximately September 2018), the last time I checked it was about 65 weeks missing, i.e. more than a third. Might be that some of those are completely empty (== genuine data gap), but every case I’ve checked so far has at least some data for the week in question. Before Sept 2018 I haven’t yet counted the gaps.

Ultimately I don’t care too much about tidying the backlog, but I would like the first backup to complete at some point.

Another cause is … well, if I could describe it accurately I’d be able to fix it! But basically something happens with the data processing that might be due to a bug in Arc, which results in hundreds (or even thousands) of samples getting orphaned (ie no longer attached to a parent TimelineItem). The processing engine can then recognise that problem and clean it up, but as you’ve seen, it can take a long time.

The two cases often look identical, so it’s hard to debug (and I still haven’t found a convincing cause for the latter described bug). <…>

Now that you mention this: I frequently see ARC re-processing days which it only processed very recently – applies to seemingly “normal / non-messy days” too. For example, right now, if I step to a random day in August '21, it first goes to “processing timeline items”. And sometimes (not always) it then shows me “unconfirmed items”, even if I’ve previously gone through the process of confirming+editing all items for that day. So I do wonder whether my phone is showing that old (potential) bug you describe. Or maybe, this is again something completely different…

Yeah exactly. Having that first backup completed will be good peace of mind. And if there’s important data in the weeks that’re getting held back by messy days, that’s a concern.

I’m still pondering what I can do about it / what’s causing it. But in the meantime, you might be able to get away with just swiping through the messy days and leaving them for a few minutes of the processing engine churning through them. If for example there’s 1000 nonsense items, and the processing engine gets through 200 of them, that might be enough to get over whatever hurdle is holding back the backups for that week.

Certainly not ideal, but better than waiting for every messy day to finish cleaning up potentially thousands of nonsense items…

This is often innocuous little bits of processing - not the big jobs. Things like checking to see if the visits with unconfirmed places still have the best match place assigned. For example if a visit was auto assigned the “Local Mall” place, but since then there’s been more visits inside the mall and the models for individual shops are now more fleshed out, the best match might now be “Convenience Store” or “Ice cream Shop” or whatever - eg a more precise / accurate match instead of the more general first attempt best match that was assigned on the day.

It’ll only do that kind of quick reprocessing for unconfirmed items though. If you’d confirmed the visit’s place, then that visit won’t be reprocessed / best match assignment checked.

Likewise the same with auto assigned activity types for trips. If an unconfirmed trip was auto assigned “car”, but with more up to date ActivityType models the best match is now “bus” (because it’s become obvious that that route is your daily bus route), that’ll get picked up in the reprocessing.

Typically those little bits of reprocessing are sub-second, and the “processing timeline items” will flash then disappear.

This can happen if the previously auto assigned best match looked like a really good match at the time, but after going back a week/month later, with more up to date ML models, it now looks more iffy. Like that car vs bus example, where previously the models said “yep, definitely car”, but now say “uh, actually now it looks more like bus, based on what we’ve learnt. but the confidence isn’t high enough, so we’d better ask”.

Yeah exactly. Having that first backup completed will be good peace of mind. And if there’s important data in the weeks that’re getting held back by messy days, that’s a concern.

Sure, good peace of mind … and if I understand correctly, some decent space savings on my iCloud Drive, given that the old-style backup is still taking space!

I’m still pondering what I can do about it / what’s causing it. But in the meantime, you might be able to get away with just swiping through the messy days and leaving them for a few minutes of the processing engine churning through them. <…>

And for now, it’s what I’m trying, but it really is super tedious due to the frequent freezes, so even getting to the point where ARC starts working on some old data takes 5-10 minutes. So I’m really looking forward to the version which no longer freezes. At current speed, on a normal day, I’m able to force the backup of most “non-backed-up” sections at a rate of 1 week per day.

I can also see that while ARC is running, every now and again another weekly file is written – but for the most part, those are merely updates or rewrites of weeks which are already backed up (since the number of files within LocomotionSample doesn’t change). Is the backup getting its priorities mixed up if it’s frequently redoing weeks?

If there’s any way I can help locate the problem – e.g. if it’s possible to provide some of my data which just won’t back up automatically? – let me know.

Thanks for the additional information on re-processing.

Oh good point! I’d forgotten about that - it’s so long since my own new backups finished and cleared out the old ones. Yeah, once that new backup finally finishes, there’ll be a lovely purge of the old CloudKit backups, freeing up massive space.

The new update is already in the App Store review queue, in “Waiting for Review” status. Usually that only takes a day, so hopefully it’ll be “In Review” and then “Pending Developer Release” before the end of the day. Fingers crossed.

Aside: Arc Mini finally went live this morning! Though that won’t help with the problem you’re currently having. But for people who get frequent data gaps, it’ll offer some extra peace of mind, and hopefully solve that problem of frequent iOS terminations for them. It also has a couple of iOS Home Screen widgets - I use the “Current Item” widget to give me peace of mind that both apps are still alive, and also for little insights like how many minutes/hours I’ve been at the current place.

The Arc App update is now live on the store. Yay!

It doesn’t solve all UI freezing - there appears to be another remaining cause. But it does solve the vast majority of freezing, so it’s much faster and easier to use when doing lots of editing.