What do the classifier numbers mean?

Sorry for spamming with so many posts over one day, just very interested in how Arc works on the inside :sweat_smile:

In Arc Mini’s classifier debug menu, there’s a bunch of different numbers, and I was wondering what some of them mean: the sample counts and the coordinates are pretty obvious, but then there’s the A (accuracy?) and C (completeness?) parameters which I couldn’t find anything about on this forum.

Then there’s also the recent sample classifier data, where (I assume) the number in parentheses is the probability (0.00-1.00) of the sample being of a said type, but what does the other one mean? Is it the raw value output from the neural network for that type?

Oh, and while we’re here, what’s the difference between the CD2 and UD2 models? CD2 and GD2-0 are pretty understandable (and covered in other posts), but I couldn’t find any description for UD2 models.

And one last thing, is it possible to somehow see which classifier was responsible for the last sample (or, even better, for any given sample in the timeline)? Not sure how practical it is, but knowing where my personal model fails and falls back to the global ones would be a nice way to see why some samples get classified correctly and some don’t, even when they’re pretty close by (usually on the border between the CD2 models), and will also give some insight on how good the local knowledge of the global models actually is and how much you can rely on it.

1 Like

No worries! I enjoy talking about this stuff. It helps me to think through problems better.

Yep, you got it!

A = Accuracy = uh… I actually forget how it’s calculated, for either the CD or GD models. But yeah, accuracy.

C = Completeness = how close the sample count is to the arbitrary “needed samples” amount I established for each model depth years ago. The composite classifier tree uses that completeness as a weighting, when combining the results of the discrete classifiers in the tree (ie the CD2, GD2, GD1, and GD0).

The first number is the raw score/probability that the classifier(s) returned for that type. The second one, rounded to two decimal places, is the “normalised” value, and is the one that’s used in the UI on the Type Change views / lists. By “normalised” I mean just that it’s percent of the total of all the scores/probabilities added up.

So for example right now I’m seeing Stationary at the top, with 0.887598 [0.92], so the classifier(s) have given Stationary a score of 0.887, which is 92% of the sum of all the scores. That value is basically just a much easier to read / reason about value, suitable for showing in the main UI.

CD = CoreML models, GD = server provided “shared” models, built with my old custom built Naive Bayes system, and UD = on-device built models, also built with that old custom Naive Bayes.

The UD models aren’t actually used anymore for classifying samples. They’re destined for the scrapheap. Same for the GD models actually! The CD2 models have entirely replaced the UD2 models, and the GD models are on their last legs too. I’m going to entirely ditch that old system once I get the time, including the entire server side component, and replace it with CD0 (and maybe CD1) models built and maintained on-device.

CoreML’s boosted gradient trees do a much better job than my old hacked together Naive Bayes stuff, so I think with a bit of work I can get better results with entirely on-device built models.

Kind of, yep. Basically if the CD2 has a C/Completeness score of 1.0 it’ll be the only model used. The composite classifier will weight it at 100% and not even bother asking the other models for their opinions. So for anywhere that you travel frequently, it’s likely that the CD2 is doing the entire job, and the other models are just hanging around doing nothing.

So yeah, that C/Completeness score is the weighting used. And as soon as the tree hits a model with completeness = 1.0, it stops looking beyond there. It works through them in this order: CD2 → GD2 → GD1 → GD0.

So for example if the CD2 is at C0.90 but the GD2 is at C1.00, the CD2 will be weighted at 90% and the GD2 takes up the remaining 10% of the weight, and the GD1 and GD0 don’t get asked.

The GD0 (the singular model that covers the whole world) only ever gets asked its opinion if none of the models before it are “complete”, so it’s really the last resort. And it’s also pretty useless for anything beyond walking, running, cycling, car, an airplane. Going beyond that really need neighbourhood level contextual information, which is where the CD2/GD2 excel.

2 Likes

Ah, got it, thanks for the explanation! It all makes sense now :blush:

Have I understood you correctly that you’re planning to remove the “public” GD* models completely? I feel like even though they’re pretty inaccurate they offer a lot of help for “kickstarting” the CD model on the common routes in big cities (i.e. while traveling), and removing them would remove the local knowledge that the personal models may never get*.

I am all for adding the CD0 and CD1 models to the mix (especially with CoreML being much better at its job), but is there any reason to not keep the GD models as a last resort? Maybe use them just for a couple of days until the CD models get an opportunity to re-train for the new place.

*For example, when you spend just one or two days in a new city and move on, at which point you have a lot of very noisy data to sift through. Also, people usually have different habits while traveling, adapting to the city they’re in, which the local models know nothing about, while the shared ones know those patterns at least somewhat better.

A couple of reasons. The first is that at the moment they’re semi-broken (although I’m not sure why), and doing a really shit job - much shitter than they used to do. They also cost money (the Heroku server costs are around 200 USD a month). So it’s a matter of either fix them or ditch them, but either way, don’t keep spending money on something that’s barely even working :joy:

My thinking is the CD0 would likely be just as good/accurate as the current GD0, because at that global/world level you’re really only getting accuracy based on accelerometer, pedometer, and altitude (which at GD0 scale is mostly only useful for airplanes anyway). The location coordinates are bucketed in the GD0 to such a large scale that at best they’re probably only going to distinguish land from sea.

The GD0 is great for walking, running, cycling, car, airplane, but for everything else it’s next to useless. So an on-device built CD0 could probably fully replace it in terms of accuracy, and potentially even do better, given that Core ML’s boosted gradient tree algorithm is much better than my dodgy old Naive Bayes.

For the CD1, I’m not sure if there’d be any point. I’d have to think more on that. My current gut feeling is that if the GD models were ditched completely, the replacement system would be purely CD2 and CD0. But maybe the CD1 would be worth it. Not sure.

I guess the big question is whether the GD2 is worth keeping. That’s the neighbourhood level data, which gets hopefully decent results in a new area when the CD2 is either entirely missing or extremely slim on data. If the GD2s are working correctly, they should do an adequate job of distinguishing trains, cars, boats/ferries, etc, in a new area that the user’s data hasn’t seen before. And that’s something that a CD0 or CD1 couldn’t really replace.

So yeah, that would be a likely drop in accuracy for those first few days / weeks in a new area. Well, assuming the GD2s were working correctly, which currently they’re not. Heh.

What I’d ideally like to do is replace my old Naive Bayes algorithm with server-built Core ML models. So the GD2 models would be much more accurate. But to do that… yeah, that would be difficult.

Right now the GD2 models are built server side by merging provided UD2 models. So they’re not built using any user’s raw data, only a fingerprint/model of the user’s data. Which provides an extra layer of privacy / data security. The app isn’t sending anyone’s raw data to the server. But to build Core ML models server side it would likely be necessary to send actual raw user data to the server. Which falls into the bucket of “fuck no; don’t want to do that”.

To do that, Arc would have to implement things like Differential Privacy. But location data is extremely difficult to anonymise, even if you strip out timestamps and send/store it unordered. It’s still possible to reverse engineer individual users from even “anonymised” location data. So yeah, the Differential Privacy logic would have to be extremely sturdy and well thought out.

1 Like

Oh, I see, that makes sense now.

Then I guess in the short term it makes the most sense to add the CD0 and CD1 (city-size knowledge still helps in big cities IMO, i.e. when you’re in a new neighbourhood, with the CD1 model knowing more about the city and the usual transport there than the CD0, but I guess the best way to find out is to try both options and see what works best) and use the UD2 models for the server-side ones (assuming they’re fixable and that’s not going to take too much time)

And in the long term… yeah, I don’t see easy solutions for that either, that sucks… Apart from maybe just running a bunch of personal CoreML models in parallel and weighing their results, which… yeah, no. At this point I’m starting to see your point in just removing the server-side models outright, though I’d still leave the GD2 (even if they’re not CoreML, just as a last resort for new places). Though for me the corrections and global models were the coolest parts of Arc, knowing that what I’m doing doesn’t only help me, but also other people (even if just in theory), so it’d be sad to see them go :smiling_face_with_tear:.

Also, just got a weird idea, probably not applicable here but maybe it’ll help inspire something better idk. What if people could share their models? So when you go to a new place, you can look through a list of local models uploaded by other people (probably with a list of 3 or so most common movement types and an amount of samples total) and take the one that suits you the most? Kinda like the controller preset sharing in Steam. Bonus points for making an ML [meta-]model choose the right model for you. This is definitely much more complicated than it should be and not thought through at all, but I’ll leave it here anyways just in case :sweat_smile:

1 Like

Yeah that’s always been my hunch / experience too. Though to understand better whether the CD1s would work as well as the GD1s I’d need to better understand what the boosted decision tree algorithm is doing internally with coordinates.

In the GD models I know how it works - it buckets them into fixed bucket sizes, with the buckets being sized based on the coordinates ranges of the next level down. ie GD0 coordinate buckets are the size of GD1 models, GD1 buckets are the size of GD2 models, and GD2 buckets are … well, they’re the exception - it’s something like 10x10 metres or some such (depending on latitude).

In the GD models that means a GD1 model’s buckets can at least inform the classifier whether an activity type should exist at all within a coordinate bucket. Like, if one city has trains and the next city over has no trains, the buckets will clearly indicate that, so the classifier results won’t (or at least shouldn’t) contain any train results in the no-trains city, and should contain higher scoring results for trains in the has-trains city.

For CD1 models… I really don’t have any good guess how the model would codify that data. I don’t know the algorithm well enough. Would it work like “these given coordinates are closer to coordinates that I’ve got lots of train for; are further away from coordinates I’ve got boat for”? I need to do more reading on decision trees!

Yeah me too. Though over the years I’ve also become increasingly enamoured by the idea of apps that require no server side involvement, especially when it comes to location data, given the extreme sensitivity of location data, and the extreme difficulty of properly anonymising location data. But yeah, it’d be sad to lose that cool collective effort aspect of things.

1 Like