The catch is it doesn’t know you moved 900 miles in a plane. It likely only has data for some slow moving segments before takeoff, and then a long gap with no data for several hours, then some slow moving segments again after landing. The data that it does have (perhaps only a few minutes of data) most closely matches the speed and accelerometer patterns of cycling
Aside: For me, if it’s an airport I’ve flown in or out of before, it’ll often pick up those segments as bus, due to previously confirmed bus trips to/from airplane to terminal. A taxiing airplane can produce almost identical data to an airport shuttle bus, so the only way the model ends up differentiating is if the location data conclusively puts the trip on the runway, where typically only airplanes go.
It will learn that, so long as it is told. It doesn’t have the same basic intuitions as us - it’s merely looking at the recorded data, in discrete 6-60 second samples, and matching that to the patterns seen from previously confirmed/corrected samples.
The reason why it might pick “car” when you’re stationary is because travel by car is predominantly very slow or stationary for extended periods, with little to no accelerometer movement. Cars spend the vast majority of their travel time stuck in traffic. (When you look at the raw data of cars, it’s really quite depressing!)
When you correct a segment to “stationary” you are telling the classifier “actually at this location I’m much more likely to be stationary than in a car”. Then next time you’re at that location, with little to no movement, it’ll be much more likely to match the samples with stationary instead of car.
The classifier also knows to choose stationary more often inside visits. So sometimes it will first choose a moving type (eg car), then once there’s enough data that the processing engine decides you’re actually inside a Visit, it will be able to reassess the samples inside that Visit and conclude that now the best match is stationary.
That comes as a surprise to me, because my signature reply style is excessively long, overly detailed explanations!
Generally speaking, the more accurately you manually classify the individual samples, the higher accuracy you will get back from the classifier. However when inside visits I personally prefer to correct almost all of it to stationary (with the exception of large places like malls or airports), because I’m not massively interested in picking up all the details of me walking around my home, and prefer the simplicity of it all being classified as stationary.
For things like walking versus car, my preference is to be picky about getting those details correct, because I don’t want the classifier thinking I took a taxi into my bedroom. I want it to know I got out of the taxi out at the road, then I walked into my building. If I classified the whole thing as car/taxi, the classifier would learn that, and end up thinking that I drive cars into my bedroom. That would be a bit silly
If you haven’t already, I recommend installing Arc Mini. In this case for the purpose of looking at the “Classifier Debug Info” view. The first line you will see in there is the CD2 model, which is the model built from your confirms/corrects within the roughly neighbourhood sized area you’re currently in. The last number on that line is the accuracy percent it has achieved, for correctly classifying samples.
For mine right now, it says “A0.96”, which means it’s classified 96% of samples correctly, according to my feedback (ie confirms/corrects). So that’s pretty good!
The second to last number (in my case “C0.30”) is the “completeness” percent. So in my case it’s saying that the CD2 model for the neighbourhood I’m currently in is about 30% “complete”. Though that’s quite an arbitrary measure - based on a minimum number of training samples that achieve decent results. But it’s a useful measure at a glance to know whether the model for the current neighbourhood is well fleshed out or not. (Mine is low because I’ve only been in this neighbourhood about a month, and haven’t really gone anywhere besides a short walk to the beach and back each day).
In that Classifier Debug Info view you can also see the classifier’s results for the most recent sample, ordered from highest scoring types to lowest scoring. Watching that live update while walking or driving (well, someone else driving) can give you insight into what the classifier is “thinking” at any single moment in time.
Though in a more general sense, I think the answer to your question is that these ML (machine learning) models don’t think the same way we do, and don’t have the same information available to them as we do. We can see everything around us, and various conclusions are patently obvious, but the models are working with limited data (lat/long coordinates, accelerometer data, pedometer data, and a handful of other details) and are feeling around those data points in the dark, thinking “what does this feel most like?” They’re bound to make dumb decisions sometimes - even ones that look utterly absurd to us.
But ultimately the current Core ML based system used in Arc is achieving very high accuracy overall (eg consistently over 95%). It just sometimes doesn’t feel that way. When the model makes a really dumb decision, that stands out like a sore thumb, while all the correct decisions it makes throughout 95% of the rest of the day don’t stand out at all. We notice things when they go wrong, but not so much when they go right.
So what you’re left with then is largely not a technology problem but a UX problem, ie user experience. Just because the technology is getting the right answer almost all the time, doesn’t mean it feels that way, and doesn’t mean it’s creating the most ideal user experience. But that’s a whole other massive topic, which I won’t go into right now! I’ve already rambled on long enough