You’re welcome, although it occurs to me that maybe I’m not understanding your exact issue — is it accuracy or is it speed? Accuracy would be the percentage of correct word matches without any regard for how long it takes. Speed would be how fast the recognition is returned without any regard for how correct it is.
In my experience, with such a small language model combined with OpenEars .91, recognition on the device is basically instantaneous, maybe with a little bit of lag on an iPhone 3G or first-gen iPod. But before recognition begins, silence has to be detected, and the amount of silence that is detected in order to start recognition is a fixed amount that is set in OpenEarsConfig.h. So that amount of time is non-negotiable because it functions as the signal that the user is done saying their word or phrase. If it feels slow, maybe it is due to this obligatory silence, or maybe there is another issue. You can time how fast the actual recognition is performed by using the new delegate methods of OpenEarsEventsObserver for .91.
Hmm, I don’t think there is a reliable way to distinguish between the voices just using OpenEars. Even if there were, it would break down if you had two users with very similar timbres and accents unless it were very granular. It sounds like you’re interested in acoustic fingerprinting, but from this discussion it looks like a bit of a no-go:
This might be a case for just using some kind of interface/logic solution such as having each user tap a button when they want to speak, or log in at the beginning of a session so you know who they are.