Sincere thanks for the detailed followup.
My app is trying to be a “bad word detector/trainer.” The idea is that the user chooses a session time (e.g. 2 mins) to practice speaking. As they are talking, my app will detect each time they say a “bad word”, and it will display visual feedback (e.g. unhappy face) and a score (i.e. “You said DAMN 3 times so far”). The app continues detecting and reporting as they talk throughout the session. If they say two “bad words” in succession, I’d like to get two responses from OE (e.g. “DAMN”, “DARN”).
My current app kinda works, but often misses “bad words” or combines multiple bad words (said in sequence) into a single response (e.g. “DARN BUMMER”, instead of “DARN” and “BUMMER”.
I’m wondering if I would benefit from your suggestion of recording .WAV (even though I don’t understand why that helps), or if RapidEars is the right choice.