What I would like to know is the order in which the delegates are called.
Good question — this is really pretty dependent on what is happening/what you are doing. It isn’t so much that there is a particular order to expect but that there are particular events which will result in a certain delegate callback. The basic thing that you will see is the start event, then lots and lots of updates of the live speech event (as you mention) followed by a finalized speech event.
However pocketsphinxDidStopListening doesn’t appear to be called, should this not be called as some point before pocketsphinxDidStartListening is called? Or should pocketsphinxDidStartListening not be called except for the very first time?
I think this is an example of flawed naming on my part — pocketsphinxDidStartListening and pocketsphinxDidStopListening are not actually analogs. pocketsphinxDidStartListening is called when entering the listening loop, pocketsphinxDidStopListening is called when turning off the recognition engine finally.
What causes rapidEarsDidDetectFinishedSpeechAsWordArray to be called? Does it still work on the second of silence?
Correct, there are lots of attempts to recognize during the speech, and then once there is a pause there is a finalized, higher-accuracy attempt that is very similar to the default recognition behavior of OpenEars. It can be turned off (and should be turned off to save a few cycles) if you are only interested in the live speech but I left the option in there of using it so you aren’t excluded from the old-style pause-based recognition if you choose RapidEars. You can turn it off by setting this:
Just to confirm but pocketsphinxDidReceiveHypothese should no longer should be callled?
What sort of delay if any will be caused when it’s switching between these states? I’m mainly interested in trying to find out if any words will be lost if they are said between rapidEarsDidDetectFinishedSpeechAsWordArray and pocketsphinxDidStartListening being called, how is the reconigition loop affected, should I make the user wait before continuing to speak?
Just like with OpenEars, the engine is not taking in new audio while it is performing that pause-based finalized recognition (if you tell it to stop finalizing the expected behavior is that it shouldn’t have gaps in listening — let me know if that isn’t the case). But there shouldn’t be a delay in the time between returning the hypothesis and going back to listening.