I hear what you’re saying, but it is by design. Live mode does not do any kind of waiting for a pause in order to derive state, so if the engine is only operating in live mode, it uses its own logic to determine when is a good time to call an utterance over so that continuous recognition is able to proceed without notable pauses or skips. secondsOfSilenceToDetect and rapidEarsDidDetectEndOfSpeech both refer to pause detection, which isn’t a feature of live mode. It could be documented better, I agree.
I think that if I wanted to use live mode hypotheses and non-live mode utterance logic I’d probably turn finalize on and just ignore its hyp output. The overhead isn’t that heavy.