Soft-start Engine

Home Forums OpenEars Soft-start Engine

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
    Posts
  • #1015471
    jbsilb
    Participant

    I’m having one challenge that I’d love to get feedback from the community on:

    When I start Listening, there’s a bit of a lag, so I typically prefer to start the system prior to requiring any speech input. Unfortunately, that means that it automatically jumps into recognition mode, which can cause code to be implemented that’s not useful.

    I’ve solved this in the past by blocking based on a boolean, but this seems inefficient and still seems to frequently lead to words getting queued in the hypothesis, and the first recognition being error prone.

    Is there a way to “soft start” the engine so that there isn’t a lag between first request and “ready” state, while not starting the actual recognition process?

    Thanks!

    #1015472
    Halle Winkler
    Politepix

    Welcome,

    This is not actually advisable, because the lag is the voice activity detection checking the noise levels in the room and calibrating itself to distinguish between silence and speech in the current conditions before the user starts speaking. If this is done at some arbitrary time before the user is just about to talk, the calibration isn’t being performed for the environment which exists in the timeframe in which the user is speaking. This will lead to error-prone recognition.

    #1015473
    jbsilb
    Participant

    Hi,

    Thanks for the suggestion. Strictly speaking, however, it’s not an arbitrary time, it’s usually 15-20 seconds prior to the first voice input required (in the car).

    What’s the recommended audio environment for calibration? Only ambient noise?

    We’d like to make sure users have some sort of queue so that they might turn off radios, etc if that aids in calibration.

    #1015474
    Halle Winkler
    Politepix

    Yup, for speech recognition the optimal environment is always as quiet as possible, since background noise will either occlude the speech or cause an attempt to recognize it. So if the users are in the car and they are just using the built-in phone mic, it’s a good suggestion for them to turn off the radio. The important thing about calibration is that it is done on an environment that matches the speech environment, meaning that if the user is going to talk over the radio even if you suggest that they not do that, you want the radio on during calibration because silence in that case means “the user isn’t talking but there is quieter radio noise running in the background”.

    #1015475
    jbsilb
    Participant

    The one other thing we noticed is that every time the language model changes, the system starts listening and recognizing. Is that correct?

    #1015476
    Halle Winkler
    Politepix

    Sort of — these are all things that happen when the engine is started (calibration, listening, language model switching), so they aren’t responsible for starting it. Switching language models is something you can do while listening is in progress so the impression that it starts listening comes from the context in which you are preventing entry into the listening loop.

    I think what you’re seeing is that the overall listening method is recursive, so events which return it to the top of the loop will end-run your method of preventing recognition. I think the startup time is just a second or so, are you seeing significantly longer waits to start?

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.