Retrieve ASCII phonemes and bypass local vocab matching

Home Forums OpenEars Retrieve ASCII phonemes and bypass local vocab matching

Viewing 5 posts - 1 through 5 (of 5 total)

  • Author
    Posts
  • #1021041
    box
    Participant

    The big picture and why I want to accomplish this…
    My App doesn’t use a vocabulary as you’d normally use to be recognized. I have an SQL database with (could be millions, currently thousands of) sentences and words that are ALL UGC (user generated content), which is constantly changing, deleting, and adding. I need the user’s mobile device’s pocketsphinx to take the ps_decoder_t Lattice data it builds and convert it to ACII phonemes that I can send to my server to search for a match on the database.

    I’ve dug in pretty deep into the pocketsphinx and OpenEars code to try and edit a way to convert and pass ASCII phonemes to OpenEars. It’s over my head I think… And after compiling the modified pocketsphinx, I’d have an OpenEars delegate observer to be notified like this on iOS:

    – (void) pocketsphinxDidCreatePhonemes: (NSString *) phonemes

    Any help or advice to make this happen will be GREATLY appreciated. I’m a decent developer but I don’t know as much about speech recognition as I should, so please be gentile.

    Also, here are some pointers so you don’t have to mention them:
    – I know phonetics aren’t visibly available in pocketsphinx as-is. I’ve worked with them in sphinx4 though.
    – I know openeears doesn’t and won’t do this now or never
    – Currently I’m using (Other than pocketsphinx) paid online Speech services such as NDEV, ATT, and iSpeech, and they work fine, (especially NDEV Nuance mobile) but all still take way too long to retrieve recognition as they ALL require sending audio recording files over the internet. Then after its finally received, I send the recognition to my database to be matched. Takes way too long. Plus they’re expensive, proprietary, and not open source.

    Thanks!!
    -Joseph

    #1021046
    Halle Winkler
    Politepix

    Welcome,

    I honestly don’t think this is possible as a simple change to Pocketsphinx. This isn’t an advertised feature of Rejecto and as such could encounter implementation changes later, however, you could try doing this with the current version of Rejecto. The further problem though is that phoneme-based recognition is inaccurate.

    #1021062
    box
    Participant

    Thanks for your response!

    Two questions…

    1) Do you think it could be possible to convert Pocketsphinx current decoder to comething that could be sent to my server/database? My problem is, is I don’t fully understand the data that Pocketsphinx uses to find a hypothesis.

    2) I do know that phoneme recognition is inaccurate, but I plan on making up for it on the database query side of things. Each record in the DB has multiple phoneme-based pronunciations and MySQL has very fast searching capabilities. Would it be possible for you to show me or explain to me how to do this with Rejecto?

    Thanks!
    -Joe

    #1021063
    box
    Participant

    Are you referring to using Rejecto with this method? I know it was done using Sphinx3, but I’m open to trying that over rewriting some Pocketsphinx code…

    #1021081
    Halle Winkler
    Politepix

    1) Do you think it could be possible to convert Pocketsphinx current decoder to comething that could be sent to my server/database? My problem is, is I don’t fully understand the data that Pocketsphinx uses to find a hypothesis.

    You can use SaveThatWave to save utterances as WAV files and send them to a cloud decoder if you like. It should also be possible to turn off local recognition by setting self.pocketsphinxController.processSpeechLocally = FALSE (I’m slightly hesitant on that one because I just noticed that I left it out of my testbed so I’m not 100% on whether it’s currently working, but I would expect it to).

    2) I do know that phoneme recognition is inaccurate, but I plan on making up for it on the database query side of things. Each record in the DB has multiple phoneme-based pronunciations and MySQL has very fast searching capabilities. Would it be possible for you to show me or explain to me how to do this with Rejecto?

    I don’t think the underlying issue with phoneme-based recognition is that there isn’t enough data about phonemes in words, but that there isn’t any context anymore, but I think it’s reasonable to experiment and find out how it works in your own implementation. You can receive the actual Rejecto phoneme that was received if you set Rejecto’s setter method – (void) deliverRejectedSpeechInHypotheses:(BOOL)trueorfalse; to TRUE when initially setting up the LanguageModelGenerator.

Viewing 5 posts - 1 through 5 (of 5 total)
  • You must be logged in to reply to this topic.