It sounds to me like OpenEars or RapidEars might work well for your needs. You can try them both and see whether one of them is a fit for you. They should both be pretty fast since they are processing the speech right on the device, but RapidEars is the fastest since it can do recognition on the speech while it is actually still in progress.
OpenEars is free for use in your App Store app, while RapidEars is not free. You can read more about RapidEars’ pricing and other info at its page: https://www.politepix.com/rapidears or the shop page: https://www.politepix.com/shop
The main OpenEars page is here: https://www.politepix.com/openears
Both OpenEars and RapidEars now support English and Spanish. You can dynamically create new language models (vocabularies) in English or in Spanish whenever you want using OpenEars’ class LanguageModelGenerator. There is no way to train recognition to a particular user’s voice but you can can always change the vocabulary programmatically.
I hope this is helpful,