Got it. This is getting a little bit beyond the scope of an ASR tool and more into the region of text analysis. As you dive into a goal like this, many questions start to come up about what is “good handling” for advanced cases and then it gets multiplied by the different requirements for language models versus grammars versus RuleORama grammars, and then things that may or may not be equally likely across languages. It is also something that can usually be judged by visual observation of your own grammar.
In an individual app (versus a framework like OpenEars) you can probably restrict the range of what is likely much more, making this simpler to implement for your own specific case. You can check out the file LanguageModelGeneratorLookupList.text in the acoustic model for the language you’re using and (for instance) load it into a data structure like an NSDictionary in order to be able to access it to do evaluation of closeness according to the needs of your application, if there is no opportunity to just look at the grammar at the time of creation and consider whether it contains similar words.