I just started working with OpenEars and am very impressed! It was simple to train and to get up and running.
I want to do a similar thing as Marco and have been digging into the AVFoundation interfaces a little. I was wondering if it would be possible to subclass the AVCaptureDeviceInput class, intercept the audio from the AVCaptureInputClass (see AVCaptureInputPort.input method), Process the STT, and then pass the stream on.
We could then replace session addInput value with [session addInput:<newSubclass>]