Is there any functionality to measure syllable nuclei:
No, but this wouldn’t tell you how many words the user had said, just how many syllables. You can currently get an middling-good attempt at phoneme transcription out of Rejecto, but again, knowing parts of words doesn’t tell you anything about how many words were said unless you already know the words, in which case you should just create a language model or grammar using those words and then count utterances of them directly.
Can you simply do a speech-to-text and then just do a total count on all of the words?
Yes, I think I would do it this way using a custom language model, but it would require knowing the needed vocabulary in advance.