Yes I am very confident. I did it as you said. Same track, same segment, same volume. I am afraid that the auto detection you describe does not work.
I agree with you and it is logical to confuse music with speech if they are at the same level. However at some point and after a few failed attempts to recognise that it is noise it is hearing and not speech, it should recalibrate and snap out of it. Right now it stays in the “detecting speech – detected silence” loop and practically it becomes unusable.
I am always referring to your last beta (just in case).