In each and every case I'm familiar with, streaming means "send the whole audio thus far to the inference engine, inference it, and send back the transcription"
I have a Flutter library that does the same flow as this (though via ONNX, so I can cover all platforms), and Whisper + Silero is ~identical to the interfaces I used at Google.
If the idea is streaming is when each audio byte is only sent once to the server, there's still an audio buffer accumulated -- its just on the server.