"[01:17.000 --> 01:32.000] Translated by Releska" when using the translate to english. That entire part of the song is instrumental. This line does not appear at all in the original transcribe only in the opus format rip.
It shows up in the yt rip in format 251 (opus), but not in format 140 (aac from youtube), nor the flac rip. All three are giving different results.
The translation quality is tied to bitrate. Same song converted to different words, the only difference being bitrates and formats. Converting my own rip with the same parameters as yt (opus @140 and then @130) didn't allow me to reproduce this error.
The model hung for a solid extra minute at the end when translating to english, the last 90ish seconds of the song took real time 60 seconds, while the entire rest took about 90. The same behavior was not observed with the transcribe.
Some of the english words are incorrect but that was expected. The first Japanese "mistake" I found was "全ては二人の" instead of "すべては ふたりの". With the left being what whisper wrote. A single random word "hey" was transcribed/translated to english even though it's the singer elongating the 園 while singing the 楽園. "落ちてゆく 二人で繋がれた二人のラグ HEY" instead of "落ちていく 鎖でつながれた 二人の楽園" .
I am using the official subtitles released on the youtube video.
It's a complex Japanese song with both japanese and english, and the original transcribe took about 20 real time seconds to start with the first line, 130 seconds for the whole song. It seems to be showing results in 20 second window increments, but this seems to depend on what it considers audio and what it is throwing away.
On my computer I wasn't able to use the large model because I ran out of VRAM, I have 8gb, not sure how much more it'd require. So I ran it with medium.
The song is False Sympathy by Mondo Grosso. The mv is suggestive, in case that matters. I grabbed a fresh audio rip from Youtube because I didn't want to take it out of my cd case.
https://www.youtube.com/watch?v=B6Y-WsgpzlQ
It is translating this version differently from the director's cut version. I ripped both as opus.
There is something weird about how it is handling the opus encoded version, as I find the same "Translated by Releska" in a wav version transcoded from the opus.