That's certainly something they can do; they just don't do it.
Compare movies on VHS tapes, which all roll ads before actually playing the movie.
- When you fast forward VHS tapes you still see the adverts. When you time seek a digital video file you don't.
- It's harder to do targeted advertising (since you no longer know if the downloader is also the viewer)
- Adverts baked into a video file can't be changed out later for newer adverts after the user has downloaded the file
- There's no way of having users "click" those adverts, let alone report back to Google that they've engaged with it
- Some of those aforementioned points might also have a knock on effect with how advertising is charged, which would mean Google would need another pricing model
...and all of this is before you even address the technical issues of video files having multiple different bitrates and formats. You could probably generate a new file on the fly but that would be extremely computationally expensive. So they'd rack up more costs hosting the service as well as lose money on advertising.
Honestly, I can't blame Google for not supporting a download option.
The adverts are a different file to the video files. So you need to either merge them ahead of time or dynamically splice them in real time.
Ahead of time: Now you need to not only have a video asset for each bit rate and video format but you need to multiply that with the number of adverts on any given day. This would result in thousands of files, maybe even hundreds of thousands once you take targeted adverts into account. And they'd need to rebuild their entire catalogue whenever an ad campaign ends. Clearly that isn't going to work long term.
Real time: Option 2 is to stream the advert then follow immediately with the video content on the same HLS stream. This is much more achievable than option 1 but you are then running the streams like a live TV service where you're dynamically splicing content into existing streams. It isn't difficult to do per stream but it is extra processing compared with the existing set up. The real problem lies with scale. A TV broadcaster might do this with a dozen to a few hundred channels, each with contracts ranging from thousands to hundreds of thousands. Youtube have millions of streams, each which earn pennies from advertising (and even less if they had to download because of the change in advertising model -- as I also described before). So they'd have to pay more and earn less. The financials simply wouldn't stack up.
Before you comment that YouTube and Twitch offer live streaming services based around the same advertising models, yes they do, but they also operate using the same stack as the pre-recorded videos because those video streams don't need to be spliced by YouTube / Twitch (any editing happens by the content creators before it hits YouTube and YouTube can show the adverts before or even between the videos by injecting them in the browser (ie so the splicing is managed at the front end rather than on Googles servers). None of this is doable with a "download video" button.
But I grant it's technically possible to do with a single MPEG file, just not trivial. You would need to prepend the advertisement to the video bitstream (re-encoded with same parameters such as width, height, FPS, intra picture interval) and then do the same for audio, while ensuring the video/audio synchronization is kept intact (MPEG edit lists are not well-supported by players, such as by ffmpeg), while preferably not re-encoding the actual video. A bit more complicated to do the insertion in the middle of the file.
It's possible to do, but I don't think it's something that any off-the-shelf tools actually do.
Otherwise, this is the same problem print newspapers have; how many readers per copy, seeing what adverts, how many times? This can be modelled or assessed (e.g., by reader response codes / instrumented URLs) fairly readily.