I don’t know how youtube does it, but decoding a video, say with libavcodec(ffmpeg) without GPU acceleration is pretty demanding. They could do it on their server and send you the stream, but then again they’d save a lot of money not doing that.
But I agree it shouldn’t take so much when nothing is happening, the web has very much become so bloated.
True, but in the end it’s ffmpeg doing the work for both (at least on a linux system).