How can a web browser simultaneously run more videos than the number of CPU cores?

Question

I've been learning about threads and processes and I understand that threads allow you to write concurrent code: where executing threads switch very quickly from one to another, giving the impression that multiple programs are running in parallel (but not truly in parallel, for which we would really need parallelism using multiple CPUs).

My machine has a dual core Intel CPU, and each core is hyper-threaded. So I have a total of 2 physical CPU cores, and 4 logical CPU cores. I did a little experiment to see if I can simultaneously run 5 (>4) YouTube videos on my web browser (Chrome), one in a separate window each. And it works, but I want to know how, exactly.

Since I don't have 5 physical CPUs (or even logical cores), they cannot be all running truly in parallel, so how are the videos seemingly running simultaneously? I say "seemingly" because, at least as much as my ears can differentiate, I don't notice short intermittent breaks where each video stops, allowing the other video to stream (even if I reduce the playback speed of every video to 0.25 times the original speed). This would be the case if we have 5 threads, one for each window, switching back and forth quick enough to give the impression of simultaneity. (I can, however, totally imagine that the switching is fast enough for my ears to be unable to distinguish between them).

To summarize, is each window running a separate thread, where the threads are switching quicker than I can differentiate them?

score 11 · Accepted Answer · edited Jul 08 '23 at 17:01

is each window running a separate thread, where the threads are switching quicker than I can differentiate them?

Yep, pretty much this. If you look at task manager or other tool you will see thousands of threads. They just get enough done and switch fast enough you don't notice.

In addition, some threads will block (sleep) under some conditions, such as waiting for a disk, web call or other really slow operation, rendering them essentially suspended so others can use the CPU.

Your test is also very naïve. If you're watching a video in a browser there are probably dozens, if not 100s of threads running the browser, I/O, GIF animation and other elements as well as your GPU & NIC acting as a co-processor.

This is an extremely complicated (yet fascinating) topic.

score 2 · Answer 2 · answered Jul 08 '23 at 22:27

While video is "reasonably realtime" (there needs to be a certain throughput per unit of time, and output needs to be well synchronous), it is far away from "hard realtime" requirements (where "actually all at once" could matter).

SMP (multicore CPUs, Hyperthreading...) augments the multitasking capabilities of the OS, it does not replace them - there were multitasking OSes, and video players that worked despite some background activity, in the era where almost all PCs had only one single threaded CPU core.

There is a lot of opportunities to decode the video streams in advance, buffer the results, and then have them displayed just when they are needed to display (which happens 24 to 60 times a second depending on the video material). Moving a whole screen worth of image data to the framebuffer memory (part of the graphics card where the images go that get sent to the screen) 15 to 60 times a second would not be an impossible feat for modern (last 15 years modern) computer architectures - but yes, it would create a lot of load still.... but it is not even needed - decoding features for common video formats have been part of GPUs for a similar timespan (actually, you are getting a kind of asymmetric multiprocessing...), so all the CPU will need to care about is receiving the (compressed!) undecoded video stream from the network, dressing it up a bit and handing it over to the GPU.

So, CPU cores have plenty of time in between to handle another video stream.

If you wanted to take this even further (you might find this in dedicated video devices, probably NOT general purpose PCs): You can strive for a so called "zero copy" or close-to-zero-copy architecture, it needs hardware parts to be very well matched though. In that case, the Network card would just be coordinated to DMA transfer the received undecoded stream to some memory location, and the GPU to pick it up from there, also via DMA... or even DIRECTLY from the network card buffer.

How can a web browser simultaneously run more videos than the number of CPU cores?

2 Answers2