As I've been working on an MP3 decoder/encoder system capable of decoding and encoding MP3 data in parallel, I've had the chance of testing that system with a plethora of worker threads (my main server has 64 CPUs).
Smaller Files
Along the way, I've got many surprises.
First of all, some jobs, even though I have 64 CPUs, do not make use of much more than 8 CPUs. That is, whether I run with 64 or 8, the result is that I get the task done in about the same amount of time. However, when using just 8 CPUs, I can run the command 8 times in parallel and therefore process 8 files ...