This is a great example of why neither of the simplistic approaches to parallelization ("everything's a future" or "let the programmer decide") will ultimately prevail and how something akin to run-time optimization (a la HotSpot) will have to be used.
Like many of us, I have explored parallelization of different process intensive tasks and found that, most of the time, my efforts to chunk and parallelize them was just adding a processing overhead leading to worst performances. Even when using pooling to mitigate the expense of thread creation, the cost of context switching and synchronization needed ultimately to build the final state of the computation was still dragging the overall performance down.
In more subtler attempts I have done, like piping XSL transformations instead of chaining them, the results were sensitive to the amount of data processed (the more, the better) and the way the XSL were behaving (one that would start to output results early would lead to better performances when involved in a flow). Hence the context itself was of great importance for the result.
All in all, this lead me to think the following as far as parallelization and concurrency is concerned:
- Let us write correct code regarding to thread safety,
- Let us write efficient code as if only one thread was available,
- Let us write readable code and avoid "clever" programming.
When Larry's vision of run-time automated parallelization optimization algorithms will become reality, such code will certainly fly and, if not, will be easily refactored to do so. And if you think this idea of adaptive optimization is far-fetched, read about out-of-order processors and Java Hotspot optimization: today, we take all these for granted but a few decades ago, this was sci-fi.