The Stream API was designed to make it easier to write calculations in a way that was abstract to how they would be done, making it easy to switch between sequential and parallel.
However, just because it’s easy doesn’t mean it’s always a good idea, and in fact, it is one misconception to drop
.parallel() all over the place simply because it is possible.
First of all, note that parallelism offers no benefit other than the possibility of faster execution when multiple cores are available. A parallel execution will always require more work than a sequence, because in addition to solving the problem, it must also perform the dispatching and coordination of secondary activities. The hope is that you will be able to get the answer faster by stopping work on multiple processors; that this happens depends on many things, including the size of the data set, how much calculations are performed on each element, the nature of the calculation (in particular, does the processing of one element interact with the processing of others?), the number of processors available and the number of other competing tasks for those processors.
Also, note that parallelism often exposes nondeterminism in computation that is often hidden by sequential implementations; sometimes this doesn’t matter, or it can be mitigated by limiting the operations involved (ie, reduction operators must be stateless and associative).
In reality, sometimes parallelism speeds up your calculation, sometimes it won’t, and sometimes it will slow it down. It is best to develop first using sequential execution and then apply parallelism where (A) you know that there is actually a benefit to the performance increase and (B) that it will actually provide better performance. (A) it is a business problem, not a technical one. If you’re a performance expert, you’ll usually be able to go through the code and determine (B), but the smart path is to measure. (And don’t even bother until you’re convinced of (A), if the code is fast enough, better apply your brain cycles elsewhere.)
The simplest performance model for parallelism is the “NQ” model, where N is the number of elements and Q is the calculation per element. In general, the NQ product needs to cross a few thresholds before you can start gaining a performance advantage. For a low Q problem such as “adding the numbers 1 to N”, you will generally see a tie between N = 1000 and N = 10000. With higher Q problems, you will see breakevens at the lower thresholds.
But the reality is rather complicated. So, until you reach experience, first identify when sequential processing is actually costing you something, then consider whether parallelism will help.