OLTP query cost.
OLTP systems typically are expected to have high concurrency. Schedulers (CPU's) need to be available for worker threads. The default query cost threshold of 5 is admittedly very low. But the goal in OLTP is to keep query cost as low as possible. Lower than 5 will be a good challenge in complex systems.
When a query cost is higher than threshold, I am not totally sure if giving it DOP of all sockets would be a good idea. Sure enough, queries higher than threshold will start using parallelism, but at the cost of worker threads for other concurrent requests.
This was a long way to say that the real goal should be to reduce query cost.
Costly queries when call parallelism, the optimizer decides to split the data into equal sets of whatever MaxDOP is configured. This decision is based on estimate and is never 100% accurate. So, these smaller datasets are usually not equal. Each runs on its own worker thread. And each worker thread handling a smaller portion of data does not necessarily run faster. In fact, optimizer spent precious time deciding and splitting data. Then it executed that same plan MaxDOP times to pull that data. You just hope that you had good covering indexes (INCLUDE clause) to avoid Key Lookup's.
After all the data is collected by MaxDOP number of threads, these smaller datasets have to be put back together. That is another resource consuming process. Moreover, since not all threads retrieved the same row counts, usually some threads complete sooner than others. The completed threads then wait for all the threads to complete before the data can be merged into one dataset.
That is where you see CXPACKETS type of waits.
So why does SQL Server then even have parallelism?
The best consumers for parallelism are data warehouses. In OLAP systems, data is bulk loaded from other systems into fact tables or if you are using Inmon approach, other semi-normalized systems. In either case, the data is coming in and needs to be populated in the data warehouse tables.
In this case, parallelism comes in handy. It splits the bulk data in MaxDOP threads and populates the destination tables in parallel.