Enable multithreading assembly
This MR implements multi-threading assembly of residuals and jacobians.
- Depending on how much work is done in the assembly compared to the linear solver, these changes may give quite an speed up.
- Multi-threaded or sequential can be chosen at run-time.
- Synchronization mechanism between entity lock and coloring can also be chose at run-time.
- Ideally, people would have METIS installed, otherwise we fall back to a very simple partitioning of the grid. The number of patches of the partition are also a new run-time parameter.
- This needs a quasi-experimental branch from pdelab and the dune-assembler port.