Simplify implementation of local Laplace assembler
While the previous implementation tries hard to optimize several aspects (e.g. exploiting symmetry) I was surprised, that it performes significantly worse than the simple, non-optimized assembler from the dune-functions Poisson example.
In my tests it turned out that removing some optimizations
(e.g. don't exploit symmetry) improves performance by 15%-28%
(15% for PQ2 on a 2d mixed UGGrid
and 28% for Q2 a 3d cube UGGrid
).
The test I used is the benchmark
example from https://git.imp.fu-berlin.de/agnumpde/dune-fufem-forms.
BTW: Using integrate(dot(grad(u),grad(v)))
from the latter performes still
a bit (14%) better in 2d and significantly (40%) 3d.