WIP: Bugfix vectorization strategy and improved logging
Logging
This merge request improves logging in the vectorization strategies by providing another stringify version that makes it possible to see quickly what type of vectorization strategy was used. It shows the different inout keys of the scalar sum factorization kernels (by numbers 1, 2,...), padding (by 0) and splitting by 's'.
['1', 's', '2', 's']: Two input, that get broadcasted into the lower and
upper part with a splitting of 2.
['1', '1', '1', '0']: Combine three sum factorization kernels with the same
input and use padding in the end.
['1', 's', 's', '0']: One scalar kernel with splitting of three and padding
in the end.
Small Vectorization Bugfix
Consider the example of AVX-2 and three scalar kernel sum factorization kernels with the inputs ['1', '2', '2']. It was not possible to vectorize this as we don't support padding in the middle. This merge request adds a quick fix, that sorts the scalar kernels according to number of input keys before they get vectorized. In this case it would reorder to ['2', '2', '1'] and we can apply vectorization. This is of course not a real fix but it makes it less likely to run into this issue.