Hi guys, I'm having trouble understanding what things I should do to optimize my matrix multiplication algorithm.
Here's the code:
dataMatrixecolsrowswhereEmpty::MatrixeVoidVoidOne::e->Matrixe()()Junc::Matrixearows->Matrixebrows->Matrixe(Eitherab)rowsSplit::Matrixecolsa->Matrixecolsb->Matrixecols(Eitherab)comp::(Nume)=>Matrixecrrows->Matrixecolscr->MatrixecolsrowscompEmptyEmpty=Emptycomp(Onea)(Oneb)=One(a*b)comp(Juncab)(Splitcd)=compac+compbd-- Divide-and-conquer lawcomp(Splitab)c=Split(compac)(compbc)-- Split fusion lawcompc(Juncab)=Junc(compca)(compcb)-- Junc fusion law
This is for my LAoP matrix library. I'm doing some benchmarks and I think it can/should go faster even though it's O(n³) and cache-oblivious. Is there any black magic that can be done?
StrictData puts strictness annotations implicitly on every field - that means that during construction of datatype, values put into it are forced to WHNF - if they're strict in fields too, evaluation continues recursively
Hi guys, I'm having trouble understanding what things I should do to optimize my matrix multiplication algorithm.
Here's the code:
This is for my LAoP matrix library. I'm doing some benchmarks and I think it can/should go faster even though it's O(n³) and cache-oblivious. Is there any black magic that can be done?
I'm evaluating the result to its normal form. WHNF is fast as it's a lazy structure
I'm also compiling with the following flags:
How fast it is when constructed with
StrictData
?Should I use StrictData on the module where the data type is defined or the module where the benchmarks are?
I used in the module where the data type is defined and it had significant speed up
However the WHNF benchmarks got a little slow but still good
Weirdly enough the NF was faster than the WHNF in some cases
Can you help me understand better what's going on?
I still need to tweak the benchmarks a little bit
StrictData
puts strictness annotations implicitly on every field - that means that during construction of datatype, values put into it are forced to WHNF - if they're strict in fields too, evaluation continues recursivelyCool, and is this the only thing I can do to optimise the performance?
Is GHC Smart enough to use all cores when there's possibility?
Should I be using other flags? I'm not very experienced in compiler flags
I found this, can be useful: https://wiki.haskell.org/Performance/GHC