Efficient MMM - Haskell

Welcome to the Functional Programming Zulip Chat Archive. You can join the chat here.

Bolt

Hi guys, I'm having trouble understanding what things I should do to optimize my matrix multiplication algorithm.

Here's the code:

data Matrix e cols rows where
  Empty :: Matrix e Void Void
  One :: e -> Matrix e () ()
  Junc :: Matrix e a rows -> Matrix e b rows -> Matrix e (Either a b) rows
  Split :: Matrix e cols a -> Matrix e cols b -> Matrix e cols (Either a b)

comp :: (Num e) => Matrix e cr rows -> Matrix e cols cr -> Matrix e cols rows
comp Empty Empty            = Empty
comp (One a) (One b)        = One (a * b)
comp (Junc a b) (Split c d) = comp a c + comp b d         -- Divide-and-conquer law
comp (Split a b) c          = Split (comp a c) (comp b c) -- Split fusion law
comp c (Junc a b)           = Junc (comp c a) (comp c b)  -- Junc fusion law

This is for my LAoP matrix library. I'm doing some benchmarks and I think it can/should go faster even though it's O(n³) and cache-oblivious. Is there any black magic that can be done?

Bolt

I'm evaluating the result to its normal form. WHNF is fast as it's a lazy structure

Bolt

I'm also compiling with the following flags:

    - -threaded
    - -rtsopts
    - -with-rtsopts=-N
    - -O2
TheMatten

How fast it is when constructed with StrictData?

Bolt

Should I use StrictData on the module where the data type is defined or the module where the benchmarks are?

Bolt

I used in the module where the data type is defined and it had significant speed up

Bolt

However the WHNF benchmarks got a little slow but still good

Bolt

Weirdly enough the NF was faster than the WHNF in some cases

Bolt

Can you help me understand better what's going on?

Bolt

I still need to tweak the benchmarks a little bit

TheMatten

StrictData puts strictness annotations implicitly on every field - that means that during construction of datatype, values put into it are forced to WHNF - if they're strict in fields too, evaluation continues recursively

Bolt

Cool, and is this the only thing I can do to optimise the performance?

Bolt

Is GHC Smart enough to use all cores when there's possibility?

Bolt

Should I be using other flags? I'm not very experienced in compiler flags