Efficient MMM - Haskell

2020-02-08 11:08:02

Hi guys, I'm having trouble understanding what things I should do to optimize my matrix multiplication algorithm.

Here's the code:

data Matrix e cols rows where
  Empty :: Matrix e Void Void
  One :: e -> Matrix e () ()
  Junc :: Matrix e a rows -> Matrix e b rows -> Matrix e (Either a b) rows
  Split :: Matrix e cols a -> Matrix e cols b -> Matrix e cols (Either a b)

comp :: (Num e) => Matrix e cr rows -> Matrix e cols cr -> Matrix e cols rows
comp Empty Empty            = Empty
comp (One a) (One b)        = One (a * b)
comp (Junc a b) (Split c d) = comp a c + comp b d         -- Divide-and-conquer law
comp (Split a b) c          = Split (comp a c) (comp b c) -- Split fusion law
comp c (Junc a b)           = Junc (comp c a) (comp c b)  -- Junc fusion law

This is for my LAoP matrix library. I'm doing some benchmarks and I think it can/should go faster even though it's O(n³) and cache-oblivious. Is there any black magic that can be done?

Bolt

2020-02-08 11:10:19

I'm evaluating the result to its normal form. WHNF is fast as it's a lazy structure

Bolt

2020-02-08 11:14:15

I'm also compiling with the following flags:

    - -threaded
    - -rtsopts
    - -with-rtsopts=-N
    - -O2

TheMatten

2020-02-08 11:38:55

How fast it is when constructed with StrictData?

Bolt

2020-02-08 11:46:07

Should I use StrictData on the module where the data type is defined or the module where the benchmarks are?

Bolt

2020-02-08 11:51:25

I used in the module where the data type is defined and it had significant speed up

Bolt

2020-02-08 11:52:43

However the WHNF benchmarks got a little slow but still good

Bolt

2020-02-08 11:53:13

Weirdly enough the NF was faster than the WHNF in some cases

Bolt

2020-02-08 11:54:11

Can you help me understand better what's going on?

Bolt

2020-02-08 12:12:49

I still need to tweak the benchmarks a little bit

TheMatten

2020-02-08 13:11:53

StrictData puts strictness annotations implicitly on every field - that means that during construction of datatype, values put into it are forced to WHNF - if they're strict in fields too, evaluation continues recursively

Bolt

2020-02-08 13:32:57

Cool, and is this the only thing I can do to optimise the performance?

Bolt

2020-02-08 13:34:07

Is GHC Smart enough to use all cores when there's possibility?

Bolt

2020-02-08 13:53:14

Should I be using other flags? I'm not very experienced in compiler flags

Bolt

2020-02-08 13:59:06

I found this, can be useful: https://wiki.haskell.org/Performance/GHC