[source]

compiler/stranal/WwLib.hs

Note [Always do CPR w/w]

[note link]

At one time we refrained from doing CPR w/w for thunks, on the grounds that we might duplicate work. But that is already handled by the demand analyser, which doesn’t give the CPR proprety if w/w might waste work: see Note [CPR for thunks] in DmdAnal.

And if something has been given the CPR property and we don’t w/w, it’s a disaster, because then the enclosing function might say it has the CPR property, but now doesn’t and there a cascade of disaster. A good example is #5920.

Note [Limit w/w arity]

[note link]

Guard against high worker arity as it generates a lot of stack traffic. A simplified example is #11565#comment:6

Current strategy is very simple: don’t perform w/w transformation at all if the result produces a wrapper with arity higher than -fmax-worker-args=.

It is a bit all or nothing, consider

f (x,y) (a,b,c,d,e ... , z) = rhs

Currently we will remove all w/w ness entirely. But actually we could w/w on the (x,y) pair… it’s the huge product that is the problem.

Could we instead refrain from w/w on an arg-by-arg basis? Yes, that’d solve f. But we can get a lot of args from deeply-nested products:

g (a, (b, (c, (d, ...)))) = rhs

This is harder to spot on an arg-by-arg basis. Previously mkWwStr was given some “fuel” saying how many arguments it could add; when we ran out of fuel it would stop w/wing. Still not very clever because it had a left-right bias.

Note [Protecting the last value argument]

[note link]

If the user writes (_ -> E), they might be intentionally disallowing the sharing of E. Since absence analysis and worker-wrapper are keen to remove such unused arguments, we add in a void argument to prevent the function from becoming a thunk.

The user can avoid adding the void argument with the -ffun-to-thunk flag. However, this can create sharing, which may be bad in two ways. 1) It can create a space leak. 2) It can prevent inlining under a lambda. If w/w removes the last argument from a function f, then f now looks like a thunk, and so f can’t be inlined under a lambda.

Note [Join points and beta-redexes]

[note link]

Originally, the worker would invoke the original function by calling it with arguments, thus producing a beta-redex for the simplifier to munch away:

\x y z -> e => (\x y z -> e) wx wy wz

Now that we have special rules about join points, however, this is Not Good if the original function is itself a join point, as then it may contain invocations of other join points:

join j1 x = ...
join j2 y = if y == 0 then 0 else j1 y
=>
join j1 x = ...
join $wj2 y# = let wy = I# y# in (\y -> if y == 0 then 0 else jump j1 y) wy
join j2 y = case y of I# y# -> jump $wj2 y#

There can’t be an intervening lambda between a join point’s declaration and its occurrences, so $wj2 here is wrong. But of course, this is easy enough to fix:

… let join $wj2 y# = let wy = I# y# in let y = wy in if y == 0 then 0 else j1 y …

Hence we simply do the beta-reduction here. (This would be harder if we had to worry about hygiene, but luckily wy is freshly generated.)

Note [Join points returning functions]

[note link]

It is crucial that the arity of a join point depends on its callers, not its own syntax. What this means is that a join point can have “extra lambdas”:

f :: Int -> Int -> (Int, Int) -> Int f x y = join j (z, w) = (u, v) -> …

in jump j (x, y)

Typically this happens with functions that are seen as computing functions, rather than being curried. (The real-life example was GraphOps.addConflicts.)

When we create the wrapper, it must be in “eta-contracted” form so that the jump has the right number of arguments:

f x y = join $wj z’ w’ = u’ v’ -> let {z = z’; w = w’; u = u’; v = v’} in …
j (z, w) = jump $wj z w

(See Note [Join points and beta-redexes] for where the lets come from.) If j were a function, we would instead say

f x y = let $wj = z’ w’ u’ v’ -> let {z = z’; w = w’; u = u’; v = v’} in …
j (z, w) (u, v) = $wj z w u v

Notice that the worker ends up with the same lambdas; it’s only the wrapper we have to be concerned about.

FIXME Currently the functionality to produce “eta-contracted” wrappers is unimplemented; we simply give up.

Note [Freshen WW arguments]

[note link]

Wen we do a worker/wrapper split, we must not in-scope names as the arguments of the worker, else we’ll get name capture. E.g.

– y1 is in scope from further out f x = ..y1..

If we accidentally choose y1 as a worker argument disaster results:

fww y1 y2 = let x = (y1,y2) in ...y1...

To avoid this:

  • We use a fresh unique for both type-variable and term-variable binders Originally we lacked this freshness for type variables, and that led to the very obscure #12562. (A type variable in the worker shadowed an outer term-variable binding.)
  • Because of this cloning we have to substitute in the type/kind of the new binders. That’s why we carry the TCvSubst through mkWWargs.
So we need a decent in-scope set, just in case that type/kind
itself has foralls.  We get this from the free vars of the RHS of the
function since those are the only variables that might be captured.
It's a lazy thunk, which will only be poked if the type/kind has a forall.
Another tricky case was when f :: forall a. a -> forall a. a->a
(i.e. with shadowing), and then the worker used the same 'a' twice.

Note [Unpacking arguments with product and polymorphic demands]

[note link]

The argument is unpacked in a case if it has a product type and has a strict and used demand put on it. I.e., arguments, with demands such as the following ones:

<S,U(U, L)>
<S(L,S),U>

will be unpacked, but

<S,U> or <B,U>

will not, because the pieces aren’t used. This is quite important otherwise we end up unpacking massive tuples passed to the bottoming function. Example:

f :: ((Int,Int) -> String) -> (Int,Int) -> a
f g pr = error (g pr)
main = print (f fst (1, error "no"))

Does ‘main’ print “error 1” or “error no”? We don’t really want ‘f’ to unbox its second argument. This actually happened in GHC’s onwn source code, in Packages.applyPackageFlag, which ended up un-boxing the enormous DynFlags tuple, and being strict in the as-yet-un-filled-in pkgState files.

Note [How to do the worker/wrapper split]

[note link]

The worker-wrapper transformation, mkWWstr_one, takes into account several possibilities to decide if the function is worthy for splitting:

  1. If an argument is absent, it would be silly to pass it to the worker. Hence the isAbsDmd case. This case must come first because a demand like <S,A> or <B,A> is possible. E.g. <B,A> comes from a function like

    f x = error “urk”

    and <S,A> can come from Note [Add demands for strict constructors]

  2. If the argument is evaluated strictly, and we can split the product demand (splitProdDmd_maybe), then unbox it and w/w its pieces. For example

    f :: (Int, Int) -> Int f p = (case p of (a,b) -> a) + 1

is split to
f :: (Int, Int) -> Int f p = case p of (a,b) -> $wf a
$wf :: Int -> Int
$wf a = a + 1
and

g :: Bool -> (Int, Int) -> Int g c p = case p of (a,b) ->

if c then a else b
is split to
g c p = case p of (a,b) -> $gw c a b $gw c a b = if c then a else b
2a But do /not/ split if the components are not used; that is, the
usage is just ‘Used’ rather than ‘UProd’. In this case splitProdDmd_maybe returns Nothing. Otherwise we risk decomposing a massive tuple which is barely used. Example:
f :: ((Int,Int) -> String) -> (Int,Int) -> a
f g pr = error (g pr)
main = print (f fst (1, error "no"))
Here, f does not take 'pr' apart, and it's stupid to do so.
Imagine that it had millions of fields. This actually happened
in GHC itself where the tuple was DynFlags
  1. A plain ‘seqDmd’, which is head-strict with usage UHead, can’t be split by splitProdDmd_maybe. But we want it to behave just like U(AAAA) for suitable number of absent demands. So we have a special case for it, with arity coming from the data constructor.

Note [Worker-wrapper for bottoming functions]

[note link]

We used not to split if the result is bottom. [Justification: there’s no efficiency to be gained.]

But it’s sometimes bad not to make a wrapper. Consider
fw = x# -> let x = I# x# in case e of
p1 -> error_fn x p2 -> error_fn x p3 -> the real stuff

The re-boxing code won’t go away unless error_fn gets a wrapper too. [We don’t do reboxing now, but in general it’s better to pass an unboxed thing to f, and have it reboxed in the error cases….]

Note [Add demands for strict constructors]

[note link]

Consider this program (due to Roman):

data X a = X !a
foo :: X Int -> Int -> Int
foo (X a) n = go 0
 where
   go i | i < n     = a + go (i+1)
        | otherwise = 0

We want the worker for ‘foo’ too look like this:

$wfoo :: Int# -> Int# -> Int#

with the first argument unboxed, so that it is not eval’d each time around the ‘go’ loop (which would otherwise happen, since ‘foo’ is not strict in ‘a’). It is sound for the wrapper to pass an unboxed arg because X is strict, so its argument must be evaluated. And if we don’t pass an unboxed argument, we can’t even repair it by adding a seq thus:

foo (X a) n = a `seq` go 0

because the seq is discarded (very early) since X is strict!

So here’s what we do

  • We leave the demand-analysis alone. The demand on ‘a’ in the definition of ‘foo’ is <L, U(U)>; the strictness info is Lazy because foo’s body may or may not evaluate ‘a’; but the usage info says that ‘a’ is unpacked and its content is used.
  • During worker/wrapper, if we unpack a strict constructor (as we do for ‘foo’), we use ‘addDataConStrictness’ to bump up the strictness on the strict arguments of the data constructor.
  • That in turn means that, if the usage info supports doing so (i.e. splitProdDmd_maybe returns Just), we will unpack that argument – even though the original demand (e.g. on ‘a’) was lazy.
  • What does “bump up the strictness” mean? Just add a head-strict demand to the strictness! Even for a demand like <L,A> we can safely turn it into <S,A>; remember case (1) of Note [How to do the worker/wrapper split].

The net effect is that the w/w transformation is more aggressive about unpacking the strict arguments of a data constructor, when that eagerness is supported by the usage info.

There is the usual danger of reboxing, which as usual we ignore. But if X is monomorphic, and has an UNPACK pragma, then this optimisation is even more important. We don’t want the wrapper to rebox an unboxed argument, and pass an Int to $wfoo!

This works in nested situations like

data family Bar a
data instance Bar (a, b) = BarPair !(Bar a) !(Bar b)
newtype instance Bar Int = Bar Int
foo :: Bar ((Int, Int), Int) -> Int -> Int
foo f k = case f of BarPair x y ->
          case burble of
             True -> case x of
                       BarPair p q -> ...
             False -> ...
The extra eagerness lets us produce a worker of type:
$wfoo :: Int# -> Int# -> Int# -> Int -> Int $wfoo p# q# y# = …

even though the case x is only lazily evaluated.

——— Historical note ———— We used to add data-con strictness demands when demand analysing case expression. However, it was noticed in #15696 that this misses some cases. For instance, consider the program (from T10482)

data family Bar a
data instance Bar (a, b) = BarPair !(Bar a) !(Bar b)
newtype instance Bar Int = Bar Int
foo :: Bar ((Int, Int), Int) -> Int -> Int
foo f k =
  case f of
    BarPair x y -> case burble of
                      True -> case x of
                                BarPair p q -> ...
                      False -> ...

We really should be able to assume that p is already evaluated since it came from a strict field of BarPair. This strictness would allow us to produce a worker of type:

$wfoo :: Int# -> Int# -> Int# -> Int -> Int
$wfoo p# q# y# = ...

even though the case x is only lazily evaluated

Indeed before we fixed #15696 this would happen since we would float the inner case x through the case burble to get:

foo f k =
  case f of
    BarPair x y -> case x of
                      BarPair p q -> case burble of
                                      True -> ...
                                      False -> ...

However, after fixing #15696 this could no longer happen (for the reasons discussed in ticket:15696#comment:76). This means that the demand placed on f would then be significantly weaker (since the False branch of the case on burble is not strict in p or q).

Consequently, we now instead account for data-con strictness in mkWWstr_one, applying the strictness demands to the final result of DmdAnal. The result is that we get the strict demand signature we wanted even if we can’t float the case on x up through the case on burble.

Note [mkWWstr and unsafeCoerce]

[note link]

By using unsafeCoerce, it is possible to make the number of demands fail to match the number of constructor arguments; this happened in #8037. If so, the worker/wrapper split doesn’t work right and we get a Core Lint bug. The fix here is simply to decline to do w/w if that happens.

Note [Record evaluated-ness in worker/wrapper]

[note link]

Suppose we have

data T = MkT !Int Int
f :: T -> T
f x = e

and f’s is strict, and has the CPR property. The we are going to generate this w/w split

f x = case x of
        MkT x1 x2 -> case $wf x1 x2 of
                        (# r1, r2 #) -> MkT r1 r2
$wfw x1 x2 = let x = MkT x1 x2 in
             case e of
               MkT r1 r2 -> (# r1, r2 #)

Note that

  • In the worker $wf, inside ‘e’ we can be sure that x1 will be evaluated (it came from unpacking the argument MkT. But that’s no immediately apparent in $wf
  • In the wrapper ‘f’, which we’ll inline at call sites, we can be sure that ‘r1’ has been evaluated (because it came from unpacking the result MkT. But that is not immediately apparent from the wrapper code.

Missing these facts isn’t unsound, but it loses possible future opportunities for optimisation.

Solution: use setCaseBndrEvald when creating
  1. The arg binders x1,x2 in mkWstr_one
    See #13077, test T13077
  2. The result binders r1,r2 in mkWWcpr_help
    See Trace #13077, test T13077a And #13027 comment:20, item (4)

to record that the relevant binder is evaluated.

Note [Do not unpack class dictionaries]

[note link]

If we have
f :: Ord a => [a] -> Int -> a {-# INLINABLE f #-}

and we worker/wrapper f, we’ll get a worker with an INLINABLE pragma (see Note [Worker-wrapper for INLINABLE functions] in WorkWrap), which can still be specialised by the type-class specialiser, something like

fw :: Ord a => [a] -> Int# -> a
BUT if f is strict in the Ord dictionary, we might unpack it, to get
fw :: (a->a->Bool) -> [a] -> Int# -> a

and the type-class specialiser can’t specialise that. An example is #6056.

But in any other situation a dictionary is just an ordinary value, and can be unpacked. So we track the INLINABLE pragma, and switch off the unpacking in mkWWstr_one (see the isClassPred test).

Historical note: #14955 describes how I got this fix wrong the first time.

Note [non-algebraic or open body type warning]

[note link]

There are a few cases where the W/W transformation is told that something returns a constructor, but the type at hand doesn’t really match this. One real-world example involves unsafeCoerce:

foo = IO a foo = unsafeCoerce c_exit foreign import ccall “c_exit” c_exit :: IO ()

Here CPR will tell you that foo returns a () constructor for sure, but trying to create a worker/wrapper for type a obviously fails. (This was a real example until ee8e792 in libraries/base.)

It does not seem feasible to avoid all such cases already in the analyser (and after all, the analysis is not really wrong), so we simply do nothing here in mkWWcpr. But we still want to emit warning with -DDEBUG, to hopefully catch other cases where something went avoidably wrong.

Note [Profiling and unpacking]

[note link]

If the original function looked like
f = x -> {-# SCC “foo” #-} E
then we want the CPR’d worker to look like
x -> {-# SCC “foo” #-} (case E of I# x -> x)
and definitely not
x -> case ({-# SCC “foo” #-} E) of I# x -> x)

This transform doesn’t move work or allocation from one cost centre to another.

Later [SDM]: presumably this is because we want the simplifier to eliminate the case, and the scc would get in the way? I’m ok with including the case itself in the cost centre, since it is morally part of the function (post transformation) anyway.

Note [Absent errors]

[note link]

We make a new binding for Ids that are marked absent, thus
let x = absentError “x :: Int”

The idea is that this binding will never be used; but if it buggily is used we’ll get a runtime error message.

Coping with absence for unlifted types is important; see, for example, #4306 and #15627. In the UnliftedRep case, we can use LitRubbish, which we need to apply to the required type. For the unlifted types of singleton kind like Float#, Addr#, etc. we also find a suitable literal, using Literal.absentLiteralOf. We don’t have literals for every primitive type, so the function is partial.

Note: I did try the experiment of using an error thunk for unlifted things too, relying on the simplifier to drop it as dead code. But this is fragile

  • It fails when profiling is on, which disables various optimisations

  • It fails when reboxing happens. E.g.

    data T = MkT Int Int# f p@(MkT a _) = …g p….

    where g is /lazy/ in ‘p’, but only uses the first component. Then ‘f’ is /strict/ in ‘p’, and only uses the first component. So we only pass that component to the worker for ‘f’, which reconstructs ‘p’ to pass it to ‘g’. Alas we can’t say

    …f (MkT a (absentError Int# “blah”))…

    bacause MkT is strict in its Int# argument, so we get an absentError exception when we shouldn’t. Very annoying!

So absentError is only used for lifted types.