[source]

compiler/basicTypes/Demand.hs

Note [Exceptions and strictness]

[note link]

We used to smart about catching exceptions, but we aren’t anymore. See #14998 for the way it’s resolved at the moment.

Here’s a historic breakdown:

Apparently, exception handling prim-ops didn’t use to have any special strictness signatures, thus defaulting to topSig, which assumes they use their arguments lazily. Joachim was the first to realise that we could provide richer information. Thus, in 0558911f91c (Dec 13), he added signatures to primops.txt.pp indicating that functions like catch# and catchRetry# call their argument, which is useful information for usage analysis. Still with a ‘Lazy’ strictness demand (i.e. ‘lazyApply1Dmd’), though, and the world was fine.

In 7c0fff4 (July 15), Simon argued that giving catch# et al. a ‘strictApply1Dmd’ leads to substantial performance gains. That was at the cost of correctness, as #10712 proved. So, back to ‘lazyApply1Dmd’ in 28638dfe79e (Dec 15).

Motivated to reproduce the gains of 7c0fff4 without the breakage of #10712, Ben opened #11222. Simon made the demand analyser “understand catch” in 9915b656 (Jan 16) by adding a new ‘catchArgDmd’, which basically said to call its argument strictly, but also swallow any thrown exceptions in ‘postProcessDmdResult’. This was realized by extending the ‘Str’ constructor of ‘ArgStr’ with a ‘ExnStr’ field, indicating that it catches the exception, and adding a ‘ThrowsExn’ constructor to the ‘Termination’ lattice as an element between ‘Dunno’ and ‘Diverges’. Then along came #11555 and finally #13330, so we had to revert to ‘lazyApply1Dmd’ again in 701256df88c (Mar 17).

This left the other variants like ‘catchRetry#’ having ‘catchArgDmd’, which is where #14998 picked up. Item 1 was concerned with measuring the impact of also making catchRetry# and catchSTM# have ‘lazyApply1Dmd’. The result was that there was none. We removed the last usages of ‘catchArgDmd’ in 00b8ecb7 (Apr 18). There was a lot of dead code resulting from that change, that we removed in ef6b283 (Jan 19): We got rid of ‘ThrowsExn’ and ‘ExnStr’ again and removed any code that was dealing with the peculiarities.

Where did the speed-ups vanish to? In #14998, item 3 established that turning ‘catch#’ strict in its first argument didn’t bring back any of the alleged performance benefits. Item 2 of that ticket finally found out that it was entirely due to ‘catchException’s new (since #11555) definition, which was simply

catchException !io handler = catch io handler

While ‘catchException’ is arguably the saner semantics for ‘catch’, it is an internal helper function in “GHC.IO”. Its use in “GHC.IO.Handle.Internals.do_operation” made for the huge allocation differences: Remove the bang and you find the regressions we originally wanted to avoid with ‘catchArgDmd’. See also #exceptions_and_strictness# in “GHC.IO”.

So history keeps telling us that the only possibly correct strictness annotation for the first argument of ‘catch#’ is ‘lazyApply1Dmd’, because ‘catch#’ really is not strict in its argument: Just try this in GHCi

:set -XScopedTypeVariables import Control.Exception catch undefined ((_ :: SomeException) -> putStrLn “you’ll see this”)

Any analysis that assumes otherwise will be broken in some way or another (beyond -fno-pendantic-bottoms).

Note [Demand on case-alternative binders]

[note link]

The demand on a binder in a case alternative comes
  1. From the demand on the binder itself
  2. From the demand on the case binder

Forgetting (b) led directly to #10148.

Example. Source code:
f x@(p,_) = if p then foo x else True
foo (p,True) = True
foo (p,q)    = foo (q,p)
After strictness analysis:
f = (x_an1 [Dmd=<S(SL),1*U(U,1*U)>] :: (Bool, Bool)) ->

case x_an1 of wild_X7 [Dmd=<L,1*U(1*U,1*U)>] { (p_an2 [Dmd=<S,1*U>], ds_dnz [Dmd=<L,A>]) -> case p_an2 of _ {

False -> GHC.Types.True; True -> foo wild_X7 }

It’s true that ds_dnz is itself absent, but the use of wild_X7 means that it is very much alive and demanded. See #10148 for how the consequences play out.

This is needed even for non-product types, in case the case-binder is used but the components of the case alternative are not.

Note [Don’t optimise UProd(Used) to Used]

[note link]

These two UseDmds:
UProd [Used, Used] and Used

are semantically equivalent, but we do not turn the former into the latter, for a regrettable-subtle reason. Suppose we did. then

f (x,y) = (y,x)
would get
StrDmd = Str = SProd [Lazy, Lazy] UseDmd = Used = UProd [Used, Used]

But with the joint demand of <Str, Used> doesn’t convey any clue that there is a product involved, and so the worthSplittingFun will not fire. (We’d need to use the type as well to make it fire.) Moreover, consider

g h p@(_,_) = h p

This too would get <Str, Used>, but this time there really isn’t any point in w/w since the components of the pair are not used at all.

So the solution is: don’t aggressively collapse UProd [Used,Used] to Used; intead leave it as-is. In effect we are using the UseDmd to do a little bit of boxity analysis. Not very nice.

Note [Used should win]

[note link]

Both in lubUse and bothUse we want (Used both UProd us) to be Used. Why? Because Used carries the implication the whole thing is used, box and all, so we don’t want to w/w it. If we use it both boxed and unboxed, then we are definitely using the box, and so we are quite likely to pay a reboxing cost. So we make Used win here.

Example is in the Buffer argument of GHC.IO.Handle.Internals.writeCharBuffer

Baseline: (A) Not making Used win (UProd wins) Compare with: (B) making Used win for lub and both

           Min          -0.3%     -5.6%    -10.7%    -11.0%    -33.3%
           Max          +0.3%    +45.6%    +11.5%    +11.5%     +6.9%
Geometric Mean          -0.0%     +0.5%     +0.3%     +0.2%     -0.8%

Baseline: (B) Making Used win for both lub and both Compare with: (C) making Used win for both, but UProd win for lub

           Min          -0.1%     -0.3%     -7.9%     -8.0%     -6.5%
           Max          +0.1%     +1.0%    +21.0%    +21.0%     +0.5%
Geometric Mean          +0.0%     +0.0%     -0.0%     -0.1%     -0.1%

If a demand is used multiple times (i.e. reused), than any use-once mentioned there, that is not protected by a UCall, can happen many times.

Note [Strict demands]

[note link]

isStrictDmd returns true only of demands that are
both strict and used

In particular, it is False for <HyperStr, Abs>, which can and does arise in, say (#7319)

f x = raise# <some exception>

Then ‘x’ is not used, so f gets strictness <HyperStr,Abs> -> . Now the w/w generates

fx = let x <HyperStr,Abs> = absentError “unused”
in raise <some exception>
At this point we really don’t want to convert to
fx = case absentError “unused” of x -> raise <some exception>

Since the program is going to diverge, this swaps one error for another, but it’s really a bad idea to ever evaluate an absent argument. In #7319 we get

T7319.exe: Oops! Entered absent arg w_s1Hd{v} [lid] [base:GHC.Base.String{tc 36u}]

Note [Dealing with call demands]

[note link]

Call demands are constructed and deconstructed coherently for strictness and absence. For instance, the strictness signature for the following function

f :: (Int -> (Int, Int)) -> (Int, Bool) f g = (snd (g 3), True)

should be: <L,C(U(AU))>m

Note [Trimming a demand to a type]

[note link]

Consider this:

f :: a -> Bool
f x = case ... of
        A g1 -> case (x |> g1) of (p,q) -> ...
        B    -> error "urk"

where A,B are the constructors of a GADT. We’ll get a U(U,U) demand on x from the A branch, but that’s a stupid demand for x itself, which has type ‘a’. Indeed we get ASSERTs going off (notably in splitUseProdDmd, #8569).

Bottom line: we really don’t want to have a binder whose demand is more deeply-nested than its type. There are various ways to tackle this. When processing (x |> g1), we could “trim” the incoming demand U(U,U) to match x’s type. But I’m currently doing so just at the moment when we pin a demand on a binder, in DmdAnal.findBndrDmd.

Note [Threshold demands]

[note link]

Threshold usage demand is generated to figure out if cardinality-instrumented demands of a binding’s free variables should be unleashed. See also [Aggregated demand for cardinality].

Note [Replicating polymorphic demands]

[note link]

Some demands can be considered as polymorphic. Generally, it is applicable to such beasts as tops, bottoms as well as Head-Used and Head-stricts demands. For instance,

S ~ S(L, …, L)

Also, when top or bottom is occurred as a result demand, it in fact can be expanded to saturate a callee’s arity.

Note [defaultDmd and resTypeArgDmd]

[note link]

These functions are similar: They express the demand on something not explicitly mentioned in the environment resp. the argument list. Yet they are different:

  • Variables not mentioned in the free variables environment are definitely unused, so we can use absDmd there.
  • Further arguments can be used, of course. Hence topDmd is used.

Note [Nature of result demand]

[note link]

A DmdResult contains information about termination (currently distinguishing definite divergence and no information; it is possible to include definite convergence here), and CPR information about the result.

The semantics of this depends on whether we are looking at a DmdType, i.e. the demand put on by an expression _under a specific incoming demand_ on its environment, or at a StrictSig describing a demand transformer.

For a
  • DmdType, the termination information is true given the demand it was generated with, while for
  • a StrictSig it holds after applying enough arguments.

The CPR information, though, is valid after the number of arguments mentioned in the type is given. Therefore, when forgetting the demand on arguments, as in dmdAnalRhs, this needs to be considere (via removeDmdTyArgs).

Consider
b2 x y = x seq y seq error (show x)
this has a strictness signature of
<S><S>b

meaning that “b2 seq ()” and “b2 1 seq ()” might well terminate, but for “b2 1 2 seq ()” we get definite divergence.

For comparison,
b1 x = x seq error (show x)
has a strictness signature of
<S>b

and “b1 1 seq ()” is known to terminate.

Now consider a function h with signature “<C(S)>”, and the expression
e1 = h b1

now h puts a demand of <C(S)> onto its argument, and the demand transformer turns it into

<S>b

Now the DmdResult “b” does apply to us, even though “b1 seq ()” does not diverge, and we do not anything being passed to b.

Note [Asymmetry of ‘both’ for DmdType and DmdResult]

[note link]

‘both’ for DmdTypes is asymmetrical, because there is only one result! For example, given (e1 e2), we get a DmdType dt1 for e1, use its arg demand to analyse e2 giving dt2, and then do (dt1 bothType dt2). Similarly with

case e of { p -> rhs }

we get dt_scrut from the scrutinee and dt_rhs from the RHS, and then compute (dt_rhs bothType dt_scrut).

We
  1. combine the information on the free variables,
  2. take the demand on arguments from the first argument
  3. combine the termination results, but
  4. take CPR info from the first argument.

3 and 4 are implementd in bothDmdResult.

Equality needed for fixpoints in DmdAnal

Note [The need for BothDmdArg]

[note link]

Previously, the right argument to bothDmdType, as well as the return value of dmdAnalStar via postProcessDmdType, was a DmdType. But bothDmdType only needs to know about the free variables and termination information, but nothing about the demand put on arguments, nor cpr information. So we make that explicit by only passing the relevant information.

Note [Demands from unsaturated function calls]

[note link]

Consider a demand transformer d1 -> d2 -> r for f. If a sufficiently detailed demand is fed into this transformer, e.g <C(C(S)), C1(C1(S))> arising from “f x1 x2” in a strict, use-once context, then d1 and d2 is precisely the demand unleashed onto x1 and x2 (similar for the free variable environment) and furthermore the result information r is the one we want to use.

An anonymous lambda is also an unsaturated function all (needs one argument, none given), so this applies to that case as well.

But the demand fed into f might be less than <C(C(S)), C1(C1(S))>. There are a few cases:
  • Not enough demand on the strictness side: - In that case, we need to zap all strictness in the demand on arguments and

    free variables.

    • Furthermore, we remove CPR information. It could be left, but given the incoming demand is not enough to evaluate so far we just do not bother.
    • And finally termination information: If r says that f diverges for sure, then this holds when the demand guarantees that two arguments are going to be passed. If the demand is lower, we may just as well converge. If we were tracking definite convegence, than that would still hold under a weaker demand than expected by the demand transformer.
  • Not enough demand from the usage side: The missing usage can be expanded using UCall Many, therefore this is subsumed by the third case:

  • At least one of the uses has a cardinality of Many. - Even if f puts a One demand on any of its argument or free variables, if

    we call f multiple times, we may evaluate this argument or free variable multiple times. So forget about any occurrence of “One” in the demand.

In dmdTransformSig, we call peelManyCalls to find out if we are in any of these cases, and then call postProcessUnsat to reduce the demand appropriately.

Similarly, dmdTransformDictSelSig and dmdAnal, when analyzing a Lambda, use peelCallDmd, which peels only one level, but also returns the demand put on the body of the function.

Note [Default demand on free variables]

[note link]

If the variable is not mentioned in the environment of a demand type, its demand is taken to be a result demand of the type.

For the stricness component,
if the result demand is a Diverges, then we use HyperStr
else we use Lazy

For the usage component, we use Absent.

So we use either absDmd or botDmd.

Also note the equations for lubDmdResult (resp. bothDmdResult) noted there.

Note [Always analyse in virgin pass]

[note link]

Tricky point: make sure that we analyse in the ‘virgin’ pass. Consider
rec { f acc x True = f (…rec { g y = …g… }…)
f acc x False = acc }

In the virgin pass for ‘f’ we’ll give ‘f’ a very strict (bottom) type. That might mean that we analyse the sub-expression containing the E = “…rec g…” stuff in a bottom demand. Suppose we didn’t analyse E, but just returned botType.

Then in the next (non-virgin) iteration for ‘f’, we might analyse E in a weaker demand, and that will trigger doing a fixpoint iteration for g. But because it’s not the virgin pass we won’t start g’s iteration at bottom. Disaster. (This happened in $sfibToList’ of nofib/spectral/fibheaps.)

So in the virgin pass we make sure that we do analyse the expression at least once, to initialise its signatures.

Note [Analyzing with lazy demand and lambdas]

[note link]

The insight for analyzing lambdas follows from the fact that for strictness S = C(L). This polymorphic expansion is critical for cardinality analysis of the following example:

{-# NOINLINE build #-} build g = (g (:) [], g (:) [])

h c z = build (x ->
let z1 = z ++ z
in if c
then y -> x (y ++ z1) else y -> x (z1 ++ y))

One can see that build assigns to g demand <L,C(C1(U))>. Therefore, when analyzing the lambda (x -> …), we expect each lambda y -> … to be annotated as “one-shot” one. Therefore (x -> y -> x (y ++ z)) should be analyzed with a demand <C(C(..), C(C1(U))>.

This is achieved by, first, converting the lazy demand L into the strict S by the second clause of the analysis.

Note [Analysing with absent demand]

[note link]

Suppose we analyse an expression with demand <L,A>. The “A” means “absent”, so this expression will never be needed. What should happen? There are several wrinkles:

  • We do want to analyse the expression regardless. Reason: Note [Always analyse in virgin pass]
But we can post-process the results to ignore all the usage
demands coming back. This is done by postProcessDmdType.
  • In a previous incarnation of GHC we needed to be extra careful in the case of an unlifted type, because unlifted values are evaluated even if they are not used. Example (see #9254):

    f :: (() -> (# Int#, () #)) -> ()

    – Strictness signature is – <C(S(LS)), 1*C1(U(A,1*U()))> – I.e. calls k, but discards first component of result

    f k = case k () of (# _, r #) -> r

g :: Int -> ()
g y = f (\n -> (# case y of I# y2 -> y2, n #))
Here f's strictness signature says (correctly) that it calls its
argument function and ignores the first component of its result.
This is correct in the sense that it'd be fine to (say) modify the
function so that always returned 0# in the first component.
But in function g, we *will* evaluate the 'case y of ...', because
it has type Int#.  So 'y' will be evaluated.  So we must record this
usage of 'y', else 'g' will say 'y' is absent, and will w/w so that
'y' is bound to an aBSENT_ERROR thunk.
However, the argument of toCleanDmd always satisfies the let/app
invariant; so if it is unlifted it is also okForSpeculation, and so
can be evaluated in a short finite time -- and that rules out nasty
cases like the one above.  (I'm not quite sure why this was a
problem in an earlier version of GHC, but it isn't now.)

Note [Demand transformer for a dictionary selector]

[note link]

If we evaluate (op dict-expr) under demand ‘d’, then we can push the demand ‘d’ into the appropriate field of the dictionary. What is the appropriate field? We just look at the strictness signature of the class op, which will be something like: U(AAASAAAAA). Then replace the ‘S’ by the demand ‘d’.

For single-method classes, which are represented by newtypes the signature of ‘op’ won’t look like U(…), so the splitProdDmd_maybe will fail. That’s fine: if we are doing strictness analysis we are also doing inlining, so we’ll have inlined ‘op’ into a cast. So we can bale out in a conservative way, returning nopDmdType.

It is (just.. #8329) possible to be running strictness analysis without having inlined class ops from single-method classes. Suppose you are using ghc –make; and the first module has a local -O0 flag. So you may load a class without interface pragmas, ie (currently) without an unfolding for the class ops. Now if a subsequent module in the –make sweep has a local -O flag you might do strictness analysis, but there is no inlining for the class op. This is weird, so I’m not worried about whether this optimises brilliantly; but it should not fall over.

Note [Computing one-shot info]

[note link]

Consider a call
f (pqr. e1) (xyz. e2) e3
where f has usage signature
C1(C(C1(U))) C1(U) U
Then argsOneShots returns a [[OneShotInfo]] of
[[OneShot,NoOneShotInfo,OneShot], [OneShot]]

The occurrence analyser propagates this one-shot infor to the binders pqr and xyz; see Note [Use one-shot information] in OccurAnal.

Note [Unsaturated applications]

[note link]

If a function having bottom as its demand result is applied to a less number of arguments than its syntactic arity, we cannot say for sure that it is going to diverge. This is the reason why we use the function appIsBottom, which, given a strictness signature and a number of arguments, says conservatively if the function is going to diverge or not.

Zap absence or one-shot information, under control of flags

Note [Killing usage information]

[note link]

The flags -fkill-one-shot and -fkill-absence let you switch off the generation of absence or one-shot information altogether. This is only used for performance tests, to see how important they are.

Note [HyperStr and Use demands]

[note link]

The information “HyperStr” needs to be in the strictness signature, and not in the demand signature, because we still want to know about the demand on things. Consider

f (x,y) True  = error (show x)
f (x,y) False = x+1

The signature of f should be <S(SL),1*U(1*U(U),A)><S,1*U>m. If we were not distinguishing the uses on x and y in the True case, we could either not figure out how deeply we can unpack x, or that we do not have to pass y.