[source]

compiler/simplStg/StgLiftLams.hs

Note [Late lambda lifting in STG]

[note link]

See also the <https://gitlab.haskell.org/ghc/ghc/wikis/late-lam-lift wiki page>
and #9476.

The basic idea behind lambda lifting is to turn locally defined functions into top-level functions. Free variables are then passed as additional arguments at call sites instead of having a closure allocated for them at definition site. Example:

@
let x = …; y = … in let f = {x y} a -> a + x + y in let g = {f x} b -> f b + x in g 5

@

Lambda lifting @f@ would
  1. Turn @f@’s free variables into formal parameters
  2. Update @f@’s call site within @g@ to @f x y b@
  3. Update @g@’s closure: Add @y@ as an additional free variable, while removing @f@, because @f@ no longer allocates and can be floated to top-level.
  4. Actually float the binding of @f@ to top-level, eliminating the @let@ in the process.
This results in the following program (with free var annotations):
@
f x y a = a + x + y; let x = …; y = … in let g = {x y} b -> f x y b + x in g 5

@

This optimisation is all about lifting only when it is beneficial to do so.
The above seems like a worthwhile lift, judging from heap allocation:
We eliminate @f@'s closure, saving to allocate a closure with 2 words, while
not changing the size of @g@'s closure.
You can probably sense that there's some kind of cost model at play here.
And you are right! But we also employ a couple of other heuristics for the
lifting decision which are outlined in "StgLiftLams.Analysis#when".
The transformation is done in “StgLiftLams.Transformation”, which calls out to ‘StgLiftLams.Analysis.goodToLift’ for its lifting decision. It relies on “StgLiftLams.LiftM”, which abstracts some subtle STG invariants into a monadic substrate.
Suffice to say: We trade heap allocation for stack allocation.
The additional arguments have to passed on the stack (or in registers,
depending on architecture) every time we call the function to save a single
heap allocation when entering the let binding. Nofib suggests a mean
improvement of about 1% for this pass, so it seems like a worthwhile thing to
do. Compile-times went up by 0.6%, so all in all a very modest change.
For a concrete example, look at @spectral/atom@. There's a call to 'zipWith'
that is ultimately compiled to something like this
(module desugaring/lowering to actual STG):
@

propagate dt = …; runExperiment … =

let xs = … in let ys = … in let go = {dt go} xs ys -> case (xs, ys) of

([], []) -> [] (x:xs’, y:ys’) -> propagate dt x y : go xs’ ys’

in go xs ys

@

This will lambda lift @go@ to top-level, speeding up the resulting program by roughly one percent:

@

propagate dt = …; go dt xs ys = case (xs, ys) of

([], []) -> [] (x:xs’, y:ys’) -> propagate dt x y : go dt xs’ ys’
runExperiment … =
let xs = … in let ys = … in in go dt xs ys

@