[source]

compiler/basicTypes/MkId.hs

Note [Wired-in Ids]

[note link]

A “wired-in” Id can be referred to directly in GHC (e.g. ‘voidPrimId’) rather than by looking it up its name in some environment or fetching it from an interface file.

There are several reasons why an Id might appear in the wiredInIds:

  • ghcPrimIds: see Note [ghcPrimIds (aka pseudoops)]
  • magicIds: see Note [magicIds]
  • errorIds, defined in coreSyn/MkCore.hs. These error functions (e.g. rUNTIME_ERROR_ID) are wired in because the desugarer generates code that mentions them directly

In all cases except ghcPrimIds, there is a definition site in a library module, which may be called (e.g. in higher order situations); but the wired-in version means that the details are never read from that module’s interface file; instead, the full definition is right here.

Note [ghcPrimIds (aka pseudoops)]

[note link]

The ghcPrimIds

  • Are exported from GHC.Prim
  • Can’t be defined in Haskell, and hence no Haskell binding site, but have perfectly reasonable unfoldings in Core
  • Either have a CompulsoryUnfolding (hence always inlined), or
    of an EvaldUnfolding and void representation (e.g. void#)
  • Are (or should be) defined in primops.txt.pp as ‘pseudoop’ Reason: that’s how we generate documentation for them

Note [magicIds]

[note link]

The magicIds

  • Are exported from GHC.Magic
  • Can be defined in Haskell (and are, in ghc-prim:GHC/Magic.hs). This definition at least generates Haddock documentation for them.
  • May or may not have a CompulsoryUnfolding.
  • But have some special behaviour that can’t be done via an unfolding from an interface file

Note [Wrappers for data instance tycons]

[note link]

In the case of data instances, the wrapper also applies the coercion turning the representation type into the family instance type to cast the result of the wrapper. For example, consider the declarations

data family Map k :: * -> *
data instance Map (a, b) v = MapPair (Map a (Pair b v))

The tycon to which the datacon MapPair belongs gets a unique internal name of the form :R123Map, and we call it the representation tycon. In contrast, Map is the family tycon (accessible via tyConFamInst_maybe). A coercion allows you to move between representation and family type. It is accessible from :R123Map via tyConFamilyCoercion_maybe and has kind

Co123Map a b v :: {Map (a, b) v ~ :R123Map a b v}

The wrapper and worker of MapPair get the types

– Wrapper

$WMapPair :: forall a b v. Map a (Map a b v) -> Map (a, b) v $WMapPair a b v = MapPair a b v cast sym (Co123Map a b v)

– Worker

MapPair :: forall a b v. Map a (Map a b v) -> :R123Map a b v

This coercion is conditionally applied by wrapFamInstBody.

It’s a bit more complicated if the data instance is a GADT as well!

data instance T [a] where
     T1 :: forall b. b -> T [Maybe b]

Hence we translate to

– Wrapper

$WT1 :: forall b. b -> T [Maybe b] $WT1 b v = T1 (Maybe b) b (Maybe b) v

cast sym (Co7T (Maybe b))

—Worker

T1 :: forall c b. (c ~ Maybe b) => b -> :R7T c

– Coercion from family type to representation type

Co7T a :: T [a] ~ :R7T a

Newtype instances through an additional wrinkle into the mix. Consider the following example (adapted from #15318, comment:2):

data family T a
newtype instance T [a] = MkT [a]

Within the newtype instance, there are three distinct types at play:

  1. The newtype’s underlying type, [a].
  2. The instance’s representation type, TList a (where TList is the representation tycon).
  3. The family type, T [a].

We need two coercions in order to cast from (1) to (3):

  1. A newtype coercion axiom:
axiom coTList a :: TList a ~ [a]
(Where TList is the representation tycon of the newtype instance.)
  1. A data family instance coercion axiom:
axiom coT a :: T [a] ~ TList a

When we translate the newtype instance to Core, we obtain:

– Wrapper

$WMkT :: forall a. [a] -> T [a] $WMkT a x = MkT a x |> Sym (coT a)

– Worker

MkT :: forall a. [a] -> TList [a] MkT a x = x |> Sym (coTList a)

Unlike for data instances, the worker for a newtype instance is actually an executable function which expands to a cast, but otherwise, the general strategy is essentially the same as for data instances. Also note that we have a wrapper, which is unusual for a newtype, but we make GHC produce one anyway for symmetry with the way data instances are handled.

Note [Newtype datacons]

[note link]

The “data constructor” for a newtype should always be vanilla. At one point this wasn’t true, because the newtype arising from

class C a => D a
looked like
newtype T:D a = D:D (C a)

so the data constructor for T:C had a single argument, namely the predicate (C a). But now we treat that as an ordinary argument, not part of the theta-type, so all is well.

Note [Compulsory newtype unfolding]

[note link]

Newtype wrappers, just like workers, have compulsory unfoldings. This is needed so that two optimizations involving newtypes have the same effect whether a wrapper is present or not:

  1. Case-of-known constructor. See Note [beta-reduction in exprIsConApp_maybe].
  2. Matching against the map/coerce RULE. Suppose we have the RULE
{-# RULE "map/coerce" map coerce = ... #-}
As described in Note [Getting the map/coerce RULE to work],
the occurrence of 'coerce' is transformed into:
{-# RULE "map/coerce" forall (c :: T1 ~R# T2).
                      map ((\v -> v) `cast` c) = ... #-}
We'd like 'map Age' to match the LHS. For this to happen, Age
must be unfolded, otherwise we'll be stuck. This is tested in T16208.

Note [Inline partially-applied constructor wrappers]

[note link]

We allow the wrapper to inline when partially applied to avoid boxing values unnecessarily. For example, consider

data Foo a = Foo !Int a
instance Traversable Foo where
  traverse f (Foo i a) = Foo i <$> f a

This desugars to

traverse f foo = case foo of
     Foo i# a -> let i = I# i#
                 in map ($WFoo i) (f a)

If the wrapper $WFoo is not inlined, we get a fruitless reboxing of i. But if we inline the wrapper, we get

map (\a. case i of I# i# a -> Foo i# a) (f a)

and now case-of-known-constructor eliminates the redundant allocation.

Note [Activation for data constructor wrappers]

[note link]

The Activation on a data constructor wrapper allows it to inline only in Phase 0. This way rules have a chance to fire if they mention a data constructor on the left

RULE “foo” f (K a b) = …

Since the LHS of rules are simplified with InitialPhase, we won’t inline the wrapper on the LHS either.

On the other hand, this means that exprIsConApp_maybe must be able to deal with wrappers so that case-of-constructor is not delayed; see Note [exprIsConApp_maybe on data constructors with wrappers] for details.

It used to activate in phases 2 (afterInitial) and later, but it makes it awkward to write a RULE[1] with a constructor on the left: it would work if a constructor has no wrapper, but whether a constructor has a wrapper depends, for instance, on the order of type argument of that constructors. Therefore changing the order of type argument could make previously working RULEs fail.

See also https://gitlab.haskell.org/ghc/ghc/issues/15840 .

Note [Bangs on imported data constructors]

[note link]

We pass Maybe [HsImplBang] to mkDataConRep to make use of HsImplBangs from imported modules.

  • Nothing <=> use HsSrcBangs
  • Just bangs <=> use HsImplBangs

For imported types we can’t work it all out from the HsSrcBangs, because we want to be very sure to follow what the original module (where the data type was declared) decided, and that depends on what flags were enabled when it was compiled. So we record the decisions in the interface file.

The HsImplBangs passed are in 1-1 correspondence with the dataConOrigArgTys of the DataCon.

Note [Data con wrappers and unlifted types]

[note link]

Consider
data T = MkT !Int#
We certainly do not want to make a wrapper
$WMkT x = case x of y { DEFAULT -> MkT y }

For a start, it’s still to generate a no-op. But worse, since wrappers are currently injected at TidyCore, we don’t even optimise it away! So the stupid case expression stays there. This actually happened for the Integer data type (see #1600 comment:66)!

Note [Data con wrappers and GADT syntax]

[note link]

Consider these two very similar data types:

data T1 a b = MkT1 b
data T2 a b where
  MkT2 :: forall b a. b -> T2 a b

Despite their similar appearance, T2 will have a data con wrapper but T1 will not. What sets them apart? The types of their constructors, which are:

MkT1 :: forall a b. b -> T1 a b
MkT2 :: forall b a. b -> T2 a b

MkT2’s use of GADT syntax allows it to permute the order in which a and b would normally appear. See Note [DataCon user type variable binders] in DataCon for further discussion on this topic.

The worker data cons for T1 and T2, however, both have types such that a is expected to come before b as arguments. Because MkT2 permutes this order, it needs a data con wrapper to swizzle around the type variables to be in the order the worker expects.

A somewhat surprising consequence of this is that newtypes can have data con wrappers! After all, a newtype can also be written with GADT syntax:

newtype T3 a b where
  MkT3 :: forall b a. b -> T3 a b

Again, this needs a wrapper data con to reorder the type variables. It does mean that this newtype constructor requires another level of indirection when being called, but the inliner should make swift work of that.

Note [HsImplBangs for newtypes]

[note link]

Most of the time, we use the dataConSrctoImplBang function to decide what strictness/unpackedness to use for the fields of a data type constructor. But there is an exception to this rule: newtype constructors. You might not think that newtypes would pose a challenge, since newtypes are seemingly forbidden from having strictness annotations in the first place. But consider this (from #16141):

{-# LANGUAGE StrictData #-}
{-# OPTIONS_GHC -O #-}
newtype T a b where
  MkT :: forall b a. Int -> T a b

Because StrictData (plus optimization) is enabled, invoking dataConSrcToImplBang would sneak in and unpack the field of type Int to Int#! This would be disastrous, since the wrapper for MkT uses a coercion involving Int, not Int#.

Bottom line: dataConSrcToImplBang should never be invoked for newtypes. In the case of a newtype constructor, we simply hardcode its dcr_bangs field to [HsLazy].

Note [Unpacking GADTs and existentials]

[note link]

There is nothing stopping us unpacking a data type with equality components, like

data Equal a b where
Equal :: Equal a a

And it’d be fine to unpack a product type with existential components too, but that would require a bit more plumbing, so currently we don’t.

So for now we require: null (dataConExTyCoVars data_con) See #14978

Note [Unpack one-wide fields]

[note link]

The flag UnboxSmallStrictFields ensures that any field that can (safely) be unboxed to a word-sized unboxed field, should be so unboxed. For example:

data A = A Int#
newtype B = B A
data C = C !B
data D = D !C
data E = E !()
data F = F !D
data G = G !F !F

All of these should have an Int# as their representation, except G which should have two Int#s.

However

data T = T !(S Int)
data S = S !a

Here we can represent T with an Int#.

Note [Recursive unboxing]

[note link]

Consider
data R = MkR {-# UNPACK #-} !S Int data S = MkS {-# UNPACK #-} !Int

The representation arguments of MkR are the representation arguments of S (plus Int); the rep args of MkS are Int#. This is all fine.

But be careful not to try to unbox this!
data T = MkT {-# UNPACK #-} !T Int

Because then we’d get an infinite number of arguments.

Here is a more complicated case:
data S = MkS {-# UNPACK #-} !T Int data T = MkT {-# UNPACK #-} !S Int

Each of S and T must decide independently whether to unpack and they had better not both say yes. So they must both say no.

Also behave conservatively when there is no UNPACK pragma
data T = MkS !T Int

with -funbox-strict-fields or -funbox-small-strict-fields we need to behave as if there was an UNPACK pragma there.

But it’s the argument type that matters. This is fine:
data S = MkS S !Int

because Int is non-recursive.

Note [Dict funs and default methods]

[note link]

Dict funs and default methods are not ImplicitIds. Their definition involves user-written code, so we can’t figure out their strictness etc based on fixed info, as we can for constructors and record selectors (say).

NB: See also Note [Exported LocalIds] in Id

Note [Unsafe coerce magic]

[note link]

We define a primitive
GHC.Prim.unsafeCoerce#
and then in the base library we define the ordinary function
Unsafe.Coerce.unsafeCoerce :: forall (a:) (b:). a -> b unsafeCoerce x = unsafeCoerce# x

Notice that unsafeCoerce has a civilized (albeit still dangerous) polymorphic type, whose type args have kind *. So you can’t use it on unboxed values (unsafeCoerce 3#).

In contrast unsafeCoerce# is even more dangerous because you can use it on unboxed things, (unsafeCoerce# 3#) :: Int. Its type is

forall (r1 :: RuntimeRep) (r2 :: RuntimeRep) (a: TYPE r1) (b: TYPE r2). a -> b

Note [seqId magic]

[note link]

‘GHC.Prim.seq’ is special in several ways.

  1. In source Haskell its second arg can have an unboxed type

    x seq (v +# w)

    But see Note [Typing rule for seq] in TcExpr, which explains why we give seq itself an ordinary type

    seq :: forall a b. a -> b -> b

    and treat it as a language construct from a typing point of view.

  2. Its fixity is set in LoadIface.ghcPrimIface

  3. It has quite a bit of desugaring magic. See DsUtils.hs Note [Desugaring seq (1)] and (2) and (3)

  4. There is some special rule handing: Note [User-defined RULES for seq]

Note [User-defined RULES for seq]

[note link]

Roman found situations where he had
case (f n) of _ -> e

where he knew that f (which was strict in n) would terminate if n did. Notice that the result of (f n) is discarded. So it makes sense to transform to

case n of _ -> e

Rather than attempt some general analysis to support this, I’ve added enough support that you can do this using a rewrite rule:

RULE "f/seq" forall n.  seq (f n) = seq n

You write that rule. When GHC sees a case expression that discards its result, it mentally transforms it to a call to ‘seq’ and looks for a RULE. (This is done in Simplify.trySeqRules.) As usual, the correctness of the rule is up to you.

VERY IMPORTANT: to make this work, we give the RULE an arity of 1, not 2. If we wrote

RULE “f/seq” forall n e. seq (f n) e = seq n e

with rule arity 2, then two bad things would happen:

  • The magical desugaring done in Note [seqId magic] item (c) for saturated application of ‘seq’ would turn the LHS into a case expression!
  • The code in Simplify.rebuildCase would need to actually supply the value argument, which turns out to be awkward.

Note [lazyId magic]

[note link]

lazy :: forall a?. a? -> a? (i.e. works for unboxed types too)

‘lazy’ is used to make sure that a sub-expression, and its free variables, are truly used call-by-need, with no code motion. Key examples:

  • pseq: pseq a b = a seq lazy b We want to make sure that the free vars of ‘b’ are not evaluated before ‘a’, even though the expression is plainly strict in ‘b’.
  • catch: catch a b = catch# (lazy a) b Again, it’s clear that ‘a’ will be evaluated strictly (and indeed applied to a state token) but we want to make sure that any exceptions arising from the evaluation of ‘a’ are caught by the catch (see #11555).

Implementing ‘lazy’ is a bit tricky:

  • It must not have a strictness signature: by being a built-in Id, all the info about lazyId comes from here, not from GHC.Base.hi. This is important, because the strictness analyser will spot it as strict!

  • It must not have an unfolding: it gets “inlined” by a HACK in CorePrep. It’s very important to do this inlining after unfoldings are exposed in the interface file. Otherwise, the unfolding for (say) pseq in the interface file will not mention ‘lazy’, so if we inline ‘pseq’ we’ll totally miss the very thing that ‘lazy’ was there for in the first place. See #3259 for a real world example.

  • Suppose CorePrep sees (catch# (lazy e) b). At all costs we must avoid using call by value here:

    case e of r -> catch# r b

    Avoiding that is the whole point of ‘lazy’. So in CorePrep (which generate the ‘case’ expression for a call-by-value call) we must spot the ‘lazy’ on the arg (in CorePrep.cpeApp), and build a ‘let’ instead.

  • lazyId is defined in GHC.Base, so we don’t have to inline it. If it appears un-applied, we’ll end up just calling it.

Note [noinlineId magic]

[note link]

noinline :: forall a. a -> a

‘noinline’ is used to make sure that a function f is never inlined, e.g., as in ‘noinline f x’. Ordinarily, the identity function with NOINLINE could be used to achieve this effect; however, this has the unfortunate result of leaving a (useless) call to noinline at runtime. So we have a little bit of magic to optimize away ‘noinline’ after we are done running the simplifier.

‘noinline’ needs to be wired-in because it gets inserted automatically when we serialize an expression to the interface format. See Note [Inlining and hs-boot files] in ToIface

Note [The oneShot function]

[note link]

In the context of making left-folds fuse somewhat okish (see ticket #7994 and Note [Left folds via right fold]) it was determined that it would be useful if library authors could explicitly tell the compiler that a certain lambda is called at most once. The oneShot function allows that.

‘oneShot’ is levity-polymorphic, i.e. the type variables can refer to unlifted types as well (#10744); e.g.

oneShot (x:Int# -> x +# 1#)

Like most magic functions it has a compulsory unfolding, so there is no need for a real definition somewhere. We have one in GHC.Magic for the convenience of putting the documentation there.

It uses setOneShotLambda on the lambda’s binder. That is the whole magic:

A typical call looks like
oneShot (y. e)
after unfolding the definition oneShot = f x[oneshot]. f x we get
(f x[oneshot]. f x) (y. e)

–> x[oneshot]. ((y.e) x) –> x[oneshot] e[x/y]

which is what we want.

It is only effective if the one-shot info survives as long as possible; in particular it must make it into the interface in unfoldings. See Note [Preserve OneShotInfo] in CoreTidy.

Also see https://gitlab.haskell.org/ghc/ghc/wikis/one-shot.

Note [magicDictId magic]

[note link]

The identifier magicDict is just a place-holder, which is used to implement a primitive that we cannot define in Haskell but we can write in Core. It is declared with a place-holder type:

magicDict :: forall a. a

The intention is that the identifier will be used in a very specific way, to create dictionaries for classes with a single method. Consider a class like this:

class C a where
  f :: T a

We are going to use magicDict, in conjunction with a built-in Prelude rule, to cast values of type T a into dictionaries for C a. To do this, we define a function like this in the library:

data WrapC a b = WrapC (C a => Proxy a -> b)
withT :: (C a => Proxy a -> b)
-> T a -> Proxy a -> b

withT f x y = magicDict (WrapC f) x y

The purpose of WrapC is to avoid having f instantiated. Also, it avoids impredicativity, because magicDict’s type cannot be instantiated with a forall. The field of WrapC contains a Proxy parameter which is used to link the type of the constraint, C a, with the type of the Wrap value being made.

Next, we add a built-in Prelude rule (see prelude/PrelRules.hs), which will replace the RHS of this definition with the appropriate definition in Core. The rewrite rule works as follows:

magicDict @t (wrap @a @b f) x y
—->
f (x cast co a) y

The co coercion is the newtype-coercion extracted from the type-class. The type class is obtain by looking at the type of wrap.

voidArgId is a Local Id used simply as an argument in functions where we just want an arg to avoid having a thunk of unlifted type. E.g.

x = void :: Void# -> (# p, q #)

This comes up in strictness analysis

Note [evaldUnfoldings]

[note link]

The evaldUnfolding makes it look that some primitive value is evaluated, which in turn makes Simplify.interestingArg return True, which in turn makes INLINE things applied to said value likely to be inlined.