compiler/basicTypes/MkId.hs¶
Note [Wired-in Ids]¶
A “wired-in” Id can be referred to directly in GHC (e.g. ‘voidPrimId’) rather than by looking it up its name in some environment or fetching it from an interface file.
There are several reasons why an Id might appear in the wiredInIds:
- ghcPrimIds: see Note [ghcPrimIds (aka pseudoops)]
- magicIds: see Note [magicIds]
- errorIds, defined in coreSyn/MkCore.hs. These error functions (e.g. rUNTIME_ERROR_ID) are wired in because the desugarer generates code that mentions them directly
In all cases except ghcPrimIds, there is a definition site in a library module, which may be called (e.g. in higher order situations); but the wired-in version means that the details are never read from that module’s interface file; instead, the full definition is right here.
Note [ghcPrimIds (aka pseudoops)]¶
The ghcPrimIds
- Are exported from GHC.Prim
- Can’t be defined in Haskell, and hence no Haskell binding site, but have perfectly reasonable unfoldings in Core
- Either have a CompulsoryUnfolding (hence always inlined), or
- of an EvaldUnfolding and void representation (e.g. void#)
- Are (or should be) defined in primops.txt.pp as ‘pseudoop’ Reason: that’s how we generate documentation for them
Note [magicIds]¶
The magicIds
- Are exported from GHC.Magic
- Can be defined in Haskell (and are, in ghc-prim:GHC/Magic.hs). This definition at least generates Haddock documentation for them.
- May or may not have a CompulsoryUnfolding.
- But have some special behaviour that can’t be done via an unfolding from an interface file
Note [Wrappers for data instance tycons]¶
In the case of data instances, the wrapper also applies the coercion turning the representation type into the family instance type to cast the result of the wrapper. For example, consider the declarations
data family Map k :: * -> *
data instance Map (a, b) v = MapPair (Map a (Pair b v))
The tycon to which the datacon MapPair belongs gets a unique internal name of the form :R123Map, and we call it the representation tycon. In contrast, Map is the family tycon (accessible via tyConFamInst_maybe). A coercion allows you to move between representation and family type. It is accessible from :R123Map via tyConFamilyCoercion_maybe and has kind
Co123Map a b v :: {Map (a, b) v ~ :R123Map a b v}
The wrapper and worker of MapPair get the types
– Wrapper$WMapPair :: forall a b v. Map a (Map a b v) -> Map (a, b) v $WMapPair a b v = MapPair a b v cast sym (Co123Map a b v)
– WorkerMapPair :: forall a b v. Map a (Map a b v) -> :R123Map a b v
This coercion is conditionally applied by wrapFamInstBody.
It’s a bit more complicated if the data instance is a GADT as well!
data instance T [a] where
T1 :: forall b. b -> T [Maybe b]
Hence we translate to
– Wrapper$WT1 :: forall b. b -> T [Maybe b] $WT1 b v = T1 (Maybe b) b (Maybe b) v
cast sym (Co7T (Maybe b))—Worker
T1 :: forall c b. (c ~ Maybe b) => b -> :R7T c
– Coercion from family type to representation typeCo7T a :: T [a] ~ :R7T a
Newtype instances through an additional wrinkle into the mix. Consider the following example (adapted from #15318, comment:2):
data family T a
newtype instance T [a] = MkT [a]
Within the newtype instance, there are three distinct types at play:
- The newtype’s underlying type, [a].
- The instance’s representation type, TList a (where TList is the representation tycon).
- The family type, T [a].
We need two coercions in order to cast from (1) to (3):
- A newtype coercion axiom:
axiom coTList a :: TList a ~ [a]
(Where TList is the representation tycon of the newtype instance.)
- A data family instance coercion axiom:
axiom coT a :: T [a] ~ TList a
When we translate the newtype instance to Core, we obtain:
Unlike for data instances, the worker for a newtype instance is actually an executable function which expands to a cast, but otherwise, the general strategy is essentially the same as for data instances. Also note that we have a wrapper, which is unusual for a newtype, but we make GHC produce one anyway for symmetry with the way data instances are handled.
Note [Newtype datacons]¶
The “data constructor” for a newtype should always be vanilla. At one point this wasn’t true, because the newtype arising from
class C a => D a
- looked like
- newtype T:D a = D:D (C a)
so the data constructor for T:C had a single argument, namely the predicate (C a). But now we treat that as an ordinary argument, not part of the theta-type, so all is well.
Note [Compulsory newtype unfolding]¶
Newtype wrappers, just like workers, have compulsory unfoldings. This is needed so that two optimizations involving newtypes have the same effect whether a wrapper is present or not:
- Case-of-known constructor. See Note [beta-reduction in exprIsConApp_maybe].
- Matching against the map/coerce RULE. Suppose we have the RULE
{-# RULE "map/coerce" map coerce = ... #-}
As described in Note [Getting the map/coerce RULE to work],
the occurrence of 'coerce' is transformed into:
{-# RULE "map/coerce" forall (c :: T1 ~R# T2).
map ((\v -> v) `cast` c) = ... #-}
We'd like 'map Age' to match the LHS. For this to happen, Age
must be unfolded, otherwise we'll be stuck. This is tested in T16208.
Note [Inline partially-applied constructor wrappers]¶
We allow the wrapper to inline when partially applied to avoid boxing values unnecessarily. For example, consider
data Foo a = Foo !Int a
instance Traversable Foo where
traverse f (Foo i a) = Foo i <$> f a
This desugars to
traverse f foo = case foo of
Foo i# a -> let i = I# i#
in map ($WFoo i) (f a)
If the wrapper $WFoo is not inlined, we get a fruitless reboxing of i. But if we inline the wrapper, we get
map (\a. case i of I# i# a -> Foo i# a) (f a)
and now case-of-known-constructor eliminates the redundant allocation.
Note [Activation for data constructor wrappers]¶
The Activation on a data constructor wrapper allows it to inline only in Phase 0. This way rules have a chance to fire if they mention a data constructor on the left
RULE “foo” f (K a b) = …
Since the LHS of rules are simplified with InitialPhase, we won’t inline the wrapper on the LHS either.
On the other hand, this means that exprIsConApp_maybe must be able to deal with wrappers so that case-of-constructor is not delayed; see Note [exprIsConApp_maybe on data constructors with wrappers] for details.
It used to activate in phases 2 (afterInitial) and later, but it makes it awkward to write a RULE[1] with a constructor on the left: it would work if a constructor has no wrapper, but whether a constructor has a wrapper depends, for instance, on the order of type argument of that constructors. Therefore changing the order of type argument could make previously working RULEs fail.
Note [Bangs on imported data constructors]¶
We pass Maybe [HsImplBang] to mkDataConRep to make use of HsImplBangs from imported modules.
- Nothing <=> use HsSrcBangs
- Just bangs <=> use HsImplBangs
For imported types we can’t work it all out from the HsSrcBangs, because we want to be very sure to follow what the original module (where the data type was declared) decided, and that depends on what flags were enabled when it was compiled. So we record the decisions in the interface file.
The HsImplBangs passed are in 1-1 correspondence with the dataConOrigArgTys of the DataCon.
Note [Data con wrappers and unlifted types]¶
- Consider
- data T = MkT !Int#
- We certainly do not want to make a wrapper
- $WMkT x = case x of y { DEFAULT -> MkT y }
For a start, it’s still to generate a no-op. But worse, since wrappers are currently injected at TidyCore, we don’t even optimise it away! So the stupid case expression stays there. This actually happened for the Integer data type (see #1600 comment:66)!
Note [Data con wrappers and GADT syntax]¶
Consider these two very similar data types:
data T1 a b = MkT1 b
data T2 a b where
MkT2 :: forall b a. b -> T2 a b
Despite their similar appearance, T2 will have a data con wrapper but T1 will not. What sets them apart? The types of their constructors, which are:
MkT1 :: forall a b. b -> T1 a b
MkT2 :: forall b a. b -> T2 a b
MkT2’s use of GADT syntax allows it to permute the order in which a and b would normally appear. See Note [DataCon user type variable binders] in DataCon for further discussion on this topic.
The worker data cons for T1 and T2, however, both have types such that a is expected to come before b as arguments. Because MkT2 permutes this order, it needs a data con wrapper to swizzle around the type variables to be in the order the worker expects.
A somewhat surprising consequence of this is that newtypes can have data con wrappers! After all, a newtype can also be written with GADT syntax:
newtype T3 a b where
MkT3 :: forall b a. b -> T3 a b
Again, this needs a wrapper data con to reorder the type variables. It does mean that this newtype constructor requires another level of indirection when being called, but the inliner should make swift work of that.
Note [HsImplBangs for newtypes]¶
Most of the time, we use the dataConSrctoImplBang function to decide what strictness/unpackedness to use for the fields of a data type constructor. But there is an exception to this rule: newtype constructors. You might not think that newtypes would pose a challenge, since newtypes are seemingly forbidden from having strictness annotations in the first place. But consider this (from #16141):
{-# LANGUAGE StrictData #-}
{-# OPTIONS_GHC -O #-}
newtype T a b where
MkT :: forall b a. Int -> T a b
Because StrictData (plus optimization) is enabled, invoking dataConSrcToImplBang would sneak in and unpack the field of type Int to Int#! This would be disastrous, since the wrapper for MkT uses a coercion involving Int, not Int#.
Bottom line: dataConSrcToImplBang should never be invoked for newtypes. In the case of a newtype constructor, we simply hardcode its dcr_bangs field to [HsLazy].
Note [Unpacking GADTs and existentials]¶
There is nothing stopping us unpacking a data type with equality components, like
- data Equal a b where
- Equal :: Equal a a
And it’d be fine to unpack a product type with existential components too, but that would require a bit more plumbing, so currently we don’t.
So for now we require: null (dataConExTyCoVars data_con) See #14978
Note [Unpack one-wide fields]¶
The flag UnboxSmallStrictFields ensures that any field that can (safely) be unboxed to a word-sized unboxed field, should be so unboxed. For example:
data A = A Int#
newtype B = B A
data C = C !B
data D = D !C
data E = E !()
data F = F !D
data G = G !F !F
All of these should have an Int# as their representation, except G which should have two Int#s.
However
data T = T !(S Int)
data S = S !a
Here we can represent T with an Int#.
Note [Recursive unboxing]¶
- Consider
- data R = MkR {-# UNPACK #-} !S Int data S = MkS {-# UNPACK #-} !Int
The representation arguments of MkR are the representation arguments of S (plus Int); the rep args of MkS are Int#. This is all fine.
- But be careful not to try to unbox this!
- data T = MkT {-# UNPACK #-} !T Int
Because then we’d get an infinite number of arguments.
- Here is a more complicated case:
- data S = MkS {-# UNPACK #-} !T Int data T = MkT {-# UNPACK #-} !S Int
Each of S and T must decide independently whether to unpack and they had better not both say yes. So they must both say no.
- Also behave conservatively when there is no UNPACK pragma
- data T = MkS !T Int
with -funbox-strict-fields or -funbox-small-strict-fields we need to behave as if there was an UNPACK pragma there.
- But it’s the argument type that matters. This is fine:
- data S = MkS S !Int
because Int is non-recursive.
Note [Dict funs and default methods]¶
Dict funs and default methods are not ImplicitIds. Their definition involves user-written code, so we can’t figure out their strictness etc based on fixed info, as we can for constructors and record selectors (say).
NB: See also Note [Exported LocalIds] in Id
Note [Unsafe coerce magic]¶
- We define a primitive
- GHC.Prim.unsafeCoerce#
- and then in the base library we define the ordinary function
- Unsafe.Coerce.unsafeCoerce :: forall (a:) (b:). a -> b unsafeCoerce x = unsafeCoerce# x
Notice that unsafeCoerce has a civilized (albeit still dangerous) polymorphic type, whose type args have kind *. So you can’t use it on unboxed values (unsafeCoerce 3#).
In contrast unsafeCoerce# is even more dangerous because you can use it on unboxed things, (unsafeCoerce# 3#) :: Int. Its type is
forall (r1 :: RuntimeRep) (r2 :: RuntimeRep) (a: TYPE r1) (b: TYPE r2). a -> b
Note [seqId magic]¶
‘GHC.Prim.seq’ is special in several ways.
- In source Haskell its second arg can have an unboxed type
x seq (v +# w)
But see Note [Typing rule for seq] in TcExpr, which explains why we give seq itself an ordinary type
seq :: forall a b. a -> b -> b
and treat it as a language construct from a typing point of view.
Its fixity is set in LoadIface.ghcPrimIface
It has quite a bit of desugaring magic. See DsUtils.hs Note [Desugaring seq (1)] and (2) and (3)
There is some special rule handing: Note [User-defined RULES for seq]
Note [User-defined RULES for seq]¶
- Roman found situations where he had
- case (f n) of _ -> e
where he knew that f (which was strict in n) would terminate if n did. Notice that the result of (f n) is discarded. So it makes sense to transform to
case n of _ -> e
Rather than attempt some general analysis to support this, I’ve added enough support that you can do this using a rewrite rule:
RULE "f/seq" forall n. seq (f n) = seq n
You write that rule. When GHC sees a case expression that discards its result, it mentally transforms it to a call to ‘seq’ and looks for a RULE. (This is done in Simplify.trySeqRules.) As usual, the correctness of the rule is up to you.
VERY IMPORTANT: to make this work, we give the RULE an arity of 1, not 2. If we wrote
RULE “f/seq” forall n e. seq (f n) e = seq n e
with rule arity 2, then two bad things would happen:
- The magical desugaring done in Note [seqId magic] item (c) for saturated application of ‘seq’ would turn the LHS into a case expression!
- The code in Simplify.rebuildCase would need to actually supply the value argument, which turns out to be awkward.
Note [lazyId magic]¶
lazy :: forall a?. a? -> a? (i.e. works for unboxed types too)
‘lazy’ is used to make sure that a sub-expression, and its free variables, are truly used call-by-need, with no code motion. Key examples:
- pseq: pseq a b = a seq lazy b We want to make sure that the free vars of ‘b’ are not evaluated before ‘a’, even though the expression is plainly strict in ‘b’.
- catch: catch a b = catch# (lazy a) b Again, it’s clear that ‘a’ will be evaluated strictly (and indeed applied to a state token) but we want to make sure that any exceptions arising from the evaluation of ‘a’ are caught by the catch (see #11555).
Implementing ‘lazy’ is a bit tricky:
It must not have a strictness signature: by being a built-in Id, all the info about lazyId comes from here, not from GHC.Base.hi. This is important, because the strictness analyser will spot it as strict!
It must not have an unfolding: it gets “inlined” by a HACK in CorePrep. It’s very important to do this inlining after unfoldings are exposed in the interface file. Otherwise, the unfolding for (say) pseq in the interface file will not mention ‘lazy’, so if we inline ‘pseq’ we’ll totally miss the very thing that ‘lazy’ was there for in the first place. See #3259 for a real world example.
Suppose CorePrep sees (catch# (lazy e) b). At all costs we must avoid using call by value here:
case e of r -> catch# r b
Avoiding that is the whole point of ‘lazy’. So in CorePrep (which generate the ‘case’ expression for a call-by-value call) we must spot the ‘lazy’ on the arg (in CorePrep.cpeApp), and build a ‘let’ instead.
lazyId is defined in GHC.Base, so we don’t have to inline it. If it appears un-applied, we’ll end up just calling it.
Note [noinlineId magic]¶
noinline :: forall a. a -> a
‘noinline’ is used to make sure that a function f is never inlined, e.g., as in ‘noinline f x’. Ordinarily, the identity function with NOINLINE could be used to achieve this effect; however, this has the unfortunate result of leaving a (useless) call to noinline at runtime. So we have a little bit of magic to optimize away ‘noinline’ after we are done running the simplifier.
‘noinline’ needs to be wired-in because it gets inserted automatically when we serialize an expression to the interface format. See Note [Inlining and hs-boot files] in ToIface
Note [The oneShot function]¶
In the context of making left-folds fuse somewhat okish (see ticket #7994 and Note [Left folds via right fold]) it was determined that it would be useful if library authors could explicitly tell the compiler that a certain lambda is called at most once. The oneShot function allows that.
‘oneShot’ is levity-polymorphic, i.e. the type variables can refer to unlifted types as well (#10744); e.g.
oneShot (x:Int# -> x +# 1#)
Like most magic functions it has a compulsory unfolding, so there is no need for a real definition somewhere. We have one in GHC.Magic for the convenience of putting the documentation there.
It uses setOneShotLambda on the lambda’s binder. That is the whole magic:
- A typical call looks like
- oneShot (y. e)
- after unfolding the definition oneShot = f x[oneshot]. f x we get
- (f x[oneshot]. f x) (y. e)
–> x[oneshot]. ((y.e) x) –> x[oneshot] e[x/y]
which is what we want.
It is only effective if the one-shot info survives as long as possible; in particular it must make it into the interface in unfoldings. See Note [Preserve OneShotInfo] in CoreTidy.
Note [magicDictId magic]¶
The identifier magicDict is just a place-holder, which is used to implement a primitive that we cannot define in Haskell but we can write in Core. It is declared with a place-holder type:
magicDict :: forall a. a
The intention is that the identifier will be used in a very specific way, to create dictionaries for classes with a single method. Consider a class like this:
class C a where
f :: T a
We are going to use magicDict, in conjunction with a built-in Prelude rule, to cast values of type T a into dictionaries for C a. To do this, we define a function like this in the library:
data WrapC a b = WrapC (C a => Proxy a -> b)
- withT :: (C a => Proxy a -> b)
- -> T a -> Proxy a -> b
withT f x y = magicDict (WrapC f) x y
The purpose of WrapC is to avoid having f instantiated. Also, it avoids impredicativity, because magicDict’s type cannot be instantiated with a forall. The field of WrapC contains a Proxy parameter which is used to link the type of the constraint, C a, with the type of the Wrap value being made.
Next, we add a built-in Prelude rule (see prelude/PrelRules.hs), which will replace the RHS of this definition with the appropriate definition in Core. The rewrite rule works as follows:
magicDict @t (wrap @a @b f) x y
- —->
- f (x cast co a) y
The co coercion is the newtype-coercion extracted from the type-class. The type class is obtain by looking at the type of wrap.
voidArgId is a Local Id used simply as an argument in functions where we just want an arg to avoid having a thunk of unlifted type. E.g.
x = void :: Void# -> (# p, q #)
This comes up in strictness analysis
Note [evaldUnfoldings]¶
The evaldUnfolding makes it look that some primitive value is evaluated, which in turn makes Simplify.interestingArg return True, which in turn makes INLINE things applied to said value likely to be inlined.