Unlifted Newtypes¶
GHC 8.0 introduced a more sane way to talk about the kind of unlifted types,
levity polymorphism. Following this, the kind of unboxed tuples and sums was
revised. Many of the unlifted kinds have a finite number of inhabitants. For
example, TYPE 'IntRep
is only inhabited by Int#
. This proposal provides
a way to create additional types that inhabit the unlifted kinds. With the
UnliftedNewtypes
language pragma, the existing newtype
construct would
begin to accept types of unlifted kinds. GHC currently rejects the following
definition:
newtype PersonId = PersonIdConstructor { getPersonId :: Int# }
With UnliftedNewtypes
, this definition would be accepted. The kinds and types
of the generated types and functions would be:
PersonId :: TYPE 'IntRep
PersonIdConstructor :: Int# -> PersonId
getPersonId :: PersonId -> Int#
In GHC Core, the two function shown above would be implemented as casts, just as they currently are with lifted newtypes.
Motivation: Fewer Allocations on Decode¶
Consider a data type for representing an interval. It may look like this:
data Interval = Interval {-# UNPACK #-} !Int {-# UNPACK #-} !Int
The first argument is the lower bound, and the second argument is the upper bound. An interval in which the lower bound is higher than the upper bound is meaningless. We may wish to make such intervals unrepresentable by hiding the data constructor and exporting functions guaranteed to preserve this invariant:
-- | The arguments of 'smartInterval' are ordered such that
-- the smaller one comes first in the interval.
smartInterval :: Int -> Int -> Interval
inInterval :: Interval -> Int -> Bool
translateIntervalByOne :: Interval -> Interval
Assume that we have a ByteArray
of packed Interval
. We will
use a simple packing strategy in which each interval takes up two
machine words, one for the lower bound and one for the upper bound. If we
need to map over all the intervals, with translateInterval
, we
have to pay a nursery allocation for each of these:
mapIntervals :: (Interval -> Interval) -> ByteArray -> ByteArray
shiftFive :: ByteArray -> ByteArray
shiftFive arr = mapIntervals translateIntervalByOne arr
We could have avoided this by instead defining everything to work on the equivalent unboxed tuple of unboxed ints:
type Interval = (# Int#, Int# #)
Now, mapIntervals
can be called without paying an allocation for
every element in the array. But since Interval
is now a type alias,
we are no longer able to hide its internals. Users can easily circumvent
the guarantee the API was originally supposed to provide. With
UnliftedNewtypes
, we can get the best of both worlds. We can define
Interval
as:
newtype Interval = Interval (# Int#, Int# #)
We can then hide the data constructor as we did in the first example.
However, we can also have a non-allocating implementation of
mapIntervals
. This gives us the best of both worlds.
Motivation: Typed Unlifted Pointers¶
Functions that allocate memory often take a callback argument that uses
the pointer. Consider alloca
from Foreign.Marshal.Alloc
:
alloca :: Storable a => (Ptr a -> IO b) -> IO b
The callback takes a lifted argument. This means that if alloca
(or a similar function) is not inlined, the function passed to it
will end up being given a boxed argument at runtime. Most functions
that take a pointer as an argument are strict in that argument.
Typically, such functions have the worker wrapper transformation
applied to them, and the wrapper is inlined into the call site
to eliminate the boxing. However, when the function is passed
as an argument, this does not (and cannot) work.
It would be more performant manually unbox the argument:
alloca :: Storable a => (Addr# -> IO b) -> IO b
But now we have lost our phantom a
type variable. With UnliftedNewtypes
,
we could instead write:
newtype Ptr# a = Ptr# Addr#
alloca :: Storable a => (Ptr# a -> IO b) -> IO b
And now we have a variant of alloca
that preseves the phantom
type variable without needlessly boxing the pointer.
Motivation: Typed Unlifted Arrays as a Library¶
Currently, ArrayArray#
offers an unsafe interface that does not keep track
of the element type. This problem, as well as a proposed solution, is described
in greater detail on the GHC issue tracker (See this issue). Alternatively, the
primitive package offers a typeclass-based solution. If we ignore the
PrimMonad
machinery and specialize to ST
, the interface looks
like this:
data UnliftedArray e
data MutableUnliftedArray s e
class PrimUnlifted a
instance PrimUnlifted ByteArray
instance PrimUnlifted (Array a)
instance PrimUnlifted (MutableByteArray s)
instance PrimUnlifted (MutableArray s a)
indexUnliftedArray :: PrimUnlifted a => UnliftedArray a -> Int -> a
readUnliftedArray :: PrimUnlifted a => MutableUnliftedArray s a -> Int -> ST s a
writeUnliftedArray :: PrimUnlifted a => MutableUnliftedArray s a -> Int -> a -> ST s ()
However, typeclasses are not guaranteed to specialize. Users working with a
function built on top of these PrimUnlifted
functions need to be
careful to ensure that specialization happens. Consider a function
like:
-- | The first array is a list of target indices as machine integers.
-- The length of the first argument must be the length of the second
-- argument times the size in bytes of a machine integer.
shuffleUnliftedArray :: PrimUnlifted a => ByteArray -> UnliftedArray a -> UnliftedArray a
Maybe this function is defined in such a way that it can be inlined
and subsequently specialized, or maybe we could add a SPECIALIZE
pragma
to it. But it’s madness that we even have to worry about this. All of the
PrimUnlifted
dictionaries are just unsafeCoerce
(check the source
code). Specializations of shuffleUnliftedArray
are all going to end
up being the same exact code. In this case, it isn’t a big deal since
the implementation of shuffleUnliftedArray
is probably short, but
if the function were larger, this would needlessly bloat the executable.
The solution in the aforementioned GHC issue is a more strongly typed interface to arrays of unlifted things:
data UnliftedArray# (a :: TYPE 'UnliftedRep)
data MutableUnliftedArray# s (a :: TYPE 'UnliftedRep)
indexUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). UnliftedArray# a -> Int# -> a
writeUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). MutableUnliftedArray# s a -> Int# -> a -> State# s -> State# s
readUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). MutableUnliftedArray# s a -> Int# -> State# s -> (# State# s, a #)
unsafeFreezeUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). MutableUnliftedArray# s a -> State# s -> (#State# s, UnliftedArray# a#)
newUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). Int# -> a -> State# s -> (# State# s, MutableUnliftedArray# s a #)
Notice that the type signature of shuffleUnliftedArray#
under this scheme
would not have any typeclass constraints:
shuffleUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). ByteArray# -> UnliftedArray# a -> UnliftedArray# a
However, adding these functions requires modifying GHC and adding
more primops. With UnliftedNewtypes
, this interface can be implemented from
the existing ArrayArray#
interface without modifying GHC:
newtype UnliftedArray# (a :: TYPE 'UnliftedRep) = UnliftedArray# ArrayArray#
newtype MutableUnliftedArray# s (a :: TYPE 'UnliftedRep) = MutableUnliftedArray# (MutableArrayArray# s)
indexUnliftedArray# :: forall (a :: TYPE 'UnliftedRep). UnliftedArray# a -> Int# -> a
indexUnliftedArray# (UnliftedArray# a) i = unsafeCoerce# (indexArrayArrayArray# a i)
The data constructors of UnliftedArray#
and MutableUnliftedArray#
could
be hidden to prevent the user from unsafely casting elements.
Proposal Change Specification¶
The restriction that a newtype
wrap a type of kind TYPE LiftedRep
would be dropped. It would be replaced by a restriction that the newtype
must kind something of kind TYPE (r :: RuntimeRep)
. This proposal
does not include the ability for a newtype
to wrap a Constraint
.
This does not require any additions to the language’s grammar.
This proposal would allow a levity-polymorphic type variable to appear inside a newtype. Such appearances are currently forbidden (and would remain forbidden) in data constructors, since they violate the levity-polymorphism binder rule. However, newtype constructors and pattern matches become casts. Consider:
newtype Id# (r :: RuntimeRep) (a :: TYPE r) = IdC# a
The calling convetion for the IdC#
data constructor does not depend on
r
, so code generation is still possible. All other restrictions around
levity polymorphism are still in place, so the following would be rejected:
bad :: forall (r :: RuntimeRep) (a :: TYPE r). (a -> a -> Bool) -> Id# r a -> Id# r a -> Bool
bad f (IdC# a) (IdC# b) = f a b
However, this would be accepted:
good :: forall (a :: TYPE IntRep). (a -> a -> Bool) -> Id# IntRep a -> Id# IntRep a -> Bool
good f (IdC# a) (IdC# b) = f a b
If the user does not specify the kind of an unlifted newtype with GADT syntax, the kind should be inferred. Newtype that are recursive or mutually recursive in a way that make them uninhabited will be inferred to have lifted runtime representation. For example:
newtype Foo = Foo Foo
newtype Baz = Baz Tor
newtype Tor = Tor Baz
All three of the above types are currently happily accepted by GHC, and
with UnliftedNewtypes
, they will remain accepted with the same kind
that they already had. Here are the same three types defined
using GADT syntax to illustrate what the inferred kind would be:
newtype Foo :: TYPE 'LiftedRep where
Foo :: Foo -> Foo
newtype Baz :: TYPE 'LiftedRep where
Baz :: Tor -> Baz
newtype Tor :: TYPE 'LiftedRep where
Tor :: Baz -> Tor
If the user wanted the levity-polymorphic variant of the uninhabited newtype, they could write:
newtype Bar :: TYPE r where
Bar :: Bar -> Bar
Recursion in the presence of a changing runtime representation should be rejected. For example:
newtype Recurse = Recurse (# Int#, Recurse #)
newtype Sneak = Sneak (# Sneak #)
Both of these types are ill-kinded, as their kinds would involve an
infinite nested of TupleRep
. The inferred kinds would be:
Recurse :: TYPE (TupleRep [IntRep, TupleRep [IntRep, TupleRep …]]) Sneak :: TYPE (TupleRep [TupleRep [TupleRep …]])
Just as terms cannot have infinite types, types cannot have infinite kinds. This is only a problem when a recursion of unlifted types is involved. To illustrate the issue further:
newtype BadA = BadA (# Word#, BadB #)
newtype BadB = BadB (# Word#, BadA #)
newtype GoodA = GoodA (# Word#, GoodB #)
newtype GoodB = GoodB (Word#, GoodA)
The types BadA
and BadB
and ill-kinded and should be rejected.
However, GoodA
and GoodB
are well-kinded, and the kinds can
be inferred. More generally, if an unlifted newtype is well-kinded, then its kind
should always be inferrable.
Effects and Interactions¶
Generalized Newtype Deriving: The interaction with GND is straigtforward. Since typeclasses (since GHC 8.0) can accept unlifted types (or even levity-polymorphic types), GND should work exactly for an unlifted newtype as it does on a lifted newtype.
GADT Syntax: It is currently possible, although uncommon in practice, to use GADT syntax with newtypes. With newtypes, GADT-like analysis of the type variable is never allowed. The following is an example of a newtype using GADT syntax:
newtype Foo :: Type -> Type where
FooC :: a -> Foo a
Unlifted newtypes should be allowed to use GADT syntax as well. The only way this
differs from the status quo, is that kinds other than Type
all now allowed
to the right of the final arrow. All of the following should be accepted:
newtype PersonId :: TYPE 'IntRep where
PersonId :: Int# -> PersonId
newtype Id :: TYPE rep -> TYPE rep where
Id :: a -> Id a
newtype Pair# :: TYPE rep -> TYPE rep' -> TYPE (TupleRep '[rep, rep']) where
Pair# :: (# a, b #) -> Pair# a b
newtype Maybe# (a :: TYPE r) :: TYPE (SumRep '[r, TupleRep '[]]) where
Maybe# :: (# a | (# #) #) -> Maybe# a
Coercible: Both ~R#
and the Coercible
typeclass are already
levity polymorphic. However, the function coerce
is not. This proposal
requires that coerce
become levity polymorphic.
Type Classes in Base: This proposal does not change any type classes
in base
or in any of the core libraries. Making typeclasses like Num
levity-polymorphic would help a little with overloading, but no one has
measure what the impact of such a change would be on error message clarity.
Discussion of this issue is best had on the Levity Polymorphic Type Classes
proposal.
Data Families: Data families currently do not allow unlifted return kinds. This means that the following is rejected by the compiler:
data family Foo (a :: Type) :: TYPE 'IntRep
Under this proposal, this restriction would be lifted, not only in modules
where UnliftedTuples
is enabled, but everywhere. Although defining
the data families itself would not require the extension, defining
instances would. Instances could be defined with newtype instance
:
newtype instance Foo Bool = FooBoolConstructor Int#
newtype instance Foo (Maybe a) = FooIntConstructor Int#
Lazy unboxed tuples / Warn on unbanged strict patterns: This proposal, currently still under discussion, suggests tweaking the strictness of unboxed tuple patterns. Regardless of whether that proposal is accepted, a variant of it is accepted, or it is rejected, there is a simple rule for determining the strictness of an unboxed newtype pattern. It should agree with the strictness of an equivalent unboxed one-tuple pattern. For example suppose we have:
bar = ()
where
foo :: Bool
(# (# 3#, foo #) #) = undefined
newtype Wrap = Wrap (# Int#, Bool #)
baz = ()
where
foo :: Bool
Wrap (# 3#, foo #) = undefined
If bar
throws an exception, then baz
should too. If it doesn’t,
then neither should baz
.
Backpack: Since GHC 8.4, backpack allows module signatures with type declarations of unlifted kinds. For example:
signature NumberUnknown where
import GHC.Types
data Rep :: RuntimeRep
data Number :: TYPE Rep
plus :: Number -> Number -> Number
Currently, these type can only be implemented by a type synonym, not by a data declaration. Edward Yang discusses this in a comment on the aforementioned issue. This proposal would lift this restriction.
Costs and Drawbacks¶
Currently, all unlifted types have a hash appended to their name (Array#
,
Int#
, etc.). This happened because (1) GHC adopted this naming
convention and (2) no one had any way to define new unlifted types.
Since this proposal eliminates (2), users lose their easy visual cue
for knowing if a type is unlifted.
To the author’s understanding (which is not great), the implementation is not complicated and will be a comparitively small burden on maintainers.
Alternatives¶
Unlifted newtypes are briefly mentioned in the much further-reaching
unlifted data types proposal. One alternative would be to wait for
a full implementation of unlifted data types. Then a single LANGUAGE
pragma would enable both unlifted newtypes and unlifted data types.
The drawback of this is that the design of unlifted data types is
non-trivial, and their is no agreement on what they should actually
look like. Additionally, the implementation would be more
complicated than an implementation that only allowed unlifted
newtypes.
Alternatively, we could take a step in the other direction and simplify this proposal. Disallowing levity-polymorphic newtypes might make this easier to implement. Most of what this proposal has to offer comes from the ability to work with an unlifted type whose data constructor is hidden, and restricting users to the realm of the levity-monomorphic does not take away from this.
Concerns¶
Currently, haddock does not indicate the kind of data types. For an unboxed newtype, this would be desirable. Otherwise, from a cursory scan of a library’s docs, it would be easy to miss that a data type is unlifted (and consequently cannot be used in most polymorphic functions).
Implementation¶
I do not have sufficient knowledge of GHC to implement this. I welcome anyone else to implement it, or if it’s approved and enough time goes by, I may try to figure out how to implement it.
Richard Eisenburg has indicated that he might be interested in implementing the proposal.