| ===================== |
| Cogent Surface Syntax |
| ===================== |
| |
| .. note:: For new users of this language, this section is not intended to be a |
| tutorial for the language. It merely introduces its surface syntax and |
| should be used as a reference while using the language. |
| |
| .. note:: As described in `#324 <https://github.com/NICTA/cogent/issues/324>`_, |
| we are in the progress of revising the surface sytnax. So it's subject |
| to change in near future. |
| |
| |
| Types |
| ===== |
| |
| All type names must begin with an uppercase letter. Tuple, Function, |
| Record and Variant types have special syntax. |
| |
| Record types |
| ------------ |
| |
| Now indicated by a series of typing declarations inside braces, e.g |
| |
| :: |
| |
| {x : A, y : B} |
| |
| If a field ``f`` is removed by ``take``, the postfix type operator |
| ``take f`` may be used: |
| |
| :: |
| |
| {x : A, y : B} take x |
| |
| This works, even if the type on the LHS is a type synonym (c.f. :ref:`typedefs`). |
| |
| Sugar: Taking multiple fields: |
| |
| :: |
| |
| {x : A, y : B} take (x,y) -- parens mandatory |
| |
| :: |
| |
| {x : A, y : B} take (..) -- parens mandatory |
| |
| This ``(..)`` sugar means to take all currently untaken fields (if any). |
| |
| Similarly, we can write ``put`` instead of ``take`` with the same |
| syntax. ``put`` is dual to ``take``. One common usage would be |
| |
| :: |
| |
| ({x : A, y : B, z : C, u : D, w : E} take (..)) put x |
| |
| Note that the parentheses around the record and first ``take`` is mandatory. |
| Arbitrary many levels of nestings is possible. |
| |
| Unboxed records |
| --------------- |
| |
| Unboxed records are pretty much like regular records, except that their |
| wrappers (i.e. the unboxed container) are lightweight that no allocation |
| is required. |
| |
| The syntax is simple, just a prefixing ``#`` on its counterpart record |
| type. ``take``, ``put`` operations (and punning) and member extraction |
| are of exactly the same syntax as regular records. As a consequence of |
| its lightweightness, we can construct and deconstruct (by means of |
| pattern matching) unboxed records, just as what we do on tuples (see |
| :ref:`product-types` for tuples). |
| |
| E.g., |
| |
| :: |
| |
| #{x : A, y : B} -- type |
| #{x = e1, y = e2} -- expression or pattern |
| a.x -- member |
| a {x = f} -- take and put, same as regular records |
| |
| About its *permission* rules (see :ref:`permissions`): |
| the wrapper per se is non-linear (i.e. shareable |
| and discardable). If |
| any linear fields are present (untaken), then the unboxed record becomes |
| linear. Member operation can only be used if there're no linear |
| fields inside, or the unboxed record is banged (``!``). |
| |
| Unboxed abstract types |
| ---------------------- |
| |
| The ``#`` sigil is not used in the declaration of an unboxed abstract |
| type. Instead, we attach the ``#`` sigil when the type is used. E.g.: |
| |
| :: |
| |
| type A |
| type T = {a : #A, b : A} |
| |
| In the above case, field ``a`` is unboxed, whereas ``b`` is not. When |
| generating C code, boxed abstract types will be pointers, unboxed are |
| not. It's the users' responsibility to keep the C code consistent, as |
| these types are abstract. |
| |
| Bang (``!``) |
| ------------ |
| |
| Record types and abstract types have *sigils*, but outwardly, a *Write* |
| sigil is just a plain record type / abstract type, and an *Observe* (or *Readonly*) sigil |
| would be indicated with a postfix bang, for example: ``{x : A, y : B}!``, ``C!``. |
| |
| The postfix bang operator can be applied to any type, which converts all |
| write sigils to readonly sigils internally (and any type variables to |
| readonly type variables). |
| |
| To bang a parametric type, the type must be surrounded by parentheses, |
| otherwise ``!`` will be applied to the type parameter right before the |
| ``!`` symbol. |
| |
| .. _product-types: |
| |
| Product Types |
| ------------- |
| |
| Nameless, unboxed tuples may be used everywhere you like. Their syntax |
| is identical to Haskell. A unit type (which is freely discardable and |
| not linear) also exists, ``()`` with a value ``()``. Tuples are not |
| associative, namely ``(a, b, c, d) /= (a, (b, (c, d)))`` and |
| ``(a, b, c, d) /= (((a, b), c), d)``, just as in Haskell. |
| |
| Variant Types |
| ------------- |
| |
| Variant types look like this. |
| |
| :: |
| |
| < Ok (Obj, Heap) | Fail Heap > |
| |
| They can be constructed using ``Ok (obj,h)``, or ``Fail e``. |
| |
| We can determine from context if ``Identifier`` (``Ok`` and ``Fail`` in |
| the example above) is a type or a data constructor, much like Haskell. |
| We will have to do a little bit of inference to determine which variant |
| type the thing actually belongs to. |
| |
| They can have as many alternatives as you like, and there is no |
| restriction on what goes inside a variant type. Each alternative can |
| take any number of arguments (0 or more). They will be desugared to |
| tuples of all the arguments. E.g.: |
| |
| :: |
| |
| <Ok Obj U8 | Fail > -- equiv to <Ok (Obj, U8) | Fail ()> |
| |
| Polymorphic types |
| ----------------- |
| |
| Types can contain variables. Functions may be declared as having |
| polymorphic type. |
| |
| :: |
| |
| size : all (k, v). Map k v -> U32 |
| |
| Monomorphic functions are first class, but to get a monomorphic function |
| from a polymorphic function requires instantiation, e.g ``length[Int]``. |
| Since Cogent-2.0.9, explicit type applications are not mandatory, |
| although in some cases they must be supplied to guide the type inference |
| engine. Type applications can be partial, or with type holes. For |
| example, ``foo [_, A, B]``; ``foo [S, T, _]`` is equivalent to ``foo [S,T]``. |
| |
| A type variable under observation (i.e ``let!``-ed) is annotated with a |
| prefix bang (e.g ``!a``) |
| |
| .. _permissions: |
| |
| Permissions |
| ~~~~~~~~~~~ |
| |
| *Permissions* (they used to be called "kinds") are provided for |
| polymorphic signatures as follows: |
| |
| :: |
| |
| length : all (a :< k). Array a -> U32 |
| |
| Permissions are internally a set of three booleans: whether or not the |
| type can be: |
| |
| - ``D`` for Discard (i.e by weakening) |
| - ``S`` for Share (i.e by contraction) |
| - ``E`` for Escape (i.e returned from ``let!``) |
| |
| The permission signature on a type variable is more like a constraint. |
| They are some combination of those three letters. If no permission |
| is provided, it is assumed that none of those permissions are required, |
| and the value will be linear and cannot escape a ``let!``. |
| |
| .. _typedefs: |
| |
| Type synonyms |
| ============= |
| |
| Type synonyms may be provided using the ``type`` keyword as follows: |
| |
| :: |
| |
| type X a b = { foo : a, bar : b, baz : Int } |
| |
| The type synonym ``X`` must always be fully saturated with its two |
| arguments wherever it is used, however. |
| |
| Abstract types (defined in C) may also be declared, and they also may |
| take parameters. This corresponds to a family of types in C. |
| |
| :: |
| |
| type Buffer |
| type Array a |
| |
| Constants and top-level definitions |
| =================================== |
| |
| Constants are mono-typed. |
| |
| :: |
| |
| abc : U8 |
| abc = 3 |
| |
| But the right hand side can be much more expressive now, with let |
| bindings and whatnot. We must be able to prevent users from doing |
| side-effects like allocation in the top-level---see :ref:`effects`. |
| |
| To make the syntax easier to parse, a function or constant's body must |
| be indented by at least one space. This means that any non-indented |
| bareword is the start of a new definition or signature. |
| |
| .. _effects: |
| |
| Effects |
| ======= |
| |
| Most effects are currently (successfully) modelled via linear types. For |
| allocation, Cogent does not know anything about it. Memory management |
| involves the heap. I propose modelling the heap as an explicit linear |
| value, just as with any other state. |
| |
| Allocation functions must take and return a linear heap, as they |
| modify it: |
| |
| :: |
| |
| allocateObj : Heap -> < Ok Obj Heap | Fail Heap > |
| |
| Special syntax for allocation functions and automating heap-threading |
| are nice to have, so I welcome proposals. |
| |
| |
| Expression language |
| =================== |
| |
| Matching and Error Handling |
| --------------------------- |
| |
| Matching may be accomplished by the following syntax: |
| |
| :: |
| |
| f : Heap -> < Ok Obj Heap | Fail Heap > |
| f h = allocateobj h |
| | Ok obj h => allocateobj h |
| | Ok obj' h => Ok (mergeObj (obj, obj')) h |
| | Fail h -> let () = free obj in Fail h |
| | Fail h -> Fail h |
| |
| This is an alignment-based syntax, grouping determined based on the |
| alignment of the bars. |
| |
| The rightward arrows for each case can either be ``=>`` or ``->``. |
| ``=>`` indicates that that branch is likely, to enable compiler |
| optimisations. ``~>`` can also be used to indicate an unlikely branch. |
| |
| A pattern may be ``_`` but only if the kind of the value allows it to be |
| discarded. |
| |
| Biased pattern matching |
| ----------------------- |
| |
| The syntax above poses a problem if many levels of nestings occur---you |
| will end up with cascading matches which indent a lot. To solve this |
| problem, we allow a syntax for early exit, which is inspired by `Idris <https://www.idris-lang.org/>`_. |
| The syntax looks like: |
| |
| :: |
| |
| f : Heap -> <Ok Obj Heap | Fail Heap> |
| f h = let Ok obj h <= allocateobj h |> Fail h -> Fail h |
| and Ok obj' h <= allocateobj h |> Fail h -> let _ = free obj in Fail h |
| in Ok (mergeObj (obj, obj')) h |
| |
| This piece of code is semantically identical to the one above. ``<=`` |
| matches the major case, and ``|>`` bails out with the minor case. |
| |
| Patterns |
| -------- |
| |
| Patterns may be refutable (could fail, e.g ``Ok a`` or ``43``) or |
| irrefutable (always match, e.g ``(a,b)`` or ``_``). Refutable patterns |
| can be used in a matching block only, but they can only nest irrefutable |
| patterns. So, unlike in Haskell, you can't go: |
| |
| :: |
| |
| f x = foo x |
| | Ok (Alt1 3) -> bar |
| | _ -> baz |
| |
| As this nests a refutable pattern (``3``) inside another refutable |
| pattern (``Alt1 3``) inside another refutable pattern (``Ok (Alt1 3)``). |
| |
| This is forbidden to make compilation much more straightforward in the |
| presence of linear types. |
| |
| Let-binding |
| ----------- |
| |
| Let expressions take the form of ML. They are not ever recursive. |
| Multiple let-bindings can be introduced by separating them with ``and``: |
| |
| :: |
| |
| f () = let x = 3 |
| and y = 4 |
| in foo (x,y) |
| |
| Is equivalent to: |
| |
| :: |
| |
| f () = let x = 3 |
| in let y = 4 |
| in foo (x,y) |
| |
| Irrefutable single patterns may occur on the left hand side of let-binding, but |
| refutable patterns must use regular pattern matching. |
| |
| To force inference to go the way you want, a type signature can be |
| provided for a let binding: |
| |
| :: |
| |
| f () = let x : U8 = 3 |
| in let y : U16 = 4 |
| in foo (x,y) |
| |
| Observation and ``let!`` |
| ------------------------ |
| |
| Variables may be observed using ``!``: |
| |
| :: |
| |
| f (x, y) = let (a,b) = foo (x, y) !x !y |
| in bar (a, b) |
| |
| Prefix ``!`` annotations can be used inline with pattern matching also: |
| |
| :: |
| |
| f (x,y) = foo(x,y) !x !y |
| | Blah x => bar x |
| | Blorp z -> baz z |
| |
| If |
| -- |
| |
| Conditionals can be expressed in the form of if-expressions. They are in |
| the form of ``if c !v1 !v2 ... then e1 else e2``. The ``!v``\ s are |
| similar to the ``!`` syntax introduced above, allowing for temporary |
| access to linear objects in the condition. |
| |
| Apart from the normal if-then-else syntax, Cogent offers a multi-way if |
| syntax, inspired by GHC/Haskell. For example, |
| |
| :: |
| |
| if | cond_1 -> expr_1 |
| | cond_2 -> expr_2 |
| | ... |
| | else -> expr_n |
| |
| In the code snippet above, the conditions are Boolean **expressions**, |
| instead of patterns as one might think. The final ``else`` is part of |
| the syntax. The pipes have to be aligned. The arrows, as usual, can be |
| any of ``=>``, ``->`` or ``~>``, which have the same semantics as used |
| in alternatives. Prefix ``!``\ s can be added after each condition (but |
| not after the ``else`` keyword), like ``| cond_1 !v1 !v2 => e``. |
| |
| Sequencing |
| ---------- |
| |
| Occasionally, it is useful to free a bunch of things, and using ``let`` for |
| this purpose can be somewhat annoying: |
| |
| :: |
| |
| f : (Obj, Obj) -> Int |
| f (a, b) = let _ = free a |
| and _ = free b |
| in 42 |
| |
| So, a little sugar is added for a series of discarding let bindings: |
| |
| :: |
| |
| f : (Obj, Obj) -> Int |
| f (a, b) = free a; free b; 42 |
| |
| These two expressions are equivalent. |
| |
| |
| Records |
| -------- |
| Unboxed records are initialized as follows: |
| |
| :: |
| #{x = e1, y = e2} |
| |
| Boxed records are allocated on the heap. They are typically initialized by |
| first calling the external C malloc function and then updating the record |
| using the put operation. |
| |
| Take/put |
| -------- |
| |
| There is pattern syntax for ``take``, and a similar expression syntax |
| for ``put``: |
| |
| :: |
| |
| f : {a : Foo, b : Bar} -> {a : Foo, b : Bar} |
| f (r {a = ra, b = rb}) = r {a = ra, b = rb} |
| |
| .. note:: ``take`` is always in patterns (i.e. LHS of ``=``), whereas |
| ``put`` is always in expressions (i.e. RHS of ``=``). |
| |
| Punning is also allowed: |
| |
| :: |
| |
| f : {a : Foo, b : Bar} -> {a : Foo, b : Bar} |
| f (r {a, b}) = r {a, b} |
| |
| (where just ``a`` is equivalent to ``a = a``) |
| |
| Arithmetic and comparison operators |
| ----------------------------------- |
| |
| Currently Cogent will use the smallest type possible for integer |
| literals and generate upcasts (but not downcasts) automatically when |
| used in a context where they are required. For non-literals, an explicit |
| ``upcast`` primitive may be needed. |
| |
| Type annotations |
| ---------------- |
| |
| To guide the type inference engine, the user can give type annotations |
| to any expressions. The syntax is ``e : t``. |
| |
| |
| Experimental features |
| ===================== |
| |
| .. warning:: Experimental features, as they are called, are indeed experimental, |
| which means they are not stable, and may fail, terribly. Don't rely on them |
| until they become stable. |
| |
| .. _static-arrays: |
| |
| Static arrays |
| ------------- |
| |
| .. note:: This feature is not implemented on the ``master`` branch. It's implemented |
| on the ``array`` branch and needs to be enabled by the ``--builtin-arrays`` flag |
| (see :ref:`optional-features` for more information on how to do that). |
| |
| Array literals can be introduced using square brackets, like |
| ``[a, b, c]``. This syntax can also be used as patterns. Array types can |
| be defined like ``U8[3]``, similar to C. Indexing can be made with the |
| ``@`` operator. E.g.: ``arr @ 3``. Arrays can also be ``take``\ n and ``put``. |
| The type-level ``take`` and ``put`` operators are spelt as ``@take`` and ``@put`` |
| respectively. On the term level, it's written as ``arr @{ @idx = val }``--- |
| just prefix the open bracket and the indices with ``@`` symbols, and the rest |
| are the same as those for records. |
| |
| |
| Lambda expressions |
| ------------------ |
| |
| We only allow for a very limited form of lambda expressions. A lambda |
| expression has the syntax ``\irref => exp``, where ``irref`` is an |
| irrefutable pattern, and ``exp`` is an expression which does not refer |
| to any variables outside the lambda binding (no free variables). The |
| bound variables have to be non-linear. |
| |