So, the only parser left undefined in the previous parsers is typeSuffixP. It's used in typeP, which is the top-level single-type parser.
typeSuffixP = choice [arrayP <?> "an array type"
, tok "lazy" >> return (Generic (Id "lazy") . list)
, tok "when" >> constraintP
, (do id <- identP; return (Generic (Id id) . list))
<?> "a stupid suffix type"]
typeP = do
t <- termP
option t (return . nestTypes t . reverse =<< many typeSuffixP)
Type suffixes break down into four categories: .NET-style arrays, lazy values, constraints and Caml-style "stupid suffix types". Yeah. One reason I don't like them is that they were hard to parse in the first version of my code: it's hard to see where they end. Lazy values return the same structure as generics, because they really are just a special case of a generic type. They just get a keyword for some reason (probably left over from O'Caml). So that leaves only arrays and constraints.
arrayP = do
dims <- between (tok "[") (tok "]") (many (tok ","))
return $ Array (1+length dims)
Arrays are simple, just brackets around some commas. A 2D array of ints is written int[,], a 4D one is int[,,,]. That leaves the most complicated for last: constraints. Constraints are a rabbit hole. Like I said last time, they take about 3 times the code of the rest of the parser, for types that will not occur the majority of the time. Oh well. I should stop complaining and show you constraintP. Remember, from typeSuffixP, we know this is a normal type followed by 'when' and then a constraint. For example, list<'a> when ....
constraintP = typevarChoiceP <|> singletypevar
where singletypevar = do
Var var <- typevarP
subtype var <|> (tok ":" >>
choice [tok "null"
>> return (Constraint (Null var))
, tok "struct"
>> return (Constraint (Struct var))
,try (toks (words "( new : unit -> ' T )")
>> return (Constraint (DefaultConstructor var)))
, toks (words "not struct")
>> return (Constraint (NotStruct var))
, enum var
, traitP var
, delegate var])
subtype var = do
tok ":>"
t <- arrowP
return $ Constraint (Subtype var t)
enum var = do
tok "enum"
t <- between (tok "<") (tok ">") arrowP
return $ Constraint (Enum var t)
delegate var = do
tok "delegate"; tok "<"
leftT <- arrowP
tok ","
rightT <- arrowP
tok ">"
return $ Constraint (Delegate var leftT rightT)
Yowza! That's kind of long. Uh...OK. The first choice is a high level choice between a single constraint or multiple constraints. I still don't understand the semantics of multiple constraints, but it looks like
list<'a> when ('a or 'b) : (read: 'a -> 'b)
Maybe. I'll get back to multiple constraints later. For now, look at singletypevar. It first parses a typevar (eg 'a, ^a or _), then parses the constraint on that typevar. There are eight kinds of constraints; the simple ones are included inside constraintP. The simplest is the subtypeconstraint, for example
list<'a:> when 'a :> Microsoft.Collections.IComparable
All the other constraints have an intervening tok ":". Most of these are quite simple, like the null, struct, not-struct and default-constructor constraints. (Disclaimer: I'm still not sure of the semantics of some of these, but they are definitely easy to parse). Enum and delegate constraints are almost as simple to parse. They look like
list<'a> when 'a : enum<int>
or
list<'a> when 'a : delegate<int,int>
That leaves one constraint that leads us further down the rabbit hole: traitP. Traits are structural typing shoehorned into a nominal type system, so they are naturally pretty complicated. In fact, the grammar calls for attributes, which can contain arbitrary expressions, to be allowed in the grammar at one point. At this point this parser does not support that, but I suppose some day it will have to in order to be correct. The basic idea of trait is that you specify a method that a type must contain. The types do not need be explicitly related by an inheritance relation; all subtypes are implicitly related because they support the same set of methods. For example:
^a when ^a : (toString : ^a -> string)
Is the structural specification for "all types that have a toString method". O'Caml's object system works exactly this way, and Haskell's type classes are pretty close. It's not used much in F# because of the already existing nominal type system used for inheritance. It's basically type-checked duck typing, although from my fading memory of TAPL, there are some drawbacks in complexity of compile times and in the details of inheritance. Aside: At least the F# compiler actually checks its structural types. C++ appears to support structural typing for itsgenerics, but really it's just untyped text substitution. Type errors happen in the SECOND stage of compilation, where everything is a concrete type and has the longest, ugliest name in the world. C++ was supposed to get traits, but -sigh- they wanted to make sure it remained the unintentionally least friendly language in the world. So anyway, that's two paragraphs explaining traits. Here is the parser for them:
traitP var =
between (tok "(") (tok ")") (memberSigP var)
memberSigP var = do
id <- identP
(vars,constraint) <- option ([],Nothing) typevarDefnsP
tok ":"
sig <- curriedSigP
property <- option Function accessorsP
let t = if null vars then Id id else Generic (Id id) vars
case constraint of
Nothing -> return $ Constraint (Sig var t sig property)
Just con -> return $ Constraint (Sig var
(Constraint (TyparConstraint con) t)
sig
property)
Let's see...a trait is a member sig for some var, surrounded by parentheses. The actual member sig starts off with an identifier, something like toString in the above example. Then there's the option to introduce MORE type variables, which can themselves have constraints. (More on that below.) Then there's the actual sig (called curriedSigP, you'll see why in a second), followed by an option modifier that specifies that the whole thing is actually a property instead of a method. In other words, you can say this:
^a when ^a : (Stringed : ^a -> string with get)
and the with get means that Stringed is a property, not a function. This is necessary because of the sugary nature of properties, hiding two functions behind one, etc. I suspect this makes it hard to pass properties around as first-class functions, but I don't have an F# REPL running in front me right now to check. Anyway, here accessorsP to show you how the with getbusiness is parsed. It's super simple.
accessorsP = do
tok "with"
choice [try (toks (words "get , set") >> return GetSet)
, try (toks (words "set , get") >> return GetSet)
, tok "get" >> return Get
, tok "set" >> return Set]
Anyway, the real heart of the thing is curriedSigP. I feel a little bad about it, because it's basically a copy of the main grammar loop of arrowP/tupleP/typeP. The difference is that, um, instead of "arrowSigP", I named it "curriedSigP", which is a different way to say the same thing...oh no wait, it's the spec's fault. The spec calls it that. ANYWAY, here is the code:
curriedSigP = do
arg <- argsSpecP
tok "->"
args <- sepBy1 argsSpecP (tok "->")
return $ Arrow (arg:args)
argsSpecP = do
args <- sepBy1 argSpecP (tok "*")
case args of
[arg] -> return arg
args -> return $ Tuple args
argSpecP = do
argspec <- optionMaybe (try argNameSpecP)
t <- typeP
case argspec of
Just (optional,name) -> return $ NamedArg name t optional
Nothing -> return t
argNameSpecP = do
optional <- optionMaybe (tok "?")
argname <- identP
tok ":"
return $ (isJust optional, argname)
The first real difference from the main loop is that curriedSigP requires the toplevel type to be a function. It parses one argsSpecP, an arrow, then one or more argsSpecPs. That's because the structural typing of the F# compiler can't guarantee the presence of data members? I guess? That's a good thing from an OO perspective at least. The second difference is that you can name the args, and named arguments are allowed to be optional. So, to continue with the toString example, you could name the argument:
^a when ^a : (toString : ?zelf : ^a -> string with get)
And I made zelf optional for kicks, even though that makes no sense here and probably wouldn't really compile. Commentary: coming from strict and minimal languages like Haskell and Scheme, I think that optional arguments are kind of over-complicated, especially in a language that already has ad-hoc polymorphism via overloading. I'd rather have a smaller language and a smarter type inferencer. Because, you know, Haskell is sort of OO, and I can count on one hand the times I've had to supply type annotations. In F#, which is OO plus so much more, I have to drop in type annotations at least once a file, more if I interact with .NET code a lot. OK, that leaves one last crazy constraint. Seriously, I'm not sure why you'd need a type this constrained. Here's an example:
^a when ^a : (something<'b,'c> : ^a * 'c -> 'b)
So, yeah, you can introduce new type variables. But that's not all! You can alsointroduce recursive constraints! Like so:
^a when ^a : (something<^b
when ^b : (ohNoNotAgain : ^b -> ^a)> : ^a -> ^b)
It's fully recursive, all right. You can structurally constrain the types that are bound in the scope of your structurally constrained methods. (Don't worry if you didn't understand that, you might not be human.)
typevarDefnsP = do
tok "<"
defns <- sepBy1 typevarP (tok ",")
constraints <- optionMaybe typevarConstraintsP
tok ">"
case (defns,constraints) of
(_,Nothing) -> return (defns, Nothing)
([defn],Just cs) -> return (defns,Just $ nestTypes defn cs)
(defns,Just cs) ->
return (defns,Just$nestTypes (Var (Choice (map devar defns))) cs)
typevarConstraintsP = do
tok "when"
sepBy1 constraintP (tok "and")
devar (Var var) = var
list x = [x]
The code is pretty straightforward, just brackets around some comma-separated type variables (typevarDefnsP), optionally followed by when and some and-separated constraints. It does some fancy processing when you do have constraints, but that's because I need a unique way to represent each one of these structures internally. It's a representation that's probably wrong: best not get too attached to them; they'll probably all have to change when I start actually USING them. By the way, if you're hoping to get in there on a beta of Fing and start with some CRAZY RUDE constraints on structurally constrained types, you may be disappointed by the first few versions. My priority is to be able to search for obvious things first, like int -> seq<'a> -> 'a, then for subtype relations so that you can also search for list<'a>, ResizeArray<'a>, etc. OK, one last thing: type var choices. These are, as far as I can tell, genuinely ambiguous. You can write something like
map<^a,^b> when (^a or ^b) : (toString : ^a -> string)
I think we can all agree that's pretty messed up. Is it 'a or 'b that has to have a toString member? Who knows? Put an orbetween 'em and call it done! There is one quirk, though: only structural constraints are allowed. So this is illegal (hilarious though it may be):
map<^a,^b> when (^a or ^b) : (^a :> ^b)
The code should be easy to read; it's a lot like the other parsers.
typevarChoiceP = do
typars <- between (tok "(") (tok ")") (sepBy typevarP (tok "or"))
let typars' = map devar typars
tok ":"
Constraint (Sig _ id member prop) _ <- liftM ($ Id "DUMMY") (traitP Anonymous)
return $ Constraint (Sig (Choice typars') id member prop)
So that wraps up the Parsec version of the parser. The entire thing is available to download. The FParsec version is a lot longer, because I had to write the low-level backtracking parser tok by hand there. (I might be able to fix it now, but I'm also sick of parsing now.) I will post the FParsec version at the top of my next post, where I'll start to look at the data structure that the parser builds. That may be a little while, because my next step is to get a simple version running. This data structure is quite a bit more complicated than the toy version I got from kvb.