Learn You A Haskell For Great Good

by sandersn 28. May 2011 08:30

 

If you visit learnyouahaskell.com, you'll see that Learn You a Haskell For Great Good originated as a Haskell tutorial when the author decided he wanted to make sure he really knew Haskell. Haskell tutorials are a dime a dozen, especially monad tutorials, but that's exactly what it is; an introductory Haskell tutorial followed by a monad tutorial. It has the distinction that it is very well explained and much more complete than the average tutorial, as well as now existing in dead tree form. But it is not like Real World Haskell, which explains concepts in the order a practising coder would need to get a useful program off the ground. Learn You A Haskell For Great Good assumes that you are a practising coder in an imperative language, but generally explains concepts in order from simplest to most complex.

 

That's cool if you want a summary of the language, and if you're approaching Haskell as an enthusiast rather than a practitioner then you will probably like it. The focus is on expanding your mind, not (for example) finding the right tradeoffs between safe and succinct. It's arguable that this is the right approach for Haskell, which was traditionally a research playground, and the that the people behind Real World Haskell are engaging in wishful thinking when they try to make Haskell a production language.

 

Personally I prefer the practical approach (I've even used Haskell for scripting (!) before), but the tutorial approach has its advantages; it's a lot more thorough, and it covers cool new stuff in favour of things like database libraries. Even after 2-3 years of Haskell use, I learned interesting new things about Haskell's kinds, applicative functors and zippers.

 

The book is divided into two tuturials: basic Haskell and monads. The basic Haskell section is not too surprising--it covers the syntax and semantics of the basic constructs of the language. I think it could have done with a bit more emphasis on solving problems recursively, because the primary difficulty of learning Haskell is not the syntax. Really, it assumes that you are an intelligent person who is willing to work through new concepts yourself. If you don't already know at least one of: Lisp*, Erlang or ML*, then for additional help, you might want look at the Little Schemer, which does a great job of teaching recursive thinking without much language overhead.

 

*Popular members of the Lisp family are Scheme, Common Lisp and Clojure. Popular members of the ML family are F#, Caml and SML.

 

The other half of the book is a monad tutorial. It's tied with Real World Haskell for the best tutorial I've read. Instead of starting with ugly code and cleaning it up, Learn You a Haskell starts with functors, then progresses through applicative functors, monoids and on to monads. Although I suspect that this order does not recapitulate the history of these libraries, it makes a lot of sense. You get to see how functors generalise `map` on lists, then how applicative functors allow arbitrary functions to be lifted and applied in a functor (as well as providing left-to-right evaluation, which is something I missed when I was learning monads), finally ending with the way that monads allow chaining of functions to produce an imperative style.

 

I don't know if this is how the web site presented it--it takes a lot of space, but I think that it's worth it. If you have bounced off monad tutorials in the past, this one may work for you. You'll do a lot of reading, but you'll also end up with a wide, solid base for understanding monad usage.

 

Besides, if you get bored, there's always the fact that Learn You a Haskell is Officially Funny. However, the humour quickly fades into the background. This is no _why's Poignant Guide--the humour is closer to the occasional cracks in my ancient Learn Visual J++ 1.1 book. Well, I guess the drawing are pretty cool. I can't believe Capcom let them include a cartoony sketch of Quickman, but it's AWESOME. (There's also a Wiggler from Mario used to demonstrate head/tail.) Unfortunately, the drawings too fade into the background as they become less topical, or at least start referring to pop culture that I haven't heard of. Overall, it reminds me a lot of O'Reilly-quality writing. That's good but (and here I'm biased) not too surprising coming from the extraordinarily literate Haskell community.

 

Although the writing is good, it's hard for me to estimate how good the book is at teaching, since I already know Haskell. Fortunately, there is a chapter on zippers tucked in at the end, so I can use that to estimate for the rest of the book. Here, I easily acquired an intuitive understanding of what zippers are and how they work--the examples and analogies were good. But I didn't feel like I got a concrete understanding of their implementations, or when I'd need to implement my own. I'll need to actually play with zippers myself to understand them properly.

 

Bottom line: I didn't think too much of the basic Haskell tutorial, but I liked the monad tutorial. It's worthwhile if you have the patience to read the whole thing. Compared to the other two Haskell books I own, Haskell: The Craft of Functional Programming  (Thompson) and Real World Haskell (O'Sullivan, Goerzen & Stewart), Learn You a Haskell For Great Good fills the niche for smart programmers who already know an imperative language and want to learn Haskell for its own sake, covering concepts thoroughly whether or not they will be practically useful. The result is a more focussed book that covers less.

Tags: , , , ,

Books | Language | Programming

Fing 0.2 module-level overview

by sandersn 12. December 2010 02:42

It's been about six months since I blogged about Fing. I've only touched the code a couple of times since then, so I will restart the series by describing the module organisation as it stands today, in the process of developing version 0.2

Fortunately, I wrote all this down six months ago on the fing github wiki, so if you want the details, including module contents, look there. This is a module-level overview.

Util

Utilities. That usually means: utilities that ship with Haskell but not with F#.

Types

The basic types, plus basic operations on them: fold, map, string formatting.


Everything below this line depends on Util and Types (except for Opt).

ParsedTypes

string → internal type representation. For example, user entered types.

FSharpTypes

F# Powerpack Metadata → internal type representation.

CSharpTypes

C# Reflection representation → internal type representation. This is not implemented yet; maybe it should be called DotNetTypes instead.

Search

Search for functions matching a given type.


Everything below this line depends on everything above this line (mostly).

Fing

The public interface: string → list<result> and related operations.

Opt

Command line parsing. I can't believe this isn't part of the standard library.


Program depends on Fing and Opt

Program

Command line interface. To be augmented later by other kinds of interfaces, probably an ASP one next.

Most of my time so far has been refactoring and working on the woefully incomplete tests. Earlier I made a conscious decision to push out a number of features without testing so that I'd have something people could use. Now I'm coming back through and writing the missing tests.

My approach was to build a decent database of types for testing. Most of my tests then assert that a property holds about all the types. This is a lot like QuickCheck, but without the random generation. I personally think it serves me better, because I manually generated all the complex cases of the grammar without recursion on the obvious cases. I could be missing stupid-simple cases, of course. (Also I didn't want to mess with installing F#'s port of QuickCheck—without typeclasses, I assume that it's kind of klunky and painful.)

For example, my newest test asserts that, for all types, if you shuffle the order of type variables, then run Types.index, you will get the original type back. This holds because (1) Types.index overwrites type variables with ones in canonical order and (2) all my test cases were manually created with type variables in canonical order. Actually (2) is a bad idea; it's OK for parser testing but is overly simple when you start testing semantics.

The more I think about it, the more I think I may need something like QuickCheck, at least some hacky fuzzing of my existing, over-regular test cases. That should probably be the next major step in the testing of the code, even if F# Quickcheck is painful to install.

Tags: , , , ,

Language | Programming

Developing Fing, part 4: command line interface

by sandersn 16. June 2010 08:33

The previous entry in this series finished (sort of) the parser. Two things have been hurting my motivation to continue blogging. First is that I needed to blog about the F# version of the parser, since the version I've already written about is Haskell. I don't really want to write about the same thing twice. Second, and more important, is that I was busy working on the actual code instead of blogging about it. There is a decent version online at http://github.com/sandersn/fing. It's still missing three major features (web app, subtyping and type aliases), but it works for everything in FSharp.Core. Now I'm at the point of bringing tests up-to-date with the code, so, you guessed it, I have more motivation to blog again. But I still don't have the gumption to talk about the parser. I'm sick of parsing for now, so maybe I'll write it up in six months, or when I have time to revamp the core token parser, which is super ugly. Instead I'm going to write about simple stuff: the command line interface. It's straightforward, but there is at least one interesting principle in there. I'll work up to the more complicated parts of the code later.

module Main

[<EntryPoint>]
let main args = 
  let args,argmap = Opt.parse args
  let getrefs s = Seq.choose id (Opt.mapGet s argmap Seq.empty)
  let references = getrefs "r" |>Seq.append<| getrefs "reference"
  Fing.addReferences references
  match args with
  | [| t |] -> Fing.textSearch t
  | _ -> printfn "Fing is F# API Search. (rest of usage message...)"

There isn't much surprising here: first I call Opt.parse to produce a list of arguments and keyword arguments. Then I define a function to arguments matching a string s, and use it to build a sequence of "-r" and "--reference" arguments. Well, wait a minute, that's kind of weird, a nested function. I could have done it this way:

  let rs = Seq.choose id (Opt.mapGet s argmap Seq.empty)
  let references = Seq.choose id (Opt.mapGet s argmap Seq.empty)
  Fing.addReferences (rs |>Seq.append<| references)

And that's not badstyle, it just isn't ideal. In the second way, I name the two values, repeating the code that generates them. In the first way, I name the code that generates the two values and repeat only the name I gave it. There's a little more repetition the second way, with not much benefit. (I also get a chance to show off the infix-function trick that I discovered. I'm still very proud of that.) This illustrates one of the subtler aspects of functional programming. The three aspects of functional programming that I think of first are:

  1. Immutability
  2. Functions as data
  3. Functions everywhere

Well, 'functions as data' looks like a subset of 'functions everywhere', but both have a specific meaning. 'functions as data' refers to uses of functions that doesn't occur at all in imperative or OO style: passing them around, saving them in data structures. 'functions everywhere' refers to the size and frequency of functions, which is far higher than in imperative or OO styles. Even though the ideal for all styles is many small functions (or methods), functional languages make it natural to achieve this ideal. It is definitely not natural in normal imperative, structured code, and all the "short methods are good for you" advice books indicate to me that it's not natural in OO either. In any case, if you compare functional languages and OO languages, I think you will find a much greater percentage of the former make short function definition natural. So, to finish up:

  match args with
  | [| t |] -> Fing.textSearch t
  | _ -> printfn "Fing is F# API Search. (rest of usage message...)"

There's not much to say here, except that I am glad pattern matching works on arrays and I wonder why it doesn't work on sequences. I can think of two reasons: first, seq (aka IEnumerable) is an interface, and pattern matching is a concrete operation. Second, it might be inefficient to pattern match the tail of a sequence. I don't grok the efficiency model of iterators, but I think that, because they're not defined recursively, a pattern-match, which treats them as recursive, must create a new enumerator to point to the tail of the matched sequence. Regardless, it's an inconvenience and likely one that can be fixed with an active pattern. That's another F# feature I still need to learn.


Post-script: I had to write Opt.parse myself. I couldn't find one built-in to .NET. Maybe I was just didn't search hard enough, but all I found was half-baked code online. I started with one example, intending to translate it to F# in order to understand it. Once I understood it, I saw that it was buggy and incomplete, and wrote my own from scratch. (I just want back, read the comments below the code, and found a link to a reflection/annotation-based library called Consolery. Nothing in the standard library though. Anyway, continuing with my earlier theme of "I'll blog about whatever I want to blog about", I'll probably talk about Opt.parse next. You might be able to help me iron out some of its ugly parts.

Tags: , ,

Language | Programming

Stack Overflow, The Death of Words and Type Classes

by sandersn 5. June 2010 21:16

C S Lewis has a perceptive essay called "The Death of Words". You can find the title screen here; you can poke around and probably convince Google to let you read most of the essay. Anyway, the problem that he's complaining about is that the meaning of words tend to change over time. A lot of words that meant something just mean "good" now, partly because we liked the original thing. This is well-known in historical linguistics, so maybe Lewis picked it up from hanging around Tolkien. Lewis captures this in the quote: "And as long as most people are more anxious to express their likes and dislikes, this must remain a universal truth about language." He starts giving examples and a couple of paragraphs hits upon the word which, in the 21st century, is almost beyond dead. People have almost stopped using it, even to mean "good", except in computer science. Lewis says, "Modern ... has ceased to be a chronological term ... and often means little more than 'efficient'". A recent question on Stack Overflow about dynamic scopehas an answer by Eli Bendersky. The answer is great, but it's also a good example of how dead 'modern' is. Here's the end of his post (emphasis mine).

That said, lexical binding is IMHO much better for 99% of the cases. Note that modernLisps are not dynamic-binding-only like Emacs lisp.
  • Common Lisp supports both forms of binding, though the lexical one is used much more
  • The Scheme specification doesn't even specify dynamic binding (only lexical one), though many implementations support both.
In addition, modern languages like Python and Ruby that were somewhat inspired by Lisp usually support lexical-binding in a straightforward way, with dynamic binding also available but less straightforward.

Now, Bendersky frequently gives insightful answers about all kinds of Lisps and the P-languages descended from Lisp. I am sure he knows that Scheme (1975) is 10 years older than Emacs Lisp (1985) and Common Lisp is a year older (1984). I am sure he also knows that Python (1991) is only 6 years younger and Ruby (1995) 10 years. The reason E-Lisp didn't have lexical scope is not because it's less modern than the other languages. It's because it was a mediocre scripting language from its inception. The fact that it hasn't added lexical scope in the intervening 25 years means that it's now a bad scripting language. Look: Python didn't have any nested scope until 2.0. But at least Guido added it. No such luck with E-Lisp. The word "modern" is dead. It just means "good" here, specifically "has good features". But this meaning doesn't survive from previous usage; it's picked up from context. And it really annoys me. It would be a lot better just to say "Note that good Lisps..." and "In addition, "good languages like Python and Ruby...". Even better, be substantive in your dislike: "In addition, "Note that full-featured Lisps..." and "In addition, well-designed languages like Python and Ruby..." By the way, those substantive reasons mean that Emacs would be better off with any of the languages in the preceding paragraph besides E-Lisp, by the way. They're all better languages, even Common Lisp, which hasn't changed since 1994. If somebody could port all the millions and millions lines of E-Lisp, or least provide backward compatibility, it would probably extend Emacs' popularity ten-fold. (I was going to say "life span", but I'm convinced that's converging on co-equal with that of the human race at this point.) In conclusion, Lispers, don't mince words: E-Lisp is a bad Lisp, not because it's not modern, but because it's badly designed and feature-free.


The second question I noticed today is related to my recent post trying to emulate some uses of typeclasses with "standard" (ie Java) OO. Here, the question is "how similar are Go's interfaces and Haskell's type classes?" Answer: just as similar as standard OO's interfaces, because Go's interfaces are no more powerful than Java's. They're just type-checked structurally. That's cool, because implementation is implicit and can be done by anybody, not just the original author. But it's nowhere near as powerful as typeclasses. Of course systems with delayed typing (like Ruby's mixins and C++'s templates) can be as powerful as typeclasses; by the time the type is checked, the type variables have been fully instantiated so the checker sees nothing but concrete types. But...it also means your error messages also contain nothing but concrete types; that's the trade-off. I guess the only reason for this question is that Go only has two tutorial documents, and the concept of structural typing is so unfamiliar that people stop to re-examine their assumptions about what you can and can't do. I did, but after re-reading the Go tutorial I'm convinced that you don't get any extra power above Java. Update: Norman Ramsey chimed in a few minutes ago to point out that Haskell's type system is equivalent in power to cut-free Prolog, ie "pure" Prolog, ie mini-Kanren. So if I ever want to subject myself to that pain again, I know I can do it without leaving Haskell.


Edit: Lispers have been abusing 'modern' for a long time. While packing for my upcoming move, I ran across Eugene Charniak's AI book from 1985. The Lisp tutorial chapter refers to Scheme as a "modern" Lisp and Franz Lisp as a "less modern" Lisp. Yup, still backwards: Scheme: 1975; Franz Lisp: 1978. (Although it is based on MacLisp, which is an older design.)

Tags: , , ,

Language | Programming

Developing Fing, part 3: Parsec parser 3

by sandersn 26. May 2010 14:01

So, the only parser left undefined in the previous parsers is typeSuffixP. It's used in typeP, which is the top-level single-type parser.

typeSuffixP = choice [arrayP <?> "an array type"
                      , tok "lazy" >> return (Generic (Id "lazy") . list)
                      , tok "when" >> constraintP
                      , (do id <- identP; return (Generic (Id id) . list))
                                  <?> "a stupid suffix type"]
typeP = do
  t <- termP
  option t (return . nestTypes t . reverse =<< many typeSuffixP)

Type suffixes break down into four categories: .NET-style arrays, lazy values, constraints and Caml-style "stupid suffix types". Yeah. One reason I don't like them is that they were hard to parse in the first version of my code: it's hard to see where they end. Lazy values return the same structure as generics, because they really are just a special case of a generic type. They just get a keyword for some reason (probably left over from O'Caml). So that leaves only arrays and constraints.

arrayP = do
  dims <- between (tok "[") (tok "]") (many (tok ","))
  return $ Array (1+length dims)

Arrays are simple, just brackets around some commas. A 2D array of ints is written int[,], a 4D one is int[,,,]. That leaves the most complicated for last: constraints. Constraints are a rabbit hole. Like I said last time, they take about 3 times the code of the rest of the parser, for types that will not occur the majority of the time. Oh well. I should stop complaining and show you constraintP. Remember, from typeSuffixP, we know this is a normal type followed by 'when' and then a constraint. For example, list<'a> when ....

constraintP = typevarChoiceP <|> singletypevar
  where singletypevar = do
          Var var <- typevarP
          subtype var <|> (tok ":" >>
                           choice [tok "null"
                                   >> return (Constraint (Null var))
                                  , tok "struct"
                                   >> return (Constraint (Struct var))
                                  ,try (toks (words "( new : unit -> ' T )")
                                        >> return (Constraint (DefaultConstructor var)))
                                  ,  toks (words "not struct")
                                   >> return (Constraint (NotStruct var))
                                  , enum var
                                  , traitP var
                                  , delegate var])
        subtype var = do
          tok ":>"
          t <- arrowP
          return $ Constraint (Subtype var t)
        enum var = do
          tok "enum"
          t <- between (tok "<") (tok ">") arrowP
          return $ Constraint (Enum var t)
        delegate var = do
          tok "delegate"; tok "<"
          leftT <- arrowP
          tok ","
          rightT <- arrowP
          tok ">"
          return $ Constraint (Delegate var leftT rightT)

Yowza! That's kind of long. Uh...OK. The first choice is a high level choice between a single constraint or multiple constraints. I still don't understand the semantics of multiple constraints, but it looks like

list<'a> when ('a or 'b) : (read: 'a -> 'b)

Maybe. I'll get back to multiple constraints later. For now, look at singletypevar. It first parses a typevar (eg 'a, ^a or _), then parses the constraint on that typevar. There are eight kinds of constraints; the simple ones are included inside constraintP. The simplest is the subtypeconstraint, for example

list<'a:> when 'a :> Microsoft.Collections.IComparable

All the other constraints have an intervening tok ":". Most of these are quite simple, like the null, struct, not-struct and default-constructor constraints. (Disclaimer: I'm still not sure of the semantics of some of these, but they are definitely easy to parse). Enum and delegate constraints are almost as simple to parse. They look like

list<'a> when 'a : enum<int>

or

list<'a> when 'a : delegate<int,int>

That leaves one constraint that leads us further down the rabbit hole: traitP. Traits are structural typing shoehorned into a nominal type system, so they are naturally pretty complicated. In fact, the grammar calls for attributes, which can contain arbitrary expressions, to be allowed in the grammar at one point. At this point this parser does not support that, but I suppose some day it will have to in order to be correct. The basic idea of trait is that you specify a method that a type must contain. The types do not need be explicitly related by an inheritance relation; all subtypes are implicitly related because they support the same set of methods. For example:

^a when ^a : (toString : ^a -> string)

Is the structural specification for "all types that have a toString method". O'Caml's object system works exactly this way, and Haskell's type classes are pretty close. It's not used much in F# because of the already existing nominal type system used for inheritance. It's basically type-checked duck typing, although from my fading memory of TAPL, there are some drawbacks in complexity of compile times and in the details of inheritance. Aside: At least the F# compiler actually checks its structural types. C++ appears to support structural typing for itsgenerics, but really it's just untyped text substitution. Type errors happen in the SECOND stage of compilation, where everything is a concrete type and has the longest, ugliest name in the world. C++ was supposed to get traits, but -sigh- they wanted to make sure it remained the unintentionally least friendly language in the world. So anyway, that's two paragraphs explaining traits. Here is the parser for them:

traitP var =
  between (tok "(") (tok ")") (memberSigP var)
memberSigP var = do
  id <- identP
  (vars,constraint) <- option ([],Nothing) typevarDefnsP
  tok ":"
  sig <- curriedSigP
  property <- option Function accessorsP
  let t = if null vars then Id id else Generic (Id id) vars
  case constraint of
    Nothing -> return $ Constraint (Sig var t sig property)
    Just con -> return $ Constraint (Sig var
                                         (Constraint (TyparConstraint con) t)
                                         sig
                                         property)

Let's see...a trait is a member sig for some var, surrounded by parentheses. The actual member sig starts off with an identifier, something like toString in the above example. Then there's the option to introduce MORE type variables, which can themselves have constraints. (More on that below.) Then there's the actual sig (called curriedSigP, you'll see why in a second), followed by an option modifier that specifies that the whole thing is actually a property instead of a method. In other words, you can say this:

^a when ^a : (Stringed : ^a -> string with get)

and the with get means that Stringed is a property, not a function. This is necessary because of the sugary nature of properties, hiding two functions behind one, etc. I suspect this makes it hard to pass properties around as first-class functions, but I don't have an F# REPL running in front me right now to check. Anyway, here accessorsP to show you how the with getbusiness is parsed. It's super simple.

accessorsP = do
  tok "with"
  choice [try (toks (words "get , set") >> return GetSet)
         , try (toks (words "set , get") >> return GetSet)
         , tok "get" >> return Get
         , tok "set" >> return Set]

Anyway, the real heart of the thing is curriedSigP. I feel a little bad about it, because it's basically a copy of the main grammar loop of arrowP/tupleP/typeP. The difference is that, um, instead of "arrowSigP", I named it "curriedSigP", which is a different way to say the same thing...oh no wait, it's the spec's fault. The spec calls it that. ANYWAY, here is the code:

curriedSigP = do
  arg <- argsSpecP
  tok "->"
  args <- sepBy1 argsSpecP (tok "->")
  return $ Arrow (arg:args)
argsSpecP = do
  args <- sepBy1 argSpecP (tok "*")
  case args of
    [arg] -> return arg
    args -> return $ Tuple args
argSpecP = do
  argspec <- optionMaybe (try argNameSpecP)
  t <- typeP
  case argspec of
    Just (optional,name) -> return $ NamedArg name t optional
    Nothing -> return t
argNameSpecP = do
  optional <- optionMaybe (tok "?")
  argname <- identP
  tok ":"
  return $ (isJust optional, argname)

The first real difference from the main loop is that curriedSigP requires the toplevel type to be a function. It parses one argsSpecP, an arrow, then one or more argsSpecPs. That's because the structural typing of the F# compiler can't guarantee the presence of data members? I guess? That's a good thing from an OO perspective at least. The second difference is that you can name the args, and named arguments are allowed to be optional. So, to continue with the toString example, you could name the argument:

^a when ^a : (toString : ?zelf : ^a -> string with get)

And I made zelf optional for kicks, even though that makes no sense here and probably wouldn't really compile. Commentary: coming from strict and minimal languages like Haskell and Scheme, I think that optional arguments are kind of over-complicated, especially in a language that already has ad-hoc polymorphism via overloading. I'd rather have a smaller language and a smarter type inferencer. Because, you know, Haskell is sort of OO, and I can count on one hand the times I've had to supply type annotations. In F#, which is OO plus so much more, I have to drop in type annotations at least once a file, more if I interact with .NET code a lot. OK, that leaves one last crazy constraint. Seriously, I'm not sure why you'd need a type this constrained. Here's an example:

^a when ^a : (something<'b,'c> : ^a * 'c -> 'b)

So, yeah, you can introduce new type variables. But that's not all! You can alsointroduce recursive constraints! Like so:

^a when ^a : (something<^b 
    when ^b : (ohNoNotAgain : ^b -> ^a)> : ^a -> ^b)

It's fully recursive, all right. You can structurally constrain the types that are bound in the scope of your structurally constrained methods. (Don't worry if you didn't understand that, you might not be human.)

typevarDefnsP = do
  tok "<"
  defns <- sepBy1 typevarP (tok ",")
  constraints <- optionMaybe typevarConstraintsP
  tok ">"
  case (defns,constraints) of
    (_,Nothing) -> return (defns, Nothing)
    ([defn],Just cs) -> return (defns,Just $ nestTypes defn cs)
    (defns,Just cs) ->
      return (defns,Just$nestTypes (Var (Choice (map devar defns))) cs)
typevarConstraintsP = do
  tok "when"
  sepBy1 constraintP (tok "and")
devar (Var var) = var
list x = [x]

The code is pretty straightforward, just brackets around some comma-separated type variables (typevarDefnsP), optionally followed by when and some and-separated constraints. It does some fancy processing when you do have constraints, but that's because I need a unique way to represent each one of these structures internally. It's a representation that's probably wrong: best not get too attached to them; they'll probably all have to change when I start actually USING them. By the way, if you're hoping to get in there on a beta of Fing and start with some CRAZY RUDE constraints on structurally constrained types, you may be disappointed by the first few versions. My priority is to be able to search for obvious things first, like int -> seq<'a> -> 'a, then for subtype relations so that you can also search for list<'a>, ResizeArray<'a>, etc. OK, one last thing: type var choices. These are, as far as I can tell, genuinely ambiguous. You can write something like

map<^a,^b> when (^a or ^b) : (toString : ^a -> string)

I think we can all agree that's pretty messed up. Is it 'a or 'b that has to have a toString member? Who knows? Put an orbetween 'em and call it done! There is one quirk, though: only structural constraints are allowed. So this is illegal (hilarious though it may be):

map<^a,^b> when (^a or ^b) : (^a :> ^b)

The code should be easy to read; it's a lot like the other parsers.

typevarChoiceP = do
  typars <- between (tok "(") (tok ")") (sepBy typevarP (tok "or"))
  let typars' = map devar typars
  tok ":"
  Constraint (Sig _ id member prop) _ <- liftM ($ Id "DUMMY") (traitP Anonymous)
  return $ Constraint (Sig (Choice typars') id member prop)

So that wraps up the Parsec version of the parser. The entire thing is available to download. The FParsec version is a lot longer, because I had to write the low-level backtracking parser tok by hand there. (I might be able to fix it now, but I'm also sick of parsing now.) I will post the FParsec version at the top of my next post, where I'll start to look at the data structure that the parser builds. That may be a little while, because my next step is to get a simple version running. This data structure is quite a bit more complicated than the toy version I got from kvb.

Tags: , , , ,

Language | Programming

Scaling simple file I/O

by sandersn 3. May 2010 10:39
I previously came to the conclusion that Haskell and Python are about the same in terms of proximity to the (non-existent) One True Programming Language. Where one expresses things better, the other suffers, and vice versa. Still, I find that, in terms of sheer brevity, Haskell scales a little better than Python. (about 15% better, in case you're wondering. I measured it last time.) This is true even where Python wins initially. Take this simple script for example:
# Python
print(withFileLines(scale, sys.argv[0]))
-- Haskell
getArgs >>= head & withFileLines scale >>= print

(assume appropriate definitions for the undefined functions withFileLines and scale*) Now we improve the formatting to print line-at-a-time.
for line in withFileLines(scale, sys.argv[0]):
    print(line)

getArgs >>= head & withFileLines scale >>= mapM_ putStrLn
Now we improve the script to combine more than one input file:
for line in scale(concat(withFileLines(cdr, arg) for arg in sys.argv)):
    print(line)

getArgs >>= mapM (withFileLines tail) >>= concat & scale & mapM_ putStrLn
(The first line of each file is the title and must be dropped, although id would work almost as well. Functional utilities for Python are below.*) Haskell requires a lot more up-front knowledge, but once you've learned all the combinators, they are more powerful, so they require less restructuring when expanding. The principal difference here is that the Haskell code stays almost as flat as it was when it started, the pipeline just gets longer. In Python the shape of the code jumps around because things don't combine quite as well. (Though it's really close.) In 2003, when I first started using it, I tried to explain to a classmate why Python is better than Java. The main reason is that it gives the programmer a larger vocabulary, a larger set of abstractions to use in building programs. A larger vocabulary makes for shorter programs. It turns out I have a nearly unlimited hunger for new vocabulary, and a lot of the people coming up with new words are working in Haskell. PS: Yes, these stupid posts will continue until I finish my dissertation. I don't have time to do a serious write-up of parsing until then. Fortunately it's only two more weeks. *Here are some example implementations:
def withFileLines(f, file):
  return f(iter(open(file)))
def scale(lines):
  l = list(lines)
  return (str(float(line) / len(l)) for line in l)

def cdr(it):
  it.next()
  return it
def concat(it):
  return (y for x in it for y in x)
def id(x):
  return x
and Haskell:
withFileLines f = readFile >=> lines & f & return
scale lines = lines |> map ((read :: Double) & (/ len) & printf "%.5f")
  where len = fromIntegral (length lines)

Tags: , , ,

Language | Programming

Developing Fing, part 2: Parsec parser 2

by sandersn 11. April 2010 08:35
Last time I explained that I got stuck writing an F# type parser for Fing in FParsec. So I switched to Parsec. This time, I'll show the entire Parsec parser instead of just the tokeniser. Here's the tokeniser again (along with the imports I need for the rest of the code).
import Text.ParserCombinators.Parsec
import Control.Monad (sequence, msum, liftM, liftM2, when)
import Data.Either (rights)
import Data.List (intercalate)
import Data.Maybe (isJust)
tokeniser = choice [try (many1 (digit <|> letter <|> oneOf ".`_"))
                   , string "->"
                   , try (string ":>")
                   , (return . list =<< oneOf "*<>()[],'^#:?")
                   ]
tok t = try $ do
  spaces
  x <- tokeniser
  if x==t then return x else fail "Token not recognised"
toks = mapM tok
notTok ts = try $ do
  spaces
  x <- tokeniser
  if x `elem` ts then fail "Incorrect token" else return x
Read my previous post if you want an explanation of the tokeniser. Before I start, let me warn you that I'm going to explain the parser twice. This post covers the parser's input—the strings it accepts. The next post covers the parser's output—the structures it generates. So ignore anything that says 'return' on it (or liftM). I'll explain it later. The reason I'm breaking it up like this is that I'm sure my parser works, but I'm not sure the type representation I'm building up is correct; I really need to try to fit some real .NET types into it to find out. And I can't do that until the FParsec version is done. So...take anything with 'return' on it with a grain of salt, even if you can't ignore it.
identP = notTok ["lazy", "with", "when", "if", "else", "do", "new", "get", "set"
                , "<", ">", "[", ",", "]", "(", ")", "#", ":", "?"
                , "->", "*", "'", "_", "^", ":>"]
First up is the basic identifier. I don't have all the F# keywords here yet, and I think I'm also missing some punctuation, but you get the idea. An identifier is a token that is not one of these things. Here's an example of some things that can be parsed so far:
  • int
  • list
  • Microsoft.FSharp.Core.list`1
arrowP = liftM (toplevel Arrow) $ sepBy1 tupleP (tok "->")
tupleP = liftM (toplevel Tuple) $ sepBy1 typeP (tok "*")
typeP = do
  t <- termP
  option t (return . nestTypes t . reverse =<< many typeSuffixP)
termP = typevarP <|> between (tok "(") (tok ")") arrowP <|> longidentP
Identifiers are the heart of the parser, but the "main loop" is actually these four parsers: arrows, tuples, types and terms. For arrowP and tupleP, ignore what they return for now and focus on the part after the dollar sign. It's pretty simple: an arrow is a bunch of tuples separated by arrows and a tuple is a bunch of types separated by stars. A type is a term followed by an optional type suffix. That's the Caml-style int list, plus some other things. There can be many of these, for example: int list list option ref. A term is a type variable, a parenthesised arrow or a "long" identifier. That means an identifier, optionally followed by a .NET-style generic type. In other words, instead of int list (Caml-style), you write list<int> (.NET-style). I personally prefer the .NET style because I think it reads more naturally and explicitly, especially when you try to mentally parse int list list option ref versus ref<option<list<list<int>>>> when read aloud. (The first one requires you to build the mental structure bottom-up instead of top-down.)
longidentP = do
  id <- identP
  option (Id id) (postfixNameP (Id id))
postfixNameP id =
  liftM (Generic id) $ between (tok "<") (tok ">") (sepBy arrowP (tok ","))
typevarP = choice [anon, normal, structural]
  where anon = tok "_" >> return (Var Anonymous)
        normal = tok "'" >> liftM (Var . Normal) identP
        structural = tok "^" >> liftM (Var . Structural) identP
Type variables are pretty simple: either the anonymous _, the normal 'a or the structural ^a. I'm not up on the semantics of structural variables, so even though they're the most mysterious of the three, I can't explain them right now. Sorry. So, given those seven parsers, here are some things that they can parse:
  • int->int
  • int->int->int
  • int*int
  • int->int*int->int
  • int*int->int*int
  • (((int)))
  • list
  • list<>
  • list<int>
  • list<int->int>
  • list<int*int>
  • list<int*(int->int*int)>
  • list<list<int>>
  • int list list
  • 'a->'a
  • 'a*'a
  • _*^a
  • list<int,'a,_>
That's enough to parse most types, so if you stop reading here, you won't miss much. Mainly what's left is more .NET compatibility (arrays) and advanced F# features (constraints). Constraints are complicated to parse, so they take about 3 times more code than the rest of the parser put together. In fact, because I'm short on time, I'm going to break the post in two so that at least I'll have something to post today. Here's the starting point for next time: the type suffix parser which I left undefined above.
typeSuffixP = choice [arrayP <?> "an array type"
                      , tok "lazy" >> return (Generic (Id "lazy") . list)
                      , tok "when" >> constraintP
                      , (do id <- identP; return (Generic (Id id) . list))
                                  <?> "a stupid suffix type"]

Tags: , , , ,

Language | Programming

Programming language makeup of my dialectology dissertation

by sandersn 1. April 2010 10:37

This week, Josh was surprised to learn that my entire dissertation isn't in Haskell. Actually, the core is written in C++, but there are a couple of reasons for that. First, I wrote most of the code before I started using Haskell. Second, the original version of the program had to prioritise memory efficiency over speed because of the experiment I was running at the time. This meant that the Python version was useless, and the Caml version was unbearably slow and often failed on our 2 GB corpus server. The garbage collection overhead, either in space (for maximum efficiency) or in time (for minimum space) was greater than the stupid inefficiencies in my novice C++ code.

Even now, a naive Haskell version is about twice as slow as the (still fairly naive) C++ version, and it's not worth the effort to switch. Still, only the core is written in C++. The wrapper that runs the whole thing is Python, the myriad of data transformation programs are Haskell, and the statistics checking at the end is R. Also, there's a lot of Java which I didn't write in MaltParser and the Berkeley parser. Lots.

Here are the numbers:

Python

     255 build.py
     152 consts.py
      51 norte.py
      44 test.py
     502 total

C++

      38 icedist.cpp
      21 icefeat.cpp
      18 icesig.cpp
     258 icecore.h
     125 iceextra.h
     460 total

Haskell

     10 CalculateGeoDistance.hs
       8 CombineFeatures.hs
     113 Consensus.hs
     174 Consts.hs
      65 ConvertBerkeleyToFeature.hs
       6 ConvertDistToL04.hs
      20 ConvertMaltToFeature.hs
      10 ConvertPTBToTags.hs
      14 ConvertTagsToConll.hs
      10 ConvertTagsToFeature.hs
      40 ConvertTalbankenToPTB.hs
      12 ConvertTalbankenToTags.hs
       6 ConvertTntToTxt.hs
      55 Distance.hs
      31 FormatDistance.hs
      12 RankFeatures.hs
       7 Sexp.hs
      47 Swedia.hs
      11 Talbanken.hs
      65 TestConvertTags.hs
     174 TestConvertTalbanken.hs
      49 TestDistance.hs
      55 TestPath.hs
     170 TestSwedia.hs
      16 TriangleInequality.hs
      55 Util.hs
    1235 total

R*

      63 montecarlo Mantel example.R
      57 genAnalysis.R
     120 total

So, yes, the majority of the code for my dissertation is actually Haskell. A lot of that is in an area where Haskell is not thought of as strong: text-file format munging. That's what all those Convert files are. But really the main difficulty is learning to keep the monadic and pure variants of functions straight. The usage of map vs mapM is not immediately obvious, and the type errors you get are confusing. The good news is that once learned, the patterns are pretty obvious. But that's the story of Haskell for everything.

(Also you need a good way to format strings. I've got by with unwords and intercalate so far.)

Why not Python instead of Haskell?

So why not Python for text-file munging? Three reasons: the weakest is that functional code in Python is ugly. The second is that functional code is either inefficient (Python 2.x) or unreliable (Python 3.x) compared to Haskell. Haskell is fully lazy and has functional data structures. That means that functional code behaves the same way as imperative code---processing is line-by-line and generated incrementally, which makes debugging a lot easier. This is not true in Python 2.x and requires manual analysis of laziness in Python 3.x (since iterators must be explicitly cached to lists where necessary).

The third reason is that functional code is unmaintainable by default in Python. The text-munging I do is not your run-of-the-mill Perl sweet-spot of per-line regex processing; much of it involves building a tree representation and then manipulating it (there is a lot of parsing in my dissertation after all). So, intermingled with the text-munging is some fairly complex code. Python will punish you for writing complex functional code without documentation. In contrast, Haskell either generates the documentation for you (eg :browse Module) or requires declarations up front (eg data Tree a = Leaf | Node a [Tree a]).

I think what surprised Josh is that I talk a lot more about Haskell than about Python. But I use the right tool for the job**, and Python's place is making imperative code simple, so that's where I use it. (I wrote an awesome parallel execution function that's compatible with Python 2.6 and 3.0, for example). And I use C++ when I need maximum memory efficiency and speed in the same program.

*I didn't write the first R file. Stephanie Dickinson of the IU Statistics Consulting Center department did. If you have a single statistics question, this is an invaluable place. Ask them what statistics method you need, and they will not only tell you, but even write code for you to run it.

**Of course, my understanding of these tools might be flawed. In particular, my view of Python's applicability may be artificially restricted. And my view of C++ may be inflated...

Tags: , , , , , , ,

Language | Programming

Bing Maps API

by sandersn 7. March 2010 13:40

Recently my advisor suggested that I compare my results to travel distance between towns instead of straight-line geographical distance. That seems like a good idea—if there's a huge lake and a mountain in between two towns, the languages are likely to be very different. The problem is taking the time to gather all the distances; it's not a big deal to identify 30 or so sites on Google Maps, but when you need to find the 435 distances between all 30 sites, you probably want a script to do it.

The obvious choice for this is an API to the mapping service you were going to use in the first place. For me, that led to a couple of choices:

Google Maps vs Bing Maps

Google Maps requires me to set up a web page (publicly accessible) and make a visible embedded map, then manipulate it with Javascript. This is really intended for mashups I think. I was disappointed because I prefer to let my knowledge of HTML and Javascript rot and moulder. (I found some discussion that indicates there is no SOAP API, although it's still possible that I just missed it.)

In contrast, the 3rd Google result for "google maps soap api" is for Bing Maps, admittedly under the previous branding "Virtual Earth". The SOAP API that allows you to pretend (given a Proper Library) that you are calling a function that magically gives distances between two points. The Proper Library manages all the XML conversion/parsing and sending over the Internet. So Bing Maps won this one, because I just want to import a library and call a function from a script instead of embedding the whole thing in a web page. This led to a second choice:

Programming Language

I momentarily considered Haskell, because it does have a lot of libraries listed in Hackage. But it's pretty weak on the XML front and its lone SOAP library is sort of just a text wrapper around XML that you have to build yourself. No XML conversion or parsing, just the sending-over-Internet part.

From there I went to Python. I quickly found that Dive Into Python has a chapter on SOAP, but it's 6 or 7 years out of date, and SOAPpy, the library it uses, to has been replaced by ZSI, "Zolera Soap Infrastructure". I distrust these companies that start with Z (*cof* Zope *cof* Zend) and the source was package in a .egg (or maybe a .muffin? no it was definitely .egg) which I don't even know how to install and it was all just Fear-Uncertainty-Doubt.

The real reason that I wasn't enthusiastic about installing a Python library is that (1) I probably will never use SOAP again and (2) in the Bing Maps documentation, it mentions that Visual Studio has SOAP libraries installed with it by default. And all the examples are in C#, so it becomes a matter of cut-and-paste. And Visual Studio has a wizard for generating the proxy objects you need in a static language. (Imagine that, Visual Studio having a wizard!) Well, the wizard was the last straw—I mean who can resist all that code the wizards generate for you? Plus it was getting late and I just wanted to get something working. I even resisted the temptation to write the code in F#, because F# is extremely short on wizards right now.

(The code would have been a lot better in F#, though. It has tuples and pattern matching and it's hard to write a program these days that doesn't use one or both of those features. You'll see when I show you the code.)

So let's see. Here's the gist of my workflow. I went to Bing Maps and identified all 30 of the interview sites. Then I dumped to some format called KML. Now, Bing Maps doesn't make it obvious how to give somebody a link to these sites, but the option to dump to KML is front and centre. I suspect that it started life as a desktop product. Anyway, it's annoying for directions but it's perfect for my purposes. (Hint: click the icon of a letter in the bottom-left, even if you just want a link to the map.)

I did grep coordinates swedia.kml >pbcopy and pasted that in to an emacs buffer. From there I recorded some macros to produce a Python literal of a list of pairs. With the list loaded in the REPL, I added on the location names to each location, something like this:

[(site,lat,long) for (site,(lat,long)) in zip(consts.swediaSites, coords)]

At this point I had a list in Python whose elements looked like this:

("Ankarsrum", 16.33, 57.70)

Then I pasted the list back into the emacs buffer and added some noise around each line to make it valid C#:

sites.Add(new Loc("Ankarsrum", 16.33, 57.70));

I should have just written a string format comprehension in Python but I was really tired by this point.

This, of course assumes a struct named Loc and a List<Loc> called sites. But that's easy enough to do in C#. Pardon my outmoded accent—the machines I coded for at Bing had not yet updated to .NET 3.5 so I didn't really learn a lot of 3.5 conveniences fluently.

    struct Loc
    {
        private string _site;
        private double _lat;
        private double _long_;
        public String Site { get { return _site; } }
        public double Lat { get { return _lat; } }
        public double Long { get { return _long_; } }
        public Loc(String site, double lat, double long_)
        {
            _site = site;
            _lat = lat;
            _long_ = long_;
        }
    }
    class Program
    {
        private static List<Loc> sites = new List<Loc>(30);
        private static void init()
        {
            sites.Add(new Loc("Ankarsrum", 16.331280825605035, 57.700431990646344));
            // ... 29 more of these ...
        }

OK, so that's the boring setup. Then I needed a pairwise iteration over a list. This is a lot easier with linked lists and type inference—managing indices is fiddly and the type annotations get unwieldy so I said Forget It and wrote some C-like code instead. I didn't want to fiddle with it to get it right. (But it is pretty ugly).

        private static IEnumerable<int[]> pairwise(int n)
        {
            for (int i = 0; i < n; i++)
            {
                for (int j = i + 1; j < n; j++)
                {
                    yield return new int[] { i, j };
                }
            }
        }

Finally, the code in Main is not all that long as badly written C# goes:

        static void Main(string[] args)
        {
            init();
            var req = new BingMaps.RouteRequest();
            req.Credentials = new BingMaps.Credentials();
            req.Credentials.ApplicationId = 
              "Very-Long-String-You-Get-By-Registering-With-Bing";
            BingMaps.Waypoint[] route = new BingMaps.Waypoint[2];
            req.Waypoints = route;
            var client = new BingMaps.RouteServiceClient(
              "BasicHttpBinding_IRouteService");

            foreach (int[] indices in pairwise(30))
            {
                var from = sites[indices[0]];
                var to = sites[indices[1]];
                route[0] = new BingMaps.Waypoint()
                { 
                    Description = from.Site,
                    Location = new BingMaps.Location() { 
                      Latitude = from.Long, Longitude = from.Lat 
                    }
                };
                route[1] = new BingMaps.Waypoint()
                {
                    Description = to.Site,
                    Location = new BingMaps.Location() { 
                      Latitude = to.Long, Longitude = to.Lat 
                    }
                };
                var d = client.CalculateRoute(req).Result.Summary.Distance;
                Console.WriteLine(string.Format("(\\"{0}\\", \\"{1}\\"): {2},", 
                  from.Site, to.Site, d));
            }
        }

I added all the type inference it could handle and shortened some of the ridiculous example names. The API is still pretty verbose but I think that's because it's SOAP. Let's see, you have to

  1. Create a request.
  2. Create new credentials and give it your personal key.
  3. Create a new route and add the route to the request.
  4. Create a new Route Service client. (There are several other services, geocoding for example.)
  5. Inside the loop, set each point in the route.
  6. Also inside the loop, call client.CalculateRoute, passing in the request. This is the actual SOAP call that sends XML across the wire and back.
  7. Print out the distance of the route summary.

If you specify two points that do not have a connecting route, then you get a hilarious null pointer exception somewhere along the chain CalculateRoute().Result.Summary.Distance. It seems to me that this should be an exception, but no, just null.

Really, though, the only reason that I got null pointer exceptions is that KML stores its points in Longitude, Latitude order, while the rest of the world uses Latitude, Longitude order. So instead of two Swedish villages, I requested the distance between two points in the Arab Sea. The alert reader will notice the switcheroo I have to pull to make this work. I should have changed the Loc constructor but I didn't think of that until just now.

The code takes about 100 seconds to get all 435 distances. It's faster than I expected, so I guess most of the lag in the web interface is the browser drawing the route.

In conclusion, this probably won't be useful to anybody else, but at least maybe you'll think it's cool to see how you can easily get driving distances between all unique pairs of sites.

Tags: , , , , , , ,

Language | Programming

Reading and writing consensus trees in Haskell

by sandersn 3. March 2010 14:43

I use the consensus tree code I wrote about the last two times in my dissertation to find agreement between clusters of Swedish dialect interviews. The input is cluster trees from R (the open source S) and the output is formatted for use by the qtree Latex package. I also need to filter out cluster trees that contain non-significant distances—all the input distances should be statistically significant.

main = do
  sigs <- withFileLines findSigs "sig-10-1000-interview.csv"
  interactFiles (withFileLines (makeConsensusTree sigs & list)) qtree

makeConsensusTree uses the list of significant files and qtree converts a Tree to a String, suitable for output to stdout. But first, findSigs:

findSigs =
  tail
  & map (replace ',' ' ' & words & tail
         & zip ["path", "trigram", "dep", "psg", "grand", "unigram", "all"]
         & filter (snd & (=="0"))
         & map fst)
  & zip ["r", "r_sq", "kl", "js", "cos"]
  & concatMap (uncurry (zip . repeat))
  & Set.fromList

(Reminder: (&) == flip (.)) findSigs is a hoky CSV parser. It drops the column heads (tail), then chops each line into fields (map (replace ',' ' ') & words & tail), assuming that they don't contain spaces so that it can use words to do the chopping.

Then it zips the hard-coded column heads back on, and throws away fields that aren't (=="0")—that is, feature sets that have more than 0 non-significant distances.

For each line, the significant feature sets are zipped with their with distance measure, and a hilarious and incomprehensible concatMap call turns them into a list of pairs like [("path","r"), ("trigram", "r"), ("path", "js") ...].

I know this is statically typed, but I don't think anybody would hesitate to call this ad-hoc mess a script. It was written in about 5 minutes for a specific file and will break as soon as that file changes.

Moving on.

makeConsensusTree sigs =
  filter (isPrefixOf "Cluster: ")
  & map rReader
  & filter (fst & (`Set.member` sigs))
  & map (snd & buildRTree)
  & consensus

makeConsensusTree is the top-level function for reading in the different dendrograms and writing out a consensus tree. Its first task is to read Yet Another Tree Format, this time R's binary dendrogram format. Each tree is on a line that starts with "Cluster: " (hence, filter (isPrefixOf "Cluster: "). Then the measure and feature set follows, and finally a list of numbers. If you divide those numbers in half (which rReader does), you get two columns.

rReader line = 
  ((measure,features),
   splitAt (floor (fromIntegral (length ns) / 2)) ns |> uncurry zip)
  where (_:measure:features:ss) = words line
        ns = map read ss

For example, if you have the line

Cluster: r all -2 1 -4 2 -4 -9 -8 3

The numbers are interpreted as the following two columns:

1.-2-3
2.1-9
3.-4-8
4.23

buildRTree further interprets these numbers as a binary tree: negative numbers are leaf nodes, (negative) indices into the interview sites or whatever the things are that you're clustering. Positive numbers are internal nodes, indices that refer to a previously created node. The index it refers to is the step at which the node was created. So, in the above example, internal node 1 has two leaf children: values 2 and 3. Internal node 2 has two children. One is node 1 and the other is a leaf (with value 9). The last node, 4, is the root. For the tiny example above, the Lisp representation of the tree is:

'(((2 3)
   9)
  (4 8))
label (Leaf a) = a
label (Node a _) = a
buildRTree tuples = last nodes
  where nodes = map buildNode tuples
        buildNode (i,j) = Node ('$':label inode++label jnode) [inode,jnode]
          where (inode,jnode) = (buildChild i, buildChild j)
        buildChild i | i < 0 = Leaf $ swediaSites !! (-i-1)
                     | otherwise = nodes !! (i-1)

The way that buildRTree implements this is kind of clever; it has to refer to previous parts of the node list as the list is being built. In Haskell, laziness is the most natural way to do this. So when buildChild implements the negative/positive → leaf/non-terminal decision, it refers to nodes for the non-terminal case. But nodes is just map buildNode and buildNode first calls buildChild and oh-no-circular-definition!

But it turns out all right, because, assuming that R has properly dumped the node indices, buildChild's i will always be less than the length of nodes at the time that it's required.

(Of course, you should always avoid (!!) and this particular use is egregiously O(n2) blah blah, but remember once again that this is a script and that I happen to know that I'll never run this on trees with more than 100 terminals. And that I'm willing to wait for up to 5 minutes.)

qtree (Leaf a) = Set.findMax a
qtree (Node a []) = Set.findMax a
qtree (Node a kids) = "[. {" ++ left ++ "} " ++ right ++ " ]"
  where (leaves,nodes) = partition leaf kids
        left = intercalate "\\\\\\\\" (map qtree leaves)
        right = unwords (map qtree nodes)

Finally, qtree converts a tree into Yet Another Tree Format, this time the qtree Latex package. I can then paste the string into my dissertation. It's pretty straightforward except for the abuse of Set.findMax—it's not finding anything since there's only one item left in the set anyway. I just don't have a way to destructure sets.

Well, now you've seen how messy script code can be in Haskell. I think it's a pretty good argument that static typing doesn't buy you much up front—I can be as messy and ugly as I want in either Python or Haskell. All you can hope for is that the type system gets out of your way while you're doing it. The win comes in six months when I decide I need to make the code more robust or (more likely) add a tiny feature. Haskell is more discoverable out of the box—no type documentation needed: the compiler figures it out for you—and more reassuring to make changes—less need for tests: many breaking changes will cause the compiler to complain.

In case you notice a tacit unwillingness to write documentation and tests for my script code, you got me. Yup. I wrote tests for some of it, but after the code was done. And documentation? I'm writing a dissertation! It's pages and pages of high-level documentation! (Just don't expect any commentary on actual code.)

Tags: , , , ,

Language | Programming