Mostly Typed

For Hackers and Other Heretics

Making GHC and cabal sandbox play nice

So you’re building something in Haskell. You’re most likely using GHC as your compiler, and almost certainly using cabal to manage package dependencies. If you’re up on all the hippest new features of cabal, you’re probably using sandboxes to make sure your project compiles reliably and can, e.g., depend on older or bleeding-edge versions of packages without affecting your global environment. Great!

UPDATE: cabal exec is now integrated natively into cabal, so this guide is outdated

However, if your project in anyway invokes GHC (e.g. using the hint package or invoking runghc to run some user generated script), all of a sudden you’ll find yourself outside of the sandbox.

This is actually a familiar problem from other languages. For example, the Ruby community uses bundler to achieve similar goals as cabal. However, because Ruby is interpreted, not compiled, bundler inherently needed bundle exec. Similarly, Java’s maven works because you must have the classpath setup properly to run a Java program in the first place. Since Haskell is compiled, and in fact library are usually statically compiled into the binary, this is not the case. When a compiled Haskell program runs in a completely independent environment from the one in which the binary was originally compiled.

This is problematic for postgresql-orm’s1 migration scripts. Migrations are written as Haskell source files and run by the pg_migrate utility by invoking runghc. This works great if postgresql-orm is installed in the global or user package databases. But that would cause conflicts if, say, you’re developing multiple apps on the same machine, each relying on a different version of postgresql-orm.

The (Hacky) Solution

The solution I’ve been using is from my advisor. A shell function that adds an exec subcommand to cabal. When you run cabal exec [some_command], the shell function looks for a cabal sandbox in the local directory and uses it to set GHC_PACKAGE_PATH and PATH appropriately:

cabal() {
  if [[ $1 == "exec" ]]
  then
    shift
    local dir=$PWD conf db
    while :; do
        conf=$dir/cabal.sandbox.config
        [[ ! -f $conf ]] || break
        if [[ -z $dir ]]; then
            echo "Cannot find cabal.sandbox.config" >&2
            return 1
        fi
        dir=${dir%/*}
    done

    db=$(sed -ne '/^package-db: */{s///p;q}' "$conf")
    if [[ ! -d $db ]]; then
        echo "$db: does not exist"
        return 1
    fi

    pkg_path=$(command cabal sandbox hc-pkg list 2> /dev/null | grep \: | tac | sed 's/://' | paste -d: - -)
    if [[ $# == 0 ]]; then
        echo GHC_PACKAGE_PATH=${pkg_path}:
    else
        GHC_PACKAGE_PATH=${pkg_path} PATH=$(dirname $db)/bin:$PATH "$@"
    fi
  else
    command cabal $@
  fi
}

To figure out the right value for GHC_PACKAGE_PATH, the function inspects the result of cabal sandbox hc-pkg list, which is the cabal-sandbox equivalent of ghc-pkg. This is important! You might think to simply set GHC_PACKAGE_PATH to something like “.cabal-sandbox/x86_64-linux-ghc-7.6.3-packages.conf.d:”, which causes GHC to look in that package database and fall-back on the default package databases (namely the user database, then the global database). However, because cabal-sandbox explicitly excludes the user database when compiling packages, this can cause compilation errors if you have different versions of the same package installed in the user and global databases.

This solution is problematic for a couple reasons:

  1. It’s not built in – that means yet another configuration to forget when, e.g. deploying code to a new server or setting up a new workstation.

  2. It won’t allow a program to invoke cabal itself, since cabal refuses to run if GHC_PACKAGE_PATH is set. This could be an issue for building any kind of build system or continuous integration service on top of cabal without giving up sandboxes.

The Less Hacky Solution

Cabal should natively expose an exec subcommand and have a reasonable way of dealing with GHC_ environment variables. We’ve been discussing this on GitHub in this issue. Chime in!

  1. postgresql-orm is an object-relational mapper for PostgreSQL