Modernizing dev ex for Standard ML

by Andrew Chang-DeWitt, Tue Jul 01 2025

intro

This summer I had the opportunity to work on implementing some new features in the compiler for a research language, PriML (more on that later). The compiler is implemented in Standard ML, a language I'd never worked with before. Once I got familiar with the basics, I started running into slowdowns with building & getting tight feedback loops while working on the codebase. In particular, it seemed difficult to have make rebuild the project on changes to source files without first cleaning everything already built. This also made attempts to automatically rebuild &/or run tests/code examples on file changes was either very slow, or just broken.

As I got into creating a dataflow analysis framework for the PriML compiler, one thing I needed was a decent directed graph library. While there was an undirected graph library in the vendored grab bag library we had in our code base, we had no existing directed graph implementation. To solve this, I began working on my own graph implementation & chose to externalize it as a separate library for inclusion to PriML as a dependency later. This presented an opportunity to see if I could figure out how to streamline the SML build process we were using with the goal of improving build time, letting make be able to rebuild on changes in required source files, & improve the process of including external dependencies without having to vendor them.

For my own notes & in the interest of documenting this in case someone else finds it useful, I written the following about my process of improving the dev ex for coding in SML with mlton & make.

creating a simple feedback loop

Many software projects use make and an appropriate compiler to simplify their build commands. Standard ML is no exception. A typical project structure might look something like this:

./                   // prj-root
|- Makefile
|- hello.sml         // single source file
|- hello             // compiled exe bin

and orchestrate its build with a Makefile like the following:

MLTON       := mlton

all: hello

hello: hello.sml
        $(MLTON) hello.sml

With this in place, running make from the root directory will recompile the binary if ./hello.sml has changed since the last build.

I find I'm most productive when I rebuild/test in a tight feedback loop. To achieve this, I usually automate rebuilding/testing using entr with a command like ls | entr make, which will rerun the command make any time a file listed by ls changes. Now, assuming the builds/tests are fast, results are recreated right away every

However, it's not all sunshine & roses yet.

getting more complicated: building from multiple source files

When building an SML source with mlton, things get a little more complicated once the source tree is more than one file. The typical way I saw to handle discovering sources & making their definitions available to other sources is by using an ml basis file, which acts as a sort of dependency list. When compiling with an ml basis file, mlton simply needs to be given the path to the basis file, then it handles the rest.

For a (still relatively trivial) example of multiple sources referenced by an ml basis file like this:

./                   // prj-root
|- Makefile
|- hello.mlb         // ml basis file
|- hello             // compiled exe bin
|- source1.ml        // sources...
|- source2.ml
|  ...
|- sourcen.ml

the Makefile might be updated to look like this:

MLTON       := mlton

all: hello

hello: hello.mlb
        $(MLTON) hello.mlb

"What's so complicated about that?" you say? Well, if you're following along & told entr to rerun the hello target above with ls | entr make, you might notice that any time you change a source file, make reports that there's nothing to be done. You might also notice that your changes aren't incorporated if you rerun the executable.

This happens because make only reruns a target if it detects a change in one of that target's listed dependencies. In this case, that means that, in this example, the hello target will only get rerun if the basis file ./hello.mlb changes, not if any of the sources listed within it change.

Lucklily, make can dynamically include another makefile, which we can generate on the fly from an basis file, using sed:

mlton -stop f hello.mlb | \
  sed -e "1ihello hello.mlb.d:\\\\" -e "s|.*|  & \\\\|" -e "\$$s| \\\\||"  > \
  hello.mlb.d; \
  [ -s hello.mlb.d ]

This might look scary, but taken line-by-line, it does the following things:

  1. has mlton read the basis file given, but stop once it's created a list of files it needs in order to compile the requested program
  2. pipes that list to sed, which then makes a new make target named "hello hello.mlb.d" that depends on that list of files from mlton
  3. saves the stream from sed to a new file `hello.mlb.d (or replacing the existing file)
  4. checks that previous steps were successful by asserting the file hello.mlb.d exists and is non-empty

With that dependency target file created, it just needs included into the Makefile with a simple include hello.mlb.d. Generalizing these two steps gives an updated Makefile that looks like:

MLTON := mlton

EXE   := ./hello
MLB   := $(EXE).mlb
DEP   := $(MLB).d

all: hello

%.mlb.d: %.mlb
        $(SHELL) -ec '$(MLTON) -stop f $< \
          | sed -e "1i$(<:.mlb=) $@:\\\\" -e "s|.*|  & \\\\|" -e "\$$s| \\\\||" \
          > $@; \
          [ -s $@ ]'

ifneq ($(findstring hello,$(MAKECMDGOALS)),)
        include $(DEP)
endif

hello: $(MLB)
        $(MLTON) $<

Now running ls | entr -s 'make && ./hello' should build the program, execute the built binary, then wait to rebuild (if necessary) & re-execute when any source file is changed.

Unfortunately, due to its focus on being a "whole-program compiler", mlton doesn't support incremental compiling (where only the changed parts are recompiled when updating), so this is likely about the best that can be done easily to tighten the feedback loop.

That doesn't mean there isn't more that can be done to improve the developer experience when building w/ SML.

loading dependencies

Generally, I prefer modern languages to older ones like SML for one simple reason above just about any other: package/dependency management. While tools like NPM or PIP are far from perfect, they still make using third-party libraries in a project much, much easier than it is without. With that said, while working on sml-graph, I managed to come up with a system using make & git to help simply the process quite a bit.

A common solution to dependency management I saw in most SML codebases I interacted with while working on PriML was to simply "vendor" the library needed (or sometimes just some files from it) & ship it with their own source code. This can make pulling updates to third party code more difficult, increases the size of the source code base, & can make managing dependencies brittle. All of these problems can cause maintainers to avoid changing/updating dependencies out of fear of breaking

To build sml-graph, I wanted to use SMLUnit & SMLFormat to test & auto-format my code. Luckily both libraries are hosted on GitHub, so downloading them is fairly trivial with git. Each also includes a fairly well-done build system that even provides a method for "installing" the libraries (& binary in the case of SMLFormat) in the user's directory of choice, typically recommended to be some cache directory for mlton in the user's home directory.

To improve upon this & keep the dependencies local to this project, I followed a process that roughly does the following:

  1. download sources w/ git (or other tools if needed) into a local dependency cache folder
  2. build dependencies from source & save output to a local lib folder
  3. refer to those dependencies in appropriate basis files using mlb-path-var
  4. pass path to lib folder to mlton at compile time using the -mlb-path-var compile time option

After doing the process a few times when needing to rebuild things from scratch, I ended up automating it in the project's Makefile.

step 1: download sources

In the case of sml-graph, the only dependencies were hosted on GitHub as repos in the smlsharp organization, so generalizing the download process was fairly easy:

CACHE_DIR       := $(ROOT)/.cache
SMLUNIT_CACHE   := $(CACHE_DIR)/SMLUnit
SMLFORMAT_CACHE := $(CACHE_DIR)/SMLFormat
CACHE           := $(SMLUNIT_CACHE) $(SMLFORMAT_CACHE)

# ...

# download or update dependency source using git
# if repo already cached, just pull latest commits
# otherwise clone the repo from github
$(CACHE): $(CACHE_DIR)/%: | $(CACHE_DIR)
        if [ -d "$@" ]; then \
          cd $@ && git plo; \
        else \
          git clone [email protected]:smlsharp/$(@F) $@; \
        fi

# ensure cache directory exists to save dep source repos
$(CACHE_DIR):
        mkdir -p $(CACHE_DIR)

step 2: build sources

Once cloned/updated, the sources may need built/rebuilt. This step is unfortunately required to be tailored to each dependency, as every SML project has their own build process that has no strictly followed standard. For SMLUnit & SMLFormat, they both use make & helpfully include install targets which place necessary library & binary files at the directory given. I put that together into a couple of make targets for sml-graph, resembling something like this:

# build & install SMLUnit dependency
$(SMLUNIT_LIB): $(SMLUNIT_CACHE) | $(DEPS_DIR)
        if [ ! -d "$(SMLUNIT_CACHE)/bin" ]; then \
          mkdir $(SMLUNIT_CACHE)/bin; \
        fi
        cd $(SMLUNIT_CACHE) \
          && $(MAKE) \
            -f $(SMLUNIT_CACHE)/Makefile.mlton \
            PREFIX=$(DEPS_DIR) \
            install-nodoc

# build & install SMLFormat dependency
$(SMLFORMAT_EXE): $(SMLFORMAT_CACHE) | $(DEPS_DIR)
        cd $(SMLFORMAT_CACHE) \
          && $(MAKE) \
            -f $(SMLFORMAT_CACHE)/Makefile.mlton \
            PREFIX=$(DEPS_DIR) \
            install-nodoc

$(DEPS_DIR):
        mkdir -p $(DEPS_DIR)

steps 3 & 4: add dependencies in basis files

To enable testing with SMLUnit, the installed library's basis file needs to be included in the test harness' basis file. This can be readily accomplished by adding the line $(SMLUNIT_LIB)/src/sources.mlb & telling mlton where to find $(SMLUNIT_LIB) using the -mlb-path-var option when compiling the test harness. At the same time, the test harness binary needs to depend on the SMLUnit dependency having been properly installed & made available. In the end, I ended up with a test binary target that looked like this:

TST_DIR := $(ROOT)/test
TST_MLB := $(TST_DIR)/sources.mlb
TST_TGT := $(TGT_PRE)/test
TST_EXE := $(TST_TGT)/test_hello
TST_DEP := $(TST_MLB:.mlb=.mlb.d)

# add path vars to compile time flags
$(TST_EXE): MLTON_FLAGS += -mlb-path-var "SMLUNIT_LIB $(SMLUNIT_LIB_DIR)" -mlb-path-var "SML_DX $(SRC_DIR)"
# test binary target depends on deps targets & generated .mlb.d
$(TST_EXE): $(DEPS) $(TST_DEP) | $(TST_TGT)
	$(MLTON) $(MLTON_FLAGS) -output $(TST_EXE) $(TST_MLB)

which I invoked with a phony test target that depended on the test harness binary & simply executed it:

test: $(TST_EXE)
	$(TST_EXE)

Finally, all of that becomes wrapped with entr for quick feedback loops that run unit tests with a simple one-liner shell command:

ag -l -g . | entr make test

footnotes