Modernizing dev ex for Standard ML
by Andrew Chang-DeWitt, Tue Jul 01 2025
intro
This summer I had the opportunity to work on implementing some new features in
the compiler for a research language,
PriML (more on that later). The
compiler is implemented in Standard ML, a language I'd never worked with
before. Once I got familiar with the basics, I started running into slowdowns
with building & getting tight feedback loops while working on the codebase. In
particular, it seemed difficult to have make rebuild the project on changes to
source files without first cleaning everything already built. This also made
attempts to automatically rebuild &/or run tests/code examples on file changes
was either very slow, or just broken.
As I got into creating a dataflow analysis framework for the PriML compiler,
one thing I needed was a decent directed graph library. While there was an
undirected graph library in the vendored grab bag library we had in our code
base, we had no existing directed graph implementation. To solve this, I began
working on my own graph implementation & chose to externalize it as a separate
library for inclusion
to PriML as a dependency later. This presented an opportunity to see if I could
figure out how to streamline the SML build process we were using with the goal
of improving build time, letting make be able to rebuild on changes in
required source files, & improve the process of including external dependencies
without having to vendor them.
For my own notes & in the interest of documenting this in case someone else
finds it useful, I written the following about my process of improving the dev
ex for coding in SML with mlton & make.
creating a simple feedback loop
Many software projects use make and an appropriate compiler to simplify their
build commands. Standard ML is no exception. A typical project structure might
look something like this:
./ // prj-root
|- Makefile
|- hello.sml // single source file
|- hello // compiled exe binand orchestrate its build with a Makefile like the following:
MLTON := mlton
all: hello
hello: hello.sml
$(MLTON) hello.smlWith this in place, running make from the root directory will recompile
the binary if ./hello.sml has changed since the last build.
I find I'm most productive when I rebuild/test in a tight feedback loop. To
achieve this, I usually automate rebuilding/testing using entr
with a command like ls | entr make, which will rerun the command make
any time a file listed by ls changes. Now, assuming the builds/tests are
fast, results are recreated right away every
However, it's not all sunshine & roses yet.
getting more complicated: building from multiple source files
When building an SML source with mlton, things get a little more complicated
once the source tree is more than one file. The typical way I saw to handle
discovering sources & making their definitions available to other sources is by
using an ml basis file, which acts as a sort of dependency list. When compiling
with an ml basis file, mlton simply needs to be given the path to the basis
file, then it handles the rest.
For a (still relatively trivial) example of multiple sources referenced by an ml basis file like this:
./ // prj-root
|- Makefile
|- hello.mlb // ml basis file
|- hello // compiled exe bin
|- source1.ml // sources...
|- source2.ml
| ...
|- sourcen.mlthe Makefile might be updated to look like this:
MLTON := mlton
all: hello
hello: hello.mlb
$(MLTON) hello.mlb"What's so complicated about that?" you say? Well, if you're following along &
told entr to rerun the hello target above with ls | entr make, you might
notice that any time you change a source file, make reports that there's
nothing to be done. You might also notice that your changes aren't incorporated
if you rerun the executable.
This happens because make only reruns a target if it detects a change in one
of that target's listed dependencies. In this case, that means that, in this
example, the hello target will only get rerun if the basis file ./hello.mlb
changes, not if any of the sources listed within it change.
Lucklily, make can dynamically include another makefile, which we can generate
on the fly from an basis file, using sed:
mlton -stop f hello.mlb | \
sed -e "1ihello hello.mlb.d:\\\\" -e "s|.*| & \\\\|" -e "\$$s| \\\\||" > \
hello.mlb.d; \
[ -s hello.mlb.d ]This might look scary, but taken line-by-line, it does the following things:
- has
mltonread the basis file given, but stop once it's created a list of files it needs in order to compile the requested program - pipes that list to
sed, which then makes a newmaketarget named "hello hello.mlb.d" that depends on that list of files frommlton - saves the stream from
sedto a new file `hello.mlb.d (or replacing the existing file) - checks that previous steps were successful by asserting the file
hello.mlb.dexists and is non-empty
With that dependency target file created, it just needs included into the
Makefile with a simple include hello.mlb.d. Generalizing these two steps
gives an updated Makefile that looks like:
MLTON := mlton
EXE := ./hello
MLB := $(EXE).mlb
DEP := $(MLB).d
all: hello
%.mlb.d: %.mlb
$(SHELL) -ec '$(MLTON) -stop f $< \
| sed -e "1i$(<:.mlb=) $@:\\\\" -e "s|.*| & \\\\|" -e "\$$s| \\\\||" \
> $@; \
[ -s $@ ]'
ifneq ($(findstring hello,$(MAKECMDGOALS)),)
include $(DEP)
endif
hello: $(MLB)
$(MLTON) $<Now running ls | entr -s 'make && ./hello' should build the program, execute
the built binary, then wait to rebuild (if necessary) & re-execute when any
source file is changed.
Unfortunately, due to its focus on being a "whole-program compiler", mlton
doesn't support incremental compiling (where only the changed parts are
recompiled when updating), so this is likely about the best that can be done
easily to tighten the feedback loop.
That doesn't mean there isn't more that can be done to improve the developer experience when building w/ SML.
loading dependencies
Generally, I prefer modern languages to older ones like SML for one simple
reason above just about any other: package/dependency management. While tools
like NPM or PIP are far from perfect, they still make using third-party
libraries in a project much, much easier than it is without. With that said,
while working on sml-graph, I managed to
come up with a system using make & git to help simply the process quite a
bit.
A common solution to dependency management I saw in most SML codebases I interacted with while working on PriML was to simply "vendor" the library needed (or sometimes just some files from it) & ship it with their own source code. This can make pulling updates to third party code more difficult, increases the size of the source code base, & can make managing dependencies brittle. All of these problems can cause maintainers to avoid changing/updating dependencies out of fear of breaking
To build sml-graph, I wanted to use SMLUnit & SMLFormat to test &
auto-format my code. Luckily both libraries are hosted on GitHub, so
downloading them is fairly trivial with git. Each also includes a fairly
well-done build system that even provides a method for "installing" the
libraries (& binary in the case of SMLFormat) in the user's directory of
choice, typically recommended to be some cache directory for mlton in the
user's home directory.
To improve upon this & keep the dependencies local to this project, I followed a process that roughly does the following:
- download sources w/ git (or other tools if needed) into a local dependency cache folder
- build dependencies from source & save output to a local lib folder
- refer to those dependencies in appropriate basis files using mlb-path-var
- pass path to lib folder to mlton at compile time using the
-mlb-path-varcompile time option
After doing the process a few times when needing to rebuild things from
scratch, I ended up automating it in the project's Makefile.
step 1: download sources
In the case of sml-graph, the only dependencies were hosted on GitHub as repos
in the smlsharp organization, so generalizing the download process was fairly
easy:
CACHE_DIR := $(ROOT)/.cache
SMLUNIT_CACHE := $(CACHE_DIR)/SMLUnit
SMLFORMAT_CACHE := $(CACHE_DIR)/SMLFormat
CACHE := $(SMLUNIT_CACHE) $(SMLFORMAT_CACHE)
# ...
# download or update dependency source using git
# if repo already cached, just pull latest commits
# otherwise clone the repo from github
$(CACHE): $(CACHE_DIR)/%: | $(CACHE_DIR)
if [ -d "$@" ]; then \
cd $@ && git plo; \
else \
git clone [email protected]:smlsharp/$(@F) $@; \
fi
# ensure cache directory exists to save dep source repos
$(CACHE_DIR):
mkdir -p $(CACHE_DIR)step 2: build sources
Once cloned/updated, the sources may need built/rebuilt. This step is
unfortunately required to be tailored to each dependency, as every SML project
has their own build process that has no strictly followed standard. For SMLUnit
& SMLFormat, they both use make & helpfully include install targets which
place necessary library & binary files at the directory given. I put that
together into a couple of make targets for sml-graph, resembling something like
this:
# build & install SMLUnit dependency
$(SMLUNIT_LIB): $(SMLUNIT_CACHE) | $(DEPS_DIR)
if [ ! -d "$(SMLUNIT_CACHE)/bin" ]; then \
mkdir $(SMLUNIT_CACHE)/bin; \
fi
cd $(SMLUNIT_CACHE) \
&& $(MAKE) \
-f $(SMLUNIT_CACHE)/Makefile.mlton \
PREFIX=$(DEPS_DIR) \
install-nodoc
# build & install SMLFormat dependency
$(SMLFORMAT_EXE): $(SMLFORMAT_CACHE) | $(DEPS_DIR)
cd $(SMLFORMAT_CACHE) \
&& $(MAKE) \
-f $(SMLFORMAT_CACHE)/Makefile.mlton \
PREFIX=$(DEPS_DIR) \
install-nodoc
$(DEPS_DIR):
mkdir -p $(DEPS_DIR)steps 3 & 4: add dependencies in basis files
To enable testing with SMLUnit, the installed library's basis file needs to be
included in the test harness' basis file. This can be readily accomplished by
adding the line $(SMLUNIT_LIB)/src/sources.mlb & telling mlton where to
find $(SMLUNIT_LIB) using the -mlb-path-var option when compiling the test
harness. At the same time, the test harness binary needs to depend on the
SMLUnit dependency having been properly installed & made available. In the end,
I ended up with a test binary target that looked like this:
TST_DIR := $(ROOT)/test
TST_MLB := $(TST_DIR)/sources.mlb
TST_TGT := $(TGT_PRE)/test
TST_EXE := $(TST_TGT)/test_hello
TST_DEP := $(TST_MLB:.mlb=.mlb.d)
# add path vars to compile time flags
$(TST_EXE): MLTON_FLAGS += -mlb-path-var "SMLUNIT_LIB $(SMLUNIT_LIB_DIR)" -mlb-path-var "SML_DX $(SRC_DIR)"
# test binary target depends on deps targets & generated .mlb.d
$(TST_EXE): $(DEPS) $(TST_DEP) | $(TST_TGT)
$(MLTON) $(MLTON_FLAGS) -output $(TST_EXE) $(TST_MLB)which I invoked with a phony test target that depended on the test harness
binary & simply executed it:
test: $(TST_EXE)
$(TST_EXE)Finally, all of that becomes wrapped with entr for quick feedback loops that run unit tests with a simple one-liner shell command:
ag -l -g . | entr make testfootnotes
- A trivial example code base implementing everything discussed here can be found on GitHub
- A less trivial example can be found in
sml-graph - Much of the process outlined here was inspired by browsing sources in the smlsharp github repos, in particular their makefile style, seen in
- SMLUnit
- SMLFormat
- SMLDoc