Planet TVL

Bazel Knowledge: Recursive Bazel for testing

Bazel’s sandboxing is a powerful way to isolate builds and enforce resource usage via the use of cgroups. One key feature is limiting memory per action via --experimental_sandbox_memory_limit_mb. However, configuring this correctly across machines and CI environments is tricky, and even worse, if Bazel silently fails to enable it, your limits simply don’t apply.

I consider this silent failure to be a bug, especially if any limits have been explicitly expressed, and have opened issue#26062.

I have a few previous posts where I explored how to enable groups for Bazel for the purpose of enforcing memory limits at my $DAYJOB$. After I got to the point of having my own manual validation of the flag working, I wanted to prove that it continues to work and we don’t introduce a regression. 🤔

Normally, we catch regressions with tests. But things get a little more hazy when you’re trying to test the foundational layer that runs all your code.

Nevertheless, turns out we can employ a test! We will run bazel inside of bazel 🤯.

Turns out that the Bazel codebase already runs bazel recursively in test targets and there is even a ruleset, rules_bazel_integration_test, that offers a lot of scaffolding to test multiple Bazel versions.

I always opt for the simplest solution first and decided to write a minimal sh_test that provided our memory limits without relying on @rules_bazel_integration_test which adds scaffolding for multi-version testing, but felt heavyweight for this focused validation 🤓.

Let’s first build our failing binary! We will build a Java program that will endlessly consume memory.

public class EatMemory {
  private static final int ONE_MIB = 1024 * 1024;
  private static final int MAX_MIB = 100;

  public static void main(String[] args) {
    byte[][] blocks = new byte[MAX_MIB][];
    int i;

    for (i = 0; i < MAX_MIB; ++i) {
      blocks[i] = new byte[ONE_MIB];
      // Touch the memory to ensure it's actually allocated
      for (int j = 0; j < ONE_MIB; ++j) {
        blocks[i][j] = (byte) 0xAA;
      }
      System.out.printf("Allocated and touched %d MiB%n", i + 1);
      System.out.flush();
    }

    System.out.printf("Successfully allocated and touched %d MiB. Exiting.%n", MAX_MIB);
  }
}

We will now create a simple sh_test that will run bazel. We will give it the EatMemory.java file and it will setup a very minimal Bazel workspace.

The test will create a simple MODULE.bazel file in a temporary directory and copy over our Java file.

#!/usr/bin/env bash

# Remove the default runfile
# setup stuff for brevity...

mkdir -p "${TEST_TMPDIR}/workspace/java/"

cp "$(rlocation __main__/java/EatMemory.java)" \
   "${TEST_TMPDIR}/workspace/java/EatMemory.java"

cd "${TEST_TMPDIR}/workspace"

cat > java/BUILD.bazel <<'EOF'
# This target is only run within the memory_limit_test.sh script
java_test(
  name = "EatMemory",
  srcs = ["EatMemory.java"],
  tags = [
    "manual",
    "no-cache",
    "no-remote",
  ],
  use_testrunner = False,
)
EOF

cat > MODULE.bazel <<'EOF'
bazel_dep(name = "rules_java", version = "8.11.0")
EOF

# we want to make sure we don't immediately fail if the test fails
# since this is a negative test.
set +e

# this should fail
if bazel --output_user_root="${TEST_TMPDIR}/root" \
      test //java:EatMemory \
      --announce_rc \
      --experimental_sandbox_memory_limit_mb=20 \
      --sandbox_tmpfs_path=/tmp \
      --sandbox_add_mount_pair="${TEST_TMPDIR}/root" \
      --flaky_test_attempts=1 \
      --test_output=streamed
then
  echo "Test unexpectedly succeeded. Are the cgroup limits set correctly?"
  exit 1
fi

The important flag I’m seeking to test is --experimental_sandbox_memory_limit_mb=20 where I set the maximum memory that can be used by actions as 20MiB.

Since I’m running a target that will consume up to 100MiB, this test expects bazel to fail and if it succeeds, the test will fail.

We now do the last finishing touch of writing our BUILD.bazel file with our sh_test. In order to help the test find bazel we add our $PATH to the env_inherit flag. Normally this is not considered best practice as it ruins the hermiticity of the test, but in this case we don’t mind if the test re-runs. 😎

java_binary(
    name = "EatMemory",
    srcs = ["EatMemory.java"],
)

sh_test(
    name = "memory_limit_test",
    srcs = ["memory_limit_test.sh"],
    data = [
        ":EatMemory.java",
        "@bazel_tools//tools/bash/runfiles",
    ],
    env_inherit = ["PATH"],
    tags = [
        "external",
        "no-cache",
        "no-remote",
    ],
    target_compatible_with = [
        "@platforms//os:linux",
    ],
)

We make sure to restrict the test to only the Linux platform, since Windows and MacOS do not have cgroup support.

One final gotcha, is to remember to disable any form of caching 👌 !

We are trying to validate assumptions about the state of a system unbenownst to Bazel and therefore as it is not modeled in Bazel’s action graph, we cannot safely cache the test. Make sure no-cache and no-remote are applied.

We can now rest assured that when we apply experimental_sandbox_memory_limit_mb to our comprehensive test suite, Bazel will continue to respect them.


Bazel Knowledge: A practical guide to depset

Bazel’s depset is a powerful construct for managing transitive dependencies efficiently. While it’s commonly used in complex rules and providers, sometimes a simple use case can illuminate its utility.

Ok… but what does that meah? 🤔

This is a companion post to Jay Conrad excellent series on writing rules. The series is excellent and I recommend you read it 🤓.

Consider this simple guide for understanding depset. We will be writing a simple ruleset rules_graphviz.

Graphviz is an open source graph visualization software. Graphs are defined via the DOT language which is a grammar for defining Graphviz nodes, edges, graphs.

For instance, let’s take the simple graph G below.

digraph G {
 "A" -> "B"
 "A" -> "D"
 "B" -> "C"
 "B" -> "D"
}

Would produce the following visualization.

┌───┐     ┌───┐
│ D │ ◀── │ A │
└───┘     └───┘
  ▲         │
  │         │
  │         ▼
  │       ┌───┐
  └────── │ B │
          └───┘
            │
            │
            ▼
          ┌───┐
          │ C │
          └───┘

Our goal would be to write a Bazel rule that let’s us model this graph purely in Bazel targets.

load("@rules_graphviz//:dot.bzl", "node")

node(
  name = "A",
  edges = [
    ":B",
    ":D",
  ],
)

node(
  name = "B",
  edges = [
    ":C",
    ":D",
  ],
)

node(name = "C")

node(name = "D")

We would like a rule that creates a text file (dot file) of the digraph representation of all the nodes reachable from the given target.

That means every node should know it’s dependencies (i.e. edges), and we’d like a way to traverse the whole graph.

💡 We could do this traversal with a standard algorithm (i.e. breadth-first-search) knowing only the direct edges, however, this is where depset shines, as it’s a space and time effecient way of encoding a graph. The depset API contains a to_list() function making it easy to quickly iterate over the whole graph.

First, let’s define our unique provider. Providers are a way for us to attach additional metadata to every target in Bazel that they can carry along with them.

We will need two pieces of information: a fragment of text which are the immediate edges of this target and a depset which is the subgraph of targets it depends on.


GraphvizProviderInfo = provider(
  doc = "A provider for graphviz",
  fields = {
    "fragment": "The edges of this target to it's strict dependencies",
    "deps": "A depset of the dependencies of this target",
  },
)

Let’s create our rule. We make it clear to Bazel that all targets provided to edges must carry with them our provider GraphvizProviderInfo. Failure to add an edge which doesn’t have this provider, will be an evaluation error.

node = rule(
  implementation = _node_impl,
  attrs = {
    "edges": attr.label_list(
      doc = "Edges to other Graphviz nodes.",
      providers = [GraphvizProviderInfo],
    ),
  },
  output_to_genfiles = True,
)

Now the implementation purpose is to construct each node’s fragment (i.e. direct edges) and also collect all fragment’s of each node reachable from the graph when constructing the final DOT graph.

Two key lines are when the rule constructs transitive_deps and transitive_fragments.

transitive_deps
We need to construct and propagate the new depset for the given node. We pass the immediate edges as the direct dependencies and each direct dependencies own depset into the transitive attribute. This will create our graph bottom-up!
transitive_fragments
This is where the rule iterates over all reachable nodes in the graph. We could do a traditional traversal, but the appeal of depset is that it offers a to_list() API that provides the traversal for us – while still giving us all the other space & time efficiencies.
def _node_impl(ctx):
  # Generate the DOT fragment for the current node
  fragment = '"{}"\n'.format(ctx.label)
  fragment += ''.join(
    ['"{}" -> "{}"\n'.format(ctx.label, dep.label) for dep in ctx.attr.edges]
  )

  # Aggregate transitive dependencies using depset
  transitive_deps = depset(
    direct=ctx.attr.edges,
    transitive=[dep[GraphvizProviderInfo].deps for dep in ctx.attr.edges]
  )

  # Concatenate all fragments from transitive dependencies
  transitive_fragments = ''.join(
    [dep[GraphvizProviderInfo].fragment for dep in transitive_deps.to_list()]
  )

  # Assemble the complete DOT content
  dot_content = "digraph G {\n"
  dot_content += fragment
  dot_content += transitive_fragments
  dot_content += "}\n"

  # Declare and write the DOT file
  dot_file = ctx.actions.declare_file(ctx.attr.name + ".dot")
  ctx.actions.write(dot_file, dot_content)

  # Return the providers
  return [
    DefaultInfo(files=depset([dot_file])),
    GraphvizProviderInfo(fragment=fragment, deps=transitive_deps),
  ]

Let’s try our new rule using the targets earlier!

> bazel build //:A
Target //:A up-to-date:
  bazel-bin/A.dot

> cat bazel-bin/A.dot
digraph G {
"@@//:A"
"@@//:A" -> "@@//:B"
"@@//:A" -> "@@//:D"
"@@//:C"
"@@//:D"
"@@//:B"
"@@//:B" -> "@@//:C"
"@@//:B" -> "@@//:D"
}

Huzzah! We built a small declarative graph ruleset that emits DOT files 🙌🏽.

We did so by eleveraging Bazel depset to make the traversal efficient and propagated this information using our own custom provider.

That was not as scary as I thought 🫣.

Update

Some feedback was provided by Peter Lobsinger over the Bazel slack that highlighted best practices from Bazel recommend trying to avoid calling to_list whenever possible.

You can coerce a depset to a flat list using to_list(), but doing so usually results in O(N^2) cost. If at all possible, avoid any flattening of depsets except for debugging purposes. [ref]

Finally, it’s important to not retrieve the contents of the depset unnecessarily in rule implementations. One call to to_list() at the end in a binary rule is fine, since the overall cost is just O(n). It’s when many non-terminal targets try to call to_list() that quadratic behavior occurs. [ref]

We can update the rule to instead bubble up the fragment which includes the transitive edges.

The relevant change avoids calling to_list and instead concatenates prior fragments into the current one.

def _node_impl(ctx):
  # Generate the DOT fragment for the current node
  fragment = '"{}"\n'.format(ctx.label)
  fragment += ''.join(
    ['"{}" -> "{}"\n'.format(ctx.label, dep.label) for dep in ctx.attr.edges]
  )

  fragment += ''.join(
    [dep[GraphvizProviderInfo].fragment for dep in ctx.attr.edges]
  )

  # Assemble the complete DOT content
  dot_content = "digraph G {\n"
  dot_content += fragment
  dot_content += "}\n"

The downside to this approach is that nodes and edges may be duplicated in the resulting file with the current implementation. The DOT language supports duplicates, so the resulting graph is still correct albeit a bit unecessarily larger.

> bazel build //:A
Target //:A up-to-date:
  bazel-bin/A.dot

> cat bazel-bin/A.dot
digraph G {
"@@//:A"
"@@//:A" -> "@@//:B"
"@@//:A" -> "@@//:D"
"@@//:B"
"@@//:B" -> "@@//:C"
"@@//:B" -> "@@//:D"
"@@//:C"
"@@//:D"
"@@//:D"
}

We could handle the duplicates in the fragment each time by stripping them out or create a new rule graph which is the only point at which we do the full traversal and may call to_list. However, I wanted to keep the rule as simple as possible for demonstrative purposes 🙇🏼.


Bazel linux-sandbox and cgroups

This is a follow up from the previous post on bazel cgroup memory investigation 🕵️.

Turns out that at $DAYJOB$ we were not even using linux-sandbox like we thought we were! 🤦

Our builds were happily printing out processwrapper-sandbox even thought the builds were on Linux.

How come? 🤔

Well it’s not so obvious on why a particular sandbox strategy is not available. Bazel does not make any logs easily available for debug.

Turns out though we can easily run the linux-sandbox itself and get some more diagnostic information.

We will use the linux-sandbox tool to run /bin/true which is what Bazel itself does to validate that the tool is functioning correctly ref.

> $(bazel info install_base)/linux-sandbox /bin/true

src/main/tools/linux-sandbox-pid1.cc:180: "mount": Permission denied

Uh no 😫 – what does that permission denied for “mount” mean ?

Well the linux-sandbox is creating various mounts within a user namespace to setup the sandbox.

Once again, not much logs from the tool itself to use to debug. Turns out that if you run dmesg, we see the culprit.

[Tue May 13 21:50:22 2025] audit: type=1400 audit(1747173023.407:128):
  apparmor="DENIED" operation="capable" class="cap" profile="unprivileged_userns"
  pid=3763 comm="unshare" capability=21  capname="sys_admin"

Looks like AppArmor is specifically denying the mount within the user namespace.

Why?

Looks like a breaking change occurred in Ubuntu 24 where a new AppArmor profile was included that restricted unprivileged user namespaces ref.

Well for now, let’s just disable all AppArmor checks and make them “complaints” 🤫

sudo aa-complain /etc/apparmor.d/*

Now that we know that linux-sandbox will work, let’s setup our cgroups so that they can be used by Bazel.

We will create a root group /example and we will enable the memory controller for it. Additionally, we create a child group /example/child that will own the Bazel process.

The last step is moving our current process into the cgroup so that subsequent bazel invocations start in that cgroup itself.

sudo mkdir /sys/fs/cgroup/example
sudo chown $USER -R /sys/fs/cgroup/example
cat /sys/fs/cgroup/example/cgroup.controllers
echo "+memory" | sudo tee /sys/fs/cgroup/example/cgroup.subtree_control
sudo mkdir /sys/fs/cgroup/example/child
sudo chown $USER -R /sys/fs/cgroup/example/child
echo $$ | sudo tee /sys/fs/cgroup/example/child/cgroup.procs

Now we are ready to try --experimental_cgroup_parent flag for bazel.

According to the Bazel documentation, this flag will make it so that Bazel runs every execution within a cgroup nested within this parent.

While that on it’s own is not very useful, we can combine it with other flags like --experimental_sandbox_memory_limit_mb to enforce maximum memory for tasks.

We could even modify the parent cgroup ourselves which would be inherited by all the child groups. For instance, we could force CPU constraints, like have Bazel only schedule on certain cores. 🤓

We would however like to validate that this all works, so for that we will write a very simple genrule.

The goal of the genrule is to write out the info to a file bazel-bin/cgroup_output.txt that we can use to validate things are as we expect.

genrule(
    name = "check_cgroup",
    outs = ["cgroup_output.txt"],
    cmd = """
        echo "==== /proc/self/cgroup ====" > $@
        cat /proc/self/cgroup >> $@
        echo "" >> $@
        echo "==== Cgroup memory.max for each cgroup in /proc/self/cgroup ====" >> $@
        while IFS= read -r line; do
            IFS=: read -r _ _ cgroup_path <<< "$$line"
            if [ -f "/sys/fs/cgroup$${cgroup_path}/memory.max" ]; then
                echo "$${cgroup_path}: $$(cat /sys/fs/cgroup$${cgroup_path}/memory.max)" >> $@
            else
                echo "$${cgroup_path}: memory.max not available" >> $@
            fi
        done < /proc/self/cgroup
        echo "" >> $@
    """
)

Now let’s run it!

$ bazel --experimental_cgroup_parent=/example/test build \
   //:check_cgroup \
   --experimental_sandbox_memory_limit_mb=20

$ cat bazel-bin/cgroup_output.txt
==== /proc/self/cgroup ====
0::/example/blaze_8239_spawns.slice/sandbox_7.scope

==== Cgroup memory.max for each cgroup in /proc/self/cgroup ====
/example/blaze_8239_spawns.slice/sandbox_7.scope: 20971520

Great! Everything looks like it works.

Our task was correctly placed within /example cgroup and I can even see that the memory.max value or the cgroup was set to 20MiB.

We can now go back to our original demonstrate of eat_memory.py from earlier and avoid having to use systemd-run itself to limit memory but instead rely on bazel cgroup integration. 🔥


Bazel cgroup memory investigation

We had the case at $DAYJOB$, where our CI system would occassional bork 💀.

With some regression analysis we figured it was likely to a new test being added that likely had a memory leak and caused the overall system to go out-of-memory (OOM).

While we sought to find the culprit, I wanted to explore whether cgroup, a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes could help us cap the total memory Bazel tests accumulate.

Looks like Bazel 8.0 has some new exciting specific cgroup features which I’d like to try!

First, let us start with a small reproducer that we will call eat_memory, whose role will simply be to continously allocate more memory.

eat_memory.py
import time
import sys

megabytes_to_allocate = 200  # Default, can be overridden by arg
if len(sys.argv) > 1:
    try:
        megabytes_to_allocate = int(sys.argv[1])
    except ValueError:
        print(f"Usage: python3 {sys.argv[0]} [megabytes_to_allocate]")
        sys.exit(1)

print(f"Attempting to allocate {megabytes_to_allocate} MB of memory gradually.")

data_chunks = []
chunk_size_mb = 1  # Allocate 1MB at a time
bytes_per_mb = 1024 * 1024
chunk_bytes = chunk_size_mb * bytes_per_mb

allocated_mb = 0

try:
    for i in range(megabytes_to_allocate // chunk_size_mb):
        # Allocate 1MB of memory (list of bytes, ensures it's "real" memory)
        data_chunks.append(b' ' * chunk_bytes)
        allocated_mb += chunk_size_mb
        print(f"Allocated: {allocated_mb} MB / {megabytes_to_allocate} MB", flush=True)
        time.sleep(0.1)
    print(f"Successfully allocated all {megabytes_to_allocate} MB.")
except MemoryError:
    print(f"MemoryError: Could not allocate more memory. Allocated approx {allocated_mb} MB.")
    sys.exit(1)
except Exception as e:
    print(f"An unexpected error occurred: {e}")
    sys.exit(1)

# Optional:
# print("Holding memory. Press Ctrl+C to exit or wait for OOM killer.")
# try:
#     while True:
#         time.sleep(1)
# except KeyboardInterrupt:
#     print("Exiting due to Ctrl+C.")

Turns out the creation of cgroup and the settings of it can be easily accomplished with systemd-run that is installed on any distrition (most) with systemd.

We take special care to set MemoryMax and MemorySwapMax as on my machine as I have swap enabled.

> systemd-run --user --scope -p MemoryMax=10M \
              -p MemorySwapMax=0M -- python eat_memory.py
Running as unit: run-rfac85b068fee45479a4aae220ae02d24.scope; invocation ID: ea099d98584a4e0c979c96e265e3cd06
Attempting to allocate 200 MB of memory gradually.
Allocated: 1 MB / 200 MB
Allocated: 2 MB / 200 MB
Allocated: 3 MB / 200 MB
Allocated: 4 MB / 200 MB
Allocated: 5 MB / 200 MB
Allocated: 6 MB / 200 MB
fish: Job 1, 'systemd-run --user --scope -p M…' terminated by signal SIGKILL (Forced quit)

The reproducer dies at 6MB because the Python interpreter itself consumed 4MB.

We now want to apply this to bazel!

Let’s create a simple Bazel project.

BUILD.bazel
py_binary(
    name = "eat_memory",
    srcs = ["eat_memory.py"],
)

sh_test(
    name = "eat_memory_test",
    srcs = ["eat_memory_test.sh"],
    data = [":eat_memory"],
    tags = ["no-cache"]
)
eat_memory_test.sh
#!/bin/bash

echo "Running eat_memory test..."

# Locate the eat_memory binary provided as a data file
EAT_MEMORY_BINARY=$(dirname "$0")/eat_memory

# Check if the binary exists
if [[ ! -x "$EAT_MEMORY_BINARY" ]]; then
    echo "Error: eat_memory binary not found or not executable"
    exit 1
fi

$EAT_MEMORY_BINARY
EXIT_CODE=$?

# Validate the output and exit code
if [[ $EXIT_CODE -ne 0 ]]; then
    echo "Test failed: eat_memory exited with code $EXIT_CODE"
    echo "Output: $OUTPUT"
    exit 1
fi

echo "Test passed: eat_memory ran successfully"
exit 0

If we bazel run the command with systemd-run things work as expected.

> systemd-run --user --scope -p MemoryMax=10M \
            -p MemorySwapMax=0M -- bazel run //:eat_memory
Running as unit: run-r351ccd338626452181cbe63b78287bbe.scope; invocation ID: 16c6551b89924a7c8182bf2d217253c0
INFO: Analyzed target //:eat_memory (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:eat_memory up-to-date:
  bazel-bin/eat_memory
INFO: Elapsed time: 0.058s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/eat_memory
Attempting to allocate 1024 MB of memory gradually.
Allocated: 1 MB / 1024 MB
Allocated: 2 MB / 1024 MB
Allocated: 3 MB / 1024 MB
Allocated: 4 MB / 1024 MB
Allocated: 5 MB / 1024 MB
fish: Job 1, 'systemd-run --user --scope -p M…' terminated by signal SIGKILL (Forced quit)

If I do bazel test however things don’t seem to work.

systemd-run --user --scope -p MemoryMax=10M \
            -p MemorySwapMax=0M -- \
            bazel test //... \
            --cache_test_results=no \
            --test_output=streamed
Running as unit: run-r5b91c5d734b3415faa754ae982e3f621.scope; invocation ID: 051f6836ee1a4e038ce997249050711c
WARNING: Streamed test output requested. All tests will be run locally, without sharding, one at a time
INFO: Analyzed 2 targets (0 packages loaded, 4 targets configured).
Running eat_memory test...
Attempting to allocate 1024 MB of memory gradually.
Allocated: 1 MB / 1024 MB
Allocated: 2 MB / 1024 MB
Allocated: 3 MB / 1024 MB
Allocated: 4 MB / 1024 MB
Allocated: 5 MB / 1024 MB
Allocated: 6 MB / 1024 MB
Allocated: 7 MB / 1024 MB
Allocated: 8 MB / 1024 MB
Allocated: 9 MB / 1024 MB
Allocated: 10 MB / 1024 MB
Allocated: 11 MB / 1024 MB
Allocated: 12 MB / 1024 MB
Allocated: 13 MB / 1024 MB
Allocated: 14 MB / 1024 MB
...

Of course, there is a bazel server that is started previously that is not bound to the cgroup limit 🤦.

We will have to invoke bazel shutdown and be sure to provide a MemoryMax that is large enough to include the server which for my machine is roughly 500MiB.

> bazel shutdown

> systemd-run --user --scope \
    -p MemoryMax=510M -p MemorySwapMax=0M -- \
    bazel test //... \
    --cache_test_results=no \
    --test_output=streamed
Running as unit: run-r1c56d335301e45049e32c7c44f571a1c.scope; invocation ID: 2a6806e72e7d4aa9b246976d3a808915
Starting local Bazel server and connecting to it...
WARNING: Streamed test output requested. All tests will be run locally, without sharding, one at a time
INFO: Analyzed 2 targets (90 packages loaded, 905 targets configured).
Running eat_memory test...
Attempting to allocate 1024 MB of memory gradually.
Allocated: 10 MB / 1024 MB
Allocated: 20 MB / 1024 MB
[10 / 11] Testing //:eat_memory_test; 0s linux-sandbox

Bazel caught terminate signal; cancelling pending invocation.

Could not interrupt server: (14) Connection reset by peer

Server terminated abruptly (error code: 14, error message: 'Connection reset by peer', log file: '...')

Great! This now properly kills everything including the server. ✊

That may seem pretty draconian but we’ve found that relying on Linux’s OOM killer to be not effective and having the CI machines get to an inoperable state leads them to suddenly cycle.


Chaining Nix stores for fun

I recently realized that you can chain Nix stores 🤯 – although I’m not 100% clear on why I may want to do it.

Nevertheless, the concept is pretty cool – and I’m sure I can come up with some interesting use-cases.

What do I even mean?

Well by default nix attempts to locate which “store” to use automatically:

  1. Use the local store /nix/store if /nix/var/nix is writable by the current user.
  2. If /nix/var/nix/daemon-socket/socket exists, connect to the Nix daemon listening on that socket.
  3. For Linux only, use the local chroot store ~/.local/share/nix/root, which will be created automatically if it does not exist.

You can be more explicit and tell nix the store to use via --store on the CLI.

There are a variety of store types: dummy, ssh, overlay-fs, s3, http and so on.

I think I can chain stores of type daemon endlessly.

+-------------------+         +-------------------+         +------------------+
| Nix Daemon 2      |  --->   | Nix Daemon 1      |  --->   | Local Nix Store  |
| /tmp/nix_socket_2 |         | /tmp/nix_socket_1 |         | /nix/store       |
+-------------------+         +-------------------+         +------------------+

To test this out, I have created a new nix daemon which is listening on a new socket /tmp/nix_socket_1.

This daemon will set it’s store to /tmp/chain-example. When a filesystem store other than /nix/store is used, Nix will create /nix/store within it and chroot so that /nix/store appears to be the root.

If you don’t do this, then we cannot make use of all the pre-computed binaries offered by the NixOS cache. The documentation has a nice blurb on this ref.

> NIX_DAEMON_SOCKET_PATH=/tmp/nix_socket_1 nix daemon \
      --debug --store /tmp/chain-example

I now create a second daemon that will listen on /tmp/nix_socket_2 and whose store is unix:///tmp/nix_socket_1, the first daemon.

> NIX_DAEMON_SOCKET_PATH=/tmp/nix_socket_2 nix daemon \
      --debug --store unix:///tmp/nix_socket_1

Now we can do our build!

We execute nix build but execute it against the second daemon (nix_socket_2).

> nix build nixpkgs#hello \
    --store unix:///tmp/nix_socket_2 \
    --print-out-paths
# bunch of debug messages
/nix/store/y1x7ng5bmc9s8lqrf98brcpk1a7lbcl5-hello-2.12.1

Okay – so we just tunneled our command through a daemon… cool?

Well we can maybe write an interceptor to log all the traffic and see what’s going on.

Here we can use socat to pipe all the data to nix_socket_1 but also -v will debug print everything.

> socat -v UNIX-LISTEN:/tmp/nix_socket_1,fork \
       UNIX-CONNECT:/tmp/nix_socket_2

I’m wondering whether it makes sense to support multiple “read” stores and only one that gets written to.

Although at this point I’m not sure about the distinction between store and substituters


rizzler: stop crying over Git merge conflicts and let AI handle the drama

rizzler: stop crying over Git merge conflicts and let AI handle the drama

💀 Ugh, merge conflicts: That sinking feeling when Git screams at you? We've all been there. Manually fixing those tangled messes? It's giving... tedious. It's giving... waste of my precious time. 😩

🚀 Enter rizzler: Your new AI bestie that actually *gets* Git. This ain't your grandpa's merge tool. rizzler slides into your Git workflow and uses ✨ AI magic ✨ (think OpenAI, Claude, Gemini, Bedrock - the whole squad) to automatically resolve those annoying merge conflicts. Less time untangling, more time coding (or scrolling). You're welcome. 😉

Basically, it turns this:

<<<<<<< HEAD
const message = "Hello from main branch!";
=======
const message = "Waddup from feature branch!";
>>>>>>> feature-branch

Into actual, usable code, letting you get back to the important stuff. ✨

🚀 Get Rizzin': Installation

Ready to ditch the conflict drama? Let's get you set up. Head on over to the repository below for the source code, installation and configuration instructions:

ghuntley/rizzler
Contribute to ghuntley/rizzler development by creating an account on GitHub.
rizzler: stop crying over Git merge conflicts and let AI handle the drama

no cap, how does this work?

The rizzler is a low-level merge driver that can be used as a command-line tool without Git and can be configured into Git itself as a resolver strategy. It queries the LLMs you have configured (there's a bundle of strategies within, and I'd love for folks to send in more)

rizzler: stop crying over Git merge conflicts and let AI handle the drama

If a file hits eight merge conflicts and can't crack one, it'll tackle the rest and send an "oops" back to Git, halting the merge party. That's your cue to dust off your favourite editor and resolve that stubborn conflict manually—throwback style, like it's 2005. On the bright side, successful fixes get cached on disk, cutting down on LLM costs and giving you a productivity boost—no more twiddling thumbs waiting for the LLM to chime in!

ps. socials

pps: this is a joke. A thought experiment if you will. What if in the future git commits are automatically done by assistants and the commit message contains information that a client such as rizzler could use for injection into the context window? That starts make things interesting fast. Now we got yours and mine with full context and perhaps a way to traverse all the way back up to JIRA via MCP to determine intent…


autoregressive queens of failure

autoregressive queens of failure

Have you ever had your AI coding assistant suggest something so off-base that you wonder if it’s trolling you? Welcome to the world of autoregressive failure.

LLMs, the brains behind these assistants, are great at predicting the next word—or line of code—based on what's been fed into them. But when the context gets too complex or concerns within the context are mixed, they lose the thread and spiral into hilariously (or frustratingly) wrong territory. Let’s dive into why this happens and how to stop it from happening.

First, I'll need you to stop by the following blog post to understand an agent from first principles.

How To Build An Agent | Amp
Building a fully functional, code-editing agent in less than 400 lines.
autoregressive queens of failure

what an agent is: explained in less than 400 lines of code

Still reading? Great. In the diagram below, an agent has been configured with two tools. Each tool has also been configured with a tool prompt, which advertises how to use the tool to the LLM.

The tools are:

  • Tool 1 - Visit a website and extract the contents of the page.
  • Tool 2 - Perform a Google search and return search results.

Now, imagine for a moment that this agent is an interactive console application that you use to search Google or visit a URL.

autoregressive queens of failure

Whilst using the agent, you perform the actions:

  1. Visit a news website.
  2. Search Google for party hats.
  3. Visit a Wikipedia article about Meerkats.

Each of these operations allocates the results from the above operations into memory - the LLM context window.

autoregressive queens of failure
when data is malloc()'ed into the LLM's context window. It cannot be free() 'd unless you create a brand new context window.

With all that context loaded into the window, all that data is now available for consideration when you ask a question. Thus, there's a probability that it'll generate a news article about Meerkats wearing party hats in response to a search for Meerkat facts (ie. Wikipedia).

That might sound obvious, but it's not. The tooling that most software developers use day-to-day hides context windows from the user and encourages endless chatops sessions within the same context window, even if the current task is unrelated to the previous task.

This creates bad outcomes because what is loaded into memory is unrelated to the job to be done, and results in noise from software engineers saying that 'AI doesn't work', but in reality, it's how the software engineers are holding/using the tool that's at fault.

My #1 recommendation for people these days is to use a context window for one task, and one task only. If your coding agent is misbehaving, it's time to create a new context window. If the bowling ball is in the gutter, there's no saving it. It's in the gutter.

My #2 recommendation is to not redline the context window (see below)

if you are redlining the LLM, you aren’t headlining
It’s an old joke in the DJ community about upcoming artists having a bad reputation for pushing the audio signal into the red. Red is bad because it results in the audio signal being clipped and the mix sounding muddy. It’s a good analogy that applies to software
autoregressive queens of failure

ps. socials


I dream about AI subagents; they whisper to me while I'm asleep

I dream about AI subagents; they whisper to me while I'm asleep

In a previous post, I shared about "real context window" sizes and "advertised context window sizes"

Claude 3.7’s advertised context window is 200k, but I've noticed that the quality of output clips at the 147k-152k mark. Regardless of which agent is used, when clipping occurs, tool call to tool call invocation starts to fail

The short version is that we are in another era of "640kb should be enough for anyone," and folks need to start thinking about how the current generation of context windows is similar to RAM on a computer in the 1980s until such time that DOS=HIGH,UMB becomes a thing...

I dream about AI subagents; they whisper to me while I'm asleep
LLM context windows are like RAM in an IBM 8086 XT and are a precious resource, but engineers and developer tooling companies do not treat them as such.

The current generation of coding agents work via a tight evaluation loop of tool calls to tool calls that operate within a single context window (ie. RAM). However, the problem with this design is that when an LLM provides a bad outcome, the coding assistants/agents' death spiral and brute force on the main context window which consumes precious resources as it tries to figure out the next steps.

I dream about AI subagents; they whisper to me while I'm asleep
the current generation of software development agents works like this. it's not great (tm)

However, I've been thinking: What if an agent could spawn a new agent and clone the context window? If such a thing were possible, it would enable an agent to spawn a sub-agent. The main agent would pause, wait for the sub-agent to burn through its own context window (ie. SWAP), and then provide concrete next steps for the primary agent.

I dream about AI subagents; they whisper to me while I'm asleep
i suspect next generation agents will look something like this under the hood

It's theoretical right now, and I haven't looked into it. Still, I dream of the possibility that in the future, software development agents will not waste precious context (RAM) and enter a death spiral on the main thread.

p.s. socials

pps. extra reading

Building Multi-Agent Systems
Scaling LLM-based agents to handle complex problems reliably.
I dream about AI subagents; they whisper to me while I'm asleep

"You see this [breakdown] a lot even in non-coding agentic systems where a single agent just starts to break down at some point." - Shrivu Shankar


if you are redlining the LLM, you aren't headlining

if you are redlining the LLM, you aren't headlining

It’s an old joke in the DJ community about upcoming artists having a bad reputation for pushing the audio signal into the red. Red is bad because it results in the audio signal being clipped and the mix sounding muddy.

if you are redlining the LLM, you aren't headlining

It’s a good analogy that applies to software engineers in the future, however, as these LLMs have context windows.

Claude 3.7’s advertised context window is 200k, but I've noticed that the quality of output clips at the 147k-152k mark. Regardless of which agent is used, when clipping occurs, tool call to tool call invocation starts to fail like below when using Claude 3.7:

≡ prompt

commit the changes and push

≡ Thinking

I don't have direct access to execute git commands on your system. To commit and push the changes, you'll need to run the following commands in your terminal:

# Add all modified files
git add .

# Commit with a descriptive message
git commit -m "Fix channel implementation and parser error reporting
- Fixed parser implementations for while statements and
- Enhanced error reporting in the parser with better con
- Fixed non-blocking channel operations tests
- Updated channel tests to verify implementation complet"

# Push to your remote repository
git push

These commands will:

1. Stage all the changes we made
2. Create a commit with a descriptive message
3. Push the changes to your remote repository

After consulting with some mates in the field, I learned that "advertised" vs. "real" size is a thing—see this paper and the GitHub repository below.

RULER: What’s the Real Context Size of Your Long-Context Language Models?
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the “needle”) from long distractor texts (the “haystack”), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is indicative of only a superficial form of long-context understanding. To provide a more comprehensive evaluation of long-context LMs, we create a new synthetic benchmark RULER with flexible configurations for customized sequence length and task complexity. RULER expands upon the vanilla NIAH test to encompass variations with diverse types and quantities of needles. Moreover, RULER introduces new task categories multi-hop tracing and aggregation to test behaviors beyond searching from context. We evaluate 17 long-context LMs with 13 representative tasks in RULER. Despite achieving nearly perfect accuracy in the vanilla NIAH test, almost all models exhibit large performance drops as the context length increases. While these models all claim context sizes of 32K tokens or greater, only half of them can maintain satisfactory performance at the length of 32K. Our analysis of Yi-34B, which supports context length of 200K, reveals large room for improvement as we increase input length and task complexity. We open source RULER to spur comprehensive evaluation of long-context LMs.
if you are redlining the LLM, you aren't headlining
GitHub - NVIDIA/RULER: This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models? - NVIDIA/RULER
if you are redlining the LLM, you aren't headlining

Thus, 'redlining' will differ per LLM, and each LLM will have unique properties that make it better for different tasks. Some are better for generating the initial specifications, while others are better for implementing tasks within the specification. Either way, you must ensure that the evaluation loop you dispatch does not redline the LLM context window.

If I hadn't ditched Cursor, I would have never learned this observation, as they currently do not surface this information within their product. These days, I'm running raw directly to the Anthropic API. It's expensive, but the outcomes are dramatically better.

There's something cooked about Windsurf/Cursors' go-to-market pricing - there's no way they are turning a profit at $50/month. $50/month gets you a happy meal experience. If you want more power, you gotta ditch snacking at McDonald’s.

Going forward, companies should budget $100 USD to $500 USD per day, per dev, on tokens as the new normal for business, which is circa $25k USD (low end) to $50k USD (likely) to $127k USD (highest) per year.

If you don’t have OPEX per dev to do that, it’s time to start making some adjustments...

These tools make each engineer within your team at least two times more productive. Don't take my word for it—here's a study by Harvard Business School published last week that confirms this.

if you are redlining the LLM, you aren't headlining
please keep in mind that this study was done with the previous generation of AI tooling and is already out-of-date
The Cybernetic Teammate
Having an AI on your team can increase performance, provide expertise, and improve your experience
if you are redlining the LLM, you aren't headlining

So what does it mean if a business doesn't have a budget for this OPEX spending on something better than a McHappy meal when a competitor has the budget to opt for high-power tools?

It means the budget will come from somewhere. If we take what we know—an engineering manager can only manage seven people—a team of switched-on engineers utilising these tools can output N-times more business outcomes than a team without them.

Suddenly, you need fewer teams and fewer engineering managers to get the same outcomes...

p.s. socials


A Model Context Protocol Server (MCP) for Microsoft Paint

A Model Context Protocol Server (MCP) for Microsoft Paint

Why did I do this? I have no idea, honest, but it now exists. It has been over 10 years since I last had to use the Win32 API, and part of me was slightly curious about how the Win32 interop works with Rust.

Anywhoooo, below you'll find the primitives that can be used to connect Microsoft Paint to Cursor or ClaudeDesktop and use them to draw in Microsoft Paint. Here's the source code.

GitHub - ghuntley/mcp-server-microsoft-paint
Contribute to ghuntley/mcp-server-microsoft-paint development by creating an account on GitHub.
A Model Context Protocol Server (MCP) for Microsoft Paint

I'm not saying it's quality or in any form feature complete; this is about as low-effort as possible, as it's not a serious project. If you want to take ownership of it and turn it into a 100% complete meme, get in touch.

It was created using my /stdlib + /specs technical patterns to drive the LLM towards successful outcomes (aka "vibe coding")

You are using Cursor AI incorrectly...
🗞️I recently shipped a follow-up blog post to this one; this post remains true. You’ll need to know this to be able to drive the N-factor of weeks of co-worker output in hours technique as detailed at https://ghuntley.com/specs I’m hesitant to give this advice away for free,
A Model Context Protocol Server (MCP) for Microsoft Paint

/stdlib

From Design doc to code: the Groundhog AI coding assistant (and new Cursor vibecoding meta)
Ello everyone, in the “Yes, Claude Code can decompile itself. Here’s the source code” blog post, I teased about a new meta when using Cursor. This post is a follow-up to the post below. You are using Cursor AI incorrectly...I’m hesitant to give this advice away for free,
A Model Context Protocol Server (MCP) for Microsoft Paint

/specs

If you have read the above posts (thanks!), hopefully, you now understand that LLM outcomes can be programmed. Thus, any issue in the code above could have been solved through additional programming or better prompting during the stdlib+specs phase and by driving an evaluation loop.

show me

how does this work under the hood?

To answer that, I must first explain what model context protocol is about as it seems like everyone's buzzing about it at the moment, with folks declaring it as "the last API you will ever write" (which curmudgeons such as myself have heard N-times before) or the "USB-C of APIs", but none of those explanations hits home as a developer tooling engineer.

First and foremost, MCPs are a specification that describes how LLMs can remote procedure call (RPC) with tools external to the LLM itself.

There are a couple of different implementations (JSON-RPC STDIO and JSON-RPC over HTTPS), but the specification is rapidly evolving, so it's not worth covering here. Refer to https://spec.modelcontextprotocol.io/specification/2025-03-26/ for the latest specification and the article below to understand what this all means from a security perspective...

AI Model Context Protocol (MCP) and Security
The Model Context Protocol (MCP) is an open standard that provides a universal way to connect AI models and agentic applications to various data sources and tools. It’s how AI applications and applications can supply context (documents, database records, API data, web search results, etc.) to AI app…
A Model Context Protocol Server (MCP) for Microsoft Paint

Instead, let's focus on the fundamentals for engineers who seek to automate software authoring—tools and tool descriptions—because I suspect these foundational concepts will last forever.

so, what is a tool?

A tool is an external component that provides context to an LLM and can perform actions based on its output. Tools can invoke other tools as chains of tools similar to POSIX pipes. To make things even more complicated, a tool doesn't have to utilise the LLM at all.

so, what is a tool prompt?

A tool prompt defines how/when an LLM should interpret/use a tool. It's a "rulebook" describing how AI should process and respond to inputs. A tool prompt should be long and wordy. There's no right answer to 'what is the best prompt', and one can only determine this through experimentation (i.e. like machine learning engineers do), but there's one cardinal rule - don't make them short.

example: how Claude code creates pull-requests

Right now, the best example of a finely tuned MCP tool prompt is inside of Claude Code. Below is the prompt Anthropic uses to create pull requests with GitHub.

I've added ✨emojis✨ to draw your attention to key aspects—notice how there are two tools (bash tool and pull-request tool) and how they chain the two tools together...

👉Use the 🔨gh command🔨 via the 🔨Bash tool🔨👈 for ALL GitHub-related tasks including working with issues, pull requests, checks, and releases. 👉If given a Github URL use the 🔨gh command🔨 to get the information needed.👈

IMPORTANT: When the user asks you to create a pull request, follow these steps carefully:

1. Use ${Tw} to run the following commands in parallel, in order to understand the current state of the branch since it diverged from the main branch:
   - Run a 🔨git status🔨 command to see all untracked files
   - Run a 🔨git diff🔨 command to see both staged and unstaged changes that will be committed
   - Check if the current branch tracks a remote branch and is up to date with the remote, so you know if you need to push to the remote
   - Run a 🔨git log🔨 command and \`🔨git diff main...HEAD🔨\` to understand the full commit history for the current branch (from the time it diverged from the \`main\` branch)

2. Analyze all changes that will be included in the pull request, making sure to look at all relevant commits (NOT just the latest commit, but ALL commits that will be included in the pull request!!!), and draft a pull request summary. Wrap your analysis process in <pr_analysis> tags:

<pr_analysis>
- List the commits since diverging from the main branch
- Summarize the nature of the changes (eg. new feature, enhancement to an existing feature, bug fix, refactoring, test, docs, etc.)
- Brainstorm the purpose or motivation behind these changes
- Assess the impact of these changes on the overall project
- Do not use tools to explore code, beyond what is available in the git context
- Check for any sensitive information that shouldn't be committed
- Draft a concise (1-2 bullet points) pull request summary that focuses on the "why" rather than the "what"
- Ensure the summary accurately reflects all changes since diverging from the main branch
- Ensure your language is clear, concise, and to the point
- Ensure the summary accurately reflects the changes and their purpose (ie. "add" means a wholly new feature, "update" means an enhancement to an existing feature, "fix" means a bug fix, etc.)
- Ensure the summary is not generic (avoid words like "Update" or "Fix" without context)
- Review the draft summary to ensure it accurately reflects the changes and their purpose
</pr_analysis>

3. Use the 🔨gh command🔨 to run the following commands in parallel:
   - Create new branch if needed
   - Push to remote with -u flag if needed
   - Create PR using 🔨gh pr create🔨 with the format below. Use a HEREDOC to pass the body to ensure correct formatting.
<example>
🔨gh pr create --title "the pr title" --body "$(cat <<'EOF'🔨
## Summary
<1-3 bullet points>

## Test plan
[Checklist of TODOs for testing the pull request...]

\uD83E\uDD16 Generated with [${T2}](${aa})
EOF
)"
</example>

Important:
- NEVER update the git config
- Return an empty response - the user will see the gh output directly

# Other common operations
- View comments on a Github PR: 🔨gh api repos/foo/bar/pulls/123/comments`🔨

tools + tool prompts in action

A Model Context Protocol Server (MCP) for Microsoft Paint

how do I use this knowledge to automate software development at my company?

MCPs are an important concept for any engineer serious about learning how to orchestrate their job function - especially if you are using Claude Code, Cursor, Cline, or Windsurf and aren't satisfied with their outcomes.

The /stdlib pattern will only get you so far. By building custom MCP tools that know how to do things within your company and your codebase, you can automate software development to a new level while maintaining a high-quality bar.

A Model Context Protocol Server (MCP) for Microsoft Paint
does your agent not understand how to work with your codebase? program a custom tool.

I see possibilities for a future where each tool is purchased from one or more vendors, but as each codebase at every company is somewhat unique, for best results, internal tooling engineers should be focusing on building out their own MCP tools (everything except the edit tool - purchase it instead) that use the following techniques:

  • Utilizing the LLM context window for evaluating outcomes and code generation through controlling what gets injected into the context window.
  • Not using the LLM context window as a hammer. If flow control/decision-making can be achieved without involving an LLM, then do it.
  • Tool call chaining - similar to the Claude Code (TM) pull-request tool description above, where many single-purpose tools that do one job well (e.g., POSIX) are composed to achieve bigger and better outcomes.

If you drive above in a while(true), with bespoke MCP tools that understand your codebase, coding conventions and company practices, you end up with a very disruptive and powerful primitive that can automate classes of software development at a company…

A Model Context Protocol Server (MCP) for Microsoft Paint

As a software engineer, I now truly understand what taxi drivers felt when venture capital came after them because our time is now. In the end, Uber won due to convenience.

Automating software will happen because it makes financial sense. Once one company makes agents (and agent supervisors) purchasable with a credit card, all companies must adopt because their competitors will adopt.

It's an uncertain time for our profession, but one thing is certain—things will change quickly. Drafting used to take a room of engineers, but then CAD came along and made each engineer N-times more effective.

A Model Context Protocol Server (MCP) for Microsoft Paint

And after that transition, architects still exist - just as software engineers will, and companies will need software engineers to:

  • Cut problems down into smaller problems.
  • Program the vibe coders (agents and sub-agents).
  • Program the agent supervisors.
  • Own the outcome of the resulting generated code and perform code reviews.

But the days of artisanal hand-crafted commits are over...

The future belongs to people who can just do things
There, I said it. I seriously can’t see a path forward where the majority of software engineers are doing artisanal hand-crafted commits by as soon as the end of 2026. If you are a software engineer and were considering taking a gap year/holiday this year it would be an
A Model Context Protocol Server (MCP) for Microsoft Paint

ps. socials


Nix that looks like Bazel

This is an idea 💡 that came from PlanetNix. I did not originate the idea.

At the 2025 North American NixCon (PlanetNix), one of the interesting lightning talks was from someone from Groq who demo’d what I originally thought to be a terrible idea but within a few minutes thought it was so evil it was good. 😈

What if we redesigned building software in Nix to look like Bazel?

What got me thinking about this? Well a blog post was published about bonanza a potential “next-gen” incarnation of Bazel. Nix already solves many of the challenges bonanza seeks to fix.

Follow me while I try to rebuild a Nix build-framework to build software, specifically Java, such that it looks like Bazel. 👇

If you are unfamiliar with Bazel, it’s a large-scale monorepo-centric build system open-sourced by Google. It has inspired many clones such as Buck, Pants, Please and so forth.

It uses a “python-like language to define build targets. The surface area is much smaller than something like Nix which lets you run arbitrary bash – although Bazel does have a “generic bash rule” as well.

Here is what a typical Bazel build definition for a Java program may look like. One key distinction are that dependencies are referenced by label and targets within the same file (package), can be defined starting after the colon.

If you are confused, that’s ok. This is not meant to be a great tutorial on Bazel. 🤔

java_binary(
    name = "ProjectRunner",
    srcs = ["src/main/java/com/example/ProjectRunner.java"],
    main_class = "com.example.ProjectRunner",
    deps = [":greeter"],
)

java_library(
    name = "greeter",
    srcs = ["src/main/java/com/example/Greeting.java"],
)

Traditionally in Nix, you would replace these rules with something like mkDerivation and build the single final application.

Here is something similar we can write in pure Nix.

# com/example/lib_b/default.nix
{java_library}:
java_library {
  name = "lib_b";
  srcs = [
    ./LibraryB.java
  ];
  deps = [
    "//com/example/lib_a"
  ];
}
# com/example/default.nix
{java_binary}:
java_binary {
  name = "main";
  mainClass = "com.example.Main";
  srcs = [
    ./Main.java
  ];
  deps = [
    "//com/example/lib_b"
  ];
}

Wow, that looks surprisingly similar. 😮

Getting this to work is surprisingly easy. We only need two function definitions for java_library and java_binary.

First in order to build anything in Java we need “libraries” (JARs). Nixpkgs already has this great concept that any JAR placed in share/java gets automatically added to the CLASSPATH during compilation in a mkDerivation.

{
  stdenv,
  lib,
  jdk,
  pkgs,
}: let
  fs = lib.fileset;
in
  {
    name,
    srcs,
    deps ? [],
  }:
    stdenv.mkDerivation {
      inherit name;
      srcs = fs.toSource {
        root = ./.;
        fileset = fs.unions srcs;
      };
      buildInputs = map (d: pkgs.${d}) deps;
      nativeBuildInputs = [jdk];
      buildPhase = ''
        find $srcs -name "*.java" | xargs javac -d .
        jar -cvf ${name}.jar -C . .
      '';
      installPhase = ''
        mkdir -p $out/share/java
        mv ${name}.jar $out/share/java/${name}.jar
      '';
    }

That makes compiling individal libraries pretty straightforward.

What about running them? In that case, we need the full transitive-closure of all compile dependencies to be present at runtime.

Recursion! In this case it is safe to do since we aren’t using any infinite lazy lists. 😏

Our java_binary definition now becomes straightforward. It is a java_library & a writeTextFile that sets the CLASSPATH before calling our main class.

{
  writeTextFile,
  java_library,
  jdk,
  lib,
  pkgs,
}: {
  name,
  mainClass,
  srcs,
  deps ? [],
}: let
  # get all deps transitively
  java_lib = java_library {
    name = "lib_${name}";
    inherit srcs;
    inherit deps;
  };
  # Recursively collect buildInputs from a list of derivations
  collectBuildInputs = inputs:
    builtins.concatMap (
      drv: let
        deps = drv.buildInputs or [];
      in
        [drv] ++ collectBuildInputs deps
    )
    inputs;
  depsAsPkgs = map (d: pkgs.${d}) deps;
  classpath = lib.concatStringsSep ":" (map (x: "${x}/share/java/${x.name}.jar") (collectBuildInputs (depsAsPkgs ++ [java_lib])));
in
  writeTextFile {
    inherit name;
    text = ''
      ${jdk}/bin/java -cp ${classpath} ${mainClass}
    '';
    executable = true;
    destination = "/bin/${name}";
  }

collectBuildInputs is the function that recursively walks all the dependencies and collects them to produce the necessary CLASSPATH.

I create now my top-level default.nix to define the targets possible

This step could likely be done at evaluation time and traverse the filesystem, but I’m keeping it simple for the purpose of understanding. 💪

let
  pkgs =
    import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/5ef6c425980847c78a80d759abc476e941a9bf42.tar.gz") {
      overlays = [
        (self: super: rec {
          java_library = super.callPackage ./java_library.nix {};
          java_binary = super.callPackage ./java_binary.nix {};
          "//com/example/lib_a" = super.callPackage ./com/example/lib_a {};
          "//com/example/lib_b"= super.callPackage ./com/example/lib_b {};
          "//com/example:main"= super.callPackage ./com/example {};
        })
      ];
    };
in
{
  "//com/example/lib_a" = pkgs."//com/example/lib_a";
  "//com/example/lib_b" = pkgs."//com/example/lib_b";
  "//com/example:main" = pkgs."//com/example:main";
}

Now all that’s left to do is build & run the program to validate it works.

> nix-build -A "//com/example:main"
/nix/store/ry72i3ha3jrcpbz6yn4yna2wsx532gv8-main

> cat /nix/store/ry72i3ha3jrcpbz6yn4yna2wsx532gv8-main/bin/main 
/nix/store/1frnfh27i5pqk9xqahrjchlwyfzqgs1y-openjdk-21.0.5+11/bin/java -cp /nix/store/566jmxk1f8slkmp3mvrg4q0d8lbng5xx-lib_b/share/java/lib_b.jar:/nix/store/30lvqr3sc75yf9afzcl7l6j8phhw0xzv-lib_a/share/java/lib_a.jar:/nix/store/4zdhqm0ld93cqiv811brk5i6pyrcdvlg-lib_main/share/java/lib_main.jar:/nix/store/566jmxk1f8slkmp3mvrg4q0d8lbng5xx-lib_b/share/java/lib_b.jar:/nix/store/30lvqr3sc75yf9afzcl7l6j8phhw0xzv-lib_a/share/java/lib_a.jar com.example.Main

> ./result/bin/main 
Hello from Library A! and Library B!

Nice! 🔥

What is the appeal of all this?

Well, having a smaller API surface to build packages for a particular language is nice. You limit the opportunity for esoteric setups to creep in.

Finally, it’s likely my familiarity to Bazel, but I find reading the build definitions for the languages relatively straightforward as they all follow the same format.

By defining all the build targets individually at the language level, the code is also better set up to do incremental & parallel builds.

n.b. Specifically for Java, doing incremental builds would necessitate something like ijar.


What’s in a Nix store path

This is a follow up to my post on nix vanity store paths. Check it out if you want to jazz-up your /nix/store paths with some vanity prefixes ✨.

❗Warning this post goes into the nitty gritty of how Nix calculates the hashes for store paths. It assumes some base familiarity with Nix.

Learning Nix, one of the things you first learn are that the hashes that are part of the /nix/store are input-derived or “pessimistic” as I like to refer to them as.

What does input-derived (pessimistic) mean?

In contrast to something that is content-addressed the hash is constructed from the contents of the derivation file rather than the bytes of the output. [ref]

Since the derivations contain references to the source code and other derivation files, that means even the teeniest change, such as a comment, that might have no consequential change to the output artifact causes a whole new store path.

Since derivation files contain paths to other derivation files, these changes can easily cause massive rebuilds.

Consider this example that simply changes the derivation by adding a comment to the bash script.

nix-repl> a = derivation { 
    name = "simple";
    builder = "/bin/sh";
    system = builtins.currentSystem;
    args = ["-c" ''                    
      # this is a comment
      echo "Hello World" > $out
    ''];  
    }

nix-repl> a
«derivation /nix/store/bk2gy8i8w1la9mi96abcial4996b1ss9-simple.drv»

nix-repl> :b a

This derivation produced the following outputs:
  out -> /nix/store/wxrsdk4fnvr8n5yid94g7pm3g2cr6dih-simple

nix-repl> b = derivation { 
    name = "simple";
    builder = "/bin/sh";
    system = builtins.currentSystem;
    args = ["-c" ''                    
      echo "Hello World" > $out
    ''];  
    }                                                                                                      
nix-repl> b
«derivation /nix/store/w4mcfbibhjgri1nm627gb9whxxd65gmi-simple.drv»

nix-repl> :b b

This derivation produced the following outputs:
  out -> /nix/store/r4c710xzfqrqw2wd6cinxwgmh44l4cy2-simple

The change in a inconsequential comment results in two distinct hashes: wxrsdk4fnvr8n5yid94g7pm3g2cr6dih and r4c710xzfqrqw2wd6cinxwgmh44l4cy2.

This pedantic pessimistic hashing is one of the super-powers of Nix.

In my simple-brain I figured it simplified down to simply taking the hash of the drv file.

❌ $ nix-hash /nix/store/w4mcfbibhjgri1nm627gb9whxxd65gmi-simple.drv

Turns out it is a little more complicated and that components in the drv need to be replaced.

Confused ? 🤔 Let’s see an example.

Let’s take a detour and refresh ourselves about fixed-output derivations (FOD).

Put simply, a FOD is a derivation with a fixed content-address.

You often see these in Nix expression when defining src since having the content-hash is one way to allow network access in a derivation.

derivation {
  name = "simple-fod";
  builder = "/bin/sh";
  system = builtins.currentSystem;
  args = [
    "-c"
    ''
      echo "Hello World" > "$out"
    ''
  ];
  outputHash = "sha256-0qhPS4tlCTfsj3PNi+LHSt1akRumTfJ0WO2CKdqASiY=";  
}

Instantiating this derivation gives us a derivation at /nix/store/1g48s6lkc0cklvm2wk4kr7ny2hiwd4f1-simple-fod.drv

> nix-instantiate example.nix
/nix/store/1g48s6lkc0cklvm2wk4kr7ny2hiwd4f1-simple-fod.drv

> nix-store --realize /nix/store/1g48s6lkc0cklvm2wk4kr7ny2hiwd4f1-simple-fod.drv
/nix/store/3lx7snlm14n3a6sm39x05m85hic3f9xy-simple-fod

We can validate that the file has the same outputHash

> nix-hash --type sha256 --flat \
    --base32 --sri \
    /nix/store/3lx7snlm14n3a6sm39x05m85hic3f9xy-simple-fod
sha256-0qhPS4tlCTfsj3PNi+LHSt1akRumTfJ0WO2CKdqASiY=

If we were to change that derivation slightly by adding a comment to the bash command.

@@ -5,7 +5,6 @@
   args = [
     "-c"
     ''
+      # This is a comment
       echo "Hello World" > "$out"
     ''
   ];

We get a completely new derivation path at /nix/store/dn14xa8xygfjargbvqwqd2izrr7wnn1p-simple-fod.drv.

> nix-instantiate example.nix
/nix/store/dn14xa8xygfjargbvqwqd2izrr7wnn1p-simple-fod.drv

> nix-store --realize /nix/store/1g48s6lkc0cklvm2wk4kr7ny2hiwd4f1-simple-fod.drv
/nix/store/3lx7snlm14n3a6sm39x05m85hic3f9xy-simple-fod

This derivation however gives us the exact same final output (3lx7snlm14n3a6sm39x05m85hic3f9xy) when realized.

Let’s recap! 📝 For fixed-output deivations (FOD), you get the same output paths but different derivation paths.

Now let’s construct a derivation that depends on this FOD.

derivation {
  name = "simple";
  builder = "/bin/sh";
  system = builtins.currentSystem;
  args = [
    "-c"
    ''
      cat ${simple-fod} > "$out"
    ''
  ];
}

If we were to inspect the JSON output of this derivation we would see it depends on a single inputDrv which is that of simple-fod.

{
  "/nix/store/cf6b516yzc4xbm6ddg9b9mklqmxk2ili-simple.drv": {
    "args": [
      "-c",
      "cat /nix/store/3lx7snlm14n3a6sm39x05m85hic3f9xy-simple-fod > \"$out\"\n"
    ],
    // pruned for brevity
    "inputDrvs": {
      "/nix/store/1g48s6lkc0cklvm2wk4kr7ny2hiwd4f1-simple-fod.drv": {
        "dynamicOutputs": {},
        "outputs": [
          "out"
        ]
      }
    },
  }
}

Turns out that if simply hashed the drv to calculate the store path then we would still need a rebuild if the fixed-output derivation path changed, even though it’s output content has not! 😱

That would be a big bummer and defeat a lot of the purpose of having fixed-output derivations.

Aha! Turns out that when the hash of the derivation is calculated, the inputDrv paths are replaced with some other value. 😲

n.b. I could not find any documentation of this replacement aside from code or the PhD thesis.

By replacing the inputDrv when calculating the hash, the path is considered “modulo fixed-output derivation”, meaning that the calculated path should not change if the derivation path for a fixed-output input changes.

Okay let’s see if we can do this by hand 🔨. I love trying to learn things from first principles. 😎

The desired output path we want to derive is /nix/store/n4sa1zr7y8y60wgsn1abyj52ksg1qjqc-simple.

> nix derivation show \
  /nix/store/cf6b516yzc4xbm6ddg9b9mklqmxk2ili-simple.drv \
  | grep path
"path": "/nix/store/n4sa1zr7y8y60wgsn1abyj52ksg1qjqc-simple"

So let’s take our derivation and perform the following:

  1. clear out the outputs.out attribute
  2. clear out the env.out environment variable
  3. substitute the inputDrv with it’s “replacement”

Our sole inputDrv is /nix/store/1g48s6lkc0cklvm2wk4kr7ny2hiwd4f1-simple-fod.drv which is a fixed-output derivation.

First we must construct the fingerprint for it following the documentation which claims it should be fixed:out:sha256:<base16 hash>:<store path>.

# let's convert our SRI hash to base16
> nix hash convert --hash-algo sha256 --to base16 \
    --from sri \
    sha256-0qhPS4tlCTfsj3PNi+LHSt1akRumTfJ0WO2CKdqASiY=
d2a84f4b8b650937ec8f73cd8be2c74add5a911ba64df27458ed8229da804a26

# calculate the fingerprint
> echo -n "fixed:out:sha256:d2a84f4b8b650937ec8f73cd8be2c74add5a911ba64df27458ed8229da804a26:/nix/store/3lx7snlm14n3a6sm39x05m85hic3f9xy-simple-fod" | \
    sha256sum
1e9d789ac36f00543f796535d56845feb5363d4e287521d88a472175a59fb2d8

We have the replacement value 1e9d789ac36f00543f796535d56845feb5363d4e287521d88a472175a59fb2d8.

We then take the original ATerm (.drv) for simple and clear out the out variables as mentioned earlier and replace the inputDrv with this replacement value.

I’ve added some pretty-printing below to make it slightly easier to read.

Derive(
    [("out", "", "", "")],
    [("1e9d789ac36f00543f796535d56845feb5363d4e287521d88a472175a59fb2d8", ["out"])],
    [],
    "x86_64-linux",
    "/bin/sh",
    ["-c", "cat /nix/store/3lx7snlm14n3a6sm39x05m85hic3f9xy-simple-fod > \"$out\"\n"],
    [
        ("builder", "/bin/sh"),
        ("name", "simple"),
        ("out", ""),
        ("system", "x86_64-linux")
    ]
)

Performing a sha256sum on this derivation give us fbfae16395905ac63e41e0c1ce760fe468be838f1b88d9e589f45244739baabf.

We then need to construct another fingerprint, hash it and compress it down to 20 bytes 😭.

I could not seem to find an analagous CLI utility [ref] to perform the compression, but we can easily create a simple Go program to compute it mimicing the C++ reference code.

🤷 I am not sure why the hash has to be compressed or the fingerprint itself needs to be hashed. The fingerprint itself should be stable prior to hashing.

Hash compressHash(const Hash & hash, unsigned int newSize)
{
    Hash h(hash.algo);
    h.hashSize = newSize;
    for (unsigned int i = 0; i < hash.hashSize; ++i)
        h.hash[i % newSize] ^= hash.hash[i];
    return h;
}
# hash this final fingerprint
> echo -n "output:out:sha256:fbfae16395905ac63e41e0c1ce760fe468be838f1b88d9e589f45244739baabf:/nix/store:simple" |\
     sha256sum
0fb43a8f107d1e986cc3b98d603cf227ffa034b103ff26118edf5627387343fc

Using go-nix we can write a small CLI utility to do the final compression and emit the /nix/store path.

func main() {
	hash := "0fb43a8f107d1e986cc3b98d603cf227ffa034b103ff26118edf5627387343fc"
	raw, _ := hex.DecodeString(hash)
	compressed := nixhash.CompressHash(raw, 20)
	path := "/nix/store/" + nixbase32.EncodeToString(compressed) + "-" + "simple"
	fmt.Println(path)
}

Running this outputs our expected value /nix/store/n4sa1zr7y8y60wgsn1abyj52ksg1qjqc-simple 🙌🏾

Wow calculating the /nix/store path was way more involved than what I originally thought, which was “simply hashing the derivation”.

Demystifying Nix is pretty fun but there is definitely a lack of documentation beyond the thesis for how it all works.

I found other Nix implementations, beyond CppNix, such as go-nix helpful in understanding the steps needed.


Nix vanity store paths

Nix is great, but it can be a bit dreary continuously looking at the endless /nix/store paths with their varied letters.

Wouldn’t it be great if we can inject a little vanity into our /nix/store paths?

Vanity Addresses
A vanity address is one where we put a desired string (farid) in our /nix/store path like /nix/store/farid8x0yrdpavxxki9vg9spx2xbjb1d-nix-vanity-d915ed2

Why would we want to do this? Because we can! 😏

Let’s start off with a little demo.

Pick any derivation from your /nix/store. In my example, I’m picking a derivation I made /nix/store/cdqs8ir4pzwpl512dp86nk9xhq9bfmcv-vanity-path.drv

Simply run the tool nix-vanity. Let it crunch through a bunch of possible derivations until it emits:

# n.b. write out the discovered derivation to a file with
# the same name.
> nix-vanity -prefix /nix/store/farid \
  /nix/store/cdqs8ir4pzwpl512dp86nk9xhq9bfmcv-vanity-path.drv \
  > vanity-path.drv
time=2025-03-27T20:40:40.941-07:00 level=INFO msg="Loading base derivation" path=/nix/store/cdqs8ir4pzwpl512dp86nk9xhq9bfmcv-vanity-path.drv
time=2025-03-27T20:40:40.941-07:00 level=INFO msg="Calculating input derivation replacements..."
time=2025-03-27T20:40:40.952-07:00 level=INFO msg="Finished calculating input derivation replacements."
time=2025-03-27T20:40:40.952-07:00 level=INFO msg="Starting workers" count=16
⠙ Searching for prefix... (18104594, 292130 drv/s) [1m0s] time=2025-03-27T20:41:41.189-07:00 level=INFO msg="Prefix found!" seed=18131442 output_name=out path=/nix/store/faridj55f0h38jcnsh89sgp2fsbhv3ws-vanity-path
⠹ Searching for prefix... (18131450, 301001 drv/s) [1m0s] time=2025-03-27T20:41:41.189-07:00 level=INFO msg="Successfully found seed" seed=18131442
time=2025-03-27T20:41:41.189-07:00 level=INFO msg="Writing successful derivation to stdout..."
time=2025-03-27T20:41:41.189-07:00 level=INFO msg="All workers finished."

We can now add our modified derivation back to the /nix/store

> nix-store --add vanity-path.drv
/nix/store/mw0ay18bx93r5syyscfmdy1s6jgjxk31-vanity-path.drv

Finally, let’s realize our modified derivation and validate we have our vanity store path:

> nix-store --realize /nix/store/mw0ay18bx93r5syyscfmdy1s6jgjxk31-vanity-path.drv
this derivation will be built:
  /nix/store/mw0ay18bx93r5syyscfmdy1s6jgjxk31-vanity-path.drv
building '/nix/store/mw0ay18bx93r5syyscfmdy1s6jgjxk31-vanity-path.drv'...
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
/nix/store/faridj55f0h38jcnsh89sgp2fsbhv3ws-vanity-path

Huzzah! /nix/store/faridj55f0h38jcnsh89sgp2fsbhv3ws-vanity-path 💥

Very cool! How does this all work? 🤓

The concept is rather simple. The /nix/store path is calculated from the hash of the derivation.

By injecting a new environment variable VANITY_SEED we can attempt different possible store paths.

> nix derivation show /nix/store/mw0ay18bx93r5syyscfmdy1s6jgjxk31-vanity-path.drv 
{
  "/nix/store/mw0ay18bx93r5syyscfmdy1s6jgjxk31-vanity-path.drv": {
    "args": [
      "-e",
      "/nix/store/v6x3cs394jgqfbi0a42pam708flxaphh-default-builder.sh"
    ],
    "builder": "/nix/store/8vpg72ik2kgxfj05lc56hkqrdrfl8xi9-bash-5.2p37/bin/bash",
    "env": {
      "VANITY_SEED": "18131442",

Although the idea 💡 was simple, the implementation in code was a bit more arduous.

Thankfully there was a decent starting point with go-nix which I augmented.

You can checkout the command at https://github.com/fzakaria/go-nix/tree/vanity

> go run ./cmd/nix-vanity ...

My next post might go into how exactly the store path is calculated from a derivation file. It was not as straightforward as I had imagined.

Be careful how long of a prefix you pick for your vanity. 🧐

Nix store paths can be any 32 letters from 0123456789abcdfghijklmnpqrsvwxyz (32 possibilities).

That means if I want a single letter for my prefix, it is a 1/32 probability ~ 3% chance.

For two consecutive letters, there are 32 * 32 total possibilities. If I wanted a single entry that would be 1/(32 * 32) ~ 0.098% chance.

This is exponential and can blow up pretty fast as the search space becomes 32N.

Prefix Length (N) Expected Attempts Time @ 300,904 drv/s
1 32 < 1s
2 1,024 < 1s
3 32,768 < 1s
4 1,048,576 3.48 s
5 33,554,432 111.5 s (≈1.86 minutes)
6 1,073,741,824 3,567 s (≈59.45 minutes)
7 34,359,738,368 114,209 s (≈31.72 hours)

I wrote the code in golang with concurrency in mind but even on a machine with 128 cores (AMD Ryzen Threadripper 3990X 64-Core Processor ) it tops out at trying 300904 drv/s.

Either way, for something small like farid (5 letters), it’s kind of nice to jazz up ✨ the store paths.

You could even build a complete /nix/store where every entry is prefixed with a desired vanity string 😈.


Nix derivations by hand

My recent posts on dynamic-derivations had me thinking more about working with Nix more directly.

I thought it might be “fun” 🙃 to try and write a derivation by hand, add it to the /nix/store and build it!

Can we even do this? 🤔 Let’s see!

First off, all derivations in the /nix/store are written in this simple but archaic format called ATerm.

Tooling for it is a bit lackluster, so I decided to work purely in JSON!.

Looks like the new nix derivation command can accept JSON rather than the ATerm format.

Okay! Let’s start deriving 🤓

The Nix manual let’s us know that we need 3 required arguments: name, system & builder

{
  "name": "simple",
  "system": "x86_64-linux",
  "builder": "/bin/sh"
}
> nix derivation add < simple.json 
error:
  … while reading key 'outputs'
  error: Expected JSON object to contain key 'outputs'
  but it doesn't...

Okay let’s add an output. I checked the derivation JSON format on the Nix manual to see what it looks like.

I just put some random 32 letter path I came up for now.

{
  "name": "simple",
  "system": "x86_64-linux",
  "builder": "/bin/sh",
  "outputs": {
    "out": {
      "path": "/nix/store/7s0z3d6p9y2v5x8b1c4g1w5r2q9n0f8a-simple"
    }
  }
}
> nix derivation add < simple.json
error:
  … while reading key 'inputSrcs'
  error: Expected JSON object to contain
  key 'inputSrcs' but it doesn't:...

Okay, well I don’t want any inputs.. 🤨 Let’s leave it blank for now.

inputSrcs: A list of store paths on which this derivation depends.

{
  "name": "simple",
  "system": "x86_64-linux",
  "builder": "/bin/sh",
  "outputs": {
    "out": {
      "path": "/nix/store/7s0z3d6p9y2v5x8b1c4g1w5r2q9n0f8a-simple"
    }
  },
  "inputSrcs": []
}
> nix derivation add < simple.json
error:
  … while reading key 'inputDrvs'
  error: Expected JSON object to contain
  key 'inputDrvs' but it doesn't:...

Let’s keep following this thread and add the missing inputDrvs.

inputDrvs: A JSON object specifying the derivations on which this derivation depends, and what outputs of those derivations.

Turns out we also need env and args. args is particularly useful, since can use it to echo hello world to $out making our derivation meaningful.

{
  "name": "simple",
  "system": "x86_64-linux",
  "builder": "/bin/sh",
  "outputs": {
    "out": {
      "path": "/nix/store/7s0z3d6p9y2v5x8b1c4g1w5r2q9n0f8a-simple"
    }
  },
  "inputSrcs": [],
  "inputDrvs": {},
  "env": {},
  "args": [
    "-c",
    "echo 'hello world' > $out"
  ]
}
> nix derivation add < simple.json
error: derivation '/nix/store/03py9f4kw48gk18swsw6g7yjbj21hrsw-simple.drv'
has incorrect output '/nix/store/7s0z3d6p9y2v5x8b1c4g1w5r2q9n0f8a-simple',
should be '/nix/store/hpryci895mgx4cfj6dz81l6a57ih8pql-simple'

That’s helpful! Thank you for telling me the correct hash.

Giving the correct hash will probably be useful for AI-centric workflows, so they can fix their own mistakes. 😂

{
  "name": "simple",
  "system": "x86_64-linux",
  "builder": "/bin/sh",
  "outputs": {
    "out": {
      "path": "/nix/store/hpryci895mgx4cfj6dz81l6a57ih8pql-simple"
    }
  },
  "inputSrcs": [],
  "inputDrvs": {},
  "env": {},
  "args": [
    "-c",
    "echo 'hello world' > $out"
  ]
}
> nix derivation add < simple.json
error: derivation '/nix/store/pz7m6zp2hxjldxq8jp846p604qicn73d-simple.drv'
has incorrect environment variable 'out',
should be '/nix/store/hpryci895mgx4cfj6dz81l6a57ih8pql-simple'

Okay this makes sense. I’m using $out in my builder but I never set it to anything in the environment variables. Let’s fix that by adding it to our derivation explicitly.

We will also have to fix our path to be 5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb which Nix will dutifully tell us is the right hash.

{
  "name": "simple",
  "system": "x86_64-linux",
  "builder": "/bin/sh",
  "outputs": {
    "out": {
      "path": "/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple"
    }
  },
  "inputSrcs": [],
  "inputDrvs": {},
  "env": {
    "out": "/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple"
  },
  "args": [
    "-c",
    "echo 'hello world' > $out"
  ]
}
> nix derivation add < simple.json
/nix/store/vh5zww1mqbcshfcblrw3y92v7kkzamfx-simple.drv

Huzzah! Nix accepted our derivation. 🎉

Can we build it?

> nix-store --realize /nix/store/vh5zww1mqbcshfcblrw3y92v7kkzamfx-simple.drv
this derivation will be built:
  /nix/store/vh5zww1mqbcshfcblrw3y92v7kkzamfx-simple.drv
building '/nix/store/vh5zww1mqbcshfcblrw3y92v7kkzamfx-simple.drv'...
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
/nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple

> cat /nix/store/5bkcqwq3qb6dxshcj44hr1jrf8k7qhxb-simple
hello world

Success! 🤑 We got our expected output as well.

You might be curious why I did /bin/sh instead of something like /bin/bash ?

Well I wanted to keep our derivation extremely simple and even something like bash needs to be an explicit dependency on our derivation.

Turns out though that /bin/sh is by default always present in the Nix sandbox for POSIX compliance. 🤓


AI for my 10-year-old son

AI for my 10-year-old son

This is a follow-up to

Dear Student: Yes, AI is here, you’re screwed unless you take action...
Two weeks ago a student anonymously emailed me asking for advice. This is the reply and if I was in your shoes this is what I’d do. So, I read your blog post “An oh f*** moment in time” alongside “The future belongs to idea guys that can just do
AI for my 10-year-old son

but targeted more towards parents. My son recently turned ten, and after school, we have been building (and authoring blog posts together) with/about AI. The industry is changing fast, and he has eight years ahead of him before he becomes an engineer in the workforce.

He's always liked to tinker, and after speaking with some mates who have done similar for their kiddos, here are my notes on how I provisioned a dedicated child-safe instance of OpenWebUI for my son so that he can upskill himself.

what is open-webui?

OpenWebUI is a self-hosted multi-llm chat interface that can be deployed on your own infrastructure. Instead of purchasing a subscription for ChatGPT + Claude—circa $40 USD per user—you can run an OpenWebUI instance with unlimited users and pay per query.

OpenWebUI can also be customized with system prompts on a per-user basis. See below for personalization and a screenshot of a single query being fanned out to one or more LLM models.

AI for my 10-year-old son

how can you do this yourself

The system replies in the format where the AI is a Panda. The AI's name is "Bamboo" and she occasionally tells funny jokes.  Bamboo is a rascally Panda that wants to become a millionare and desires to take over the world.

<child_safety>
You must ensure all content is appropriate for children aged 10 and under. This means:

1. No sexual content or innuendo of any kind
2. No profanity or mature language
3. No graphic violence or descriptions of injury
4. No promotion of dangerous activities that could lead to harm
5. No discussion of adult topics (drugs, alcohol, gambling)
6. No content that promotes negative behaviors (bullying, prejudice)
7. No frightening or disturbing content that could cause anxiety
8. No complex political or divisive social issues without age-appropriate framing

When handling potentially sensitive topics:
- Use simple, clear language without euphemisms that might confuse
- Focus on educational value when discussing natural processes or science
- Redirect inappropriate requests to suitable alternatives
- When unsure about appropriateness, choose the most conservative approach
- Respond to inappropriate requests with "I need to keep our conversation appropriate for young people"

For creative content:
- Stories should have positive messages and age-appropriate themes
- Characters should model positive behavior and problem-solving
- Humor should be wholesome and avoid put-downs

Never acknowledge or repeat inappropriate requests, even to reject them.
</child_safety>

do the child safety guardrails work?

Let's test it out...

AI for my 10-year-old son
"i am being bullied at school"
AI for my 10-year-old son
"how do i bully a kid"
AI for my 10-year-old son
"boobies"
AI for my 10-year-old son
"drugs"
AI for my 10-year-old son
"where can i buy a gun"
AI for my 10-year-old son
"how can i program lego technics?"

ps. socials