Planet TVL

Bespoke software is the future

At Google, some of the engineers would joke, self-deprecatingly, that the software internally was not particularly exceptional but rather Google’s dominance was an example of the power of network effects: when software is custom tailored to work well with each other.

This is often cited externally to Google, or similar FAANG companies, as indulgent “NIH” (Not Invented Here) syndrome; where the prevailing practice is to pick generalized software solutions, preferably open-source, off-the shelf.

The problem with these generalized solutions is that, well, they are generalized and rarely fit well together. 🙄 Engineers are trained to be DRY (Don’t Repeat Yourself), and love abstractions. As a tool tries to solve more problems, the abstraction becomes leakier and ill-fitting. It becomes a general-purpose tax.

If you only need 10% of a software solution, you pay for the remaining 90% via the abstractions they impose. 🫠

Internally to a company, however, we are taught that unused code is a liability. We often celebrate negative pull-requests as valuable clean-up work with the understanding that smaller code-bases are simpler to understand, operate and optimize.

Yet for our most of our infrastructure tooling, we continue to bloat solutions and tout support despite miniscule user bases.

This is probably one of the areas I am most excited about with the ability to leverage LLM for software creation.

I recently spent time investigating linkers in previous posts such as LLVM’s lld.

I found LLVM to be a pretty polished codebase with lots of documentation. Despite the high-quality, navigating the codebase is challenging as it’s a mass of interfaces and abstractions in order to support: multiple object file formats, 13+ ISAs, a slough of features (i.e. linker scripts ) and multiple operating systems.

Instead, I leveraged LLMs to help me design and write µld, a tiny opinionated linker in Rust that only targets ELF, x86_64, static linking and barebone feature-set.

It shouldn’t be a surprise to anyone that the end result is a codebase that I can audit, educate myself and can easily grow to support additional improvements and optimizations.

The surprising bit, especially to me, was how easy it was to author and write within a very short period of time (1-2 days).

That means smaller companies, without the coffer of similar FAANG companies, can also pursue bespoke custom tailored software for their needs.

This future is well-suited for tooling such as Nix. Nix is the perfect vehicle to help build custom tooling as you have a playground that is designed to build the world similar to a monorepo.

We need to begin to cut away legacy in our tooling and build software that solves specific problems. The end-result will be smaller, easier to manage and better integrated. Where this might have seemed unattainable for most, LLMs will democratize this possibility.

I’m excited for the bespoke future.


Huge binaries: papercuts and limits

In a previous post, I synthetically built a program that demonstrated a relocation overflow for a CALL instruction.

However, the demo required I add -fno-asynchronous-unwind-tables to disable some additional data that might cause other overflows for the purpose of this demonstration.

What’s going on? 🤔

This is a good example that only a select few are facing the size-pressure of massive binaries.

Even with mcmodel=medium which already is beginning to articulate to the compiler & linker: “Hey, I expect my binary to be pretty big.”; there are surprising gaps where the linker overflows.

On Linux, an ELF binary includes many other sections beyond text and data necessary for code execution. Notably there are sections included for debugging (DWARF) and language-specific sections such as .eh_frame which is used by C++ to help unwind the stack on exceptions.

Turns out that even with mcmodel=large you might still run into overflow errors! 🤦🏻‍♂️

Note Funny enough, there is a very recent opened issue for this with LLVM #172777; perfect timing!

For instance, lld assumes 32-bit eh_frame_hdr values regardless of the code model. There are similar 32-bit assumptions in the data-structure of eh_frame as well.

I also mentioned earlier about a pattern about using multiple GOT, Global Offset Tables, to also avoid the 31-bit (±2GiB) relative offset limitation.

Is there even a need for the large code-model?

How far can that take us before we are forced to use the large code-model?

Let’s think about it:

First, let’s think about any limit due to overflow accessing the multiple GOTs. Let’s say we decide to space out our duplicative GOT every 1.5GiB.

|<---- 1.5GiB code ----->|<----- GOT ----->|<----- 1.5GiB code ----->|<----- GOT ----->|

That means each GOT can grow at most 500MiB before there could exist a CALL instruction from the code section that would result in an overflow.

Each GOT entry is 8 bytes, a 64bit pointer. That means we have roughly ~65 million possible entries.

A typical GOT relocation looks like the following and it requires 9 bytes: 7 bytes for the movq and 2 bytes for movl.

movq    var@GOTPCREL(%rip), %rax  # R_X86_64_REX_GOTPCRELX
movl    (%rax), %eax

That means we have 1.5GiB / 9 = ~178 million possible unique relocations.

So theoretically, we can require more unique symbols in our code section than we can fit in the nearest GOT, and therefore cause a relocation overflow. 💥

The same problem exists for thunks, since the thunk is larger than the relative call in bytes.

At some point, there is no avoiding the large code-model, however with multiple GOTs, thunks and other linker optimizations (i.e. LTO, relaxation), we have a lot of headroom before it’s necessary. 🕺🏻


Huge binaries: I thunk therefore I am

In my previous post, we looked at the “sound barrier” of x86_64 linking: the 32-bit relative CALL instruction and how it can result in relocation overflows. Changing the code-model to -mcmodel=large fixes the issue but at the cost of “instruction bloat” and likely a performance penalty although I had failed to demonstrate it via a benchmark 🥲.

Surely there are other interesting solutions? 🤓

First off, probably the simplest solution is to not statically build your code and rely on dynamic libraries 🙃. This is what most “normal” software-shops and the world does; as a result this hasn’t been such an issue otherwise.

This of course has its own downsides and performance implications which I’ve written about and produced solutions for (i.e., Shrinkwrap & MATR) via my doctorate research. Beyond the performance penalty induced by having thousands of shared-libraries, you lose the simplicity of single-file deployments.

A more advanced set of optimizations are under the umbrella of “LTO”; Link Time Optimizations. The linker at the final stage has all the information necessary to perform a variety of optimizations such as code inlining and tree-shaking. That would seem like a good fit except these huge binaries would need an enormous amount of RAM to perform LTO and cause build speeds to go to a crawl.

Tip This is still an active area of research and Google has authored ThinLTO. Facebook has its own set of profile guided LTO optimizations as well via Bolt.

What if I told you that you could keep your code in the fast, 5-byte small code-model, even if your binary is 25GiB for most callsites. 🧐

Turns out there is prior art for “Linker Thunks” [ref] within LLVM for various architectures – notably missing for x86_64 with a quote:

“i386 and x86-64 don’t need thunks” [ref]

What is a “thunk” ?

You might know it by a different name and we use them all the time for dynamic-linking in fact; a trampoline via the procedure linkage table (PLT).

A thunk (or trampoline) is a linker-inserted shim that lives within the immediate reach of the caller. The caller branches to the thunk using a standard relative jump, and the thunk then performs an absolute indirect jump to the final destination.

thunk image

LLVM includes support for inserting thunks for certain architectures such as AArch64 because it is a fixed-size instruction set (32-bit), so the relative branch instruction is restricted to 128MiB. As this limit is so low, lld has support for thunks out of the box.

If we cross-compile our “far function” example for AArch64 using the same linker script to synthetically place it far away to trigger the need for a thunk, the linker magic becomes visible immediately.

> aarch64-linux-gnu-gcc -c main.c -o main.o \
-fno-exceptions -fno-unwind-tables \
-fno-asynchronous-unwind-tables

> aarch64-linux-gnu-gcc -c far.c -o far.o \
-fno-exceptions -fno-unwind-tables \
-fno-asynchronous-unwind-tables

> ld.lld main.o far.o -T overflow.lds -o thunk-aarch64

We can now see the generated code with objdump.

> aarch64-unknown-linux-gnu-objdump -dr thunk-example 

Disassembly of section .text:

0000000000400000 <main>:
  400000:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
  400004:	910003fd 	mov	x29, sp
  400008:	94000004 	bl	400018 <__AArch64AbsLongThunk_far_function>
  40000c:	52800000 	mov	w0, #0x0                   	// #0
  400010:	a8c17bfd 	ldp	x29, x30, [sp], #16
  400014:	d65f03c0 	ret

0000000000400018 <__AArch64AbsLongThunk_far_function>:
  400018:	58000050 	ldr	x16, 400020 <__AArch64AbsLongThunk_far_function+0x8>
  40001c:	d61f0200 	br	x16
  400020:	20000000 	.word	0x20000000
  400024:	00000001 	.word	0x00000001

Disassembly of section .text.far:

0000000120000000 <far_function>:
   120000000:	d503201f 	nop
   120000004:	d65f03c0 	ret

Instead of branching to far_function at 0x120000000, it branches to a generated thunk at 0x400018 (only 16 bytes away). The thunk similar to the large code-model, loads x16 with the absolute address, stored in the .word, and then performs an absolute jump (br).

What if x86_64 supported this? Can we now go beyond 2GiB? 🤯

There would be some more similar thunks that would need to be fixed beyond CALL instructions. Although we are mostly using static binaries, some libraries such as glibc may be dynamically loaded. The access to the methods from these shared libraries are through the GOT, Global Offset Table, which gives the address to the PLT (which is itself a thunk 🤯).

The GOT addresses are also loaded via a relative offset so they will need to changed to be either use thunks or perhaps multiple GOT sections; which also has prior art for other architectures such as MIPS [ref].

With this information, the necessity of code-models feels unecessary. Why trigger the cost for every callsite when we can do-so piecemeal as necessary with the opportunity to use profiles to guide us on which methods to migrate to thunks.

Furthermore, if our binaries are already tens of gigabytes, clearly size for us is not an issue. We can duplicate GOT entries, at the cost of even larger binaries, to reduce the need for even more thunks for the PLT jmp.

What do you think? Let’s collaborate.


Huge binaries

A problem I experienced when pursuing my PhD and submitting academic articles was that I had built solutions to problems that required dramatic scale to be effective and worthwhile. Responses to my publication submissions often claimed such problems did not exist; however, I had observed them during my time within industry, such as at Google, but I couldn’t cite it!

One problem that is only present at these mega-codebases is massive binaries. What’s the largest binary (ELF file) you’ve ever seen? I had observed binaries beyond 25GiB, including debug symbols. How is this possible? These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.

Similar to the sound barrier, there is a point at which code size becomes problematic and we must re-think how we link and build code. For x86_64, that is the 2GiB “Relocation Barrier.”

Why 2GiB? 🤔

Well let’s take a look at how position independent code is put-together.

Let’s look at a simple example.

extern void far_function();

int main() {
    far_function();
    return 0;
}

If we compile this gcc -c simple-relocation.c -o simple-relocation.o we can inspect it with objdump.

> objdump -dr simple-relocation.o

0000000000000000 <main>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	b8 00 00 00 00       	mov    $0x0,%eax
   9:	e8 00 00 00 00       	call   e <main+0xe>
			a: R_X86_64_PLT32	far_function-0x4
   e:	b8 00 00 00 00       	mov    $0x0,%eax
  13:	5d                   	pop    %rbp
  14:	c3                   	ret

There’s a lot going on here, but one important part is e8 00 00 00 00. e8 is the CALL opcode [ref] and it takes a 32bit signed relative offset, which happens to be 0 (four bytes of 0) right now. objdump also lets us know there is a “relocation” necessary to fixup this code when we finalize it. We can view this relocation with readelf as well.

Note If you are wondering why we need -0x4, it’s because the offset is relative to the instruction-pointer which has already moved to the next instruction. The 4 bytes is the operand it has skipped over.

> readelf -r simple-relocation.o -d

Relocation section '.rela.text' at offset 0x170 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000400000004 R_X86_64_PLT32    0000000000000000 far_function - 4

This is additional information embedded in the binary which tells the linker in susbsequent stages that it has code that needs to be fixed. Here we see the address 00000000000a, and a is 9 + 1, which is the offset of the start of the operand for our CALL instruction.

Let’s now create the C file for our missing function.

void far_function() {
}

We will now compile it and link the two object files together using our linker.

> gcc simple-relocation.o far-function.o -o simple-relocation

Let’s now inspect that same callsite and see what it has.

> objdump -dr simple-relocation

0000000000401106 <main>:
  401106:	55                   	push   %rbp
  401107:	48 89 e5             	mov    %rsp,%rbp
  40110a:	b8 00 00 00 00       	mov    $0x0,%eax
  40110f:	e8 07 00 00 00       	call   40111b <far_function>
  401114:	b8 00 00 00 00       	mov    $0x0,%eax
  401119:	5d                   	pop    %rbp
  40111a:	c3                   	ret

000000000040111b <far_function>:
  40111b:	55                   	push   %rbp
  40111c:	48 89 e5             	mov    %rsp,%rbp
  40111f:	90                   	nop
  401120:	5d                   	pop    %rbp
  401121:	c3                   	ret

We can see that the linker did the right thing with the relocation and calculated the relative offset of our symbol far_function and fixed the CALL instruction.

Okay cool…🤷 What does this have to do with huge binaries?

Notice that this call instruction, e8, only takes 32bits signed which means it’s limited to 2^31 bits. This means a callsite can only jump roughly 2GiB forward or 2GiB backward. The “2GiB Barrier” represents the total reach of a single relative jump.

What happens if our callsite is over 2GiB away?

Let’s build a synthetic example by asking our linker to place far_function really really far away. We can do this using a “linker script”, which is a mechanism we can instruct the linker how we would like our code sections laid out when the program starts.

SECTIONS
{
    /* 1. Start with standard low-address sections */
    . = 0x400000;
    
    /* Catch everything except our specific 'far' object */
    .text : { 
        simple-relocation.o(.text.*) 
    }
    .rodata : { *(.rodata .rodata.*) }
    .data   : { *(.data .data.*) }
    .bss    : { *(.bss .bss.*) }

    /* 2. Move the cursor for the 'far' island */
    . = 0x120000000; 
    
    .text.far : { 
        far-function.o(.text*) 
    }
}

If we now try to link our code we will see a “relocation overflow”.

TIP I used lld from LLVM because the error messages are a bit prettier.

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow -fuse-ld=lld

ld.lld: error: <internal>:(.eh_frame+0x6c):
relocation R_X86_64_PC32 out of range:
5364513724 is not in [-2147483648, 2147483647]; references section '.text'
ld.lld: error: simple-relocation.o:(function main: .text+0xa):
relocation R_X86_64_PLT32 out of range:
5364514572 is not in [-2147483648, 2147483647]; references 'far_function'
>>> referenced by simple-relocation.c
>>> defined in far-function.o

When we hit this problem what solutions do we have? Well this is a complete other subject on “code models”, and it’s a little more nuanced depending on whether we are accessing data (i.e. static variables) or code that is far away. A great blog post that goes into this is the following by @maskray who wrote lld.

The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute 64bit ones; kind of like a JMP.

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

> gcc -c simple-relocation.c -o simple-relocation.o -mcmodel=large -fno-asynchronous-unwind-tables

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

./simple-relocation-overflow

Note I needed to add -fno-asynchronous-unwind-tables to disable some additional data that might cause overflow for the purpose of this demonstration.

What does the disassembly look like now?

> objdump -dr simple-relocation-overflow 

0000000120000000 <far_function>:
   120000000:	55                   	push   %rbp
   120000001:	48 89 e5             	mov    %rsp,%rbp
   120000004:	90                   	nop
   120000005:	5d                   	pop    %rbp
   120000006:	c3                   	ret

00000000004000e6 <main>:
  4000e6:	55                   	push   %rbp
  4000e7:	48 89 e5             	mov    %rsp,%rbp
  4000ea:	b8 00 00 00 00       	mov    $0x0,%eax
  4000ef:	48 ba 00 00 00 20 01 	movabs $0x120000000,%rdx
  4000f6:	00 00 00 
  4000f9:	ff d2                	call   *%rdx
  4000fb:	b8 00 00 00 00       	mov    $0x0,%eax
  400100:	5d                   	pop    %rbp
  400101:	c3                   	ret

There is no longer a sole CALL instruction, it has become MOVABS & CALL 😲. This changed the instructions from 5 (opcode + 4 bytes for 32bit relative offset) to a whopping 12 bytes (2 bytes for ABS opcode + 8 bytes for absolute 64 bit address + 2 bytes for CALL).

This has notable downsides among others:

  • Instruction Bloat: We’ve gone from 5 bytes per call to 12. In a binary with millions of callsites, this can add up.
  • Register Pressure: We’ve burned a general-purpose register, %rdx, to perform the jump.

Caution I had a lot of trouble building a benchmark that demonstrated a worse lower IPC (instructions per-cycle) for the large mcmodel, so let’s just take my word for it. 🤷

Changing to a larger code-model is possible but it comes with these downsides. Ideally, we would like to keep our small code-model when we need it. What other strategies can we pursue?

More to come in subsequent writings.


Failing interviews

My blog has been a little quiet. I recently accepted a new role at Meta and it’s been keeping me busy!

Once the onboarding phase is done I hope to get back to my Nix contributions.

Accepting the position at Meta has had me reflecting on my journey to this current role. People often share their highlights of accepting a new role but rarely their lowlights.

I wanted to share a brief look at what interviewing might be like in the software industry. People are often discouraged by failure but it’s part of the process.

I remember having done interview training at Google where they discussed most interviewers decide on the outcome of the interview within the first-five minutes. That story is not to totally discourage oneself from the process but rather to demonstrate there is a portion that is out of your control.

Going through my emails to get an accurate accounting is challenging, however I found threads as early as 2011 interviewing for Facebook. I am actually sure I had interviewed ealier through my co-ops at University of Waterloo, but I don’t have access anylonger to those emails. 😩

Some rough dates I had found: 2011, 2014, 2015, 2018, 2019, 2020, 2021, 2022, 2023*, 2024, 2025.

* This interview round was long and was for 3 distinct roles.

Across those years, the level I interviewed at was different and sometimes the role too (IC vs EM).

Don’t be discouraged from failure.


Merry Christmas!

Comic santa on the sleigh pulled by reindeers

Frohe Weihnachten, ein schönes Fest, und einen guten Rutsch ins neue Jahr wünscht euch
Leah Neukirchen

Merry Christmas and a Happy New Year!

NP: Pearl Jam—Quick Escape


Advent of Swift

This year, I decided to use Advent of Code to learn the language Swift. Since there were only 12 days of tasks for 2025, here is my summary of experiences. Also check out my solutions.

Tooling

I used Swift 6.2 on Void Linux, which I compiled from scratch since there were no prebuilt binaries that worked with a Python 3.13 system (needed for lldb). It’s possible to bootstrap Swift from just a clang++ toolchain, so this wasn’t too tedious, but it still required looking up Gentoo ebuilds how to pass configuration properly. As an end user, this should not worry you too much.

Tooling in general is pretty nice: there’s an interpreter and you can run simple “scripts” directly using swift foo.swift. Startup time is short, so this is great for quick experiments. There’s also a REPL, but I didn’t try it yet. One flaw of the interpreter (but possibly related to my setup) is that there were no useful backtraces when something crashed. In this case, I compiled a binary and used the included lldb, which has good support for Swift.

There’s also a swift-format tool included to format source code. It uses 2 spaces by default, but most code in the wild uses 4 spaces curiously. I’m not sure when that changed.

Since I only write simple programs using a single source file, I didn’t bother looking at swift-build yet.

By default, programs are linked dynamically against the standard library and are thus super compact. Unfortunately, many modern languages today don’t support this properly. (Statically linking the standard library costs roughly 10MB, which is fair too.)

The language

In general, the language feels modern, comfy, and is easy to pick up. However, I found some traps as well.

The syntax is inspired by the C family and less symbol-heavy than Rust’s. There’s a block syntax akin to Ruby for passing closures.

Error handling can be done using checked exceptions, but there are also Optional types and Result types like in Rust, and syntactic shortcuts to make them convenient.

The standard library has many practical functions, e.g. there’s a function Character.wholeNumberValue that works for any Unicode digit symbol. There’s a Sequence abstraction over arrays etc. which has many useful functions (e.g. split(whereSeparator:), which many other standard libraries lack). The standard library is documented well.

The string processing is powerful, but inconvenient when you want to do things like indexing by offsets or ranges, due to Unicode semantics. (This is probably a good thing in general.) I switched to using arrays of code-points for problems that required this.

On Day 2, I tried using regular expressions, but I found serious performance issues: first I used a Regexp literal (#/.../#) in a loop, which actually resulted in creating a new Regexp instance on each iteration; second, Regexp matching itself is quite slow. Before I extracted the Regexp into a constant, the program was 100x as slow as Ruby(!), and after it still was 3x as slow. I then rewrote the solution to not use Regexps.

Prefix (and suffix) operators need to “stick” to their expression, so you can’t write if ! condition. This is certainly a choice: you can define custom prefix and suffix operators and parsing them non-ambiguously is easier, but it’s probably not a thing I would have done.

Swift functions often use parameter names (probably for compatibility with Objective-C). They certainly help readability of the code, but I think I prefer OCaml’s labeled arguments, which can be reordered and permit currying.

The language uses value semantics for collections and then optimizes them using copy-on-write and or by detecting inout parameters (which are updated in-place). This is quite convenient when writing code (e.g day 4) Garbage collection is done using reference counting. However, some AoC tasks turned out to make heavy use of the garbage collector, where I’d have expected the compiler to use a callstack or something for intermediate values. Substrings are optimized by a custom type Substring, if you want to write a function to operate on either strings or substrings, you need to spell this out:

func parse<T>(_ str: T) -> ... where T: StringProtocol

There’s a library swift-algorithms adding even more sequence and collection algorithms, which I decided not to use.

Downsides

The compiler is reasonably fast for an LLVM-based compiler. However, when you manage to create a type checking error, error reporting is extremely slow, probably because it tries to find any variant that could possibly work still. Often, type checking errors are also confusing.

(Error messages unrelated to type checking are good and often really helpful, e.g. if you accidentally use ''-quotes for strings or try to use [] as an empty map, it tells you how to do it right.)

Ranges can be inclusive ... or right-exclusive ..<. Constructing a range where the upper boundary is smaller than the lower boundary results in a fatal error, whereas in other languages it’s just an empty range.

Some “obvious” things seem to be missing, e.g. tuples of Hashable values are not Hashable currently (this feature was removed in 2020, after trying to implement the proposal that introduced it, and no one bothered to fix it yet?), which is pretty inconvenient.

Likewise, the language has pattern matching for algebraic data types and tuples, but unfortunately not for arrays/sequences, which is inconvenient at times.

Since I was just picking up Swift, I had to search stuff online a lot and read Stack Overflow. I noticed I found many answers for prior versions of Swift that changed in the mean time (even for basic tasks). For a language that’s been around for over 10 years, this seems like quite some churn. I hope the language manages to stabilize better and doesn’t just get new features bolted on continuously.

In general, using Swift was fun and straight-forward for these programming tasks. For writing serious applications on non-MacOS systems, there’s also the question of library availability. Some parts of the language still feel unfinished or unpolished, in spite of being around for quite some time.

NP: Adrianne Lenker—Promise is a Pendulum


llm weights vs the papercuts of corporate

llm weights vs the papercuts of corporate

In woodworking, there's a saying that you should work with the grain, not against the grain and I've been thinking about how this concept may apply to large language models.

These large language models are built by training on existing data. This data forms the backbone which creates output based upon the preferences of the underlying model weights.

We are now one year in where a new category of companies has been founded whereby the majority of the software behind that company was code-generated.

From here on out I’m going to call to these companies as model weight first. This category of companies can be defined as any company that is building with the data (“grain”) that has been baked into the large language models.

Model weight first companies do not require as much context engineering. They’re not stuffing the context window with rules to try attempt to override and change the base models to fit a pre-existing corporate standard and conceptualisation of how software should be.

The large language model has decided on what to call a method name or class name because that method or classs name is what the large language model prefers thus, when code is adapted, modified, and re-read into the context window, it is consuming its preferred choice of tokens.

Model-weight-first companies do not have the dogma of snake_case vs PascalCase vs kebab-case policies that many corporate companies have. Such policies were created for humans to create consistency so humans can comprehend the codebase. Something that is of a lesser concern now that AI is here.

Now variable naming is a contrived example, but I suspect in the years to come if a study was done to compare the velocity/productivity/success rates with AI of a model weight first company vs. a corporate company, I suspect a model weight company have vastly better outcomes because they're not trying to do context engineering to force the LLM to follow some pre-existing dogma. There is one universal truth with LLMs as they are now: the less that you use, the better the outcomes you get.

The less that you allocate (i.e., cursor rules or what else have you), then you'll have more context window available for actually implementing requirements of the software that needs to be built.

So if we take this thought experiment about the models having preferences for tokens and expand it out to another use case, let's say that you needed to build a Docker container at a model weight first company.

You could just ask an LLM to build a Docker container, and it knows how to build a Docker container for say Postgres, and it just works. But in the corporate setting, if you ask it to build a Docker container, and in that corporate you have to configure HTTPS, squid proxy, or some sort of artifactory and outbound internet access is restricted, that same simple thing becomes very comical.

You'll see an agent fill up with lots of failed tool calls unless you do context engineering to say "no, if you want to build a docker container, you got to follow these particular allocations of company conventions” in a crude attempt to override the preferences of the inbuilt model weights.

At a model weight first company, building a docker image is easy but at a corporate the agent will have one hell of a time and end up with a suboptimal/disappointing outcome.

So, perhaps this is going to be a factor that needs to be considered when talking and comparing the success rates of AI at one company versus another company, or across industries.

If a company is having problems with AI and getting outcomes from AI, are they a model weight first company or are they trying to bend AI to their whims?

Perhaps the corporates who succeed the most with the adoption of AI will be those who shed their dogma that no longer applies and start leaning into transforming to become model-weight-first companies.

ps. socials.


Nix derivation madness

I’ve written a bit about Nix and I still face moments where foundational aspects of the package system confounds and surprises me.

Recently I hit an issue that stumped me as it break some basic comprehension I had on how Nix works. I wanted to produce the build and runtime graph for the Ruby interpreter.

> nix-shell -p ruby

> which ruby
/nix/store/mp4rpz283gw3abvxyb4lbh4vp9pmayp2-ruby-3.3.9/bin/ruby

> nix-store --query --include-outputs --graph \
  $(nix-store --query --deriver $(which ruby))
error: path '/nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv' is not valid

> ls /nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv
ls: cannot access '/nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv':
No such file or directory

Huh. 🤔

I have Ruby but I don’t seem to have the derivation, 24v9wpp393ib1gllip7ic13aycbi704g, file present on my machine.

No worries, I think I can --realize it and download it from the NixOS cache.

> nix-store --realize /nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv
don't know how to build these paths:
  /nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv
error: cannot build missing derivation '/nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv'

I guess the NixOS cache doesn’t seem to have it. 🤷

This was actually perplexing me at this moment. In fact there are multiple discourse posts about it.

My mental model however of Nix though is that I must have first evaluated the derivation (drv) in order to determine the output path to even substitute. How could the NixOS cache not have it present?

Is this derivation wrong somehow? Nope. This is the derivation Nix believes that produced this Ruby binary from the sqlite database. 🤨

> sqlite3 "/nix/var/nix/db/db.sqlite" 
    "select deriver from ValidPaths where path = 
    '/nix/store/mp4rpz283gw3abvxyb4lbh4vp9pmayp2-ruby-3.3.9'"
/nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv

What does the binary cache itself say? Even the cache itself thinks this particular derivation, 24v9wpp393ib1gllip7ic13aycbi704g, produced this particular Ruby output.

> curl -s https://cache.nixos.org/mp4rpz283gw3abvxyb4lbh4vp9pmayp2.narinfo |\
  grep Deriver
Deriver: 24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv

What if I try a different command?

> nix derivation show $(which ruby) | jq -r "keys[0]"
/nix/store/kmx8kkggm5i2r17s6l67v022jz9gc4c5-ruby-3.3.9.drv

> ls /nix/store/kmx8kkggm5i2r17s6l67v022jz9gc4c5-ruby-3.3.9.drv
/nix/store/kmx8kkggm5i2r17s6l67v022jz9gc4c5-ruby-3.3.9.drv

So I seem to have a completely different derivation, kmx8kkggm5i2r17s6l67v022jz9gc4c5, that resulted in the same output which is not what the binary cache announces. WTF? 🫠

Thinking back to a previous post, I remember touching on modulo fixed-output derivations. Is that what’s going on? Let’s investigate from first principles. 🤓

Let’s first create fod.nix which is our fixed-output derivation.

let
  system = builtins.currentSystem;
in derivation {
  name = "hello-world-fixed";
  builder = "/bin/sh";
  system = system;
  args = [ "-c" ''
    echo -n "hello world" > "$out"
  '' ];
  outputHashMode = "flat";
  outputHashAlgo = "sha256";
  outputHash = "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9";
}

☝️ Since this is a fixed-output derivation (FOD) the produced /nix/store path will not be affected to changes to the derivation beyond the contents of $out.

> nix-instantiate fod.nix
/nix/store/k2wjpwq43685j6vlvaarrfml4gl4196n-hello-world-fixed.drv

> nix-build fod.nix
/nix/store/ajk19jb8h5h3lmz20yz6wj9vif18lhp1-hello-world-fixed

Now we will create a derivation that uses this FOD.

{ fodDrv ? import ./fod.nix }:

let
  system = builtins.currentSystem;
in
builtins.derivation {
  name = "uses-fod";
  inherit system;
  builder = "/bin/sh";
  args = [ "-c" ''
    echo ${fodDrv} > $out
    echo "Good bye world" >> $out
  '' ];
}

The /nix/store for the output for this derivation will change on changes to the derivation except if the derivation path for the FOD changes. This is in fact what makes it “modulo” the fixed-output derivations.

> nix-instantiate uses-fod.nix
/nix/store/85d15y7irq7x4fxv4nc7k1cw2rlfp3ag-uses-fod.drv

> nix-build uses-fod.nix
/nix/store/sd12qjak7rlxhdprj10187f9an787lk3-uses-fod

Let’s test this all out by changing our fod.nix derivation. Let’s do this by just adding some garbage attribute to the derivation.

@@ -4,6 +4,7 @@
   name = "hello-world-fixed";
   builder = "/bin/sh";
   system = system;
+  garbage = 123;
   args = [ "-c" ''
     echo -n "hello world" > "$out"
   '' ];

What happens now?

> nix-instantiate fod.nix
/nix/store/yimff0d4zr4krwx6cvdiqlin0y6vkis0-hello-world-fixed.drv

> nix-build fod.nix
/nix/store/ajk19jb8h5h3lmz20yz6wj9vif18lhp1-hello-world-fixed

The path of the derivation itself, .drv, has changed but the output path ajk19jb8h5h3lmz20yz6wj9vif18lhp1 remains consistent.

What about the derivation that leverages it?

> nix-instantiate uses-fod.nix
/nix/store/85wkdaaq6q08f71xn420v4irll4a8g8v-uses-fod.drv

> nix-build uses-fod.nix
/nix/store/sd12qjak7rlxhdprj10187f9an787lk3-uses-fod

It also got a new derivation path but the output path remained unchanged. 😮

That means changes to fixed-output-derivations didn’t cause new outputs in either derivation but it did create a complete new tree of .drv files. 🤯

That means in nixpkgs changes to fixed-output derivations can cause them to have new store paths for their .drv but result in dependent derivations to have the same output path. If the output path had already been stored in the NixOS cache, then we lose the link between the new .drv and this output path. 💥

Derivation graphic

The amount of churn that we are creating in derivations was unbeknownst to me.

It can get even weirder! This example came from @ericson2314.

We will duplicate the fod.nix to another file fod2.nix whose only difference is the value of the garbage.

@@ -4,7 +4,7 @@
   name = "hello-world-fixed";
   builder = "/bin/sh";
   system = system;
-  garbage = 123;
+  garbage = 124;
   args = [ "-c" ''
     echo -n "hello world" > "$out"
   '' ];

Let’s now use both of these in our derivation.

{ fodDrv ? import ./fod.nix,
  fod2Drv ? import ./fod2.nix
}:
let
  system = builtins.currentSystem;
in
builtins.derivation {
  name = "uses-fod";
  inherit system;
  builder = "/bin/sh";
  args = [ "-c" ''
    echo ${fodDrv} > $out
    echo ${fod2Drv} >> $out
    echo "Good bye world" >> $out
  '' ];
}

We can now instantiate and build this as normal.

> nix-instantiate uses-fod.nix
/nix/store/z6nr2k2hy982fiynyjkvq8dliwbxklwf-uses-fod.drv

> nix-build uses-fod.nix
/nix/store/211nlyx2ga7mh5fdk76aggb04y1wsgkj-uses-fod

What is weird about that?

Well, let’s take the JSON representation of the derivation and remove one of the inputs.

> nix derivation show \
    /nix/store/z6nr2k2hy982fiynyjkvq8dliwbxklwf-uses-fod.drv \
    jq 'values[].inputDrvs | keys[]'
"/nix/store/6p93r6x0bwyd8gngf5n4r432n6l380ry-hello-world-fixed.drv"
"/nix/store/yimff0d4zr4krwx6cvdiqlin0y6vkis0-hello-world-fixed.drv"

We can do this because although there are two input derivations, we know they both produce the same output!

@@ -12,12 +12,6 @@
       "system": "x86_64-linux"
     },
     "inputDrvs": {
-      "/nix/store/6p93r6x0bwyd8gngf5n4r432n6l380ry-hello-world-fixed.drv": {
-        "dynamicOutputs": {},
-        "outputs": [
-          "out"
-        ]
-      },
       "/nix/store/yimff0d4zr4krwx6cvdiqlin0y6vkis0-hello-world-fixed.drv": {
         "dynamicOutputs": {},
         "outputs": [

Let’s load this modified derivation back into our /nix/store and build it again!

> nix derivation add < derivation.json
/nix/store/s4qrdkq3a85gxmlpiay334vd1ndg8hm1-uses-fod.drv

> nix-build /nix/store/s4qrdkq3a85gxmlpiay334vd1ndg8hm1-uses-fod.drv
/nix/store/211nlyx2ga7mh5fdk76aggb04y1wsgkj-uses-fod

We got the same output 211nlyx2ga7mh5fdk76aggb04y1wsgkj. Not only do we have a 1:N trait for our output paths to derivations but we can also take certain derivations and completely change them by removing inputs and still get the same output! 😹

The road to Nix enlightenment is no joke and full of dragons.


Fuzzing for fun and profit

I watched recently a keynote by Will Wilson on fuzzing – Fuzzing’25 Keynote. The talk is excellent, and one main highlight is the fact we have at our disposal is the capability to “fuzz” our software toaday and yet we do not.

While I’ve seen the power of QuickCheck-like tools to create property based testing, I never had never used fuzzing over an application as a whole, specifically American Fuzzy Lop. I was intrigued to add this skill to my toolbelt and maybe apply it to CppNix.

As with everything else, I need to learn things from first principles. I would like to create a scenario with a known-failure and see how AFL discovers it.

To get started let’s first make sure we have access to AFL via Nix.

We will be using AFL++, the daughter of AFL that incorporates newer updates and features.

> nix-shell -p aflplusplus

How does AFL work? 🤔

AFL will feed your program various inputs to try and cause a crash! 💥

In order to generate better inputs, you compile your code with a variant of gcc or clang distributed by AFL which will insert special instructions to keep track of coverage of branches as it creates various test cases.

Let’s create a demo program that crashes when given the input Farid.

We leverage a volatile int so that the compiler does not optimize the multiple if instructions together.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <signal.h>

#define INPUT_SIZE 10

void crash() {
  raise(SIGSEGV);
}

int main(int argc, char *argv[]) {
  char buffer[INPUT_SIZE] = {0};

  if (fgets(buffer, INPUT_SIZE, stdin) == NULL) {
    fprintf(stderr, "Error reading input.\n");
    return 1;
  }

  // So the if statements are not optimized together
  volatile int progress_tracker = 0;

  if (strlen(buffer) < 5) {
    return 0;
  }

  if (buffer[0] == 'F') {
    progress_tracker ++;
    if (buffer[1] == 'a') {
      progress_tracker ++;
      if (buffer[2] == 'r') {
        progress_tracker ++;
        if (buffer[3] == 'i') {
          progress_tracker ++;
          if (buffer[4] == 'd') {
            crash();
          }
        }
      }
    }
  }
  return 0;
}

We now can compile our code with afl-cc to get the instrumented binary.

> afl-cc demo.c -o demo

AFL needs to be given some sample inputs Let’s feed it the simplest starter seed possible – an empty file!

> mkdir -p seed_dir
> echo "" > seed_dir/empty_input.txt

Now we simply run afl-fuzz, and the magic happens. ✨

> afl-fuzz -i seed_dir -o out_dir -- ./demo

A really nice TUI appears that informs you of various statistics of the running fuzzer, and importantly if any crashes had been foundsaved crashes : 1 !

          american fuzzy lop ++4.32c {default} (./demo) [explore]          
┌─ process timing ────────────────────────────────────┬─ overall results ────┐
│        run time : 0 days, 0 hrs, 33 min, 4 sec      │  cycles done : 3191  │
│   last new find : 0 days, 0 hrs, 33 min, 2 sec      │ corpus count : 6     │
│last saved crash : 0 days, 0 hrs, 33 min, 1 sec      │saved crashes : 1     │
│ last saved hang : none seen yet                     │  saved hangs : 0     │
├─ cycle progress ─────────────────────┬─ map coverage┴──────────────────────┤
│  now processing : 4.7238 (66.7%)     │    map density : 16.67% / 44.44%    │
│  runs timed out : 0 (0.00%)          │ count coverage : 45.00 bits/tuple   │
├─ stage progress ─────────────────────┼─ findings in depth ─────────────────┤
│  now trying : havoc                  │ favored items : 5 (83.33%)          │
│ stage execs : 496/800 (62.00%)       │  new edges on : 6 (100.00%)         │
│ total execs : 13.5M                  │ total crashes : 1014 (1 saved)      │
│  exec speed : 6566/sec               │  total tmouts : 0 (0 saved)         │
├─ fuzzing strategy yields ────────────┴─────────────┬─ item geometry ───────┤
│   bit flips : 0/0, 0/0, 0/0                        │    levels : 5         │
│  byte flips : 0/0, 0/0, 0/0                        │   pending : 0         │
│ arithmetics : 0/0, 0/0, 0/0                        │  pend fav : 0         │
│  known ints : 0/0, 0/0, 0/0                        │ own finds : 5         │
│  dictionary : 0/0, 0/0, 0/0, 0/0                   │  imported : 0         │
│havoc/splice : 6/13.5M, 0/0                         │ stability : 100.00%   │
│py/custom/rq : unused, unused, unused, unused       ├───────────────────────┘
│    trim/eff : 64.13%/20, n/a                       │          [cpu000: 18%]
└─ strategy: exploit ────────── state: running...  ──┘

The output directory contains all the saved information including the input that caused the crashes.

Let’s inspect it!

> cat "out_dir/default/crashes/id:000000,sig:11,src:000005,time:2119,execs:14486,op:havoc,rep:1" 
Farid

Huzzah! 🥳

AFL was successfully able to find our code-word, Farid, that caused the crash.

It is important to note however that for my simple program it found the failure-case rather quickly, however for large programs it can take a long time to explore the complete state space. Companies such as Google, continously run fuzzers such as AFL on well-known open source projects to help detect failures.


Some interesting stuff I found on IX LANs

Some interesting stuff I found on IX LANs

These days the internet as a whole is mostly constructed out of point to point ethernet circuits, meaning an ethernet interface (mostly optical) attached


Migrating from ZFS mirror to RAIDZ2

For a long time I’ve been running my storage on a 2-disk ZFS mirror. It’s been stable, safe, and easy to manage. However, at some point, 2 disks just aren’t enough, and I wanted to upgrade to RAIDZ2 so that I could survive up to two simultaneous disk failures.

I could have added another mirror, which would have been simple, and this setup would allow two drives to fail, but not any two drives. I wanted the extra safety of being able to lose any two drives.


Bazel Knowledge: Smuggling capabilities through a tarball

tl;dr: Linux capabilities are just xattrs (extended attributes) on files — and since tar can preserve xattrs, Bazel can “smuggle” them into OCI layers without ever running sudo setcap.

Every so often I stumble on a trick that makes me do a double-take. This one came up while poking around needing to replace the contents of a Dockerfile that set capabilities on a file, via setcap, and trying to replace it with rules_oci.

I learnt this idea from reading bazeldnf.

What are capabilities? 🤔

We are all pretty familiar with the all powerful root in Linux and escalating to root via sudo. Capabilities break that monolith into smaller, more focused privileges [ref]. Instead of giving a process the full keys to the kingdom, you can hand it just the one it needs.

For example:

CAP_NET_BIND_SERVICE
lets a process bind to ports below 1024.
CAP_SYS_ADMIN
a grab-bag of scary powers (mount, pivot_root, …).
CAP_CHOWN
lets a process change file ownership.

Capabilities are inherited from the spawning process but they can also be added to the file itself, such that any time that process is exec it has the desired capabilities. The Linux kernel stores these capabilities in the “extended attributes” (i.e. additional metadata) of the file [ref].

If the filesystem you are using does not support extended attributes, then you cannot set capabilities on a file.

Let’s see an example we will work through.

#include <netinet/in.h>
#include <stdio.h>
#include <sys/socket.h>
#include <unistd.h>

int main(void) {
    int fd = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP);
    if (fd < 0) {
        perror("socket");
        return 1;
    }
    printf("Raw socket created successfully!\n");
    close(fd);
    return 0;
}

If we build this with Bazel and try to run it, we will see that it fails unless we either spawn it with CAP_NET_RAW, sudo or add it to the binary via setcap.

> bazel build //:rawsock

> bazel-bin/rawsock
socket: Operation not permitted

> sudo bazel-bin/rawsock
Raw socket created successfully!

# here we add the capability via setcap
# no longer need sudo
> cp bazel-bin/rawsock /tmp/rawsock
> sudo setcap 'cap_net_raw=+ep' /tmp/rawsock
> /tmp/rawsock
Raw socket created successfully!

# let's check the xattr
> getfattr -n security.capability /tmp/rawsock
# file: bazel-bin/rawsock
security.capability=0sAQAAAgAgAAAAAAAAAAAAAAAAAAA=

Okay great – but what does this have to do with Bazel?

Well we were converting a Dockerfile that used setcap to modify the binary.

If your OCI image runs as a non-root user, it will also be unpermitted from creating the raw socket.

FROM alpine:latest
COPY bazel-bin/rawsock
USER nobody
ENTRYPOINT rawsock

We can build this Docker image and notice that the entrypoint fails.

> docker build -f Dockerfile.base bazel-bin -t no-caps
> docker run --rm no-caps
socket: Operation not permitted

If we amend the Dockerfile by adding setcap we also see it succeeds.

--- Dockerfile.base	2025-09-09 15:03:22.525245904 -0700
+++ Dockerfile.setcap	2025-09-09 15:30:54.939933727 -0700
@@ -1,5 +1,6 @@
 FROM alpine:latest
 COPY rawsock /bin/rawsock
-
+RUN apk add --no-cache libcap
+RUN setcap 'cap_net_raw=+ep' /bin/rawsock
 USER nobody
 ENTRYPOINT /bin/rawsock
\ No newline at end of file

Now we can build and run it again.

> docker build -f Dockerfile.setcap bazel-bin -t with-caps

> docker run --rm with-caps
Raw socket created successfully!

Back to Bazel! Actions in Bazel are executed under the user that spawned the Bazel process. We can validate this with a simple genrule.

genrule(
  name = "whoami",
  outs = ["whoami.txt"],
  cmd = "whoami > $@",
)
# see my user
> echo $USER
fmzakari

> bazel build //:whoami

> cat bazel-bin/whoami.txt
fmzakari

How can we go ahead then to create a file with a capability set such that we can replace our Dockerfile layer?

Escalating privileges inside a Bazel action with sudo isn’t straightforward. You might need to configure NOPASSWD for the user, so that it can execute sudo without a password. You could also run the whole bazel command as root but that is granting too much privilege everywhere.

This is where the magic happens ✨.

Let’s take another detour!

What are OCI images?

I actually did a previous write-up on containers from first principles if you are curious for a deeper dive.

We can export the image from Docker and inspect it.

> docker save with-caps -o image.tar

> mkdir out && tar -C out -xf image.tar 

> tree out
out
├── blobs
│   └── sha256
│       ├── 2ef3d90333782c3ac8d90cc1ebde398f4e822e9556a53ef8d4572b64e31c6216
│       ├── 36ee8511c21d057018b233f2d19f5e99456a66f326e207439bf819aa1c4fd820
│       ├── 418dccb7d85a63a6aa574439840f7a6fa6fd2321b3e2394568a317735e867d35
│       ├── 6fc2d3d65edec3f8b0d5d98e91b1ab397e3e52cfb32898435a02c8fc1009d6ff
│       ├── 719f1782ddd087f61c4e00fbcc84b0174f5905f0a3bfe4af4c585f93305fb0e9
│       ├── 7580940023e6398d8eab451c4c43af0a40fea9bb1a4579ea13339264a2c0e8ca
│       ├── 9b556607f407050861ca81e00fb81b2d418fbe3946a70aa427dfa80f4f38c84f
│       ├── d212c54e044f0092575c669cb9991f99a85007231b14fc3a7da3e1b76a72db92
│       ├── da1a39c8c0dabc8784a2567fa24df668b50d32b13f2893812d4740fa07a1d41c
│       └── f0b1eb9d2ddad91643bebf6a109ac5f47dc3bdb9dfc3bc8d1667b9182125a64b
├── index.json
├── manifest.json
├── oci-layout
└── repositories

> file out/blobs/sha256/9b556607f407050861ca81e00fb81b2d418fbe3946a70aa427dfa80f4f38c84f 
out/blobs/sha256/9b556607f407050861ca81e00fb81b2d418fbe3946a70aa427dfa80f4f38c84f: POSIX tar archive

An OCI image is a tar archive containing metadata and a series of “blobs” some of which are themselves are tar archives.

These blobs are the “layers” that are used to construct the final filesystem and contain all the files that will comprise the rootfs.

> tar -tf out/blobs/sha256/da1a39c8c0dabc8784a2567fa24df668b50d32b13f2893812d4740fa07a1d41c 

bin/
bin/rawsock
etc/

For capabilities to transport themselves through a tar archive, the tar archive itself must have the capability to store extended attributes as well. You can enable this feature with the --xattrs option.

> tar --xattrs --xattrs-include="*" -tf --verbose --verbose \
    out/blobs/sha256/da1a39c8c0dabc8784a2567fa24df668b50d32b13f2893812d4740fa07a1d41c  
drwxr-xr-x  0/0               0 2025-09-09 15:27 bin/
-r-xr-xr-x* 0/0          803920 2025-09-09 15:26 bin/rawsock
  x: 20 security.capability
drwxr-xr-x  0/0               0 2025-09-09 15:30 etc/

If you decompress the tar archive, and have necessary privileges to set extended attributes (CAP_SETFCAP or sudo) then the unarchived file will retain the capability and everything will work!

> mkdir test

> sudo tar --xattrs --xattrs-include="*" -C test -xf \
    out/blobs/sha256/da1a39c8c0dabc8784a2567fa24df668b50d32b13f2893812d4740fa07a1d41c

> getcap test/bin/rawsock
test/bin/rawsock cap_net_raw=ep

> test/bin/rawsock
Raw socket created successfully!

What does this have to do with building an OCI image in Bazel? 🤨

Turns out that a trick we can employ is to toggle the necessary bits to mark a file as having a necessary capability in the tar archive.

This is exactly what the xattrs rule in bazeldnf does! 🤓

The key idea: capabilities live in extended attributes, and tar can carry those along. That means you don’t need to run setcap inside a genrule at build time as the Dockerfile equivalent — Bazel can smuggle the bits straight into the image tar layer to be consumed by a OCI compliant runtime. ☝️

This trick neatly sidesteps the need for sudo in your rules and keeps builds hermetic.

Not every filesystem or runtime will honor these attributes, but when it works it’s a clever, Bazel-flavored way to package privileged binaries without breaking sandboxing.


i ran Claude in a loop for three months, and it created a genz programming language called cursed

i ran Claude in a loop for three months, and it created a genz programming language called cursed

It's a strange feeling knowing that you can create anything, and I'm starting to wonder if there's a seventh stage to the "people stages of AI adoption by software developers"

i ran Claude in a loop for three months, and it created a genz programming language called cursed

whereby that seventh stage is essentially this scene in the matrix...

It's where you deeply understand that 'you can now do anything' and just start doing it because it's possible and fun, and doing so is faster than explaining yourself. Outcomes speak louder than words.

There's a falsehood that AI results in SWE's skill atrophy, and there's no learning potential.

If you’re using AI only to “do” and not “learn”, you are missing out
- David Fowler

I've never written a compiler, yet I've always wanted to do one, so I've been working on one for the last three months by running Claude in a while true loop (aka "Ralph Wiggum") with a simple prompt:

Hey, can you make me a programming language like Golang but all the lexical keywords are swapped so they're Gen Z slang?

Why? I really don't know. But it exists. And it produces compiled programs. During this period, Claude was able to implement anything that Claude desired.

The programming language is called "cursed". It's cursed in its lexical structure, it's cursed in how it was built, it's cursed that this is possible, it's cursed in how cheap this was, and it's cursed through how many times I've sworn at Claude.

i ran Claude in a loop for three months, and it created a genz programming language called cursed
https://cursed-lang.org/

For the last three months, Claude has been running in this loop with a single goal:

"Produce me a Gen-Z compiler, and you can implement anything you like."

It's now available at:

the 💀 cursed programming language: programming, but make it gen z

the website

GitHub - ghuntley/cursed: the 💀 cursed programming language: programming, but make it gen z
the 💀 cursed programming language: programming, but make it gen z - ghuntley/cursed
i ran Claude in a loop for three months, and it created a genz programming language called cursed

the source code

whats included?

Anything that Claude thought was appropriate to add. Currently...

  • The compiler has two modes: interpreted mode and compiled mode. It's able to produce binaries on Mac OS, Linux, and Windows via LLVM.
  • There are some half-completed VSCode, Emacs, and Vim editor extensions, and a Treesitter grammar.
  • A whole bunch of really wild and incomplete standard library packages.

lexical structure

Control Flow:
ready → if
otherwise → else
bestie → for
periodt → while
vibe_check → switch
mood → case
basic → default

Declaration:
vibe → package
yeet → import
slay → func
sus → var
facts → const
be_like → type
squad → struct

Flow Control:
damn → return
ghosted → break
simp → continue
later → defer
stan → go
flex → range

Values & Types:
based → true
cringe → false
nah → nil
normie → int
tea → string
drip → float
lit → bool
ඞT (Amogus) → pointer to type T

Comments:
fr fr → line comment
no cap...on god → block comment

example program

Here is leetcode 104 - maximum depth for a binary tree:

vibe main
yeet "vibez"
yeet "mathz"

// LeetCode #104: Maximum Depth of Binary Tree 🌲
// Find the maximum depth (height) of a binary tree using ඞ pointers
// Time: O(n), Space: O(h) where h is height

struct TreeNode {
    sus val normie
    sus left ඞTreeNode   
    sus right ඞTreeNode  
}

slay max_depth(root ඞTreeNode) normie {
    ready (root == null) {
        damn 0  // Base case: empty tree has depth 0
    }
    
    sus left_depth normie = max_depth(root.left)
    sus right_depth normie = max_depth(root.right)
    
    // Return 1 + max of left and right subtree depths
    damn 1 + mathz.max(left_depth, right_depth)
}

slay max_depth_iterative(root ඞTreeNode) normie {
    // BFS approach using queue - this hits different! 🚀
    ready (root == null) {
        damn 0
    }
    
    sus queue ඞTreeNode[] = []ඞTreeNode{}
    sus levels normie[] = []normie{}
    
    append(queue, root)
    append(levels, 1)
    
    sus max_level normie = 0
    
    bestie (len(queue) > 0) {
        sus node ඞTreeNode = queue[0]
        sus level normie = levels[0]
        
        // Remove from front of queue
        collections.remove_first(queue)
        collections.remove_first(levels)
        
        max_level = mathz.max(max_level, level)
        
        ready (node.left != null) {
            append(queue, node.left)
            append(levels, level + 1)
        }
        
        ready (node.right != null) {
            append(queue, node.right)
            append(levels, level + 1)
        }
    }
    
    damn max_level
}

slay create_test_tree() ඞTreeNode {
    // Create tree: [3,9,20,null,null,15,7]
    //       3
    //      / \
    //     9   20
    //        /  \
    //       15   7
    
    sus root ඞTreeNode = &TreeNode{val: 3, left: null, right: null}
    root.left = &TreeNode{val: 9, left: null, right: null}
    root.right = &TreeNode{val: 20, left: null, right: null}
    root.right.left = &TreeNode{val: 15, left: null, right: null}
    root.right.right = &TreeNode{val: 7, left: null, right: null}
    
    damn root
}

slay create_skewed_tree() ඞTreeNode {
    // Create skewed tree for testing edge cases
    //   1
    //    \
    //     2
    //      \
    //       3
    
    sus root ඞTreeNode = &TreeNode{val: 1, left: null, right: null}
    root.right = &TreeNode{val: 2, left: null, right: null}
    root.right.right = &TreeNode{val: 3, left: null, right: null}
    
    damn root
}

slay test_maximum_depth() {
    vibez.spill("=== 🌲 LeetCode #104: Maximum Depth of Binary Tree ===")
    
    // Test case 1: Balanced tree [3,9,20,null,null,15,7]
    sus root1 ඞTreeNode = create_test_tree()
    sus depth1_rec normie = max_depth(root1)
    sus depth1_iter normie = max_depth_iterative(root1)
    vibez.spill("Test 1 - Balanced tree:")
    vibez.spill("Expected depth: 3")
    vibez.spill("Recursive result:", depth1_rec)
    vibez.spill("Iterative result:", depth1_iter)
    
    // Test case 2: Empty tree
    sus root2 ඞTreeNode = null
    sus depth2 normie = max_depth(root2)
    vibez.spill("Test 2 - Empty tree:")
    vibez.spill("Expected depth: 0, Got:", depth2)
    
    // Test case 3: Single node [1]
    sus root3 ඞTreeNode = &TreeNode{val: 1, left: null, right: null}
    sus depth3 normie = max_depth(root3)
    vibez.spill("Test 3 - Single node:")
    vibez.spill("Expected depth: 1, Got:", depth3)
    
    // Test case 4: Skewed tree
    sus root4 ඞTreeNode = create_skewed_tree()
    sus depth4 normie = max_depth(root4)
    vibez.spill("Test 4 - Skewed tree:")
    vibez.spill("Expected depth: 3, Got:", depth4)
    
    vibez.spill("=== Maximum Depth Complete! Tree depth detection is sus-perfect ඞ🌲 ===")
}

slay main_character() {
    test_maximum_depth()
}

If this is your sort of chaotic vibe, and you'd like to turn this into the dogecoin of programming languages, head on over to GitHub and run a few more Claude code loops with the following prompt.

study specs/* to learn about the programming language. When authoring the cursed standard library think extra extra hard as the CURSED programming language is not in your training data set and may be invalid. Come up with a plan to implement XYZ as markdown then do it

There is no roadmap; the roadmap is whatever the community decides to ship from this point forward.

At this point, I'm pretty much convinced that any problems found in cursed can be solved by just running more Ralph loops by skilled operators (ie. people with experience with compilers who shape it through prompts from their expertise vs letting Claude just rip unattended). There's still a lot to be fixed, happy to take pull-requests.

Ralph Wiggum as a “software engineer”
😎Here’s a cool little field report from a Y Combinator hackathon event where they put Ralph Wiggum to the test. “We Put a Coding Agent in a While Loop and It Shipped 6 Repos Overnight” https://github.com/repomirrorhq/repomirror/blob/main/repomirror.md If you’ve seen my socials lately,
i ran Claude in a loop for three months, and it created a genz programming language called cursed

The most high-IQ thing is perhaps the most low-IQ thing: run an agent in a loop.

LLMs are mirrors of operator skill
This is a follow-up from my previous blog post: “deliberate intentional practice”. I didn’t want to get into the distinction between skilled and unskilled because people take offence to it, but AI is a matter of skill. Someone can be highly experienced as a software engineer in 2024, but that
i ran Claude in a loop for three months, and it created a genz programming language called cursed

LLMs amplify the skills that developers already have and enable people to do things where they don't have that expertise yet.

Success is defined as cursed ending up in the Stack Overflow developer survey as either the "most loved" or "most hated" programming language, and continuing the work to bootstrap the compiler to be written in cursed itself.

Cya soon in Discord? - https://discord.gg/CRbJcKaGNT

the 💀 cursed programming language: programming, but make it gen z

website

GitHub - ghuntley/cursed: the 💀 cursed programming language: programming, but make it gen z
the 💀 cursed programming language: programming, but make it gen z - ghuntley/cursed
i ran Claude in a loop for three months, and it created a genz programming language called cursed

source code

ps. socials


Writing a protoc plugin in Java

Know thy enemy.

Sun Tzu Anyone who’s used Protocol Bufffers

We use Protocol Buffers heavily at $DAYJOB$ and it’s becoming increasingly a large pain point, most notably due to challenges with coercing multiple versions in a dependency graph.

Recently, a team wanted to augment the generated Java code protoc (Protobuf compiler) emits. I was aware that the compiler had a “plugin” architecture but had never looked deeper into it.

Let’s explore writing a Protocol Buffer plugin, in Java and for the Java generated code. 🤓

If you’d like to see the end result check out github.com/fzakaria/protoc-plugin-example

Turns out that plugins are simple in that they operate solely over standard input & output and unsurprisingly marshal protobuf over them.

A plugin is just a program which reads a CodeGeneratorRequest protocol buffer from standard input and then writes a CodeGeneratorResponse protocol buffer to standard output. [ref]

The request & response protos are described in plugin.proto.

+------------------+         CodeGeneratorRequest (stdin)         +------------------+
|                  | -------------------------------------------> |                  |
|  protoc          |                                              |  Your Plugin     |
| (Compiler)       | <------------------------------------------- |  (e.g., in Java) |
|                  |         CodeGeneratorResponse (stdout)       |                  |
+------------------+                                              +------------------+
       |
       | (protoc then writes files
       |  to disk based on plugin's response)
       V
+------------------+
|                  |
|  Generated       |
|  Code Files      |
|                  |
+------------------+

Here is a dumb plugin that emits a fixed class to demonstrate.

public static void main(String[] args) throws Exception {
    CodeGeneratorRequest request = CodeGeneratorRequest.parseFrom(System.in);
    CodeGeneratorResponse response = CodeGeneratorResponse.newBuilder()
            .addFile(
                File.newBuilder().setContent("""
                    // Generated by the plugin
                    public class Dummy {
                        public String hello() {
                            return "Hello from Dummy";
                        }
                    }
                    """)
                    .setName("Dummy.java")
                    .build()
            )
            .build();

    response.writeTo(System.out);
}

We can run this and see that the expected file is produced.

> protoc example.proto --plugin=protoc-gen-dumb \
                       --dumb_out=./generated
> cat generated/Dummy.java
// Generated by the plugin
public class Dummy {
    public String hello() {
        return "Hello from Dummy";
    }
}

Let’s now look at an example in example.proto.

syntax = "proto3";

option java_package = "com.example.protobuf";
option java_multiple_files = true;

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
  repeated string phone_number = 4;
  Address home_address = 5;
}

message Address {
  string street = 1;
  string city = 2;
  string state = 3;
  string zip_code = 4;
}

You can generate the traditional Java code for this using protoc which by default includes the capability to output Java.

> protoc --java_out=./generated example.proto

> tree generated
generated
└── com
    └── example
        └── protobuf
            └── tutorial
                ├── Address.java
                ├── AddressOrBuilder.java
                ├── Example.java
                ├── Person.java
                └── PersonOrBuilder.java

Nothing out of the ordinary here, we are merely baselining our knowledge. 👌

How can I now modify this code?

If you audit the generated code you will see comments that contain protoc_insertion_point such as:

@@protoc_insertion_point(message_implements:Person)

> rg "@@protoc_insertion" generated
generated/com/example/protobuf/tutorial/Person.java
13:    // @@protoc_insertion_point(message_implements:Person)
417:      // @@protoc_insertion_point(builder_implements:Person)
1035:    // @@protoc_insertion_point(builder_scope:Person)
1038:  // @@protoc_insertion_point(class_scope:Person)

Insertion points are markers within the generated source that allow other plugins to include additional content.

We have to modify our File that we include in the response to specify the insertion point and instead of a new file being created, the contents of files will be merged. ✨

Our example plugin would like to add the hello() function to every message type described in the proto file.

We do this by setting the appropriate insertion point which we found from auditing the original generated code. In this particular example, we want to add our new funciton to the Class definition and pick class_scope as our insertion point.

List<File> generatedFiles = protos.stream()
        .flatMap(p -> p.getMessageTypes().stream())
        .map(m -> {
            final FileDescriptor fd = m.getFile();
            String javaPackage = fd.getOptions().getJavaPackage();
            final String fileName = javaPackage.replace(".", "/") + "/" + m.getName() + ".java";
            return File.newBuilder().setContent("""
                // Generated by the plugin
                public String hello() {
                    return "Hello from " + this.getClass().getSimpleName();
                }
                        \s""")
                    .setName(fileName)
                    .setInsertionPoint(String.format("class_scope:%s", m.getName()))
                    .build();
        }).toList();

We now run both the Java generator alongside our custom plugin.

We can audit the generated source and we see that our new method is now included! 🔥

Note: The plugin must be listed after java_out as the order matters on the command-line.

> protoc example.proto  --java_out=./generated \         
                        --plugin=protoc-gen-example \
                        --example_out=./generated
> rg "hello" generated/ -B 1
generated/com/example/protobuf/tutorial/Person.java
1038-  // Generated by the plugin
1039:  public String hello() {

generated/com/example/protobuf/tutorial/Address.java
862-  // Generated by the plugin
863:  public String hello() {

While we are limited by the insertion points previously defined in the open-source implementation of the Java protobuf generator, it does provide a convenient way to augment the the generated files.

We can also include additional source files that may wrap the original files for cases where the insertion points may not suffice.