Planet TVL

don’t waste your back pressure

don’t waste your back pressure

I am fortunate to be surrounded by folks who listen and the link below post will go down as a seminal reading for people interested in AI context engineering.

A simple convo between mates - well Moss translated it into words and i’ve been waiting for it to come out so I didn’t front run him.

Don’t waste your back pressure ·
Back pressure for agents You might notice a pattern in the most successful applications of agents over the last year. Projects that are able to setup structure around the agent itself, to provide it with automated feedback on quality and correctness, have been able to push them to work on longer horizon tasks. This back pressure helps the agent identify mistakes as it progresses and models are now good enough that this feedback can keep them aligned to a task for much longer. As an engineer, this means you can increase your leverage by delegating progressively more complex tasks to agents, while increasing trust that when completed they are at a satisfactory standard.
don’t waste your back pressure

read this and internalise this

Enjoy. This is what engineering now looks like in the post loom/gastown era or even when doing ralph loops.

don’t waste your back pressure
software engineering is now about preventing failure scenarios and preventing the wheel from turning over through back pressure to the generative function

If you aren’t capturing your back-pressure then you are failing as a software engineer.

Back-pressure is part art, part engineering and a whole bung of performance engineering as you need "just enough" to reject invalid generations (aka "hallunications") but if the wheel spins too slow ("tests take a long time to run or for the application to compile") then it's too much resistance.

There are many ways to tune back-pressure and as Moss states it starts with choice of programming language, applying engineering knowledge to design a fast test suite that provides signal but perhaps my favorite one is pre-commit hooks (aka prek).

GitHub - j178/prek: ⚡ Better `pre-commit`, re-engineered in Rust
⚡ Better `pre-commit`, re-engineered in Rust. Contribute to j178/prek development by creating an account on GitHub.
don’t waste your back pressure

Under normal circumstances pre-commit hooks are annoying because they slow down humans but now that humans aren't the ones doing the software development it really doesn't matter anymore.


everything is a ralph loop

everything is a ralph loop

I’ve been thinking about how I build software is so very very different how I used to do it three years ago.

No, I’m not talking about acceleration through usage of AI but instead at a more fundamental level of approach, techniques and best practices.

Standard software practices is to build it vertically brick by brick - like Jenga but these days I approach everything as a loop. You see ralph isn’t just about forwards (building autonomously) or reverse mode (clean rooming) it’s also a mind set that these computers can be indeed programmed.

watch this video to learn the mindset

I’m there as an engineer just as I was in the brick by brick era but instead am programming the loop, automating my job function and removing the need to hire humans.

Everyone right now is going through their zany period - just like i did with forward mode and building software AFK on full auto - however I hope that folks will come back down from orbit and remember this from the original ralph post.

While I was in SFO, everyone seemed to be trying to crack on multi-agent, agent-to-agent communication and multiplexing. At this stage, it's not needed. Consider microservices and all the complexities that come with them. Now, consider what microservices would look like if the microservices (agents) themselves are non-deterministic—a red hot mess.

What's the opposite of microservices? A monolithic application. A single operating system process that scales vertically. Ralph is monolithic. Ralph works autonomously in a single repository as a single process that performs one task per loop.

Software is now clay on the pottery wheel and if something isn’t right then i just throw it back on the wheel to address items that need resolving.

Ralph is an orchestrator pattern where you allocate the array with the required backing specifications and then give it a goal then looping the goal.

It's important to watch the loop as that is where your personal development and learning will come from. When you see a failure domain – put on your engineering hat and resolve the problem so it never happens again.

In practice this means doing the loop manually via prompting or via automation with a pause that involves having to prcss CTRL+C to progress onto the next task. This is still ralphing as ralph is about getting the most out how the underlying models work through context engineering and that pattern is GENERIC and can be used for ALL TASKS.

In other news I've been cooking on something called "The Weaving Loom". The source code of loom can now be found on my GitHub; do not use it if your name is not Geoffrey Huntley. Loom is something that has been in my head for the last three years (and various prototypes were developed last year!) and it is essentially infrastructure for evolutionary software. Gas town focuses on spinning plates and orchestration - a full level 8.

everything is a ralph loop
see https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04

I’m going for a level 9 where autonomous loops evolve products and optimise automatically for revenue generation. Evolutionary software - also known as a software factory.

everything is a ralph loop

This is a divide now - we have software engineers outwardly rejecting AI or merely consuming via Claude Code/Cursor to accelerate the lego brick building process....

but software development is dead - I killed it. Software can now be developed cheaper than the wage of a burger flipper at maccas and it can be built autonomously whilst you are AFK.

hi, it me. i’m the guy

I’m deeply concerned for the future of these people and have started publishing videos on YouTube to send down ladders before the big bang happens.

i now won’t hire you unless you have this fundamental knowledge and can show what you have built with it

Whilst software development/programming is now dead. We however deeply need software engineers with these skills who understand that LLMs are a new form of programmable computer. If you haven’t built your own coding agent yet - please do.

how to build a coding agent: free workshop
It’s not that hard to build a coding agent. 300 lines of code running in a loop with LLM tokens. You just keep throwing tokens at the loop, and then you’ve got yourself an agent.
everything is a ralph loop

ps. think this is out there?

It is but watch it happen live. We are here right now, it’s possible and i’m systemising it.

Here in the tweet below I am putting loom under the mother of all ralph loops to automatically perform system verification. Instead of days of planning, discussions and weeks of verification I’m programming this new computer and doing it afk whilst I DJ so that I don’t have to hire humans.

everything is a ralph loop

Any faults identified can be resolved through forward ralph loops to rectify issues. Over the last year the models have became quite good and it's only now that I'm able to realise this full vision but I'll leave you with this dear reader....

What if the models don't stop getting good?

How well will your fair if you are still building jenga stacks when there are classes of principal software engineers out there to prove a point that we are here right now and please pay attention.

everything is a ralph loop

Go build your agent, go learn how to program the new computer (guidance forthcoming in future posts), fall in love with all the possibilities and then join me in this space race of building automated software factories.

ps. socials


two AI researchers are now funded by Solana

two AI researchers are now funded by Solana

Hey folks, it’s been a wild week. Ralph Wiggum has finally crossed the chasm and folks are starting to grasp that software development is now dead as software development can now be done whilst you sleep for $10.42/hr. Software Engineering is alive and well but the skills needed now are the same, yet different.

don't use the plugin - learn the theory instead!

GitHub - ghuntley/how-to-ralph-wiggum: The Ralph Wiggum Technique—the AI development methodology that reduces software costs to less than a fast food worker’s wage.
The Ralph Wiggum Technique—the AI development methodology that reduces software costs to less than a fast food worker’s wage. - ghuntley/how-to-ralph-wiggum
two AI researchers are now funded by Solana

An upcoming post will be going into the changes of unit dynamics but if you want a sneak preview to my thinking here see these two videos:

but for now, I’m going to be recapping something else that was quite extraordinary that happened in my life.

I am now a walking, talking, financial instrument, an underlying. You see when Ralph started to cross the chasm a whole bunch of people started speculating on me on the Solana crypto currency network.

This entire idea was wild and I initially rejected it - quite publicly I may add - it was a gut reaction part because my inbox was blowing up from people i didn’t know; acting in ways that tripped my radar of “this is scammy behavior" and I had lived through the NFT days five years ago.

heck, I even sat down with Coffezilla

but within the pile of DM's that were stacking up were was was one person who caught my attention.

It was this conversation which completely changed my mind about what is going on - folks who had cryptocurrency are looking for genuine people who are doing good things out there in the world and the idea of accelerating these people through funding via cryptocurrency is something they want to do.

two AI researchers are now funded by Solana

at first I ignored these messages but I looked up who they were coming from, their background and started opening up.

two AI researchers are now funded by Solana

as I had lived through the NFT craze - I already had a wallet (and a ENS!) but still had concerns - you see I have perhaps as much trust issues as the average crypto currency holder.

two AI researchers are now funded by Solana
life’s going pretty good right now but it wasn’t always the case
a new chapter: full-time working from a van in a forest
For many people, the year 2020 will go down as a moment in time of hardship in their lives but for me, the year 2019 was dramatically harder as it was the realization that a long-term relationship wasn’t going to work out…
two AI researchers are now funded by Solana

the conversation progressed and I came to understand indeed people are sick of the scammers and I was hesitant to be open to the idea that indeed a new form of creator funding dynamics was happening and I am in fact patient zero!

They suggested I catch up with the founder of BAGS and after a zoom call with them I came away with the following feelings after opening up with the following statement:

I have to believe you (BAGS) are trying to create something great here and that you need me as much as I need you. If you did anything untoward then you would be infact damaging your own reputation and business.

So, this really is a letter to the people who are having fun with the idea of turning me into a walking, talking meme that they can speculate on as for whom am I to judge as as a couple months ago I work in the high frequency trading domain. For all I know my co-workers are the ones who are trading me as an asset as a gag.

There’s so much I’ll never be able to share - prop shops are secretive. It creates somewhat of a divide between how much I can publicly talk about in the realm of AI and teach. It’s a fine balance that causes me much internal friction.

but maybe, just maybe perhaps this creator economy on the Solana network is real as I’m now staring at $300,000 in my physical bank account in under seven days which is enough of a safety net in case things ever go balls up.

So it has me thinking. Whilst I could literally throw out a tweet right now and be drowning in opportunities if such a thing was to occur...

What if instead we have the perfect recipe in the making for truly independent research that’s published open/freely in the making and all I needed to was open up and communicate with you? I've already got venture capitalists chasing me left right and center for meetings but heading down that path would likely result in the same scenario that I'm in right now: conflicted. I am an old-school hippie hacker that deeply believes that knowledge should be free.

Sorry for calling you a bunch of degens. Thank you for your support.

Here’s the next steps and commitments:

  • I am now redirecting my earnings/fees to buy $RALPH as a way to say thank you to those who got in early and to improve the pool liquidity.
  • The $RALPH coin is the official and only coin that I support. Please cease creating other coins and please note if you do create them I will claim them so that I can buy more $RALPH.
  • Clarity as to what loom is in an upcoming post - it’s bigger than gas town folks. I’m rethinking the last fourty years of software engineering and rebuilding the whole damn stack as something that can be self-hosted on-prem. For readers who know me - Loom used to be known as Pherrit. Loom is pherrit folks. Over the last year I've rebuilt the concept many times in different languages (and even streamed it live on Youtube) but now the models are really good so it's time to build. At this stage I have:
    • A fully functional source code host which replicates GitHub but it uses JJ as the baseline source control primitive but has backwards compatibility to Git - in loom speak it's called "spool". Spool is whatever I want it to be - by freeing myself from topics of group think and backwards compatibility I'll be able to explore topics such as virtual filesystems (think replicating google piper or meta's monoke) as a form to provide context for agents (ie. think BEADS by Yeggie).
    • A fully functional implementation of GitHub Codespaces and sand boxing primitives so that weavers can run in the background on remote secure infrastructure (think similar to Daytona/E2B or OpenAI Codex) that uses spiffe://.
    • A fully functional audit system/data source which can be used as loopback sources to drive agents via eBPF.
    • A partially functional implementation of Sourcegraph Amp 😎 as a weaver ("agent")
    • A partially functional implementation of Posthog analytics so that agents ("weavers") are driven by through product telemetry and in time drive autonomous ralph loops to improve product outcomes.
    • A partially functional implementation of Launchdarkly so that agents ("weavers") can autonomously release features into production via feature flags/experiments.

It’s at this point where I explain what the heck is going on. In short there’s this new platform called BAGS which had designed contracts on the SOL network where market making fees are redirected to the creator.

The more people speculate on the underlying (me) doing something cool the more fees that are collected by the platform. In my particular case 99% of fees are redirected to me and I’m now using it to buy.

This is in no ways financial advice or solicitation dear reader - cryptocurrency is volatile and I completely understand if it’s not for you. If you want to support me directly please consider subscribing to my newsletter as a paid reader - all those funds go directly to me.

The intention of this post is to recap what the heck is going on and to extend an invitation to other open source developers or researchers that if this happens to you - i’m happy to sit down on a call and explain it all.

Until next time - thanks for reading,
Geoff.

two AI researchers are now funded by Solana

Disclaimer: $RALPH is a memecoin created to celebrate the Ralph Wiggum Technique and AI development culture. The token was created and is operated by BagsApp. Geoffrey Huntley did not deploy the smart contract and has no control over it. Always do your own research before investing. Crypto is volatile—only invest what you can afford to lose. This is not financial advice. Not affiliated with Anthropic, Ralph Wiggum, or 20th Century Fox.


Bespoke software is the future

At Google, some of the engineers would joke, self-deprecatingly, that the software internally was not particularly exceptional but rather Google’s dominance was an example of the power of network effects: when software is custom tailored to work well with each other.

This is often cited externally to Google, or similar FAANG companies, as indulgent “NIH” (Not Invented Here) syndrome; where the prevailing practice is to pick generalized software solutions, preferably open-source, off-the shelf.

The problem with these generalized solutions is that, well, they are generalized and rarely fit well together. 🙄 Engineers are trained to be DRY (Don’t Repeat Yourself), and love abstractions. As a tool tries to solve more problems, the abstraction becomes leakier and ill-fitting. It becomes a general-purpose tax.

If you only need 10% of a software solution, you pay for the remaining 90% via the abstractions they impose. 🫠

Internally to a company, however, we are taught that unused code is a liability. We often celebrate negative pull-requests as valuable clean-up work with the understanding that smaller code-bases are simpler to understand, operate and optimize.

Yet for our most of our infrastructure tooling, we continue to bloat solutions and tout support despite miniscule user bases.

This is probably one of the areas I am most excited about with the ability to leverage LLM for software creation.

I recently spent time investigating linkers in previous posts such as LLVM’s lld.

I found LLVM to be a pretty polished codebase with lots of documentation. Despite the high-quality, navigating the codebase is challenging as it’s a mass of interfaces and abstractions in order to support: multiple object file formats, 13+ ISAs, a slough of features (i.e. linker scripts ) and multiple operating systems.

Instead, I leveraged LLMs to help me design and write µld, a tiny opinionated linker in Rust that only targets ELF, x86_64, static linking and barebone feature-set.

It shouldn’t be a surprise to anyone that the end result is a codebase that I can audit, educate myself and can easily grow to support additional improvements and optimizations.

The surprising bit, especially to me, was how easy it was to author and write within a very short period of time (1-2 days).

That means smaller companies, without the coffer of similar FAANG companies, can also pursue bespoke custom tailored software for their needs.

This future is well-suited for tooling such as Nix. Nix is the perfect vehicle to help build custom tooling as you have a playground that is designed to build the world similar to a monorepo.

We need to begin to cut away legacy in our tooling and build software that solves specific problems. The end-result will be smaller, easier to manage and better integrated. Where this might have seemed unattainable for most, LLMs will democratize this possibility.

I’m excited for the bespoke future.


Huge binaries: papercuts and limits

In a previous post, I synthetically built a program that demonstrated a relocation overflow for a CALL instruction.

However, the demo required I add -fno-asynchronous-unwind-tables to disable some additional data that might cause other overflows for the purpose of this demonstration.

What’s going on? 🤔

This is a good example that only a select few are facing the size-pressure of massive binaries.

Even with mcmodel=medium which already is beginning to articulate to the compiler & linker: “Hey, I expect my binary to be pretty big.”; there are surprising gaps where the linker overflows.

On Linux, an ELF binary includes many other sections beyond text and data necessary for code execution. Notably there are sections included for debugging (DWARF) and language-specific sections such as .eh_frame which is used by C++ to help unwind the stack on exceptions.

Turns out that even with mcmodel=large you might still run into overflow errors! 🤦🏻‍♂️

Note Funny enough, there is a very recent opened issue for this with LLVM #172777; perfect timing!

For instance, lld assumes 32-bit eh_frame_hdr values regardless of the code model. There are similar 32-bit assumptions in the data-structure of eh_frame as well.

I also mentioned earlier about a pattern about using multiple GOT, Global Offset Tables, to also avoid the 31-bit (±2GiB) relative offset limitation.

Is there even a need for the large code-model?

How far can that take us before we are forced to use the large code-model?

Let’s think about it:

First, let’s think about any limit due to overflow accessing the multiple GOTs. Let’s say we decide to space out our duplicative GOT every 1.5GiB.

|<---- 1.5GiB code ----->|<----- GOT ----->|<----- 1.5GiB code ----->|<----- GOT ----->|

That means each GOT can grow at most 500MiB before there could exist a CALL instruction from the code section that would result in an overflow.

Each GOT entry is 8 bytes, a 64bit pointer. That means we have roughly ~65 million possible entries.

A typical GOT relocation looks like the following and it requires 9 bytes: 7 bytes for the movq and 2 bytes for movl.

movq    var@GOTPCREL(%rip), %rax  # R_X86_64_REX_GOTPCRELX
movl    (%rax), %eax

That means we have 1.5GiB / 9 = ~178 million possible unique relocations.

So theoretically, we can require more unique symbols in our code section than we can fit in the nearest GOT, and therefore cause a relocation overflow. 💥

The same problem exists for thunks, since the thunk is larger than the relative call in bytes.

At some point, there is no avoiding the large code-model, however with multiple GOTs, thunks and other linker optimizations (i.e. LTO, relaxation), we have a lot of headroom before it’s necessary. 🕺🏻


Huge binaries: I thunk therefore I am

In my previous post, we looked at the “sound barrier” of x86_64 linking: the 32-bit relative CALL instruction and how it can result in relocation overflows. Changing the code-model to -mcmodel=large fixes the issue but at the cost of “instruction bloat” and likely a performance penalty although I had failed to demonstrate it via a benchmark 🥲.

Surely there are other interesting solutions? 🤓

First off, probably the simplest solution is to not statically build your code and rely on dynamic libraries 🙃. This is what most “normal” software-shops and the world does; as a result this hasn’t been such an issue otherwise.

This of course has its own downsides and performance implications which I’ve written about and produced solutions for (i.e., Shrinkwrap & MATR) via my doctorate research. Beyond the performance penalty induced by having thousands of shared-libraries, you lose the simplicity of single-file deployments.

A more advanced set of optimizations are under the umbrella of “LTO”; Link Time Optimizations. The linker at the final stage has all the information necessary to perform a variety of optimizations such as code inlining and tree-shaking. That would seem like a good fit except these huge binaries would need an enormous amount of RAM to perform LTO and cause build speeds to go to a crawl.

Tip This is still an active area of research and Google has authored ThinLTO. Facebook has its own set of profile guided LTO optimizations as well via Bolt.

What if I told you that you could keep your code in the fast, 5-byte small code-model, even if your binary is 25GiB for most callsites. 🧐

Turns out there is prior art for “Linker Thunks” [ref] within LLVM for various architectures – notably missing for x86_64 with a quote:

“i386 and x86-64 don’t need thunks” [ref]

What is a “thunk” ?

You might know it by a different name and we use them all the time for dynamic-linking in fact; a trampoline via the procedure linkage table (PLT).

A thunk (or trampoline) is a linker-inserted shim that lives within the immediate reach of the caller. The caller branches to the thunk using a standard relative jump, and the thunk then performs an absolute indirect jump to the final destination.

thunk image

LLVM includes support for inserting thunks for certain architectures such as AArch64 because it is a fixed-size instruction set (32-bit), so the relative branch instruction is restricted to 128MiB. As this limit is so low, lld has support for thunks out of the box.

If we cross-compile our “far function” example for AArch64 using the same linker script to synthetically place it far away to trigger the need for a thunk, the linker magic becomes visible immediately.

> aarch64-linux-gnu-gcc -c main.c -o main.o \
-fno-exceptions -fno-unwind-tables \
-fno-asynchronous-unwind-tables

> aarch64-linux-gnu-gcc -c far.c -o far.o \
-fno-exceptions -fno-unwind-tables \
-fno-asynchronous-unwind-tables

> ld.lld main.o far.o -T overflow.lds -o thunk-aarch64

We can now see the generated code with objdump.

> aarch64-unknown-linux-gnu-objdump -dr thunk-example 

Disassembly of section .text:

0000000000400000 <main>:
  400000:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
  400004:	910003fd 	mov	x29, sp
  400008:	94000004 	bl	400018 <__AArch64AbsLongThunk_far_function>
  40000c:	52800000 	mov	w0, #0x0                   	// #0
  400010:	a8c17bfd 	ldp	x29, x30, [sp], #16
  400014:	d65f03c0 	ret

0000000000400018 <__AArch64AbsLongThunk_far_function>:
  400018:	58000050 	ldr	x16, 400020 <__AArch64AbsLongThunk_far_function+0x8>
  40001c:	d61f0200 	br	x16
  400020:	20000000 	.word	0x20000000
  400024:	00000001 	.word	0x00000001

Disassembly of section .text.far:

0000000120000000 <far_function>:
   120000000:	d503201f 	nop
   120000004:	d65f03c0 	ret

Instead of branching to far_function at 0x120000000, it branches to a generated thunk at 0x400018 (only 16 bytes away). The thunk similar to the large code-model, loads x16 with the absolute address, stored in the .word, and then performs an absolute jump (br).

What if x86_64 supported this? Can we now go beyond 2GiB? 🤯

There would be some more similar thunks that would need to be fixed beyond CALL instructions. Although we are mostly using static binaries, some libraries such as glibc may be dynamically loaded. The access to the methods from these shared libraries are through the GOT, Global Offset Table, which gives the address to the PLT (which is itself a thunk 🤯).

The GOT addresses are also loaded via a relative offset so they will need to changed to be either use thunks or perhaps multiple GOT sections; which also has prior art for other architectures such as MIPS [ref].

With this information, the necessity of code-models feels unecessary. Why trigger the cost for every callsite when we can do-so piecemeal as necessary with the opportunity to use profiles to guide us on which methods to migrate to thunks.

Furthermore, if our binaries are already tens of gigabytes, clearly size for us is not an issue. We can duplicate GOT entries, at the cost of even larger binaries, to reduce the need for even more thunks for the PLT jmp.

What do you think? Let’s collaborate.


Huge binaries

A problem I experienced when pursuing my PhD and submitting academic articles was that I had built solutions to problems that required dramatic scale to be effective and worthwhile. Responses to my publication submissions often claimed such problems did not exist; however, I had observed them during my time within industry, such as at Google, but I couldn’t cite it!

One problem that is only present at these mega-codebases is massive binaries. What’s the largest binary (ELF file) you’ve ever seen? I had observed binaries beyond 25GiB, including debug symbols. How is this possible? These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.

Similar to the sound barrier, there is a point at which code size becomes problematic and we must re-think how we link and build code. For x86_64, that is the 2GiB “Relocation Barrier.”

Why 2GiB? 🤔

Well let’s take a look at how position independent code is put-together.

Let’s look at a simple example.

extern void far_function();

int main() {
    far_function();
    return 0;
}

If we compile this gcc -c simple-relocation.c -o simple-relocation.o we can inspect it with objdump.

> objdump -dr simple-relocation.o

0000000000000000 <main>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	b8 00 00 00 00       	mov    $0x0,%eax
   9:	e8 00 00 00 00       	call   e <main+0xe>
			a: R_X86_64_PLT32	far_function-0x4
   e:	b8 00 00 00 00       	mov    $0x0,%eax
  13:	5d                   	pop    %rbp
  14:	c3                   	ret

There’s a lot going on here, but one important part is e8 00 00 00 00. e8 is the CALL opcode [ref] and it takes a 32bit signed relative offset, which happens to be 0 (four bytes of 0) right now. objdump also lets us know there is a “relocation” necessary to fixup this code when we finalize it. We can view this relocation with readelf as well.

Note If you are wondering why we need -0x4, it’s because the offset is relative to the instruction-pointer which has already moved to the next instruction. The 4 bytes is the operand it has skipped over.

> readelf -r simple-relocation.o -d

Relocation section '.rela.text' at offset 0x170 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000400000004 R_X86_64_PLT32    0000000000000000 far_function - 4

This is additional information embedded in the binary which tells the linker in susbsequent stages that it has code that needs to be fixed. Here we see the address 00000000000a, and a is 9 + 1, which is the offset of the start of the operand for our CALL instruction.

Let’s now create the C file for our missing function.

void far_function() {
}

We will now compile it and link the two object files together using our linker.

> gcc simple-relocation.o far-function.o -o simple-relocation

Let’s now inspect that same callsite and see what it has.

> objdump -dr simple-relocation

0000000000401106 <main>:
  401106:	55                   	push   %rbp
  401107:	48 89 e5             	mov    %rsp,%rbp
  40110a:	b8 00 00 00 00       	mov    $0x0,%eax
  40110f:	e8 07 00 00 00       	call   40111b <far_function>
  401114:	b8 00 00 00 00       	mov    $0x0,%eax
  401119:	5d                   	pop    %rbp
  40111a:	c3                   	ret

000000000040111b <far_function>:
  40111b:	55                   	push   %rbp
  40111c:	48 89 e5             	mov    %rsp,%rbp
  40111f:	90                   	nop
  401120:	5d                   	pop    %rbp
  401121:	c3                   	ret

We can see that the linker did the right thing with the relocation and calculated the relative offset of our symbol far_function and fixed the CALL instruction.

Okay cool…🤷 What does this have to do with huge binaries?

Notice that this call instruction, e8, only takes 32bits signed which means it’s limited to 2^31 bits. This means a callsite can only jump roughly 2GiB forward or 2GiB backward. The “2GiB Barrier” represents the total reach of a single relative jump.

What happens if our callsite is over 2GiB away?

Let’s build a synthetic example by asking our linker to place far_function really really far away. We can do this using a “linker script”, which is a mechanism we can instruct the linker how we would like our code sections laid out when the program starts.

SECTIONS
{
    /* 1. Start with standard low-address sections */
    . = 0x400000;
    
    /* Catch everything except our specific 'far' object */
    .text : { 
        simple-relocation.o(.text.*) 
    }
    .rodata : { *(.rodata .rodata.*) }
    .data   : { *(.data .data.*) }
    .bss    : { *(.bss .bss.*) }

    /* 2. Move the cursor for the 'far' island */
    . = 0x120000000; 
    
    .text.far : { 
        far-function.o(.text*) 
    }
}

If we now try to link our code we will see a “relocation overflow”.

TIP I used lld from LLVM because the error messages are a bit prettier.

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow -fuse-ld=lld

ld.lld: error: <internal>:(.eh_frame+0x6c):
relocation R_X86_64_PC32 out of range:
5364513724 is not in [-2147483648, 2147483647]; references section '.text'
ld.lld: error: simple-relocation.o:(function main: .text+0xa):
relocation R_X86_64_PLT32 out of range:
5364514572 is not in [-2147483648, 2147483647]; references 'far_function'
>>> referenced by simple-relocation.c
>>> defined in far-function.o

When we hit this problem what solutions do we have? Well this is a complete other subject on “code models”, and it’s a little more nuanced depending on whether we are accessing data (i.e. static variables) or code that is far away. A great blog post that goes into this is the following by @maskray who wrote lld.

The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute 64bit ones; kind of like a JMP.

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

> gcc -c simple-relocation.c -o simple-relocation.o -mcmodel=large -fno-asynchronous-unwind-tables

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

./simple-relocation-overflow

Note I needed to add -fno-asynchronous-unwind-tables to disable some additional data that might cause overflow for the purpose of this demonstration.

What does the disassembly look like now?

> objdump -dr simple-relocation-overflow 

0000000120000000 <far_function>:
   120000000:	55                   	push   %rbp
   120000001:	48 89 e5             	mov    %rsp,%rbp
   120000004:	90                   	nop
   120000005:	5d                   	pop    %rbp
   120000006:	c3                   	ret

00000000004000e6 <main>:
  4000e6:	55                   	push   %rbp
  4000e7:	48 89 e5             	mov    %rsp,%rbp
  4000ea:	b8 00 00 00 00       	mov    $0x0,%eax
  4000ef:	48 ba 00 00 00 20 01 	movabs $0x120000000,%rdx
  4000f6:	00 00 00 
  4000f9:	ff d2                	call   *%rdx
  4000fb:	b8 00 00 00 00       	mov    $0x0,%eax
  400100:	5d                   	pop    %rbp
  400101:	c3                   	ret

There is no longer a sole CALL instruction, it has become MOVABS & CALL 😲. This changed the instructions from 5 (opcode + 4 bytes for 32bit relative offset) to a whopping 12 bytes (2 bytes for ABS opcode + 8 bytes for absolute 64 bit address + 2 bytes for CALL).

This has notable downsides among others:

  • Instruction Bloat: We’ve gone from 5 bytes per call to 12. In a binary with millions of callsites, this can add up.
  • Register Pressure: We’ve burned a general-purpose register, %rdx, to perform the jump.

Caution I had a lot of trouble building a benchmark that demonstrated a worse lower IPC (instructions per-cycle) for the large mcmodel, so let’s just take my word for it. 🤷

Changing to a larger code-model is possible but it comes with these downsides. Ideally, we would like to keep our small code-model when we need it. What other strategies can we pursue?

More to come in subsequent writings.


Failing interviews

My blog has been a little quiet. I recently accepted a new role at Meta and it’s been keeping me busy!

Once the onboarding phase is done I hope to get back to my Nix contributions.

Accepting the position at Meta has had me reflecting on my journey to this current role. People often share their highlights of accepting a new role but rarely their lowlights.

I wanted to share a brief look at what interviewing might be like in the software industry. People are often discouraged by failure but it’s part of the process.

I remember having done interview training at Google where they discussed most interviewers decide on the outcome of the interview within the first-five minutes. That story is not to totally discourage oneself from the process but rather to demonstrate there is a portion that is out of your control.

Going through my emails to get an accurate accounting is challenging, however I found threads as early as 2011 interviewing for Facebook. I am actually sure I had interviewed ealier through my co-ops at University of Waterloo, but I don’t have access anylonger to those emails. 😩

Some rough dates I had found: 2011, 2014, 2015, 2018, 2019, 2020, 2021, 2022, 2023*, 2024, 2025.

* This interview round was long and was for 3 distinct roles.

Across those years, the level I interviewed at was different and sometimes the role too (IC vs EM).

Don’t be discouraged from failure.


Merry Christmas!

Comic santa on the sleigh pulled by reindeers

Frohe Weihnachten, ein schönes Fest, und einen guten Rutsch ins neue Jahr wünscht euch
Leah Neukirchen

Merry Christmas and a Happy New Year!

NP: Pearl Jam—Quick Escape


Advent of Swift

This year, I decided to use Advent of Code to learn the language Swift. Since there were only 12 days of tasks for 2025, here is my summary of experiences. Also check out my solutions.

Tooling

I used Swift 6.2 on Void Linux, which I compiled from scratch since there were no prebuilt binaries that worked with a Python 3.13 system (needed for lldb). It’s possible to bootstrap Swift from just a clang++ toolchain, so this wasn’t too tedious, but it still required looking up Gentoo ebuilds how to pass configuration properly. As an end user, this should not worry you too much.

Tooling in general is pretty nice: there’s an interpreter and you can run simple “scripts” directly using swift foo.swift. Startup time is short, so this is great for quick experiments. There’s also a REPL, but I didn’t try it yet. One flaw of the interpreter (but possibly related to my setup) is that there were no useful backtraces when something crashed. In this case, I compiled a binary and used the included lldb, which has good support for Swift.

There’s also a swift-format tool included to format source code. It uses 2 spaces by default, but most code in the wild uses 4 spaces curiously. I’m not sure when that changed.

Since I only write simple programs using a single source file, I didn’t bother looking at swift-build yet.

By default, programs are linked dynamically against the standard library and are thus super compact. Unfortunately, many modern languages today don’t support this properly. (Statically linking the standard library costs roughly 10MB, which is fair too.)

The language

In general, the language feels modern, comfy, and is easy to pick up. However, I found some traps as well.

The syntax is inspired by the C family and less symbol-heavy than Rust’s. There’s a block syntax akin to Ruby for passing closures.

Error handling can be done using checked exceptions, but there are also Optional types and Result types like in Rust, and syntactic shortcuts to make them convenient.

The standard library has many practical functions, e.g. there’s a function Character.wholeNumberValue that works for any Unicode digit symbol. There’s a Sequence abstraction over arrays etc. which has many useful functions (e.g. split(whereSeparator:), which many other standard libraries lack). The standard library is documented well.

The string processing is powerful, but inconvenient when you want to do things like indexing by offsets or ranges, due to Unicode semantics. (This is probably a good thing in general.) I switched to using arrays of code-points for problems that required this.

On Day 2, I tried using regular expressions, but I found serious performance issues: first I used a Regexp literal (#/.../#) in a loop, which actually resulted in creating a new Regexp instance on each iteration; second, Regexp matching itself is quite slow. Before I extracted the Regexp into a constant, the program was 100x as slow as Ruby(!), and after it still was 3x as slow. I then rewrote the solution to not use Regexps.

Prefix (and suffix) operators need to “stick” to their expression, so you can’t write if ! condition. This is certainly a choice: you can define custom prefix and suffix operators and parsing them non-ambiguously is easier, but it’s probably not a thing I would have done.

Swift functions often use parameter names (probably for compatibility with Objective-C). They certainly help readability of the code, but I think I prefer OCaml’s labeled arguments, which can be reordered and permit currying.

The language uses value semantics for collections and then optimizes them using copy-on-write and or by detecting inout parameters (which are updated in-place). This is quite convenient when writing code (e.g day 4) Garbage collection is done using reference counting. However, some AoC tasks turned out to make heavy use of the garbage collector, where I’d have expected the compiler to use a callstack or something for intermediate values. Substrings are optimized by a custom type Substring, if you want to write a function to operate on either strings or substrings, you need to spell this out:

func parse<T>(_ str: T) -> ... where T: StringProtocol

There’s a library swift-algorithms adding even more sequence and collection algorithms, which I decided not to use.

Downsides

The compiler is reasonably fast for an LLVM-based compiler. However, when you manage to create a type checking error, error reporting is extremely slow, probably because it tries to find any variant that could possibly work still. Often, type checking errors are also confusing.

(Error messages unrelated to type checking are good and often really helpful, e.g. if you accidentally use ''-quotes for strings or try to use [] as an empty map, it tells you how to do it right.)

Ranges can be inclusive ... or right-exclusive ..<. Constructing a range where the upper boundary is smaller than the lower boundary results in a fatal error, whereas in other languages it’s just an empty range.

Some “obvious” things seem to be missing, e.g. tuples of Hashable values are not Hashable currently (this feature was removed in 2020, after trying to implement the proposal that introduced it, and no one bothered to fix it yet?), which is pretty inconvenient.

Likewise, the language has pattern matching for algebraic data types and tuples, but unfortunately not for arrays/sequences, which is inconvenient at times.

Since I was just picking up Swift, I had to search stuff online a lot and read Stack Overflow. I noticed I found many answers for prior versions of Swift that changed in the mean time (even for basic tasks). For a language that’s been around for over 10 years, this seems like quite some churn. I hope the language manages to stabilize better and doesn’t just get new features bolted on continuously.

In general, using Swift was fun and straight-forward for these programming tasks. For writing serious applications on non-MacOS systems, there’s also the question of library availability. Some parts of the language still feel unfinished or unpolished, in spite of being around for quite some time.

NP: Adrianne Lenker—Promise is a Pendulum


llm weights vs the papercuts of corporate

llm weights vs the papercuts of corporate

In woodworking, there's a saying that you should work with the grain, not against the grain and I've been thinking about how this concept may apply to large language models.

These large language models are built by training on existing data. This data forms the backbone which creates output based upon the preferences of the underlying model weights.

We are now one year in where a new category of companies has been founded whereby the majority of the software behind that company was code-generated.

From here on out I’m going to call to these companies as model weight first. This category of companies can be defined as any company that is building with the data (“grain”) that has been baked into the large language models.

Model weight first companies do not require as much context engineering. They’re not stuffing the context window with rules to try attempt to override and change the base models to fit a pre-existing corporate standard and conceptualisation of how software should be.

The large language model has decided on what to call a method name or class name because that method or classs name is what the large language model prefers thus, when code is adapted, modified, and re-read into the context window, it is consuming its preferred choice of tokens.

Model-weight-first companies do not have the dogma of snake_case vs PascalCase vs kebab-case policies that many corporate companies have. Such policies were created for humans to create consistency so humans can comprehend the codebase. Something that is of a lesser concern now that AI is here.

Now variable naming is a contrived example, but I suspect in the years to come if a study was done to compare the velocity/productivity/success rates with AI of a model weight first company vs. a corporate company, I suspect a model weight company have vastly better outcomes because they're not trying to do context engineering to force the LLM to follow some pre-existing dogma. There is one universal truth with LLMs as they are now: the less that you use, the better the outcomes you get.

The less that you allocate (i.e., cursor rules or what else have you), then you'll have more context window available for actually implementing requirements of the software that needs to be built.

So if we take this thought experiment about the models having preferences for tokens and expand it out to another use case, let's say that you needed to build a Docker container at a model weight first company.

You could just ask an LLM to build a Docker container, and it knows how to build a Docker container for say Postgres, and it just works. But in the corporate setting, if you ask it to build a Docker container, and in that corporate you have to configure HTTPS, squid proxy, or some sort of artifactory and outbound internet access is restricted, that same simple thing becomes very comical.

You'll see an agent fill up with lots of failed tool calls unless you do context engineering to say "no, if you want to build a docker container, you got to follow these particular allocations of company conventions” in a crude attempt to override the preferences of the inbuilt model weights.

At a model weight first company, building a docker image is easy but at a corporate the agent will have one hell of a time and end up with a suboptimal/disappointing outcome.

So, perhaps this is going to be a factor that needs to be considered when talking and comparing the success rates of AI at one company versus another company, or across industries.

If a company is having problems with AI and getting outcomes from AI, are they a model weight first company or are they trying to bend AI to their whims?

Perhaps the corporates who succeed the most with the adoption of AI will be those who shed their dogma that no longer applies and start leaning into transforming to become model-weight-first companies.

ps. socials.


Nix derivation madness

I’ve written a bit about Nix and I still face moments where foundational aspects of the package system confounds and surprises me.

Recently I hit an issue that stumped me as it break some basic comprehension I had on how Nix works. I wanted to produce the build and runtime graph for the Ruby interpreter.

> nix-shell -p ruby

> which ruby
/nix/store/mp4rpz283gw3abvxyb4lbh4vp9pmayp2-ruby-3.3.9/bin/ruby

> nix-store --query --include-outputs --graph \
  $(nix-store --query --deriver $(which ruby))
error: path '/nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv' is not valid

> ls /nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv
ls: cannot access '/nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv':
No such file or directory

Huh. 🤔

I have Ruby but I don’t seem to have the derivation, 24v9wpp393ib1gllip7ic13aycbi704g, file present on my machine.

No worries, I think I can --realize it and download it from the NixOS cache.

> nix-store --realize /nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv
don't know how to build these paths:
  /nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv
error: cannot build missing derivation '/nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv'

I guess the NixOS cache doesn’t seem to have it. 🤷

This was actually perplexing me at this moment. In fact there are multiple discourse posts about it.

My mental model however of Nix though is that I must have first evaluated the derivation (drv) in order to determine the output path to even substitute. How could the NixOS cache not have it present?

Is this derivation wrong somehow? Nope. This is the derivation Nix believes that produced this Ruby binary from the sqlite database. 🤨

> sqlite3 "/nix/var/nix/db/db.sqlite" 
    "select deriver from ValidPaths where path = 
    '/nix/store/mp4rpz283gw3abvxyb4lbh4vp9pmayp2-ruby-3.3.9'"
/nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv

What does the binary cache itself say? Even the cache itself thinks this particular derivation, 24v9wpp393ib1gllip7ic13aycbi704g, produced this particular Ruby output.

> curl -s https://cache.nixos.org/mp4rpz283gw3abvxyb4lbh4vp9pmayp2.narinfo |\
  grep Deriver
Deriver: 24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv

What if I try a different command?

> nix derivation show $(which ruby) | jq -r "keys[0]"
/nix/store/kmx8kkggm5i2r17s6l67v022jz9gc4c5-ruby-3.3.9.drv

> ls /nix/store/kmx8kkggm5i2r17s6l67v022jz9gc4c5-ruby-3.3.9.drv
/nix/store/kmx8kkggm5i2r17s6l67v022jz9gc4c5-ruby-3.3.9.drv

So I seem to have a completely different derivation, kmx8kkggm5i2r17s6l67v022jz9gc4c5, that resulted in the same output which is not what the binary cache announces. WTF? 🫠

Thinking back to a previous post, I remember touching on modulo fixed-output derivations. Is that what’s going on? Let’s investigate from first principles. 🤓

Let’s first create fod.nix which is our fixed-output derivation.

let
  system = builtins.currentSystem;
in derivation {
  name = "hello-world-fixed";
  builder = "/bin/sh";
  system = system;
  args = [ "-c" ''
    echo -n "hello world" > "$out"
  '' ];
  outputHashMode = "flat";
  outputHashAlgo = "sha256";
  outputHash = "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9";
}

☝️ Since this is a fixed-output derivation (FOD) the produced /nix/store path will not be affected to changes to the derivation beyond the contents of $out.

> nix-instantiate fod.nix
/nix/store/k2wjpwq43685j6vlvaarrfml4gl4196n-hello-world-fixed.drv

> nix-build fod.nix
/nix/store/ajk19jb8h5h3lmz20yz6wj9vif18lhp1-hello-world-fixed

Now we will create a derivation that uses this FOD.

{ fodDrv ? import ./fod.nix }:

let
  system = builtins.currentSystem;
in
builtins.derivation {
  name = "uses-fod";
  inherit system;
  builder = "/bin/sh";
  args = [ "-c" ''
    echo ${fodDrv} > $out
    echo "Good bye world" >> $out
  '' ];
}

The /nix/store for the output for this derivation will change on changes to the derivation except if the derivation path for the FOD changes. This is in fact what makes it “modulo” the fixed-output derivations.

> nix-instantiate uses-fod.nix
/nix/store/85d15y7irq7x4fxv4nc7k1cw2rlfp3ag-uses-fod.drv

> nix-build uses-fod.nix
/nix/store/sd12qjak7rlxhdprj10187f9an787lk3-uses-fod

Let’s test this all out by changing our fod.nix derivation. Let’s do this by just adding some garbage attribute to the derivation.

@@ -4,6 +4,7 @@
   name = "hello-world-fixed";
   builder = "/bin/sh";
   system = system;
+  garbage = 123;
   args = [ "-c" ''
     echo -n "hello world" > "$out"
   '' ];

What happens now?

> nix-instantiate fod.nix
/nix/store/yimff0d4zr4krwx6cvdiqlin0y6vkis0-hello-world-fixed.drv

> nix-build fod.nix
/nix/store/ajk19jb8h5h3lmz20yz6wj9vif18lhp1-hello-world-fixed

The path of the derivation itself, .drv, has changed but the output path ajk19jb8h5h3lmz20yz6wj9vif18lhp1 remains consistent.

What about the derivation that leverages it?

> nix-instantiate uses-fod.nix
/nix/store/85wkdaaq6q08f71xn420v4irll4a8g8v-uses-fod.drv

> nix-build uses-fod.nix
/nix/store/sd12qjak7rlxhdprj10187f9an787lk3-uses-fod

It also got a new derivation path but the output path remained unchanged. 😮

That means changes to fixed-output-derivations didn’t cause new outputs in either derivation but it did create a complete new tree of .drv files. 🤯

That means in nixpkgs changes to fixed-output derivations can cause them to have new store paths for their .drv but result in dependent derivations to have the same output path. If the output path had already been stored in the NixOS cache, then we lose the link between the new .drv and this output path. 💥

Derivation graphic

The amount of churn that we are creating in derivations was unbeknownst to me.

It can get even weirder! This example came from @ericson2314.

We will duplicate the fod.nix to another file fod2.nix whose only difference is the value of the garbage.

@@ -4,7 +4,7 @@
   name = "hello-world-fixed";
   builder = "/bin/sh";
   system = system;
-  garbage = 123;
+  garbage = 124;
   args = [ "-c" ''
     echo -n "hello world" > "$out"
   '' ];

Let’s now use both of these in our derivation.

{ fodDrv ? import ./fod.nix,
  fod2Drv ? import ./fod2.nix
}:
let
  system = builtins.currentSystem;
in
builtins.derivation {
  name = "uses-fod";
  inherit system;
  builder = "/bin/sh";
  args = [ "-c" ''
    echo ${fodDrv} > $out
    echo ${fod2Drv} >> $out
    echo "Good bye world" >> $out
  '' ];
}

We can now instantiate and build this as normal.

> nix-instantiate uses-fod.nix
/nix/store/z6nr2k2hy982fiynyjkvq8dliwbxklwf-uses-fod.drv

> nix-build uses-fod.nix
/nix/store/211nlyx2ga7mh5fdk76aggb04y1wsgkj-uses-fod

What is weird about that?

Well, let’s take the JSON representation of the derivation and remove one of the inputs.

> nix derivation show \
    /nix/store/z6nr2k2hy982fiynyjkvq8dliwbxklwf-uses-fod.drv \
    jq 'values[].inputDrvs | keys[]'
"/nix/store/6p93r6x0bwyd8gngf5n4r432n6l380ry-hello-world-fixed.drv"
"/nix/store/yimff0d4zr4krwx6cvdiqlin0y6vkis0-hello-world-fixed.drv"

We can do this because although there are two input derivations, we know they both produce the same output!

@@ -12,12 +12,6 @@
       "system": "x86_64-linux"
     },
     "inputDrvs": {
-      "/nix/store/6p93r6x0bwyd8gngf5n4r432n6l380ry-hello-world-fixed.drv": {
-        "dynamicOutputs": {},
-        "outputs": [
-          "out"
-        ]
-      },
       "/nix/store/yimff0d4zr4krwx6cvdiqlin0y6vkis0-hello-world-fixed.drv": {
         "dynamicOutputs": {},
         "outputs": [

Let’s load this modified derivation back into our /nix/store and build it again!

> nix derivation add < derivation.json
/nix/store/s4qrdkq3a85gxmlpiay334vd1ndg8hm1-uses-fod.drv

> nix-build /nix/store/s4qrdkq3a85gxmlpiay334vd1ndg8hm1-uses-fod.drv
/nix/store/211nlyx2ga7mh5fdk76aggb04y1wsgkj-uses-fod

We got the same output 211nlyx2ga7mh5fdk76aggb04y1wsgkj. Not only do we have a 1:N trait for our output paths to derivations but we can also take certain derivations and completely change them by removing inputs and still get the same output! 😹

The road to Nix enlightenment is no joke and full of dragons.


Fuzzing for fun and profit

I watched recently a keynote by Will Wilson on fuzzing – Fuzzing’25 Keynote. The talk is excellent, and one main highlight is the fact we have at our disposal is the capability to “fuzz” our software toaday and yet we do not.

While I’ve seen the power of QuickCheck-like tools to create property based testing, I never had never used fuzzing over an application as a whole, specifically American Fuzzy Lop. I was intrigued to add this skill to my toolbelt and maybe apply it to CppNix.

As with everything else, I need to learn things from first principles. I would like to create a scenario with a known-failure and see how AFL discovers it.

To get started let’s first make sure we have access to AFL via Nix.

We will be using AFL++, the daughter of AFL that incorporates newer updates and features.

> nix-shell -p aflplusplus

How does AFL work? 🤔

AFL will feed your program various inputs to try and cause a crash! 💥

In order to generate better inputs, you compile your code with a variant of gcc or clang distributed by AFL which will insert special instructions to keep track of coverage of branches as it creates various test cases.

Let’s create a demo program that crashes when given the input Farid.

We leverage a volatile int so that the compiler does not optimize the multiple if instructions together.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <signal.h>

#define INPUT_SIZE 10

void crash() {
  raise(SIGSEGV);
}

int main(int argc, char *argv[]) {
  char buffer[INPUT_SIZE] = {0};

  if (fgets(buffer, INPUT_SIZE, stdin) == NULL) {
    fprintf(stderr, "Error reading input.\n");
    return 1;
  }

  // So the if statements are not optimized together
  volatile int progress_tracker = 0;

  if (strlen(buffer) < 5) {
    return 0;
  }

  if (buffer[0] == 'F') {
    progress_tracker ++;
    if (buffer[1] == 'a') {
      progress_tracker ++;
      if (buffer[2] == 'r') {
        progress_tracker ++;
        if (buffer[3] == 'i') {
          progress_tracker ++;
          if (buffer[4] == 'd') {
            crash();
          }
        }
      }
    }
  }
  return 0;
}

We now can compile our code with afl-cc to get the instrumented binary.

> afl-cc demo.c -o demo

AFL needs to be given some sample inputs Let’s feed it the simplest starter seed possible – an empty file!

> mkdir -p seed_dir
> echo "" > seed_dir/empty_input.txt

Now we simply run afl-fuzz, and the magic happens. ✨

> afl-fuzz -i seed_dir -o out_dir -- ./demo

A really nice TUI appears that informs you of various statistics of the running fuzzer, and importantly if any crashes had been foundsaved crashes : 1 !

          american fuzzy lop ++4.32c {default} (./demo) [explore]          
┌─ process timing ────────────────────────────────────┬─ overall results ────┐
│        run time : 0 days, 0 hrs, 33 min, 4 sec      │  cycles done : 3191  │
│   last new find : 0 days, 0 hrs, 33 min, 2 sec      │ corpus count : 6     │
│last saved crash : 0 days, 0 hrs, 33 min, 1 sec      │saved crashes : 1     │
│ last saved hang : none seen yet                     │  saved hangs : 0     │
├─ cycle progress ─────────────────────┬─ map coverage┴──────────────────────┤
│  now processing : 4.7238 (66.7%)     │    map density : 16.67% / 44.44%    │
│  runs timed out : 0 (0.00%)          │ count coverage : 45.00 bits/tuple   │
├─ stage progress ─────────────────────┼─ findings in depth ─────────────────┤
│  now trying : havoc                  │ favored items : 5 (83.33%)          │
│ stage execs : 496/800 (62.00%)       │  new edges on : 6 (100.00%)         │
│ total execs : 13.5M                  │ total crashes : 1014 (1 saved)      │
│  exec speed : 6566/sec               │  total tmouts : 0 (0 saved)         │
├─ fuzzing strategy yields ────────────┴─────────────┬─ item geometry ───────┤
│   bit flips : 0/0, 0/0, 0/0                        │    levels : 5         │
│  byte flips : 0/0, 0/0, 0/0                        │   pending : 0         │
│ arithmetics : 0/0, 0/0, 0/0                        │  pend fav : 0         │
│  known ints : 0/0, 0/0, 0/0                        │ own finds : 5         │
│  dictionary : 0/0, 0/0, 0/0, 0/0                   │  imported : 0         │
│havoc/splice : 6/13.5M, 0/0                         │ stability : 100.00%   │
│py/custom/rq : unused, unused, unused, unused       ├───────────────────────┘
│    trim/eff : 64.13%/20, n/a                       │          [cpu000: 18%]
└─ strategy: exploit ────────── state: running...  ──┘

The output directory contains all the saved information including the input that caused the crashes.

Let’s inspect it!

> cat "out_dir/default/crashes/id:000000,sig:11,src:000005,time:2119,execs:14486,op:havoc,rep:1" 
Farid

Huzzah! 🥳

AFL was successfully able to find our code-word, Farid, that caused the crash.

It is important to note however that for my simple program it found the failure-case rather quickly, however for large programs it can take a long time to explore the complete state space. Companies such as Google, continously run fuzzers such as AFL on well-known open source projects to help detect failures.


Some interesting stuff I found on IX LANs

Some interesting stuff I found on IX LANs

These days the internet as a whole is mostly constructed out of point to point ethernet circuits, meaning an ethernet interface (mostly optical) attached


Migrating from ZFS mirror to RAIDZ2

For a long time I’ve been running my storage on a 2-disk ZFS mirror. It’s been stable, safe, and easy to manage. However, at some point, 2 disks just aren’t enough, and I wanted to upgrade to RAIDZ2 so that I could survive up to two simultaneous disk failures.

I could have added another mirror, which would have been simple, and this setup would allow two drives to fail, but not any two drives. I wanted the extra safety of being able to lose any two drives.