Planet TVL

Software development now costs less than than the wage of a minimum wage worker

Software development now costs less than than the wage of a minimum wage worker

Hey folks, the last year I've been pondering about this and doing game theory around the discovery of Ralph, how good the models are getting and how that's going to intersect with society. What follows is a cold, stark write-up of how I think it's going to go down.

Software development now costs less than than the wage of a minimum wage worker
https://www.theregister.com/2026/01/27/ralph_wiggum_claude_loops/

The financial impacts are already unfolding. Back when Ralph started to go really viral, there was a private equity firm that was previously long on Atlassian and went deliberately short on Atlassian because of Ralph. In the last couple of days, they released their new investor report, and they made absolute bank.

Software development now costs less than than the wage of a minimum wage worker
Dec 2025 - https://www.minotaurcapital.com/reports/quarterly/2025-12

I discovered Ralph almost a year ago today, and when I made that discovery, I sat on it for a while and focused on education and teaching juniors to pay attention and just writing prolifically, just writing and doing keynotes internationally, pleading with people to pay attention and to invest in themselves.

Dear Student: Yes, AI is here, you’re screwed unless you take action...
Two weeks ago a student anonymously emailed me asking for advice. This is the reply and if I was in your shoes this is what I’d do. So, I read your blog post “An oh f*** moment in time” alongside “The future belongs to idea guys that can just do
Software development now costs less than than the wage of a minimum wage worker

It's now one year later, and the cost of software development is $10.42 an hour, which is less than minimum wage and a burger flipper at macca's gets paid more than that. What does it mean to be a software developer when everyone in the world can develop software? Just two nights ago, I was at a Cursor meetup, and nearly everyone in the room was not a software developer, showing off their latest and greatest creations.

Software development now costs less than than the wage of a minimum wage worker

Well, they just became software developers because Cursor enabled them to become one. You see, the knowledge and skill of being a software developer has been commoditised. If everyone can be a software developer, what does that mean if your identity function is that you're a software developer and you write software for a living?

My theory of how it all goes down and gets feral really, really fast. Is quite simple...

Software development now costs less than than the wage of a minimum wage worker

For the past month, I've been catching up with venture capitalists in Australia and San Francisco and rubber-ducking this concept. You see, for a lot of them, they're not even sure whether their business model as venture capitalists still exists.

Why does someone need to raise a large amount of capital if it's just five man show now?

So let's open up with a classic K shape.

Software development now costs less than than the wage of a minimum wage worker

We rewind time to Christmas two years ago, where I originally posted, An "oh fuck" moment in time it was clear to me where this was going. The models were already good enough back then to cause societal disruption. The models were pretty wild; like wild horses, and they needed quite a great deal of skill to get outcomes from them...

Software development now costs less than than the wage of a minimum wage worker

If we fast-forward to the last Christmas holidays, many people had their "oh fuck" moment a year later, and the difference between now and then is twofold.

One: they actually picked up the guitar, played it, and took the Christmas period off because they had the space, capacity, and time to invest in themselves and make discoveries.

deliberate intentional practice
Something I’ve been wondering about for a really long time is, essentially, why do people say AI doesn’t work for them? What do they mean when they say that? From which identity are they coming from? Are they coming from the perspective of an engineer with a job title and
Software development now costs less than than the wage of a minimum wage worker

Two, the horses or models came with factory defaults of "broken in and ready to get shit done", which made them more accessible; they're easier to use to achieve outcomes, so people didn't need to invest as much time learning how to juice them to get disruptive outcomes.

Software development now costs less than than the wage of a minimum wage worker

The world is now divided into two types of companies. Model first companies that are lean, apex predators who can operate on razor-thin margins and crush incumbents.

llm weights vs the papercuts of corporate
In woodworking, there’s a saying that you should work with the grain, not against the grain and I’ve been thinking about how this concept may apply to large language models. These large language models are built by training on existing data. This data forms the backbone which creates output based
Software development now costs less than than the wage of a minimum wage worker

The next side of the equation is nearly every company out there today, which needs to go through a people transformation program, figure out what to do with AI, and deal with the fact that the fundamentals of business have changed.

Software development now costs less than than the wage of a minimum wage worker

Jack is doing the right thing for his company by acting early. What will happen is that the time for a competitor to be at your door will be measured in months, not years. And as models get better, the timeframe only compresses.

The real question is for the folks who, unfortunately, were laid off today; they will need jobs, and they will now see the importance of upskilling with AI. So they'll go on to their next employer or other industries and upskill with AI, and then seek to implement what is needed - automating job functions via AI.

Then the cycle continues across all industries, all disciplines.

But it's not going to be just triggered by layoffs. It'll be just triggered by executives who don't get it. When you understand what is going on and how real AI is, it is maddening to be in a company surrounded by people who don't get it.

You see, there is a difference between employer suicide and employee suicide. The smart folks who don't want to commit employment suicide will leave.

The smarter ones in that segment will just go and found their own companies, then come back and do what they know. And they'll attack their employers vertically, operating leaner and meaner.

Software development now costs less than than the wage of a minimum wage worker

As the models get better, which is slope on slope derivative pace at this stage and as model-first companies get better and better and better at automating their job function, they can be at the door of their previous employer in months, not years.

Software development now costs less than than the wage of a minimum wage worker

To make matters worse, as the models get better, time gets compressed, and the snake eating its tail speeds up.

Software development now costs less than than the wage of a minimum wage worker
idk how to visualize this, if you've got ideas let me know...


Which results in employers who did not take corrective actions, unlike Jack, having to lay off people in the long run because margins are being squeezed by new competitors operating leaner, meaner, and faster.

Then the cycle continues across all industries, all disciplines.

Software development now costs less than than the wage of a minimum wage worker

As I've been stressing in my writing for almost a year now, employers and employees trade time and skill for money. If a company is having problems adopting AI, then that is a company issue, not an employee issue.

Experience as a software engineer today doesn’t guarantee relevance tomorrow. The dynamics of employment are changing: employees trade time and skills for money, but employers’ expectations are evolving rapidly. Some companies are adapting faster than others.

Another thing I've been thinking: when someone says, “AI doesn’t work for me,” what do they mean? Are they referring to concerns related to AI in the workplace or personal experiments on greenfield projects that don't have these concerns?

This distinction matters.

Employees trade skills for employability, and failing to upskill in AI could jeopardise their future. I’m deeply concerned about this.

If a company struggles with AI adoption, that’s a solvable problem - it's now my literal job. But I worry more about employees.

In history, there are tales of employees departing companies that resisted cloud adoption to keep their skills competitive.

The same applies to AI. Companies that lag risk losing talent who prioritise skill relevance.

- June 2025 from https://ghuntley.com/six-month-recap/

Model weight first companies should be scaring the fuck out of every founder right now if they're not a utility service, for what is a moat now in the era when you can /z80 something?

llm weights vs the papercuts of corporate
In woodworking, there’s a saying that you should work with the grain, not against the grain and I’ve been thinking about how this concept may apply to large language models. These large language models are built by training on existing data. This data forms the backbone which creates output based
Software development now costs less than than the wage of a minimum wage worker
Can a LLM convert C, to ASM to specs and then to a working Z/80 Speccy tape? Yes.
✨Daniel Joyce used the techniques described in this post to port ls to rust via an objdump. You can see the code here: https://github.com/DanielJoyce/ls-rs. Keen, to see more examples - get in contact if you ship something! Damien Guard nerd sniped me and other folks wanted
Software development now costs less than than the wage of a minimum wage worker

On the topic of moats, I've been thinking about this for almost a year now, and I think I've now got a clearer sense of what moats are in the AI era, but first, let's talk about what moats aren't...

  • Any business model that's based on per-seat pricing, as AI starts to rip harder and harder, is going to become much harder to maintain headcount within a corporation because model-first companies will be coming into business and operating much leaner using utility-based pricing. It's a margin game now.
  • Any product features or platforms that were designed for humans. I know that's going to sound really wild, but understand these days I go window-shopping on SaaS companies' websites for product features, rip a screenshot into Claude Code, and it rebuilds that product feature/platform. As we enter the era of hyper-personalised software, I think this will be the case more and more. In my latest creation, I have cloned Posthog, Jira, Pipedrive, and Calendly, and the list just keeps on growing because I want to build a hyper-personalised business that meets all my needs, with full control and everything first-party. I think we're going to see more and more of model first companies operating with this mindset.
  • Any business thought that revolved around the high cost of switching from one technology to another, or migrations from one technology to another, was a form of lock-in. This is provably falsified now. It is so easy to rip a fart into Claude Code and migrate from one technology to another. Just last week, I migrated from Cloudflare D1 to a PlanetScale Postgres database automatically using a Ralph Loop, and it just worked. Full-on data migration. When have you ever heard of a database migration going successfully unattended? We're here now, folks.

If you currently work at a company that fits the top three bullet points, then understand that things are going to get really tight at your employer. I don't know when, but with certainty it will happen. Your best choices are either to find a new employer if the people around you don't get it, or, if there is a need and desire for automation, to lean so hard into AI, automate everything, and become the champion of AI within your company. If your company has banned AI outright, you need to depart right now and find another employer.

So with that out of the way, what is a moat?

  • Distribution. Any form of distribution. Brand awareness. Steaks and handshakes.
  • Utility-based pricing, similar to cloud infrastructure on a cents per megabyte or CPU hour.
  • Operating as a model-first company and accelerating the transformation so you can operate under the principles below:
Principles — Latent Patterns
Principles for building products with large language models and the latent space — hard-won lessons from shipping AI-native software.
Software development now costs less than than the wage of a minimum wage worker
Software development now costs less than than the wage of a minimum wage worker
AI erases traditional developer identities—backend, frontend, Ruby, or Node.js. Anyone can now perform these roles, creating emotional challenges for specialists with decades of experience. - https://ghuntley.com/six-month-recap

This is going to be a really hard time for a lot of people because identity functions have been erased, and the hard thing is, it's not just software developers. It's people managers as well. If your identity function is managing people, you need to make adjustments. You need to get back onto the tools ASAP.

Were smaller but effectively cut 2/3rds by telling board I wouldn’t backfill in May 2023. Best decision as got rid of all the people who “are sick of hearing about ai”. 20ish people now do about 30x the output of what having more than 60 did 3 years ago.
- an anonymous founder in my DMs today.

This transformation is going to be brutal. Organisations need to be designed differently and need to transform from this...

Software development now costs less than than the wage of a minimum wage worker

to this...

Software development now costs less than than the wage of a minimum wage worker

And one of the hardest things is that AI is being rammed into the world non-consentually. It's been pushed by employers and Silicon Valley. Yeah, it sucks, but you gotta pull your chin up, process those feelings and deal with it, but for others it's gonna be really, really rough. There are going to be people who have spent years of their lives doing Game of Thrones, social political stuff, to get to where they are within a company, and it will have been all for nothing.

Software development now costs less than than the wage of a minimum wage worker

In the org chart above, consider what the value of the senior engineer, the team lead, the manager and the senior manager in this brave new world is? How much time is spent doing Dilbert activities? What if you can flatten the org chart? If you were a founder, why wouldn't you?

Software development now costs less than than the wage of a minimum wage worker

This is what I've been fearing for a year. I could be wrong, I don't know. Anyone who says that they know for sure is selling horseshit. One thing is absolutely certain: things will change, and there's no going back. The unit economics of business have forever changed.

Whether a company does layoffs really comes down to the quality of its leadership. If they're being lazy and don't have ambitious plans, they will need to lay off, because eventually the backlog will run dry, and everything will get automated.

This isn't me throwing shit at Jack. Like, literally, it's a cold, hard fact that you need fewer people to run a business now. So if you have too many people on your payroll, you need to make changes, but having said that, there will be ambitious founders and leaders who didn't overhire and understand that AI enables them to do anything, and they can do it today. They can make that five-year roadmap happen in a year and provide a backlog for all employees to work on while they utilise AI.

It's going to be really interesting to see how this pans out.

All I can ask you to do is tap someone else on the shoulder and stress to them to treat this topic seriously, upskill, and explain the risks going forward, and then ask them to do the same. You see, for a lot of people, they haven't noticed AI is knocking on their door because AI is burrowing under their house.

Software development now costs less than than the wage of a minimum wage worker

ps. socials


My experience of migrating from Google G-Suite to ProtonMail

For about 9 years, I’ve been a customer of Google G-Suite, using it for email, file storage, and photos. I’ve never fully trusted them, however I have always claimed the following.

As a paying customer, I hope that they mine my data less than they do for free users.

There’s a lot of uncertainty in that sentence. Words like hope and less aren’t exactly reassuring, and there’s no proof it’s actually the case either. With a recent price increase warning at renewal time and the current state of politics between the EU and USA, I decided to switch to an EU provider.


Linker Pessimization

In a previous post, I wrote about linker relaxation: the linkerโ€™s ability to replace a slower, larger instruction with a faster, smaller one when it has enough information at link time. For instance, an indirect call through the GOT can be relaxed into a direct call plus a nop. This is a well-known technique to optimize the instructions for performance.

Does it ever make sense to go the other direction? ๐Ÿค”

Weโ€™ve been working on linking some massive binaries that include Intelโ€™s Math Kernel Library (MKL), a prebuilt static archive. MKL ships as object files compiled with the small code-model (mcmodel=small), meaning its instructions assume everything is reachable within ยฑ2 GiB. The included object files also has some odd relocations where the addend is a very large negative number (>1GiB).

The calculation for the relocation value is S + A - P: the symbol address plus the addend minus the instruction address. WIth a sufficiently large negative addend, the relocation value can easily exceed the 2 GiB limit and the linker fails with relocation overflows.

We canโ€™t recompile MKL (itโ€™s a prebuilt proprietary archive), and we canโ€™t simply switch everything to the large code model. What can we do? ๐Ÿค”

I am calling this technique linker pessimization: the reverse of relaxation. Instead of shrinking an instruction, we expand one to tolerate a larger address space. ๐Ÿ˜ˆ

The Problematic LEA

The specific instructions that overflow in our case are LEA (Load Effective Address) instructions.

In x86_64, lea r9, [rip + disp32] performs pure arithmetic: it computes RIP + disp32 and stores the result in r9 without accessing memory. The disp32 is a 32-bit signed integer embedded directly into the instruction encoding, and the linker fills it in via an R_X86_64_PC32 relocation.

The relocation formula is S + A - P. Letโ€™s look at an example with a large addend.

Term Meaning Value (approximate)
S (Symbol) Addfress of symbol ~200 MB into .rodata
A (Addend) Constant baked into the object file 0x44000000 (โˆ’1,062 MB)
P (Position) Address of the instruction being patched ~1,200 MB into .text
S + A - P  =  200 + (โˆ’1062) โˆ’ 1200
           =  โˆ’2062 MB

A 32-bit signed integer can only represent ยฑ2,048 MB (ยฑ2 GiB). Our value of โˆ’2,062 MB exceeds that range and the linker rightfully complains ๐Ÿ’ฅ:

ld.lld: error: libfoo.a(...):(function ...: .text+0x...):
  relocation R_X86_64_PC32 out of range:
  -2160984064 is not in [-2147483648, 2147483647]

Note These LEA instructions appear in MKL because the library uses them as a way to compute an address of a data table relative to the instruction pointer. The large negative addend (-0x44000000) is intentional; itโ€™s an offset within a large lookup table.

The Idea: Replace LEA with MOV

The core idea is delightful because often as an engineer we are trained to optimize systems, but in this case we want the opposite. We swap the LEA for a MOV that reads through a nearby pointer.

Recall from the relaxation post: relaxation shrinks instructions (e.g. indirect call -> direct call). Here we do the opposite: we make the instruction do more work (pure arithmetic -> memory load) in exchange for a reachable displacement. Thatโ€™s why I consider it a pessimization or reverse-relaxation.

Both instructions use the same encoding length (7 bytes with a REX prefix), so the patch is a single byte change in the opcode. ๐Ÿค“

LEA:  4C 8D 0D xx xx xx xx    lea r9, [rip + disp32]   (opcode 0x8D)
MOV:  4C 8B 0D xx xx xx xx    mov r9, [rip + disp32]   (opcode 0x8B)
         ^^
 only this byte changes!

The difference in behavior is critical:

  • LEA: r9 = RIP + disp32 (arithmetic, no memory access). disp32 must encode the entire distance to the far-away data. This overflows.
  • MOV: r9 = *(RIP + disp32) (memory load). disp32 points to a nearby 8-byte pointer slot. The pointer slot holds the full 64-bit address. This never overflows.

Visualizing the Change

Original โ€” the LEA must reach across the entire binary:

                    disp32 must encode this entire distance
                 โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
                 โ”‚           ~2+ GiB  (OVERFLOW!)           โ”‚
                 โ”‚                                          โ”‚
  .text          โ–ผ                                          โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                              โ”‚
  โ”‚ lea r9, [rip + disp32]   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ X โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚        (0x8D)            โ”‚  can't fit in 32 bits!       โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                              โ”‚
                                                            โ”‚
  .rodata (far away)                                        โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                              โ”‚
  โ”‚ symbol + offset          โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Pessimized โ€” the MOV reads a nearby pointer that holds the full address:

  .text                          .data.fixup (nearby)
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ mov r9, [rip + disp32] โ”‚โ”€โ”€โ–ถ โ”‚ .quad <64-bit address>   โ”‚
  โ”‚        (0x8B)          โ”‚    โ”‚  (R_X86_64_64 reloc)     โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         small offset โœ“                    โ”‚
         always fits in 32 bits            โ”‚  full 64-bit pointer
                                           โ”‚  NEVER overflows
  .rodata (far away)                       โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”             โ”‚
  โ”‚ symbol + offset          โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Weโ€™ve traded one direct LEA computation for an indirect MOV through a pointer, and we make sure the displacement is now tiny. The 64-bit pointer slot can reach any address in the virtual address space. ๐Ÿ‘Œ

Implementation Details

For each problematic relocation, three changes are needed in the object file:

1. Opcode Patch: In .text, change byte 0x8D to 0x8B (1 byte).

This converts the LEA (compute address) into a MOV (load from address). The rest of the instruction encoding (ModR/M byte, REX prefix) stays identical because both instructions use the same operand format.

 Before:  4C 8D 0D xx xx xx xx    lea  r9, [rip + disp32]
 After:   4C 8B 0D xx xx xx xx    mov  r9, QWORD PTR [rip + disp32]
             ^^

2. New Pointer Slot โ€” Create a new section (.data.fixup) containing 8 zero bytes per patch site, plus a new R_X86_64_64 relocation pointing to the original symbol with the original addend.

 .data.fixup:
   .quad 0x0000000000000000      # linker fills via R_X86_64_64
         โ–ฒ
         โ””โ”€โ”€ relocation: R_X86_64_64  sym=symbol  addend=-0x44000000

R_X86_64_64 is a 64-bit absolute relocation. Its formula is simply S + A, no subtraction of P. There is no 32-bit range limitation; it can address the entire 64-bit address space. This is the key insight that makes the fix work.

3. Retarget the Original Relocation โ€” In the .rela.text entry for the patched instruction, change the symbol to point at the new pointer slot in .data.fixup and update the type to R_X86_64_PC32. The addend becomes a small offset (the distance from the instruction to the fixup slot), which is guaranteed to fit.

Note Because both LEA and MOV with a [rip + disp32] operand are exactly the same length (7 bytes with a REX prefix), we donโ€™t shift any code, donโ€™t invalidate any other relocations, and donโ€™t need to rewrite any other parts of the object file. Itโ€™s truly a surgical patch.

The pessimized MOV now performs a memory load where the original LEA did pure register arithmetic. Thatโ€™s an extra cache line fetch and a data dependency. If this instruction is in a tight loop, it could be a performance hit.

Optimization is the root of all evil, what does that make pessimization? ๐ŸงŒ


Creating massively huge fake files and binaries

I was writing a test case for lld to support โ€œthunksโ€ [llvm#180266] which uses a linker script to place two sections very far apart (8GiB) in the virtual address space.

SECTIONS {
    .text_low 0x10000: { *(.text_low) }
    .text_high 0x200000000: { *(.text_high) }
}

After linking a trivially small assembly file, I ran ls -l on the resulting binary was confused

$ ls -lh output
-rwxr-xr-x 1 fzakaria fzakaria 8.0G Feb 11 16:00 output

8 GiB. For what amounts to a handful of instructions. ๐Ÿ˜ฒ

Whatโ€™s going on? And where did all that space come from?

Apparent size vs. on-disk size

Turns out ls -l reports the logical (apparent) size of the file, which is simply an integer stored in the inode metadata. It represents the offset of the last byte written. Since .text_high lives at 0x200000000 (~8 GiB), the fileโ€™s logical size extends out that far even though the actual code is tiny.

The real story is told by du:

$ du -h output
12K     output

12 KiB on disk. The file is sparse. ๐Ÿค“

What is a sparse file?

A sparse file is one where the filesystem doesnโ€™t bother allocating blocks for regions that are all zeros. The filesystem (ext4, btrfs, etc.) stores a mapping of logical file offsets to physical disk blocks in the inodeโ€™s extent tree. For a sparse file, there are simply no extents for the hole regions.

For our 8 GiB binary, the extent tree looks something like:

Inode extent tree:
  [offset 0,       12 blocks]  โ†’ disk blocks 48392-48403   (.text_low code)
  [offset 0x1FFFF, 4 blocks]   โ†’ disk blocks 48404-48407   (.text_high code)

  (nothing for the ~8 GiB in between โ€” no extents exist)

We can use filefrag to also see the same information, albeit a little more condensed.

$ defrag -v output
Filesystem type is: 9123683e
File size of output is 8589873896 (2097138 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       1:  461921719.. 461921720:      2:             encoded
   1:  2097137.. 2097137:  461921740.. 461921740:      1:  464018856: last,eof
output: 2 extents found

When something reads the file:

  1. The virtual filesystem (VFS) receives read(fd, buf, size) at some offset
  2. The filesystem looks up the extent tree for that offset
  3. If extent found then read from the physical disk block
  4. If no extent (hole) then the kernel fills the buffer with zeros, no disk I/O

Creating sparse files yourself

You donโ€™t need a linker to create sparse files. truncate will do it:

$ truncate -s 1P bigfile
$ ls -lh bigfile
-rw-r--r-- 1 fzakaria fzakaria 1.0P Feb 11 16:00 bigfile

$ du -h bigfile
0       bigfile

A 1 PiB file that takes zero bytes on disk. dd with seek works too:

$ dd if=/dev/null of=bigfile bs=1 seek=1P

Both produce the same result: a file whose logical size is 1 PiB but whose on-disk footprint is effectively nothing.


teleporting into the future and robbing yourself of retirement projects

teleporting into the future and robbing yourself of retirement projects

I'm going to make this a really quick one because this is doing the rounds, and whilst I've tweeted about it, it's time to dig in.

What Gergely is articulating here is something that I and everyone else went through a year ago who were paying attention. AI enables you to teleport to the future and rob your future self of retirement projects. Anything that you've been putting off to do someday, you can do it now.

To quote a post I authored almost eight months ago:

It might surprise some folks, but I'm incredibly cynical when it comes to AI and what is possible; yet I keep an open mind. That said, two weeks ago, when I was in SFO, I discovered another thing that should not be possible.

Every time I find out something that works, which should not be possible, it pushes me further and further, making me think that we are already in post-AGI territory.
- https://ghuntley.com/no/ (dated July 2025)

And another post back in September 2025:

It's a strange feeling knowing that you can create anything, and I'm starting to wonder if there's a seventh stage to the "people stages of AI adoption by software developers"
teleporting into the future and robbing yourself of retirement projects
whereby that seventh stage is essentially this scene in the matrix...

In the previous 12 months, I've cloned SaaS product feature sets of many different companies. I've built file systems, networking protocols and even developed my own programming language.

From my perspective, nothing really changed in December. The models were already great, but what was needed was a time of rest - people just needed to pick up the guitar and play.

deliberate intentional practice
Something I’ve been wondering about for a really long time is, essentially, why do people say AI doesn’t work for them? What do they mean when they say that? From which identity are they coming from? Are they coming from the perspective of an engineer with a job title and
teleporting into the future and robbing yourself of retirement projects

What makes December an inflection point was the models became much easier to use to achieve good outcomes and people picked up the guitar with an open mind and played.

Over the last couple of weeks, I've been catching up with software engineers, venture capitalists, business owners, and people in sales and marketing who are all going through this period of adjustment.

Universally, it can be described as a mild form of creative psychosis for people who like to create things. All builders who have an internal reward function of creating things as a form of pleasure go through it because AI enables them to just do things.

The future belongs to people who can just do things
There, I said it. I seriously can’t see a path forward where the majority of software engineers are doing artisanal hand-crafted commits by as soon as the end of 2026. If you are a software engineer and were considering taking a gap year/holiday this year it would be an
teleporting into the future and robbing yourself of retirement projects

Everyone who gets AI goes through it, and it typically lasts about two to three months, until they get it out of their system by completing all the projects they were putting off until retirement.

Perhaps it could be described as a bit of a reset, similar to what happened during COVID-19, when people were able to reassess what they wanted to do in life.

It's a coin flip, really, because people are either going to commit more to their current employer if they are an employee, but on the other side of the coin, they're realising they are no longer dependent on others as much to achieve certain financial outcomes.

Perhaps this is the tipping point where more people throw their hats in and become entrepreneurs.

People with ideas and unique insight can get concepts to market in rapid time and be less dependent on needing others' expertise as the world's knowledge is now in the palms of everyone's hands.

Technologists are still required, perhaps it's the ideas guys/gals who should be concerned as software engineers now have a path to bootstrap a concept in every white collar industry (recruiting, law, finance, finance, accounting, et al) at breakneck speed without having to find co-founders.

- From Feb 2025

I guess I need to wrap this up now, but I will say this:

I've written about how some people won't make it, and I've spent the last year talking about this, pleading with people to pick up the guitar and play...

If you're having trouble sleeping because of all the things that you want to create, congratulations.

You've made it through to the other side of the chasm, and you are developing skills that employers in 2026 are expecting as a bare minimum.

The only question that remains is whether you are going to be a consumer of these tools or someone who understands them deeply and automates your job function?

how to build a coding agent: free workshop
It’s not that hard to build a coding agent. 300 lines of code running in a loop with LLM tokens. You just keep throwing tokens at the loop, and then you’ve got yourself an agent.
teleporting into the future and robbing yourself of retirement projects

go build yourself an agent and taste building in the recursive latent space

Trust me, you want to be in the latter camp because consumption is now the baseline for employment.

After you come out of this phase, I hope you get to where I am, because just because you can build something doesn't mean you necessarily should. Knowing what not to build now that anything can be built is a very important life lesson.

ps. socials


Crazy shit linkers do: Common Data (COMDAT) sections

Managing code at scale is hard and comes with a lot of weird quirks in your toolchain. I wrote previously about some of the crazy shit linkers can do and that is really the tip of the iceberg.

Letโ€™s take a peek at COMDAT (Common Data) sections and some of the weird hiccups you can run into.

What even is COMDAT ?

Well, to understand what a COMDAT section, letโ€™s create a simple example to demonstrate.

Consider this example where we will create a Cache<T> helper class and leverage it across two different translation units: library.o and main.o

Note This example was inspired from @grigorypas on the discussion on the LLVM discourse.

We can compile each individually such as gcc -std=c++17 -g -O0 -c library.cpp -o library.o. The -O0 is important here otherwise this simple code will be inlined, and -std=c++17 allows us to use inline static variables.

// cache.h
#pragma once

template<typename T>
struct Cache {
    inline static T data;
    static void set(T val) { data = val; }
};

// library.cpp
#include "cache.h"

void foo() {
    Cache<int>::set(42);
}

// main.cpp
#include "cache.h"

void bar() {
    Cache<int>::set(31);
}

extern void foo();

int main() {
    foo();
    bar();
    return 0;
}

Because Cache<T> is a template, the compiler must generate the machine code for Cache<int>::set in every object file (.o) that uses it. If you compile main.cpp and library.cpp and they both use Cache<int>, both object files will contain this code.

We can double check this with objdump and sure enough, both main.o and library.o contain a duplicate section, meaning the instructions, for _ZN5CacheIiE3setEi which is the mangled version of Cache<int>::set.

> objdump -d -j .text._ZN5CacheIiE3setEi main.o

Disassembly of section .text._ZN5CacheIiE3setEi:

0000000000000000 <_ZN5CacheIiE3setEi>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	89 7d fc             	mov    %edi,-0x4(%rbp)
   7:	8b 45 fc             	mov    -0x4(%rbp),%eax
   a:	89 05 00 00 00 00    	mov    %eax,0x0(%rip)
  10:	90                   	nop
  11:	5d                   	pop    %rbp
  12:	c3                   	ret


> objdump -d -j .text._ZN5CacheIiE3setEi library.o

Disassembly of section .text._ZN5CacheIiE3setEi:

0000000000000000 <_ZN5CacheIiE3setEi>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	89 7d fc             	mov    %edi,-0x4(%rbp)
   7:	8b 45 fc             	mov    -0x4(%rbp),%eax
   a:	89 05 00 00 00 00    	mov    %eax,0x0(%rip)
  10:	90                   	nop
  11:	5d                   	pop    %rbp
  12:	c3                   	ret

Wow! Given the prevailing use of templates in C++ this is already seemingly incredibly wasteful since every .o has to include the instructions for the same templates. ๐Ÿ˜ฒ

At link time, the linker has to resolve the function to use only one of these implementations.

What do we do with all the other duplicate implementations?

Thatโ€™s where COMDAT comes in! ๐Ÿค“

To prevent your final binary from being 10x larger than necessary, the compiler marks these duplicate sections as COMDAT (Common Data). The linkerโ€™s job is simple: pick one, discard the rest.

We can inspect these groupings using readelf -g.

> readelf -g main.o -W

COMDAT group section [    1] `.group' [_ZN5CacheIiE3setEi] contains 2 sections:
   [Index]    Name
   [    6]   .text._ZN5CacheIiE3setEi
   [    7]   .rela.text._ZN5CacheIiE3setEi

Here is the pickle. How does the linker pick which section to use?

Traditionally (not specified by any ABI), the linker selects the first .o provided to it on the command-line.

Is this problematic?

Well, what if the two object files were build with different code-models (i.e. mcmodel). Letโ€™s build main.cpp with large code-model mcmodel=large.

> gcc -g -O0 -mcmodel=large -c main.cpp -o main.o

> objdump -d -j .text._ZN5CacheIiE3setEi main.o

Disassembly of section .text._ZN5CacheIiE3setEi:

0000000000000000 <_ZN5CacheIiE3setEi>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	89 7d fc             	mov    %edi,-0x4(%rbp)
   7:	48 ba 00 00 00 00 00 	movabs $0x0,%rdx
   e:	00 00 00 
  11:	8b 45 fc             	mov    -0x4(%rbp),%eax
  14:	89 02                	mov    %eax,(%rdx)
  16:	90                   	nop
  17:	5d                   	pop    %rbp
  18:	c3                   	ret

> objdump -d -j .text._ZN5CacheIiE3setEi library.o

Disassembly of section .text._ZN5CacheIiE3setEi:

0000000000000000 <_ZN5CacheIiE3setEi>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	89 7d fc             	mov    %edi,-0x4(%rbp)
   7:	8b 45 fc             	mov    -0x4(%rbp),%eax
   a:	89 05 00 00 00 00    	mov    %eax,0x0(%rip)
  10:	90                   	nop
  11:	5d                   	pop    %rbp
  12:	c3                   	ret

Although the section names are the same, the instructions generated are now different. The large code-model uses movabs which has worse performance characteristics.

Letโ€™s verify what lld does by linking them.

# Link library.o first
> gcc library.o main.o -o a.out
> objdump -d a.out
0000000000401117 <_ZN5CacheIiE3setEi>:
  401117:	55                   	push   %rbp
  401118:	48 89 e5             	mov    %rsp,%rbp
  40111b:	89 7d fc             	mov    %edi,-0x4(%rbp)
  40111e:	8b 45 fc             	mov    -0x4(%rbp),%eax
  401121:	89 05 ed 2e 00 00    	mov    %eax,0x2eed(%rip)
  401127:	90                   	nop
  401128:	5d                   	pop    %rbp
  401129:	c3                   	ret

# Link main.o first
> gcc main.o library.o -o a.out
> objdump -d a.out
0000000000401141 <_ZN5CacheIiE3setEi>:
  401141:	55                   	push   %rbp
  401142:	48 89 e5             	mov    %rsp,%rbp
  401145:	89 7d fc             	mov    %edi,-0x4(%rbp)
  401148:	48 ba 14 40 40 00 00 	movabs $0x404014,%rdx
  40114f:	00 00 00 
  401152:	8b 45 fc             	mov    -0x4(%rbp),%eax
  401155:	89 02                	mov    %eax,(%rdx)
  401157:	90                   	nop
  401158:	5d                   	pop    %rbp
  401159:	c3                   	ret

We see that the section selected does depend on the .o order provided. ๐Ÿ˜ฌ

Why does all this matter?

We are pursuing moving some code to the medium code-model to overcome some relocation overflows, however we have some prebuilt code built in the small code-model. We noticed that although our goal was to leverage the medium code-model, the linker might chose the small code-model variant of a section if it happened to be found first.

If the linker blindly picks the โ€œsmall modelโ€ version (which uses 32-bit relative offsets) but places the data more than 2GB away we still might end up with the relocation overflow errors we sought to resolve.

But wait, it gets worse.

The fact that we may instantiate multiple incarnations of a particular symbol but only select one is often known as the One Definition Rule (ODR). The ODR implies that the definition of a symbol must be identical across all translation units. But the linker generally doesnโ€™t check this (unless you use LTO, and even then, itโ€™s fuzzy). It just checks the symbol name.

Imagine if library.cpp was compiled with -DLOGGING_ENABLED which injected printf calls into Cache::set, while main.cpp was compiled in release mode without it.

If the linker picks the main.o (release) version of the COMDAT group, your โ€œDebugโ€ library implementation loses its logging features effectively muting your debug logic. Conversely, if it picks the library.o version, your high-performance release binary suddenly has debug logging in critical hot paths.

You arenโ€™t just gambling with instruction selection that may affect performance such as in the case of code-models; you are gambling with program logic. Given that the section name is purely based on the name of the symbol, itโ€™s easy to see that you can get yourself into oddities if you accidentally link implementations that wildly differ.

I can see why now many languages now force symbols to only ever be defined in a single translation unit as it avoids this whole conundrum. ๐Ÿ™ƒ


Crazy shit linkers do: Relaxation

I have been looking into linkers recently and Iโ€™ve been amazed at all the crazy options and optimizations that a linker may perform. Compilers are a well understood domain, taught in schools with a plethora of books but few resources exist for linkers aside from what you may find on some excellent technical blogs such as Lance Taylorโ€™s series on writing the gold linker and Fangrui Songโ€™s, also known as MaskRay, very in-depth blog.

I wanted to write down in my own style, concepts Iโ€™m learning from first principles.

Recently, I came across a term โ€œrelaxationโ€ as I was fuddling around LLVMโ€™s lld.

What is it? ๐Ÿค”

Note Relaxation looks to be relatively new, and the original RFC to the x86-64-abi google group was proposed in 2015.

Well, letโ€™s look at a super simple example to understand what it is and why we want it.

If you want to follow along take a look at this godbolt example.

// Declare it, but don't define it.
// The compiler assumes it might be in a shared library.
extern void external_function();

void example() {
external_function();
}

If we compile this with -O0 -fno-plt -fpic -mcmodel=medium -Wa,-mrelax-relocations=no we see the following disassembly in the object file using objdump.

example():
 push   rbp
 mov    rbp,rsp
 call   QWORD PTR [rip+0x0]        # a <example()+0xa>
    R_X86_64_GOTPCREL external_function()-0x4
 pop    rbp
 ret

Specifically, the compile has left a โ€œnoteโ€ for the linker in the form of a relocation, specifically R_X86_64_GOTPCREL.

You can see that the address in the emitted code is 0x0 after compilation. The linker needs to replace that value with the address of the function from the GOT relative to the rip register (instruction pointer).

This works great and is necessary for shared libraries but what if we are building a final static binary? ๐Ÿค“

Turns out that in some cases, this instruction can be further simplified by the linker since when producing the final executable binary it has all the information.

We will have to see the actual instruction-code to understand this further.

If we look at the hexcode for that assembly we see the following:

ff 15 00 00 00 00 call *0x0(%rip)

This indirect call call (ff) via the GOT address is 6 bytes long with 2 bytes for the opcode & 4 bytes belonging to the offset to the GOT entry.

Note Understanding x86-64 is its own whole can of worms. The ISA is incredibly dense and complex, but if you want you can reference it here.

x86-64 though has other call types (e8), that operate in a direct mode where it calls the address relative to the bytes presented.

This direct-mode call type is only 5 bytes long with 1 byte for the opcode and 4 bytes for the offset to the function.

If we knew the location of the function ahead of time, it would be nice if we could skip checking the GOT completely and just go to where we want to be.

Why would we want to do this?

Well itโ€™s more efficient to directly jump to the address we want to end up directly. The CPU doesnโ€™t have to load the memory stored at the GOT before jumping to it.

When building a static binary the linker should know all the final relative addresses of all the functions, so going through the GOT is no longer necessary.

Since the number of bytes is nearly equal, the linker can effectively patch the binary without disrupting other relative calculations, provided it can fill the small gap.

We only need to find a single byte to pad our more-efficient call! ๐Ÿ•ต๏ธ

Turns out, the nop operation is only a single byte. ๐Ÿ‘Œ

We then get the equality:

call *foo@GOTPCREL(%rip) => [nop call foo] or [call foo nop]

This is what the R_X86_64_GOTPCRELX relocation indicates. It tells the linker it is safe to โ€œrelaxโ€ and modify the instructions to the more performant variation.

When we enable relaxation, we now generate the same code as above but with this new relocation type instructing the linker to perform the optimization if possible.

 call   QWORD PTR [rip+0x0]        # a <example()+0xa>
    R_X86_64_GOTPCRELX external_function()-0x4

Note Why not just always optimize R_X86_64_GOTPCREL when possible and forgo introducing a new relocation? My own guess is that itโ€™s important to be backwards compatible and you wouldnโ€™t want the emitted code to vary depending on the linker version but I would be interested to hear something more concrete if you know!

Interestingly that many linkers, optimize this even further!

Rather than generating a nop instruction, the linker instead prefixes the call with 0x67 (addr32).

On x86-64, 0x67 (addr32) usually implies 32-bit addressing for the operand. However, for a relative call instruction, it acts as a benign prefix that effectively ignores the override but also consumes exactly 1 byte.

If we go back to our example and enable relaxation, and produce a final binary, we can disassemble it to see whether it was relaxed.

> objdump -SD main

0000000000401133 <example>:
  401133:	55                   	push   %rbp
  401134:	48 89 e5             	mov    %rsp,%rbp
  401137:	48 8d 05 9a 2e 00 00 	lea    0x2e9a(%rip),%rax        # 403fd8 <_GLOBAL_OFFSET_TABLE_>
  40113e:	b8 00 00 00 00       	mov    $0x0,%eax
  401143:	67 e8 bd ff ff ff    	addr32 call 401106 <external_function>
  401149:	90                   	nop
  40114a:	5d                   	pop    %rbp
  40114b:	31 c0                	xor    %eax,%eax
  40114d:	c3                   	ret

Here we can see that in fact our call was relaxed since we can see addr32 call 401106 ๐Ÿฅณ.

As it happens, you can do this same โ€œrelaxationโ€ optimization for a few other instructions such as test, jmp and mov but the basic premise is the same.


donโ€™t waste your back pressure

don’t waste your back pressure

I am fortunate to be surrounded by folks who listen and the link below post will go down as a seminal reading for people interested in AI context engineering.

A simple convo between mates - well Moss translated it into words and i’ve been waiting for it to come out so I didn’t front run him.

Don’t waste your back pressure ·
Back pressure for agents You might notice a pattern in the most successful applications of agents over the last year. Projects that are able to setup structure around the agent itself, to provide it with automated feedback on quality and correctness, have been able to push them to work on longer horizon tasks. This back pressure helps the agent identify mistakes as it progresses and models are now good enough that this feedback can keep them aligned to a task for much longer. As an engineer, this means you can increase your leverage by delegating progressively more complex tasks to agents, while increasing trust that when completed they are at a satisfactory standard.
don’t waste your back pressure

read this and internalise this

Enjoy. This is what engineering now looks like in the post loom/gastown era or even when doing ralph loops.

don’t waste your back pressure
software engineering is now about preventing failure scenarios and preventing the wheel from turning over through back pressure to the generative function

If you aren’t capturing your back-pressure then you are failing as a software engineer.


everything is a ralph loop

everything is a ralph loop

I’ve been thinking about how I build software is so very very different how I used to do it three years ago.

No, I’m not talking about acceleration through usage of AI but instead at a more fundamental level of approach, techniques and best practices.

Standard software practices is to build it vertically brick by brick - like Jenga but these days I approach everything as a loop. You see ralph isn’t just about forwards (building autonomously) or reverse mode (clean rooming) it’s also a mind set that these computers can be indeed programmed.

watch this video to learn the mindset

I’m there as an engineer just as I was in the brick by brick era but instead am programming the loop, automating my job function and removing the need to hire humans.

Everyone right now is going through their zany period - just like i did with forward mode and building software AFK on full auto - however I hope that folks will come back down from orbit and remember this from the original ralph post.

While I was in SFO, everyone seemed to be trying to crack on multi-agent, agent-to-agent communication and multiplexing. At this stage, it's not needed. Consider microservices and all the complexities that come with them. Now, consider what microservices would look like if the microservices (agents) themselves are non-deterministic—a red hot mess.

What's the opposite of microservices? A monolithic application. A single operating system process that scales vertically. Ralph is monolithic. Ralph works autonomously in a single repository as a single process that performs one task per loop.

Software is now clay on the pottery wheel and if something isn’t right then i just throw it back on the wheel to address items that need resolving.

Ralph is an orchestrator pattern where you allocate the array with the required backing specifications and then give it a goal then looping the goal.

It's important to watch the loop as that is where your personal development and learning will come from. When you see a failure domain – put on your engineering hat and resolve the problem so it never happens again.

In practice this means doing the loop manually via prompting or via automation with a pause that involves having to prcss CTRL+C to progress onto the next task. This is still ralphing as ralph is about getting the most out how the underlying models work through context engineering and that pattern is GENERIC and can be used for ALL TASKS.

In other news I've been cooking on something called "The Weaving Loom". The source code of loom can now be found on my GitHub; do not use it if your name is not Geoffrey Huntley. Loom is something that has been in my head for the last three years (and various prototypes were developed last year!) and it is essentially infrastructure for evolutionary software. Gas town focuses on spinning plates and orchestration - a full level 8.

everything is a ralph loop
see https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04

I’m going for a level 9 where autonomous loops evolve products and optimise automatically for revenue generation. Evolutionary software - also known as a software factory.

everything is a ralph loop

This is a divide now - we have software engineers outwardly rejecting AI or merely consuming via Claude Code/Cursor to accelerate the lego brick building process....

but software development is dead - I killed it. Software can now be developed cheaper than the wage of a burger flipper at maccas and it can be built autonomously whilst you are AFK.

hi, it me. i’m the guy

I’m deeply concerned for the future of these people and have started publishing videos on YouTube to send down ladders before the big bang happens.

i now won’t hire you unless you have this fundamental knowledge and can show what you have built with it

Whilst software development/programming is now dead. We however deeply need software engineers with these skills who understand that LLMs are a new form of programmable computer. If you haven’t built your own coding agent yet - please do.

how to build a coding agent: free workshop
It’s not that hard to build a coding agent. 300 lines of code running in a loop with LLM tokens. You just keep throwing tokens at the loop, and then you’ve got yourself an agent.
everything is a ralph loop

ps. think this is out there?

It is but watch it happen live. We are here right now, it’s possible and i’m systemising it.

Here in the tweet below I am putting loom under the mother of all ralph loops to automatically perform system verification. Instead of days of planning, discussions and weeks of verification I’m programming this new computer and doing it afk whilst I DJ so that I don’t have to hire humans.

everything is a ralph loop

Any faults identified can be resolved through forward ralph loops to rectify issues. Over the last year the models have became quite good and it's only now that I'm able to realise this full vision but I'll leave you with this dear reader....

What if the models don't stop getting good?

How well will your fair if you are still building jenga stacks when there are classes of principal software engineers out there to prove a point that we are here right now and please pay attention.

everything is a ralph loop

Go build your agent, go learn how to program the new computer (guidance forthcoming in future posts), fall in love with all the possibilities and then join me in this space race of building automated software factories.

ps. socials


Bespoke software is the future

At Google, some of the engineers would joke, self-deprecatingly, that the software internally was not particularly exceptional but rather Googleโ€™s dominance was an example of the power of network effects: when software is custom tailored to work well with each other.

This is often cited externally to Google, or similar FAANG companies, as indulgent โ€œNIHโ€ (Not Invented Here) syndrome; where the prevailing practice is to pick generalized software solutions, preferably open-source, off-the shelf.

The problem with these generalized solutions is that, well, they are generalized and rarely fit well together. ๐Ÿ™„ Engineers are trained to be DRY (Donโ€™t Repeat Yourself), and love abstractions. As a tool tries to solve more problems, the abstraction becomes leakier and ill-fitting. It becomes a general-purpose tax.

If you only need 10% of a software solution, you pay for the remaining 90% via the abstractions they impose. ๐Ÿซ 

Internally to a company, however, we are taught that unused code is a liability. We often celebrate negative pull-requests as valuable clean-up work with the understanding that smaller code-bases are simpler to understand, operate and optimize.

Yet for our most of our infrastructure tooling, we continue to bloat solutions and tout support despite miniscule user bases.

This is probably one of the areas I am most excited about with the ability to leverage LLM for software creation.

I recently spent time investigating linkers in previous posts such as LLVMโ€™s lld.

I found LLVM to be a pretty polished codebase with lots of documentation. Despite the high-quality, navigating the codebase is challenging as itโ€™s a mass of interfaces and abstractions in order to support: multiple object file formats, 13+ ISAs, a slough of features (i.e. linker scripts ) and multiple operating systems.

Instead, I leveraged LLMs to help me design and write ยตld, a tiny opinionated linker in Rust that only targets ELF, x86_64, static linking and barebone feature-set.

It shouldnโ€™t be a surprise to anyone that the end result is a codebase that I can audit, educate myself and can easily grow to support additional improvements and optimizations.

The surprising bit, especially to me, was how easy it was to author and write within a very short period of time (1-2 days).

That means smaller companies, without the coffer of similar FAANG companies, can also pursue bespoke custom tailored software for their needs.

This future is well-suited for tooling such as Nix. Nix is the perfect vehicle to help build custom tooling as you have a playground that is designed to build the world similar to a monorepo.

We need to begin to cut away legacy in our tooling and build software that solves specific problems. The end-result will be smaller, easier to manage and better integrated. Where this might have seemed unattainable for most, LLMs will democratize this possibility.

Iโ€™m excited for the bespoke future.


Huge binaries: papercuts and limits

In a previous post, I synthetically built a program that demonstrated a relocation overflow for a CALL instruction.

However, the demo required I add -fno-asynchronous-unwind-tables to disable some additional data that might cause other overflows for the purpose of this demonstration.

Whatโ€™s going on? ๐Ÿค”

This is a good example that only a select few are facing the size-pressure of massive binaries.

Even with mcmodel=medium which already is beginning to articulate to the compiler & linker: โ€œHey, I expect my binary to be pretty big.โ€; there are surprising gaps where the linker overflows.

On Linux, an ELF binary includes many other sections beyond text and data necessary for code execution. Notably there are sections included for debugging (DWARF) and language-specific sections such as .eh_frame which is used by C++ to help unwind the stack on exceptions.

Turns out that even with mcmodel=large you might still run into overflow errors! ๐Ÿคฆ๐Ÿปโ€โ™‚๏ธ

Note Funny enough, there is a very recent opened issue for this with LLVM #172777; perfect timing!

For instance, lld assumes 32-bit eh_frame_hdr values regardless of the code model. There are similar 32-bit assumptions in the data-structure of eh_frame as well.

I also mentioned earlier about a pattern about using multiple GOT, Global Offset Tables, to also avoid the 31-bit (ยฑ2GiB) relative offset limitation.

Is there even a need for the large code-model?

How far can that take us before we are forced to use the large code-model?

Letโ€™s think about it:

First, letโ€™s think about any limit due to overflow accessing the multiple GOTs. Letโ€™s say we decide to space out our duplicative GOT every 1.5GiB.

|<---- 1.5GiB code ----->|<----- GOT ----->|<----- 1.5GiB code ----->|<----- GOT ----->|

That means each GOT can grow at most 500MiB before there could exist a CALL instruction from the code section that would result in an overflow.

Each GOT entry is 8 bytes, a 64bit pointer. That means we have roughly ~65 million possible entries.

A typical GOT relocation looks like the following and it requires 9 bytes: 7 bytes for the movq and 2 bytes for movl.

movq    var@GOTPCREL(%rip), %rax  # R_X86_64_REX_GOTPCRELX
movl    (%rax), %eax

That means we have 1.5GiB / 9 = ~178 million possible unique relocations.

So theoretically, we can require more unique symbols in our code section than we can fit in the nearest GOT, and therefore cause a relocation overflow. ๐Ÿ’ฅ

The same problem exists for thunks, since the thunk is larger than the relative call in bytes.

At some point, there is no avoiding the large code-model, however with multiple GOTs, thunks and other linker optimizations (i.e. LTO, relaxation), we have a lot of headroom before itโ€™s necessary. ๐Ÿ•บ๐Ÿป


Huge binaries: I thunk therefore I am

In my previous post, we looked at the โ€œsound barrierโ€ of x86_64 linking: the 32-bit relative CALL instruction and how it can result in relocation overflows. Changing the code-model to -mcmodel=large fixes the issue but at the cost of โ€œinstruction bloatโ€ and likely a performance penalty although I had failed to demonstrate it via a benchmark ๐Ÿฅฒ.

Surely there are other interesting solutions? ๐Ÿค“

First off, probably the simplest solution is to not statically build your code and rely on dynamic libraries ๐Ÿ™ƒ. This is what most โ€œnormalโ€ software-shops and the world does; as a result this hasnโ€™t been such an issue otherwise.

This of course has its own downsides and performance implications which Iโ€™ve written about and produced solutions for (i.e., Shrinkwrap & MATR) via my doctorate research. Beyond the performance penalty induced by having thousands of shared-libraries, you lose the simplicity of single-file deployments.

A more advanced set of optimizations are under the umbrella of โ€œLTOโ€; Link Time Optimizations. The linker at the final stage has all the information necessary to perform a variety of optimizations such as code inlining and tree-shaking. That would seem like a good fit except these huge binaries would need an enormous amount of RAM to perform LTO and cause build speeds to go to a crawl.

Tip This is still an active area of research and Google has authored ThinLTO. Facebook has its own set of profile guided LTO optimizations as well via Bolt.

What if I told you that you could keep your code in the fast, 5-byte small code-model, even if your binary is 25GiB for most callsites. ๐Ÿง

Turns out there is prior art for โ€œLinker Thunksโ€ [ref] within LLVM for various architectures โ€“ notably missing for x86_64 with a quote:

โ€œi386 and x86-64 donโ€™t need thunksโ€ [ref]

What is a โ€œthunkโ€ ?

You might know it by a different name and we use them all the time for dynamic-linking in fact; a trampoline via the procedure linkage table (PLT).

A thunk (or trampoline) is a linker-inserted shim that lives within the immediate reach of the caller. The caller branches to the thunk using a standard relative jump, and the thunk then performs an absolute indirect jump to the final destination.

thunk image

LLVM includes support for inserting thunks for certain architectures such as AArch64 because it is a fixed-size instruction set (32-bit), so the relative branch instruction is restricted to 128MiB. As this limit is so low, lld has support for thunks out of the box.

If we cross-compile our โ€œfar functionโ€ example for AArch64 using the same linker script to synthetically place it far away to trigger the need for a thunk, the linker magic becomes visible immediately.

> aarch64-linux-gnu-gcc -c main.c -o main.o \
-fno-exceptions -fno-unwind-tables \
-fno-asynchronous-unwind-tables

> aarch64-linux-gnu-gcc -c far.c -o far.o \
-fno-exceptions -fno-unwind-tables \
-fno-asynchronous-unwind-tables

> ld.lld main.o far.o -T overflow.lds -o thunk-aarch64

We can now see the generated code with objdump.

> aarch64-unknown-linux-gnu-objdump -dr thunk-example 

Disassembly of section .text:

0000000000400000 <main>:
  400000:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
  400004:	910003fd 	mov	x29, sp
  400008:	94000004 	bl	400018 <__AArch64AbsLongThunk_far_function>
  40000c:	52800000 	mov	w0, #0x0                   	// #0
  400010:	a8c17bfd 	ldp	x29, x30, [sp], #16
  400014:	d65f03c0 	ret

0000000000400018 <__AArch64AbsLongThunk_far_function>:
  400018:	58000050 	ldr	x16, 400020 <__AArch64AbsLongThunk_far_function+0x8>
  40001c:	d61f0200 	br	x16
  400020:	20000000 	.word	0x20000000
  400024:	00000001 	.word	0x00000001

Disassembly of section .text.far:

0000000120000000 <far_function>:
   120000000:	d503201f 	nop
   120000004:	d65f03c0 	ret

Instead of branching to far_function at 0x120000000, it branches to a generated thunk at 0x400018 (only 16 bytes away). The thunk similar to the large code-model, loads x16 with the absolute address, stored in the .word, and then performs an absolute jump (br).

What if x86_64 supported this? Can we now go beyond 2GiB? ๐Ÿคฏ

There would be some more similar thunks that would need to be fixed beyond CALL instructions. Although we are mostly using static binaries, some libraries such as glibc may be dynamically loaded. The access to the methods from these shared libraries are through the GOT, Global Offset Table, which gives the address to the PLT (which is itself a thunk ๐Ÿคฏ).

The GOT addresses are also loaded via a relative offset so they will need to changed to be either use thunks or perhaps multiple GOT sections; which also has prior art for other architectures such as MIPS [ref].

With this information, the necessity of code-models feels unecessary. Why trigger the cost for every callsite when we can do-so piecemeal as necessary with the opportunity to use profiles to guide us on which methods to migrate to thunks.

Furthermore, if our binaries are already tens of gigabytes, clearly size for us is not an issue. We can duplicate GOT entries, at the cost of even larger binaries, to reduce the need for even more thunks for the PLT jmp.

What do you think? Letโ€™s collaborate.


Huge binaries

A problem I experienced when pursuing my PhD and submitting academic articles was that I had built solutions to problems that required dramatic scale to be effective and worthwhile. Responses to my publication submissions often claimed such problems did not exist; however, I had observed them during my time within industry, such as at Google, but I couldnโ€™t cite it!

One problem that is only present at these mega-codebases is massive binaries. Whatโ€™s the largest binary (ELF file) youโ€™ve ever seen? I had observed binaries beyond 25GiB, including debug symbols. How is this possible? These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the worldโ€™s largest codebases is a recipe for massive binaries.

Similar to the sound barrier, there is a point at which code size becomes problematic and we must re-think how we link and build code. For x86_64, that is the 2GiB โ€œRelocation Barrier.โ€

Why 2GiB? ๐Ÿค”

Well letโ€™s take a look at how position independent code is put-together.

Letโ€™s look at a simple example.

extern void far_function();

int main() {
    far_function();
    return 0;
}

If we compile this gcc -c simple-relocation.c -o simple-relocation.o we can inspect it with objdump.

> objdump -dr simple-relocation.o

0000000000000000 <main>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	b8 00 00 00 00       	mov    $0x0,%eax
   9:	e8 00 00 00 00       	call   e <main+0xe>
			a: R_X86_64_PLT32	far_function-0x4
   e:	b8 00 00 00 00       	mov    $0x0,%eax
  13:	5d                   	pop    %rbp
  14:	c3                   	ret

Thereโ€™s a lot going on here, but one important part is e8 00 00 00 00. e8 is the CALL opcode [ref] and it takes a 32bit signed relative offset, which happens to be 0 (four bytes of 0) right now. objdump also lets us know there is a โ€œrelocationโ€ necessary to fixup this code when we finalize it. We can view this relocation with readelf as well.

Note If you are wondering why we need -0x4, itโ€™s because the offset is relative to the instruction-pointer which has already moved to the next instruction. The 4 bytes is the operand it has skipped over.

> readelf -r simple-relocation.o -d

Relocation section '.rela.text' at offset 0x170 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000400000004 R_X86_64_PLT32    0000000000000000 far_function - 4

This is additional information embedded in the binary which tells the linker in susbsequent stages that it has code that needs to be fixed. Here we see the address 00000000000a, and a is 9 + 1, which is the offset of the start of the operand for our CALL instruction.

Letโ€™s now create the C file for our missing function.

void far_function() {
}

We will now compile it and link the two object files together using our linker.

> gcc simple-relocation.o far-function.o -o simple-relocation

Letโ€™s now inspect that same callsite and see what it has.

> objdump -dr simple-relocation

0000000000401106 <main>:
  401106:	55                   	push   %rbp
  401107:	48 89 e5             	mov    %rsp,%rbp
  40110a:	b8 00 00 00 00       	mov    $0x0,%eax
  40110f:	e8 07 00 00 00       	call   40111b <far_function>
  401114:	b8 00 00 00 00       	mov    $0x0,%eax
  401119:	5d                   	pop    %rbp
  40111a:	c3                   	ret

000000000040111b <far_function>:
  40111b:	55                   	push   %rbp
  40111c:	48 89 e5             	mov    %rsp,%rbp
  40111f:	90                   	nop
  401120:	5d                   	pop    %rbp
  401121:	c3                   	ret

We can see that the linker did the right thing with the relocation and calculated the relative offset of our symbol far_function and fixed the CALL instruction.

Okay coolโ€ฆ๐Ÿคท What does this have to do with huge binaries?

Notice that this call instruction, e8, only takes 32bits signed which means itโ€™s limited to 2^31 bits. This means a callsite can only jump roughly 2GiB forward or 2GiB backward. The โ€œ2GiB Barrierโ€ represents the total reach of a single relative jump.

What happens if our callsite is over 2GiB away?

Letโ€™s build a synthetic example by asking our linker to place far_function really really far away. We can do this using a โ€œlinker scriptโ€, which is a mechanism we can instruct the linker how we would like our code sections laid out when the program starts.

SECTIONS
{
    /* 1. Start with standard low-address sections */
    . = 0x400000;
    
    /* Catch everything except our specific 'far' object */
    .text : { 
        simple-relocation.o(.text.*) 
    }
    .rodata : { *(.rodata .rodata.*) }
    .data   : { *(.data .data.*) }
    .bss    : { *(.bss .bss.*) }

    /* 2. Move the cursor for the 'far' island */
    . = 0x120000000; 
    
    .text.far : { 
        far-function.o(.text*) 
    }
}

If we now try to link our code we will see a โ€œrelocation overflowโ€.

TIP I used lld from LLVM because the error messages are a bit prettier.

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow -fuse-ld=lld

ld.lld: error: <internal>:(.eh_frame+0x6c):
relocation R_X86_64_PC32 out of range:
5364513724 is not in [-2147483648, 2147483647]; references section '.text'
ld.lld: error: simple-relocation.o:(function main: .text+0xa):
relocation R_X86_64_PLT32 out of range:
5364514572 is not in [-2147483648, 2147483647]; references 'far_function'
>>> referenced by simple-relocation.c
>>> defined in far-function.o

When we hit this problem what solutions do we have? Well this is a complete other subject on โ€œcode modelsโ€, and itโ€™s a little more nuanced depending on whether we are accessing data (i.e. static variables) or code that is far away. A great blog post that goes into this is the following by @maskray who wrote lld.

The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute 64bit ones; kind of like a JMP.

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

> gcc -c simple-relocation.c -o simple-relocation.o -mcmodel=large -fno-asynchronous-unwind-tables

> gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

./simple-relocation-overflow

Note I needed to add -fno-asynchronous-unwind-tables to disable some additional data that might cause overflow for the purpose of this demonstration.

What does the disassembly look like now?

> objdump -dr simple-relocation-overflow 

0000000120000000 <far_function>:
   120000000:	55                   	push   %rbp
   120000001:	48 89 e5             	mov    %rsp,%rbp
   120000004:	90                   	nop
   120000005:	5d                   	pop    %rbp
   120000006:	c3                   	ret

00000000004000e6 <main>:
  4000e6:	55                   	push   %rbp
  4000e7:	48 89 e5             	mov    %rsp,%rbp
  4000ea:	b8 00 00 00 00       	mov    $0x0,%eax
  4000ef:	48 ba 00 00 00 20 01 	movabs $0x120000000,%rdx
  4000f6:	00 00 00 
  4000f9:	ff d2                	call   *%rdx
  4000fb:	b8 00 00 00 00       	mov    $0x0,%eax
  400100:	5d                   	pop    %rbp
  400101:	c3                   	ret

There is no longer a sole CALL instruction, it has become MOVABS & CALL ๐Ÿ˜ฒ. This changed the instructions from 5 (opcode + 4 bytes for 32bit relative offset) to a whopping 12 bytes (2 bytes for ABS opcode + 8 bytes for absolute 64 bit address + 2 bytes for CALL).

This has notable downsides among others:

  • Instruction Bloat: Weโ€™ve gone from 5 bytes per call to 12. In a binary with millions of callsites, this can add up.
  • Register Pressure: Weโ€™ve burned a general-purpose register, %rdx, to perform the jump.

Caution I had a lot of trouble building a benchmark that demonstrated a worse lower IPC (instructions per-cycle) for the large mcmodel, so letโ€™s just take my word for it. ๐Ÿคท

Changing to a larger code-model is possible but it comes with these downsides. Ideally, we would like to keep our small code-model when we need it. What other strategies can we pursue?

More to come in subsequent writings.


Failing interviews

My blog has been a little quiet. I recently accepted a new role at Meta and itโ€™s been keeping me busy!

Once the onboarding phase is done I hope to get back to my Nix contributions.

Accepting the position at Meta has had me reflecting on my journey to this current role. People often share their highlights of accepting a new role but rarely their lowlights.

I wanted to share a brief look at what interviewing might be like in the software industry. People are often discouraged by failure but itโ€™s part of the process.

I remember having done interview training at Google where they discussed most interviewers decide on the outcome of the interview within the first-five minutes. That story is not to totally discourage oneself from the process but rather to demonstrate there is a portion that is out of your control.

Going through my emails to get an accurate accounting is challenging, however I found threads as early as 2011 interviewing for Facebook. I am actually sure I had interviewed ealier through my co-ops at University of Waterloo, but I donโ€™t have access anylonger to those emails. ๐Ÿ˜ฉ

Some rough dates I had found: 2011, 2014, 2015, 2018, 2019, 2020, 2021, 2022, 2023*, 2024, 2025.

* This interview round was long and was for 3 distinct roles.

Across those years, the level I interviewed at was different and sometimes the role too (IC vs EM).

Donโ€™t be discouraged from failure.


Merry Christmas!

Comic santa on the sleigh pulled by reindeers

Frohe Weihnachten, ein schรถnes Fest, und einen guten Rutsch ins neue Jahr wรผnscht euch
Leah Neukirchen

Merry Christmas and a Happy New Year!

NP: Pearl Jam—Quick Escape