Monday, June 15, 2026

RL economics, morally charged terms, and "distillation"

After a number of Twitter discussions, and repeating myself a lot in these discussions, it is time to write a short note on the economics of advancing LLM capabilities through RL, about principles of propaganda and coining new words, and about my stubborn refusal to use the term "distillation" except in a specific narrow sense.

How do models advance when human-curated data has run out?

It's been a while since we ran out of human data to train LLMs on. We are training on copies of the internet, large piles of (originally pirated, then purchased-and-scanned-and-wholesale ingested) books, and whatever other data sources we can obtain. This leads to a certain performance plateau, as we haven't quite figured out how to make the models more data-efficient in training.

The advancements we have seen in coding and mathematics in the last year are mostly due to reinforcement learning. At the highest level, you pose a problem to an LLM that the LLM has a small but nontrivial chance of solving. You then run N copies of the LLM to generate solutions, and you get a small number of solutions and many failures. You can then use the successful solutions as new data to improve your model - moving the weights in a way that helps the model succeed with greater probability.

This is very elegant in a way, because you are kinda pulling yourself up by your own bootstraps. The cost is computational - if you have a 1% chance of finding a solution given your current LLM and current training data, you need to do 100s or 1000s of rollouts to get a reasonable variety of useful solutions.

Once you have a model that can generate a good solution for this problem with high probability, and you make that model available to others, you also provide a much cheaper way of producing the better training data: Third parties can now just ask your model to generate good solutions for them.

So for the second-mover that gets to use your model, improving their model from your model outputs is cheaper, as they can skip the more-or-less-random-search into a high-dimensional solution space and be guided better.

This is a fundamental part of the "closed LLM as a service" business, and it is painful for the leader of the pack because they need to spend money to advance, and others can catch up more cheaply.

Terms of service, copyright law, crimes vs. contract disputes

Copyright law imposes concrete ownership rights on copyrighted material. Pirating material and commercially exploiting it is often a crime.

The frontier labs have all argued that training on public data does not require them to obtain licenses from the copyright holders (a self-serving and somewhat dubious claim). The Llama release further muddied the waters by adding a license to the redistribution of model weights - by law, the output of an algorithm itself (such as model weights) are not a copyrightable object, and Meta just pretended they were. Other model labs followed suit, in the hope of establishing a practical precedent that can then be used to shape legislation in the future.

But a priori, model weights are not copyrightable.

There is an argument, though, that prompts, and the resulting output from the model are copyrightable to the person submitting the prompts. Certainly not to the model provider: Running an algorithm on somebody else's copyrightable work without human input does not make you the owner of the work. There is no human creativity input, which is the minimum threshold for establishing copyright in our current legal system.

Model providers have no rights to the output of their models if they provide access to these models to third parties.

What rights do model providers have? They have the right to set terms-of-service for their service - e.g. if you don't use the tool in a way we like, we revoke access to the tool.

Terms-of-service are very different from copyright law - they are essentially private law contracts about the exchange of services between entities. So if a model provider says "you may not use this service to generate training data for your competing LLM", they can say so, and they have the right to terminate your account if they catch you doing so.

That said - let's say I was to run a benchmarking service that tests the progress of LLMs against my favorite programming problems, and all I do is (a) run rollouts against these services (b) score the results (c) archive the results (d) sell access to the results to third parties so they can evaluate progress of models and the quality of their reasoning and (e) publish the positive results after a few months for free.

This is not a violation of the terms of service -- I am just measuring the capabilities of the models and have them solve problems for me. Publishing the data isn't a violation of the terms of service either.

Yet - by me publishing the positive results into the greater internet makes them part of the training corpus, so the improvement in capability that the model provider achieves will flow into other models. There is no way around this in our current legal system.

Reframing an inconvenient issue with your business model in moral terms

Imagine you've raised billions of dollars and you realize that your business model has a rather inconvenient flaw - you have a good business, but for it to become a fantastic business, you'd need to fix this flaw. And the flaw, as you perceive it, is the current legal system for intellectual property with it's old and well-tested precedents and mechanisms.

It will be easy to convince yourself that the flaw in your business model that gives your competitors a way to catch up with lesser investment is a moral outrage - it is so unjust! - and then complain about the fact that others have the right to do what they are doing.

Once you've convinced yourself of the immorality of what your competition is doing (how dare they compress your margins?), you will need to somehow re-frame what they are doing in moral terms. So "training on solved problems to improve" doesn't quite have the right ring to it. We need something malicious, like "distillation attacks".

"Distillation" is great, because it evokes bootlegging and 1920s prohibition-era intrigue. And "attack" is great because only bad people attack. So you leverage the fact that people called a technique to teach a smaller model from a larger model provided you have access to the internals of the larger model "distillation", you tack on the word "attack" to make it sound more nefarious, and you start screaming from the rooftops that evil distillation attackers are killing your morally superior business (that started by actual copyright violations, only justified ex-post by your success).

This is what happened here, and I urge every reader to not go along with it. Distillation means having access to a large model, including all the last-layer token probabilities, and training a smaller model by taking those internal last-layer probabilities into account.

Just training on model output isn't it. And you cannot have a world where people use LLMs to write code or text, and are allowed to publish that on the internet, and simultaneously prevent up-leveling other models as they train on that data. You have no legal or moral legs to stand on if you want to prevent that.

If the chinese models are distilled, so is the Cursor fine-tune of Kimi, or any model that is trained on the output of other models - and most of human output is now model-assisted.

You are free to argue that this is inconvenient for your business model, and a legal framework which allows you to prevent that would be useful in attracting more investment to advance your model, but that's about it.

This is why I don't call training on other models output "distillation"

Let's call it "training on model output", or whatever else that is not morally charged. And let's be honest that the existence of LLMs in their current form is the result of highly dubious approaches to copyright that are ex-post legitimized by the actual value these models bring to society. Let's please avoid allowing parties with particular financial interest build a moral framing around their interests, though.


Tuesday, March 24, 2026

Slightly safer vibecoding by adopting old hacker habits

I have seen a lot of public discussion around supply-chain attacks on the Python ecosystem, prompt injection risks when using coding agents, and general worries about the security implications of "vibe coding" for the development machine.

In some of these discussions I find myself puzzled as to what problem is being solved - and it took me a while to realize that my failure to understand lies in the development setup that I tend to use.

In this blog post I'll quickly explain my development setup.

The setup is pretty simple:

  1. The actual development happens on a rented server (or a VM on that server).
  2. In order to do development, I SSH into that server with key-forwarding for my github keys enabled.
  3. I perform my development on the server by attaching to a screen or tmux session.
  4. I used to just use vim with various extensions, but with the advent of coding agents I also use claude code etc. nowadays.
  5. I avoid keeping secrets inside the development VM or on the development server.
  6. I let the agent churn away on problems for extended periods of time while I am detached from the tmux/screen.
A setup like this reduces a large number of supply-chain attacks to - at worst - compromise the development VM.

There is still a significant risk of the github key forwarding being abused to compromise the upstream main repository.

The way around this is a bit cumbersome, but not much different from what many open-source projects already do: You keep a main repository, and you *fork* a development repository from it. Then you do all your development on the dev repository, and when you're done in your development branch, you issue a cross-repository pull request.

Obviously, a human needs to go through that PR with a fine comb - but this is something you want to do for insider risk etc. anyhow, so your risk profile changes only marginally.

In a setup like this, the main secret that you'll lose in a supply chain attack are your Claude credentials. And you don't need to worry about prompt injection into your coding agent too much, and can just focus on writing code.

Interestingly, the development model of "SSH into a machine and attach to a screen session" was popularized by the hacker subculture (as in "computer break-in" subculture) since historically it was never a good idea to have data on machines you physically own. SSH'ing into a random machine in a different country that law enforcement couldn't easily get access to was a reasonable way of keeping your hands clean. I mainly switched to that development model because I almost always need long-running compute and was travelling a lot, and with agent-first development the model is seeing a bit of a resurgence.

Friday, December 12, 2025

Ask your LLM for receipts: What I learned teaching Claude C++ crash triage

I recently embarked on a small toy project/experiment: How well can I equip Claude Code to automatically analyze and triage crashes in a C++ code base?

For the experimentation, I worked on a small number of crashes in the ffmpeg bug tracker. The initial results were very discouraging, Claude hallucinated all sorts of implausible root causes and tended to write typical "AI slop" -- things that follow the form of a well-written report, but that had no bearing on reality.

I iterated for a few days, but ultimately I got things to work reasonably well, at least to the point where I was happy with the result.

The result of this little diversion are a bunch of .md files (subagents and skills) that I contributed to https://github.com/gadievron/raptor - specifically the following parts:

https://github.com/gadievron/raptor/blob/main/.claude/agents/crash-analyzer-agent.md


The task itself is not necessarily a natural fit for an LLM: I find that LLMs tend to perform better in situations where their results can be immediately verified. This is not the case here - crash triage fundamentally includes a component of "narrative building", and it is not super clear how to validate such a narrative.

There are a few things that I took from my experience in using Claude Code for C++ development in the last year which I applied:
  • Since LLMs only perceive the world through text, but their context is a scarce resource, it makes sense to provide them with effective ways of gathering extra data without wasting too much context.
  • LLMs will hallucinate arbitrary things but tend to course-correct if their context includes too much data that is obviously in contradiction with their current trajectory.
In my C++ development, I learnt to provide the LLMs with copious amount of conditionally-compiled logging, and ways of running granular tests, so gathering information about what is happening without totally swamping the context window was possible.

Anyhow, what does the crash-analysis-agent end up doing?
  1. It gathers a lot of stuff that provides text-level data about what is going on in the program that crashes: A function-level execution trace, gcov data, an ASAN build, and an rr recording that allows deterministic replay of a particular crashing execution.
  2. It launches a subagent to then formulate a hypothesis of what is going on. This subagent is instructed to "provide receipts" for each step in the reasoning: Show the precise place where the pointer that ultimately leads to the crashing deref is allocated, show all the modifications, both in the source code and in the rr trace. Show all modifications to it, including the pointer values pre/post modification in the rr trace.
  3. This hypothesis document is then validated by a separate subagent that is instructed to carefully vet each of the steps in the first document, and reject the file if any evidence is missing. On rejection, a rebuttal is written. This rebuttal is then passed to the previous agent again, until a report is generated that the validator accepts.
  4. The final output is a report that includes specific breakpoints, pointer values, pointer modifications etc. that can be manually verified by a human by stepping through the provided rr trace.
In some sense, this is "LLM as a judge", but it appears to me that the usual problem ("generating LLM is convincing enough that the judge LLM waves everything through") is side-stepped by making the judging LLM focus on the formal correctness of the individual steps.

I didn't think much of this, but when I presented this to an audience during the last week, some of the feedback I got was that the technique of "ask the LLM for detailed receipts & have a second LLM validate the receipts" was not necessarily widely known.

So here we are. If you have a task that is perhaps not verifiable on it's final output, but involves verifiable substeps, you can greatly boost performance by providing the LLM with tools/skills to "provide receipts" for the substeps - the final output might still be wrong, but it is so with a much decreased probability.


Friday, July 11, 2025

Understand Neural Nets better, post 5 of N -- Code Assistant shootout

In a series of previous blogposts [1, 23, 4] I ran some experiments drawing the boundaries of the polytopes generated by a fully-connected leaky ReLU network while it was getting trained on reproducing an input image.

As I tried to scale the experiments to larger networks, I noticed a dramatic slowdown in the code, caused by the calculation of a hash of the activation pattern happening on CPU -- so each training step would be fast, but then everything would grind to a halt for the visualisation, and for each pixel the code would forward-evaluate the NN (all in all 1024*1024 times), and whenever the prediction was calculated, it'd transfer the activation pattern to CPU and then perform the hashing. This was very slow, and very non-parallel.

I had contemplated writing some custom CUDA code to speed things up - there's no reason to store the activation pattern or transfer it, the "right" way to solve the problem is computing a hash on the fly, ideally a hash with a commutative update function so the order in which the different ReLU neurons update the hash doesn't matter.

Then again, this is a hobby project, and I don't have the time to do anything overly smart for the moment. So I decided to - before doing anything sophisticated - I'll see if I can have one of the two existing coding assistant that I use regularly solve the problem for me.

So I created two different directories, checked out the same base repo into both, created branches in both, and then queried both Gemini CLI and Claude Code perform the task, using the following prompt:

The Python script in this directory trains a fully connected leaky ReLU network on an input image and tries 
to reproduce it. It also draws pictures illustrating the boundaries of the polytopes generated by the creases
that the ReLU creates in input space. Unfortunately, the code to generate the polytope visualisation is slow,
because it involves 1024*1024 evaluations of the NN forward, and then it needs to hash the activation pattern
into a hash to uniquely identify what polytope the pixel resides on.

I would like to speed up this computation, by - instead of calculating a hash of the activation pattern at the 
end - somehow embedding the calculation of a hash into the forward pass on-GPU. This might be doable with 
PyTorch hooks, but I don't know precisely. 

What I do know is that if I run 
```
python3 ./draw-poly-while-training.py  --input ./centered_ring.png --shape [100]*20 --epochs 30 --seed 12345678 --points 5050 --save-interval 10
``` 

the output looks something like this: 
```
(...)
Input size (MB): 0.01
Forward/backward pass size (MB): 16.39
Params size (MB): 0.77
Estimated Total Size (MB): 17.17
==========================================================================================
2025-07-08 15:15:25,811 - polytope_nn - INFO - Epoch 1/2000000 - Train Loss: 3.315190, Val Loss: 0.329414
2025-07-08 15:15:25,857 - polytope_nn - INFO - Epoch 2/2000000 - Train Loss: 1.045730, Val Loss: 0.065818
2025-07-08 15:15:25,901 - polytope_nn - INFO - Epoch 3/2000000 - Train Loss: 1.414065, Val Loss: 0.488735
2025-07-08 15:15:25,948 - polytope_nn - INFO - Epoch 4/2000000 - Train Loss: 0.201550, Val Loss: 0.102159
2025-07-08 15:15:26,100 - polytope_nn - INFO - Epoch 5/2000000 - Train Loss: 0.198983, Val Loss: 0.050712
2025-07-08 15:15:26,145 - polytope_nn - INFO - Epoch 6/2000000 - Train Loss: 0.255710, Val Loss: 0.060731
2025-07-08 15:15:26,189 - polytope_nn - INFO - Epoch 7/2000000 - Train Loss: 0.122960, Val Loss: 0.091274
2025-07-08 15:15:26,232 - polytope_nn - INFO - Epoch 8/2000000 - Train Loss: 0.180629, Val Loss: 0.053913
2025-07-08 15:15:26,276 - polytope_nn - INFO - Epoch 9/2000000 - Train Loss: 0.826762, Val Loss: 0.156673
2025-07-08 15:15:26,320 - polytope_nn - INFO - Epoch 10/2000000 - Train Loss: 0.211313, Val Loss: 0.117810
2025-07-08 15:16:27,853 - polytope_nn - INFO - Visualization @ epoch 10: 61.53s
2025-07-08 15:16:27,899 - polytope_nn - INFO - Epoch 11/2000000 - Train Loss: 0.174978, Val Loss: 0.053103
2025-07-08 15:16:27,943 - polytope_nn - INFO - Epoch 12/2000000 - Train Loss: 0.332561, Val Loss: 0.095801
2025-07-08 15:16:27,987 - polytope_nn - INFO - Epoch 13/2000000 - Train Loss: 0.192859, Val Loss: 0.064341
2025-07-08 15:16:28,031 - polytope_nn - INFO - Epoch 14/2000000 - Train Loss: 0.115424, Val Loss: 0.051763
2025-07-08 15:16:28,076 - polytope_nn - INFO - Epoch 15/2000000 - Train Loss: 0.362009, Val Loss: 0.128609
2025-07-08 15:16:28,122 - polytope_nn - INFO - Epoch 16/2000000 - Train Loss: 0.117143, Val Loss: 0.058641
2025-07-08 15:16:28,165 - polytope_nn - INFO - Epoch 17/2000000 - Train Loss: 0.335812, Val Loss: 0.082517
2025-07-08 15:16:28,211 - polytope_nn - INFO - Epoch 18/2000000 - Train Loss: 0.079342, Val Loss: 0.060753
2025-07-08 15:16:28,257 - polytope_nn - INFO - Epoch 19/2000000 - Train Loss: 0.104123, Val Loss: 0.047914
2025-07-08 15:16:28,304 - polytope_nn - INFO - Epoch 20/2000000 - Train Loss: 0.097466, Val Loss: 0.050452
2025-07-08 15:17:31,553 - polytope_nn - INFO - Visualization @ epoch 20: 63.25s
```

From this we can see that a single visualisation step takes more than a minute for a network of this size, and 
profiling shows that most of this time is spent in hashing things on the CPU, not the GPU.
I would like you to find a way to do the calculation of the hash during the forward pass on the GPU, ideally 
without storing the activation vector in memory, and instead having a hash function that can be updated
commutatively so each ReLU unit can update the final hash while it calculates the forward pass.

I want you to:

1) Create a plausible plan for improving and speeding up the code.
2) Implement that plan.
3) Re-run the script with the specified command line, and observe if a speedup indeed took place -- e.g. check
that (a) the visualisation was sped up and (b) the sum of 10 training steps and the visualisation together was
sped up.

It is frightfully easy to speed up the visualisation step but slow down the training steps so much that 10
training steps and 1 visualisation step get *slower*.

Please also verify that the image output is the same between the pre-change and post-change version, to ensure
that the changes do not break anything.

I then allowed both models to churn for a while. Both models provided changes, but Gemini failed to actually verify that the results are the same. Claude one-shotted the problem; Gemini needed the following additional prompt:

I have run your example code, and checked the output. The output images are not identical between the
pre-change and post-change version, and even the training loss changed. FWIW, none of the polytopes
are visible in your version. Could you re-check your work, and this time make sure you check whether
the outputs are the same?

With that extra prodding / prompting, the solution provided by the model worked flawlessly, and was even a tiny bit faster than the Claude version.

Let's look at the code that both models generated: The Gemini branch and the Claude branch. Reading the changes, a few things become clear:

  1. Gemini shot itself in the foot on the RNG by generating a bunch of random hash coefficients, and that messed up the state of the RNG, so the training runs were no longer comparable pre/post change.
  2. Gemini is using torch.matmul for the hash computation, whereas Claude is computing the hash as torch.sum( A * B ).
  3. Claude has broken up the code in more smaller functions, whereas Gemini didn't. Claude's code is mildly more readable, Gemini's is the more minimal change.
Interesting stuff. Neither solution is quite what I had in mind, but they are good enough for the moment, and provide a pretty significant speedup over the (also vibe-coded) stuff that I started out with. This is the first time for me that a coding assistant helped me optimize code in a nontrivial manner, and that's ... certainly something.

Anyhow, with these optimizations I can now run my data visualisation movie generation on slightly larger NNs with millions of parameters, so more studying ahead. I now need to figure out how to upload YouTube videos programmatically, but in the meantime, here is a video of training a 100-neuron, 10 layer deep network on the "circle drawing" task from my previous posts. Vibe coding randomly changed the color of my lines, but hey, that's ok.

As per usual, there are more questions than answers in this video. The thing that puzzles me most is the relative "instability" of the training in later epochs. This is visible in "flickers" where seemingly randomly the SGD step hits on a vastly higher loss, with parts of the screen turning black and loss spiking, and then the training needs to recover. Interestingly, the geometry of the polytopes doesn't change a lot in these situations, but the linear function on many of them changes at once, in a way that is very detrimental to overall performance. Once programmatic uploading works, I'll upload many more videos, because one of the intriguing observations I have is the following:

When training diverges (for larger and deeper nets), the divergence starts by first messing up the linear functions, and only after they are gloriously messed up, the geometry of the polytopes starts to go haywire, too.

Until then!






Sunday, July 06, 2025

A non-anthropomorphized view of LLMs

In many discussions where questions of "alignment" or "AI safety" crop up, I am baffled by seriously intelligent people imbuing almost magical human-like powers to something that - in my mind - is just MatMul with interspersed nonlinearities.

In one of these discussions, somebody correctly called me out on the simplistic nature of this argument - "a brain is just some proteins and currents". I felt like I should explain my argument a bit more, because it feels less simplistic to me:

The space of words

The tokenization and embedding step maps individual words (or tokens) to some \(\mathbb{R}^n\) vectors. So let us imagine for a second that we have \(\mathbb{R}^n\) in front of us. A piece of text is then a path through this space - going from word to word to word, tracing a (possibly convoluted) line.

Imagine now that you label each of the "words" that form the path with a number: The last word with 1, counting forward until you hit the first word or the maximum context length \(c\). If you've ever played the game "Snake", picture something similar, but played in very high-dimensional space - you're moving forward through space with the tail getting truncated off.

The LLM takes your previous path into account, calculates probabilities for the next point to go to, and then makes a random pick into the next point according to these probabilities. An LLM instantiated with a fixed random seed is a mapping of the form \((\mathbb{R}^n)^c \mapsto (\mathbb{R}^n)^c\).

In my mind, the paths generated by these mappings look a lot like strange attractors in dynamical systems - complicated, convoluted paths that are structured-ish.

Learning the mapping

We obtain this mapping by training it to mimic human text. For this, we use approximately all human writing we can obtain, plus corpora written by human experts on a particular topic, plus some automatically generated pieces of text in domains where we can automatically generate and validate them.

Paths to avoid

There are certain language sequences we wish to avoid - because the sequences these models generate try to mimic human speech in all it's empirical structure, but we feel that some of the things that humans have empirically written are very undesirable to be generated. We also feel that a variety of other paths should ideally not be generated, if - when interpreted by either humans or other computer systems - undesirable results arise.

We can't specify strictly in a mathematical sense which paths we would prefer not to generate, but we can provide examples and counterexamples, and we try to hence nudge the complicated learnt distribution away from them.

"Alignment" for LLMs

Alignment and safety for LLMs mean that we should be able to quantify and bound the probability with which certain undesirable sequences are generated. The trouble is that we largely fail at describing "undesirable" except by example, which makes calculating bounds difficult.

For a given LLM (without random seed) and sequence, it is trivial to calculate the probability of the sequence to be generated. So if we had a way of somehow summing or integrating over these probabilities, we could say with certainty "this model will generate an undesirable sequence once every N model evaluations". We can't, currently, and that sucks, but at the heart, this is the mathematical and computational problem we'd need to solve.

The surprising utility of LLMs

LLMs solve a large number of problems that could previously not be solved algorithmically. NLP (as the field was a few years ago) has largely been solved.

I can write a request in plain English to summarize a document for me and put some key datapoints from the document in a structured JSON format, and modern models will just do that. I can ask a model to generate a children's book story involving raceboats and generate illustrations, and the model will generate something that is passable. And much more, all of which would have seemed like absolute science fiction 5-6 years ago.

We're on a pretty steep improvement curve, so I expect the number of currently-intractable problems that these models can solve to keep increasing for a while.

Where anthropomorphization loses me

The moment that people ascribe properties such as "consciousness" or "ethics" or "values" or "morals" to these learnt mappings is where I tend to get lost. We are speaking about a big recurrence equation that produces a new word, and that stops producing words if we don't crank the shaft.

To me, wondering if this contraption will "wake up" is similarly bewildering as if I was to ask a computational meteorologist if he isn't afraid of his meteorological numerical calculation will "wake up".

I am baffled that the AI discussions seem to never move away from treating a function to generate sequences of words as something that resembles a human. Statements such as "an AI agent could become an insider threat so it needs monitoring" are simultaneously unsurprising (you have a randomized sequence generator fed into your shell, literally anything can happen!) and baffling (you talk as if you believe the dice you play with had a mind of their own and could decide to conspire against you).

Instead of saying "we cannot ensure that no harmful sequences will be generated by our function, partially because we don't know how to specify and enumerate harmful sequences", we talk about "behaviors", "ethical constraints", and "harmful actions in pursuit of their goals". All of these are anthropocentric concepts that - in my mind - do not apply to functions or other mathematical objects. And using them muddles the discussion, and our thinking about what we're doing when we create, analyze, deploy and monitor LLMs.

This muddles the public discussion. We have many historical examples of humanity ascribing bad random events to "the wrath of god(s)" (earthquakes, famines, etc.), "evil spirits" and so forth. The fact that intelligent highly educated researchers talk about these mathematical objects in anthropomorphic terms makes the technology seem mysterious, scary, and magical.

We should think in terms of "this is a function to generate sequences" and "by providing prefixes we can steer the sequence generation around in the space of words and change the probabilities for output sequences". And for every possible undesirable output sequence of a length smaller than \(c\), we can pick a context that maximizes the probability of this undesirable output sequence.

A much clearer formulation, which helps more clearly articulate the problems to solve.

Why many AI luminaries tend to anthropomorphize

Perhaps I am fighting windmills, or rather a self-selection bias: A fair number of current AI luminaries have self-selected by their belief that they might be the ones getting to AGI - "creating a god" so to speak, the creation of something like life, as good as or better than humans. You are more likely to choose this career path if you believe that it is feasible, and that current approaches might get you there. Possibly I am asking people to "please let go of the belief that you based your life around" when I am asking for an end to anthropomorphization of LLMs, which won't fly.

Why I think human consciousness isn't comparable to an LLM

The following is uncomfortably philosophical, but: In my worldview, humans are dramatically different things than a function \((\mathbb{R}^n)^c \mapsto (\mathbb{R}^n)^c\). For hundreds of millions of years, nature generated new versions, and only a small number of these versions survived. Human thought is a poorly-understood process, involving enormously many neurons, extremely high-bandwidth input, an extremely complicated cocktail of hormones, constant monitoring of energy levels, and millions of years of harsh selection pressure.

We understand essentially nothing about it. In contrast to an LLM, given a human and a sequence of words, I cannot begin putting a probability on "will this human generate this sequence". 

To repeat myself: To me, considering that any human concept such as ethics, will to survive, or fear, apply to an LLM appears similarly strange as if we were discussing the feelings of a numerical meteorology simulation.

The real issues

The function class represented by modern LLMs are very useful. Even if we never get anywhere close to AGI and just deploy the current state of technology everywhere where it might be useful, we will get a dramatically different world. LLMs might end up being similarly impactful as electrification.

My grandfather lived from 1904 to 1981, a period which encompassed moving from gas lamps to electric, the replacement of horse carriages by cars, nuclear power, transistors, all the way to computers. It also spanned two world wars, the rise of Communism and Stalinism, almost the entire lifetime of the USSR and GDR etc. The world on his birth looked nothing like the world when he died.

Navigating the dramatic changes of the next few decades while trying to avoid world wars and murderous ideologies is difficult enough without muddying our thinking.

Thursday, May 22, 2025

Some experiments to help me understand Neural Nets better, post 4 of N

After the previous blog posts here, here, and here, a friend of mine pointed me to some literature to read, and I will do so now :-).

The papers on my reading list are:

1. https://proceedings.mlr.press/v80/balestriero18b.html - Randall Balestrieros paper on DNNs as splines.
2. https://arxiv.org/abs/1906.00904 - ReLU networks have surprisingly few activation patterns (2019)
3. https://arxiv.org/abs/2305.09145 - Deep ReLU networks have surprisingly simple polytopes (2023)
4. https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2023.1274831/full

I'll blog more once I get around to reading them all.

Thursday, April 10, 2025

Some experiments to help me understand Neural Nets better, post 3 of N

What is this? After my first post on the topic, 9 months elapsed before I posted again, and now I am posting within days of the last post?

Anyhow, after my last post I could not resist and started running some experiments trying to see whether I could induce "overfitting" in the neural networks I had been training - trying to get a heavily overparametrized neural network to just "memorize" the training points so it generalizes poorly.

In the experiments I ran in previous posts, one of the key advantages is that I know the "true distribution" from which we are drawing our training data -- the input image. An overfit network would hence find ways to color the points in the training data correctly, but somehow not do so by drawing a black ring on white background (so it would be correct on the training data but fail to generalize).

So the experiment I kicked off was the following: Start with a network that has many times more parameters than we have training points: Since we start with 5000 training points, I picked 30 layers of 30 neurons for a total parameter count of approximately 27000 parameters. If von Neumann said he can draw an elephant with 4 parameters and make it wriggle it's trunk with 5, he'd certainly manage to fit 5000 training points with 27000 parameters?

Anyhow, to my great surprise, there was no hint of overfitting:


The network very clearly learns to draw a circle instead of fitting individual points. That is somewhat surprising, but perhaps this is just an artifact of our training points being relatively "dense" in the space, 5000 training points out of 1024*1024 is still 0.4%, that's a good chunk of the total space.

As a next step, I trained the same network, but with ever-reduced quantities of training data: 2500 points, 1250 points, 625 points, and 312 points. Surely training on 312 data points using 27000 parameters should generate clear signs of overfitting?

At 2500 points, while there is a noticeable slowdown in the training process, the underlying concept seems to be learnt just fine:
As we drop much lower, to 625 points, we can see how the network is struggling much more to learn the concept, but ... it still seems to have a strong bias toward creating a geometric shape resembling the ring instead of overfitting on individual points?

It appears that the learning process is slowed down - by epoch 6000 the network hasn't managed to reproduce the entire circle yet - and training seems to be less stable - but it looks as if the network is moving into the right direction. What happens if we halve the training points once more?

It's a bit of a mystery - I would have expected that by now we're clearly in a regime where the network should try fit individual points, we gave it just 0.02% of the points in the space. The network is clearly struggling to learn, and by epoch 6000 it is far from "ready" -- but it's certainly working towards a ring shape.

These experiments raise a number of questions for me:

1. It seems clear to me that the networks have some form of baked-in tendency to form contiguous areas - perhaps even a geometric shape - and the data needs to become very very sparse in order for true overfitting to occur. It's really unclear to me why we see the emergence of shapes here -- it would certainly be easy for the network to just pick the 312 polytopes in which the training points reside, and their immediate neighbors, and then have a steep linear function with big parameters to color just the individual dots black. But that's not what is happening here; there's some mechanism or process that leads to the emergence of a shape.
2. It almost seems like there is a trade-off -- if you have less data, you need to train longer, perhaps much longer. But it's really not clear to me that we will not arrive at comparatively good approximations even with 312 data points.

As a next step, I am re-running these experiments with 20000 epochs instead of 6000, to see if the network trained on very sparse training data catches up with the networks that have more data over time.