Wednesday, July 10, 2024

Someone is wrong on the internet (AGI Doom edition)

The last few years have seen a wave of hysteria about LLMs becoming conscious and then suddenly attempting to kill humanity. This hysteria, often expressed in scientific-sounding pseudo-bayesian language typical of the „lesswrong“ forums, has seeped into the media and from there into politics, where it has influenced legislation.

This hysteria arises from the claim that there is an existential risk to humanity posed by the sudden emergence of an AGI that then proceeds to wipe out humanity through a rapid series of steps that cannot be prevented.

Much of it is entirely wrong, and I will try to collect my views on the topic in this article - focusing on the „fast takeoff scenario“.

I had encountered strange forms of seemingly irrational views about AI progress before, and I made some critical tweets about the messianic tech-pseudo-religion I dubbed "Kurzweilianism" in 2014, 2016 and 2017 - my objection at the time was that believing in an exponential speed-up of all forms of technological progress looked too much like a traditional messianic religion, e.g. "the end days are coming, if we are good and sacrifice the right things, God will bring us to paradise, if not He will destroy us", dressed in techno-garb. I could never quite understand why people chose to believe Kurzweil, who, in my view, has largely had an abysmal track record predicting the future.

Apparently, the Kurzweilian ideas have mutated over time, and seem to have taken root in a group of folks associated with a forum called "LessWrong", a more high-brow version of 4chan where mostly young men try to impress each other by their command of mathematical vocabulary (not of actual math). One of the founders of this forum, Eliezer Yudkowsky, has become one of the most outspoken proponents of the hypothesis that "the end is nigh".

I have heard a lot of of secondary reporting about the claims that are advocated, and none of them ever made any sense to me - but I am also a proponent of reading original sources to form an opinion. This blog post is like a blog-post-version of a (nonexistent) YouTube reaction video of me reading original sources and commenting on them.

I will begin with the interview published at https://intelligence.org/2023/03/14/yudkowsky-on-agi-risk-on-the-bankless-podcast/

The proposed sequence of events that would lead to humanity being killed by an AGI is approximately the following:

  1. Assume that humanity manages to build an AGI, which is a computational system that for any decision "outperforms" the best decision of humans. The examples used are all zero-sum games with fixed rule sets (chess etc.).
  2. After managing this, humanity sets this AGI to work on improving itself, e.g. writing a better AGI.
  3. This is somehow successful and the AGI obtains an "immense technological advantage".
  4. The AGI also decides that it is in conflict with humanity.
  5. The AGI then coaxes a bunch of humans to carry out physical actions that enable it to then build something that kills all of humanity, in case of this interview via a "diamondoid bacteria that replicates using carbon, hydrogen, oxygen, nitrogen, and sunlight", that then kills all of humanity.
This is a fun work of fiction, but it is not even science fiction. In the following, a few thoughts:

Incorrectness and incompleteness of human writing


Human writing is full of lies that are difficult to disprove theoretically

As a mathematician with an applied bent, I once got drunk with another mathematician, a stack of coins, and a pair of pliers and some tape. The goal of the session was „how can we deform an existing coin as to create a coin with a bias significant enough to measure“. Biased coins are a staple of probability theory exercises, and exist in writing in large quantities (much more than loaded dice).

It turns out that it is very complicated and very difficult to modify an existing coin to exhibit even a reliable 0.52:0.48 bias. Modifying the shape needs to be done so aggressively that the resulting object no longer resembles a coin, and gluing two discs of uneven weight together so that they achieve nontrivial bias creates an object that has a very hard time balancing on its edge.

An AI model trained on human text will never be able to understand the difficulties in making a biased coin. It needs to be equipped with actual sensing, and it will need to perform actual real experiments. For an AI, a thought experiment and a real experiment are indistinguishable.

As a result, any world model that is learnt through the analysis of text is going to be a very poor approximation of reality. 

Practical world-knowledge is rarely put in writing

Pretty much all economies and organisations that are any good at producing something tangible have an (explicit or implicit) system of apprenticeship. The majority of important practical tasks cannot be learnt from a written description. There has never been a chef that became a good chef by reading sufficiently many cookbooks, or a woodworker that became a good woodworker by reading a lot about woodworking.

Any skill that affects the real world has a significant amount of real-world trial-and-error involved. And almost all skills that affect the real world involve large quantities of knowledge that has never been written down, but which is nonetheless essential to performing the task.

The inaccuracy and incompleteness of written language to describe the world leads to the next point:

No progress without experiments

No superintelligence can reason itself to progress without doing basic science

One of the most bizarre assumptions in the fast takeoff scenarios is that somehow once a super-intelligence has been achieved, it will be able to create all sorts of novel inventions with fantastic capabilities, simply by reasoning about them abstractly, and without performing any basic science (e.g. real-world experiments that validate hypotheses or check consistency of a theory or simulation with reality).

Perhaps this is unsurprising, as few people involved in the LessWrong forums and X-Risk discussions seem to have any experience in manufacturing or actual materials science or even basic woodworking.

The reality, though, is that while we have made great strides in areas such as computational fluid dynamics (CFD), crash test simulation etc. in recent decades, obviating the need for many physical experiments in certain areas, reality does not seem to support the thesis that technological innovations are feasible „on paper“ without extensive and painstaking experimental science.

Concrete examples:
  1. To this day, CFD simulations of the air resistance that a train is exposed to when hit by wind at an angle need to be experimentally validated - simulations have the tendency to get important details wrong.
  2. It is safe to assume that the state-supported hackers of the PRCs intelligence services have stolen every last document that was ever put into a computer at all the major chipmakers. Having all this knowledge, and the ability to direct a lot of manpower at analyzing these documents, have not yielded the knowledge necessary to make cutting-edge chips. What is missing is process knowledge, e.g. the details of how to actually make the chips.
  3. Producing ballpoint pen tips is hard. There are few nations that can reliably produce cheap, high-quality ballpoint pen tips. China famously celebrated in 2017 that they reached that level of manufacturing excellence.
Producing anything real requires a painstaking process of theory/hypothesis formation, experiment design, experiment execution, and slow iterative improvement. Many physical and chemical processes cannot be accelerated artificially. There is a reason why it takes 5-8 weeks or longer to make a wafer of chips.

The success of of systems such as AlphaGo depend on the fact that all the rules of the game of Go are fixed in time, and known, and the fact that evaluating the quality of a position is cheap and many different future games can be simulated cheaply and efficiently.

None of this is true for reality: 
  1. Simulating reality accurately and cheaply is not a thing. We cannot simulate even simple parts of reality to a high degree of accuracy (think of a water faucet with turbulent flow splashing into a sink). 
  2. The rules for reality are not known in advance. Humanity has created some good approximations of many rules, but both humanity and a superintelligence still need to create new approximations of the rules by careful experimentation and step-wise refinement.
  3. The rules for adversarial and competitive games (such as a conflict with humanity) are not stable in time.
  4. Evaluating any experiment in reality has significant cost, particularly to an AI.
A thought experiment I often use for this is: 

Let us assume that scaling is all you need for greater intelligence. If that is the case, Orcas or Sperm Whales are already much more intelligent than the most intelligent human, so perhaps an Orca or a Sperm Whale is already a superintelligence. Now imagine an Orca or Sperm Whale equipped with all written knowledge of humanity and a keyboard with which to email people. How quickly could this Orca or Sperm Whale devise and execute a plot to kill all of humanity?

People that focus on fast takeoff scenarios seem to think that humanity has achieved the place it has by virtue of intelligence alone. Personally, I think there are at least three things that came together: Bipedalism with opposable thumbs, an environment where you can have fire, and intelligence.

If we lacked any of the three, we would not have built any of our tech. Orcas and Sperm Whales lack thumbs and fire, and you can’t think yourself to world domination.


Superintelligence will also be bound by fundamental information-theoretic limits

The assumption that superintelligences can somehow simulate reality to arbitrary degrees of precision runs counter to what we know about thermodynamics, computational irreducibility, and information theory.

A lot of the narratives seem to assume that a superintelligence will somehow free itself from constraints like „cost of compute“, „cost of storing information“, „cost of acquiring information“ etc. - but if I assume that I assume an omniscient being with infinite calculation powers and deterministically computational physics, I can build a hardcore version of Maxwells Demon that incinerates half of the earth by playing extremely clever billards with all atoms in the atmosphere. No diamandoid bacteria (whatever that was supposed to mean) necessary.

The reason we cannot build Maxwells Demon, and no perpetuum mobile, is that there is a relationship between information theory and thermodynamics, and nobody, including no superintelligence, will be able to break it.

Irrespective of whether you are a believer or an atheist, you cannot accidentally create capital-G God, even if you can build a program that beats all primates on earth at chess. Cue reference to the Landauer principle here.

Conflicts (such as an attempt to kill humanity) have no zero-risk moves

Traditional wargaming makes extensive use of random numbers - units have a kill probability (usually determined empirically), and using random numbers to model random events is part and parcel for real-world wargaming. This means that a move “not working”, something going horrendously wrong is the norm in any conflict. There are usually no gainful zero-risk moves; e.g. every move you make does open an opportunity for the opponent.

I find it somewhat baffling that in all the X-risk scenarios, the superintelligence somehow finds a sequence of zero-risk or near-zero risk moves that somehow yield the desired outcome, without humanity finding even a shred of evidence before it happens.

A more realistic scenario (if we take the far-fetched and unrealistic idea of an actual synthetic superintelligence that decides on causing humans harm for granted) involves that AI making moves that incur risk to the AI based on highly uncertain data. A conflict would therefore not be brief, and have multiple interaction points between humanity and the superintelligence.


Next-token prediction cannot handle Kuhnian paradigm shifts

Some folks have argued that next-token prediction will lead to superintelligence. I do not buy it, largely because it is unclear to me how predicting the next token would deal with Kuhnian paradigm shifts. Science proceeds in fits and bursts; and usually you stay within a creaky paradigm until there is a „scientific revolution“ of sorts. The scientific revolution necessarily changes the way that language is produced — e.g. a corpus of all of human writing prior to a scientific revolution is not a good representation of the language used after a scientific revolution - but the LLM will be trained to mimic the distribution of the training corpus. People point to in-context learning and argue that LLMs can incorporate new knowledge, but I am not convinced of that yet - the fact that all current models fail at generating a sequence of words that - when cut into 2-tuples - occur rarely or never in the training corpus shows that ICL is extremely limited in the way that it can adjust the distribution of LLM outputs.


Enough for today. Touch some grass, build some stuff

In theory, theory equals practice. In practice it doesn't. Stepping out of the theoretical realm of software (where generations of EE and chip engineers sacrificed their lives to give software engineers an environment where theory is close to practice most of the time) into real-world things that involve dust, sun, radiation, and equipment chatter is a sobering experience that we should all do more often. It's easy to devolve into scholasticism if you're not building anything.



Thursday, July 04, 2024

Some experiments to help me understand Neural Nets better, post 1 of N

While I have been a sceptic of using ML and AI in adversarial (security) scenarios forever, I also quite like the fact that AI/ML has become important, if only to make me feel like my Math MSc (and abortive Math PhD) were not a waste of time.

I am a big proponent of "bottom-up" mathematics: Playing with a large number of examples to inform conjectures to be dealt with later. I tend to run through many experiments to build intuition; partly because I have crippling weaknesses when operating purely formally, partly because most of my mathematics is somewhat "geometric intuition" based -- e.g. I rely a lot on my geometric intuition for understanding problems and statements.

For a couple years I've wanted to build myself a better intuition about what deep neural networks actually "do". There are folks in the community that say "we cannot understand them", and folks that say "we believe in mechanistic interpretability, and we have found the neuron to recognize dogs"; I never found either statement to be particularly convincing.

As a result, earlier this year, I finally found time to take a pen, pencil, and wastebasket and began thinking a bit about what happens when you send data through a neural network consisting of ReLU units. Why only ReLUs? Well, my conjecture is that ReLUs are as good as anything, and they are both reasonably easy to understand and actually used in practical ML applications. They are also among the "simplest examples" to work with, and I am a big fan of trying the simple examples first.

This blog post shares some of my experiments and insights; I called it the "paper plane or origami perspective to deep learning". I subsequently found out that there are a few people that have written about these concepts under the name "the polytope lens", although this seems to be a fringe notion in the wider interpretability community (which I find strange, because - unsurprisingly - I am pretty convinced this is the right way to think about NNs).

Let's get started. In order to build intuition, we're going to work with a NN that is supposed to learn a function from R^2 to R - essentially learning a grayscale image. This has several advantages:

1. We can intuitively understand what the NN is learning.
2. We can simulate training error and generalisation errors by taking very high-resolution images and training on low-resolution samples.
3. We stay within the realm of low-dimensional geometry for now, which is something most of us have an intuitive understanding of. High dimensions will create all sorts of complications soon enough.

Let's begin by understanding a 2-dimensional ReLU neuron - essentially the function f(x, y) = max( ax + by + c, 0) for various values of a, b, and c.

This will look a bit like a sheet of paper with a crease in it:

How does this function change if we vary the parameters a, b, or c? Let's begin by varying a:

Now let's have a look at varying b:
And finally let's have a look at varying c:

So the parameters a, b, c really just decide "in which way" the plane should be folded / creased, and the steepness and orientation of the non-flat part. It divides the plane into halfspaces; the resulting function is 0 on one half-plane and linear (respectively affine) on the other.

As a next step, let's imagine a single-layer ReLU network that takes the (x,y) coordinates of the plane, and then feeds it into 10 different ReLU neurons, and then combines the result by summing them using individual weights.

The resulting network will have 3 parameters to learn for each neuron: a, b, and c. Each "neuron" will represent a separate copy of the plane that will then be combined (linearly, additively, with a weight) into the output function. The training process will move the "creases" in the paper around until the result approximates the desired output well.

Let's draw that process when trying to learn the picture of a circle: The original is here:





This shows us how the network tries to incrementally move the creases around so that on each of the convex areas that are created by the creases, it can choose a different affine function (with the condition that on the "creases" the functions will take on the same value).

Let's do another movie, this time with a higher number of first-layer neurons - 500. And let's see how well we will end up approximating the circle.


Aside from being mesmerizing to watch, this is also kinda intriguing and raises a bunch of questions:

  1. I don't understand enough about Adam as an optimizer to understand where the very visible "pulse" in the optimization process is coming from. What's going on here?
  2. I am pretty surprised by the fact that so many creases end up being extremely similar -- what would cause them to bundle up into groups in the way they do? The circle is completely rotation invariant, but visually the creases seem to bunch into groups much more than random distribution would suggest. Why?
  3. It's somewhat surprising how difficult it appears to be to learn a "sharp" edge, the edge between white and black in the above diagram is surprisingly soft. I had expected it to be easier to learn to have a narrow polytope with very large a/b constants to create a sharp edge, somehow this is difficult? Is this regularization preventing the emergence of sharp edges (by keeping weights bounded)?
Clearly, there's work to do. For now, some entertainment: Training the same 500-neuron single-layer network to learn to reproduce a picture of me with a face full of zinc sunscreen:



It's interesting (perhaps unsurprising) that the reproduced image feels visually like folded paper.

Anyhow, this was the first installment. I'll write more about this stuff as I play and understand more.
Steps I'll explain in the near future:
  1. What happens as you deepen your network structure?
  2. What happens if you train a network on categorical data and cross-entropy instead of a continuous output with MSE?
  3. What can we learn about generalization, overfitting, and overparametrization from these experiments?
See you soon.

Wednesday, January 31, 2024

The end of my Elastic/optimyze journey ...

Hey all,

== tl;dr ==

Today is my last day at Elastic. I'll take an extended break and focus on rest, family, health, writing, a bit of startup mentoring/investing, and some research - at least for a while.

I'm thankful for my great colleagues and my leadership at Elastic - y'all are stellar, even if I was often grumbly about some technical or architectural issues. I'll also miss the ex-optimyze team a lot; you were the best team anyone doing technically sophisticated work could wish for - great individuals, but in sum greater than the parts. I think the future for the tech we built is bright, particularly in light of the recent Otel events :)

========

Extended Version:

Today is my last day at Elastic, and with that, the last day of my journey with optimyze. I am leaving with a heavy heart, and complicated emotions. The 5 years of optimyze (3 years optimyze, 2 years optimyze-integration-into-Elastic) were intense - moderately intense on the work front, but extremely intense on the life front. Fate somehow managed to cram a lot of the ups and downs of midlife into a very small number of years.

A timeline:

  1. I left Google on the 31st of December 2018, and started optimyze.cloud in February 2019. I was highly motivated by the idea of building a company that aligns my ecological, economic, and technical interests. I visited the RSA conference in SF in spring 2019 to network and get people interested in our "cut-of-savings" consulting approach. I met Corey Quinn for coffee, and to this day much appreciate all the sage advice he had (even if I had to ignore some and learn the hard lesson myself).
  2. In May 2019, I was elated to (finally!) become a father for the first time.
  3. During 2019, my co-founder Sean and me mostly spent our time trying to get our "cut-of-savings" consulting business of the ground, only to be thwarted by the unfortunate combination that (a) companies nimble enough to do it were too small to make it worth it, and (b) companies big enough to make it worth it couldn't figure out how to make the contract work from a legal and accounting perspective.
    We did a few small gigs with friendly startups, and realized in late summer that a zero-instrumentation, multi-runtime, fleet-wide profiler was sorely missing as a product. We also realized that with BOLT making progress, there'd be real value in being a SaaS that sits on profiling data from different verticals. Hence the vision for optimyze.cloud as a product company was born.
  4. By late 2019, we had a prototype for unwinding C/C++ stacks using .eh_frame, and Python code, both from eBPF. We knew we could be really zero-friction in deployment, which made us very happy and excited.
  5. We decided to raise funding, and did so over the winter months - with the funding wire transfer finally hitting our (Silicon Valley Bank) account some time in early 2020. We started building, and hiring what would turn out the best team I've ever worked on.
  6. We had a working UI and product by late fall 2020, and the first in-prod deployments around the same time. One particular part of the stack was too slow (a particular query that we knew we'd need to move to a distributed K/V store, but hadn't done yet), and we spent the next few months rebuilding that part of the stack to use Scylla.
  7. We made some very bad calls on the investor relations front, I foolishly stumbled into a premature, fumbled, and retrospectively idiotic fundraise, into the middle of which my second child was born and the first acquisition offers came in.
  8. We launched Prodfiler in August 2021, to great acclaim and success. People loved the product, they loved the frictionless deployment, they loved the fact that all their stack traces were symbolized out of the box etc. - the product experience was great.
  9. In mid-October, we were acquired by Elastic with the closing date November 1st. My mother had a hip surgery from which complications arose, which led to her being transferred into an ICU.
    The day the deal closed, my mother fell into a coma, and she would never wake up again. I spent the next weeks shuttling back and forth between Zurich (where my wife and my two kids were) and Essen, Germany, to spend time bedside in the ICU.
    My mother died in the morning hours of Jan 1st 2022, a few hours after the fireworks.
  10. My elderly father needed a lot of help dealing with the aftermath; at the same time the transition into the Elastic tech stack was technically challenging to pull off.
  11. In Summer 2022, my father stumbled after a small leg surgery, fell, and hit his head; after some complications in the German medical system, it became clear that the injury had induced dementia. We transferred him to a specialist hospital in Berlin and ultimately to a care home close to my brother's family. Since then, I've been shuttling back and forth to see him often.
  12. After two years of hard work at Elastic, we finally managed to launch our product again in fall 2023.

So the entire thing was 5 years, in which I had two children, started a company, hired the best team I've known, launched a product I was (and am) immensely proud of, then lost my mother, most of my father ... and "reluctantly let go" of the company and product.

The sheer forces at play when you cram so much momentum into such a short time-frame will strain everybody; and they will strain everybody's support system. I'm extremely grateful for my entire support system, in particular my brother. I don't know how I would've fared without him, but I hope my kids will have as good a relationship with each other as I do with my brother.

I'm also grateful to the folks at Elastic and the optimyze team, who were extremely supportive and understanding as I was dealing with complications outside of work.

I'm proud that we managed to build, I am also proud that we managed to port it to the Elastic stack and re-launch it. Even after more than 2 years focused on porting the back-end, our profiler remains ahead of the competition. I'm optimistic about what Elastic and the team can build on top of our technology, in particular with OTel profiling moving toward reality.

At the same time, I am pretty spent. My productivity is nowhere near where I expect it to be (it never is - I have difficulty accepting that I am a finite human - but the gap is bigger than usual), and this leads to me having difficulty switching off: When I feel like I am not getting the things I want to get done done, my brain wants to compensate by working more - which is rarely the right step.

So, with a heavy heart, I decided that I will take an extended break. It's been intense, and emotional, and I need some time to rest and recover, and accompany my father on his last few steps into the darkness (or light?). 2019 and 2020 were among the happiest years of my life, the last chunk of 2021 and most of 2022 the most difficult parts of my life. 2023 was trending up, and I expect things to continue trending up for the foreseeable future.

I have planned to do a bit of writing (I think having done two companies, one bootstrapped and one with VC money, gives me a few things I'd like to pass on), perhaps a bit of angel investing or VC scouting, perhaps a bit of consulting where things of particular interest arise - but mostly, I intend to stretch, breathe, be there for my kids, and get a clear view of the horizon.

Monday, December 11, 2023

A list of factors that act(ed) as drag on the European Tech/Startup scene

This post is an adaption of a Twitter thread where I listed the various factors that in my experience led to a divergence of the trajectories of the US tech industry around Silicon Valley (SV) and the tech industry in Europe. Not all of these factors are current (some of the cultural ones are less pronounced today than they used to be), and some of them could be relatively easily fixable.

I'll add a separate post on policy suggestions at a later point.

I should also note that there's many great things about Europe -- I still live here, I'd build my next company here, and I don't think I'd ever want to migrate to SV. I'll also write about the advantages in the future.

Now, on to the list, which was spawned by a thread with @martin_casado and @bgurley on the website previously known as Twitter.
  1. Cultural factors: When I was growing up in the 90s, there was significant uncertainty in the labor market, and one way to achieve economic security was seeking a government job. In many European countries, running a limited liability construct into insolvency effectively bans you from running another one in the foreseeable future. The mentality of "start a company in your 20s, and if you fail, you can either try again or get a job" wasn't a thing. So we are operating from a risk-averse base, due to a labor market with then-sluggish job creation and strong incumbent effects. (Bert Hubert has written a more extensive article on the cultural factors here).
  2. A terrifyingly fragmented market, along legal, linguistic, and cultural lines. Imagine every US state had its own language, defense budget, legal system, tax system, culture, employment law etc. - in the US, you build a product and you tap into a market of 340m people. The biggest market in Europe is Germany at 80m, not even a quarter of the size. Then France (65m), Italy (59m), Spain (47m), and then things fragment into a long tail. By the time you hit 340m customers, you're operating in 9-10 countries, 7+ languages and legal systems etc.
  3. Equally fragmented capital markets that are individually much smaller. Take the US stock market and cut it into 10+ pieces. This has knock-on effects for IPOs: IPOs, when they happen, tend to be much smaller. Raising large amounts of capital is more difficult, while big wins are smaller. This has terrible knock-on effects all the way down to seedstage VCs: If the power law home run you're angling for is 1/10th the size of the home run in the US, early stage investors need to be way more risk averse. You can see this even today where most European VC funds will offer less money at worse terms than their US counterparts. It was much worse in 2006-2007, when the Samwers were almost the only game in town for VC in the EU.
    Smaller IPOs also mean that it is comparatively much more attractive to sell to an existing (US-based) giant.
  4. The absence of a DARPA to shoulder fundamental research risks in technology. Different stages of R&D require different investors. The government is in the strange situation that they can indirectly benefit from investments without having an ownership stake because it gets to tax GDP. That means at the extremely high risk end of R&D, fundamental research, it can afford to just finance many many long shots blindly and (comparatively) simply, as it doesn't need to track ownership. So how do you fund fundamental R&D without it devolving into scholasticism? Interestingly, the most basic test ("can I use this to cause some damage") is already helpful. Europe's defense sector has never since WW2 grasped it's role in advancing technology, and it's terribly fragmented, underfunded, and can't do much research. DARPA has financed the early-stage development of many enabling technologies. Having a guaranteed customer (DoD) for high risk research has enabled better and higher risk-taking, and had large downstream effects.
  5. Terrible legislation with regards to employee stock options. People talk about how many big companies in Europe are family-owned as if that is something good. It's also a symptom of legal systems that make (or made) it terribly difficult to give lots of equity to early employees. This is slowly changes through concerted lobbying, but it is still difficult in most jurisdictions, and not unified at all.
  6. The way the EU is constructed where the EU gives a directive and each country implements it's own flavor is worst-case for legal complexity. Imagine if every state got to re-implement its own flavor of each federal law.
  7. Founder Brain Drain. Why would an ambitious founder not go to where the markets are bigger, capital is easier to raise on better terms, and incentivizing early employees is easier?
  8. Ecosystem effects permit risk-taking by employees in SV. SV has such strong demand for talent that an employee can "take risks" on early stage startups because the next job is easy to get. If you live in a place with just 1-2 big employers, leaving with intent to return is riskier.
  9. Network effects and path dependence. The fragmentation of the market led to smaller players in search and ads that then sold to larger US-based players. Without the deep revenue streams, no European player had the capital or expertise to go into cloud. As a result, there is no European player with enough compute, or datasets, or capital to effectively compete in cloud or AI. China has homegrown players, even Russia has to some extent, Europe's closest equivalent are OVH and Hetzner, which sell on price, not on higher-level services.
  10. GDPR after effects: EUparl saw that in situations where US states are fragmented they can act as a standards body, and there's a weird effect of "if we cannot be relevant through tech, we can still be relevant through shaping the legal landscape", and that's what leads to this terrible idea of "Europe as regulatory superpower", where it is more important for members of EUparl to have done "something" than having done "something right" - a mentality that seems to prefer bad regulation over no regulation, when good regulation would be needed. GDPR led to higher market concentration in Ads, which arguably undermines privacy in a different way, and it's imposed huge compliance and convenience cost on everybody. But in EUparl it's celebrated as success, because hey, for once Europe was relevant (even if net effects are negative).
  11. Pervasive shortsightedness among EU national legislators, undermining the single market and passing poor laws with negative side effects for startup and capital formation. The best example is Germans "exit tax": Imagine you are an Angel Investor in the US but if you move out of state it triggers immediate cap gains on all your illiquid holdings/Angel Investments at the valuation of the last round. It essentially means you can't angel invest if you don't know if you'll have to move in the next 8-10 years because you don't know if you can afford the tax bill. It's hair-raisingly insane, and likely illegal under EU rules, but who wants to fight the German IRS in European court?
I think these are the most important factors that come to mind. I'll add more if I remember more of them.

Also, given that this post has a strong resonance with extreme "anti government" and "libertarian" types, please be aware that I am very much on a different area of the political spectrum (centre-left, somewhere where the social democrats used to reside historically in Germany). I am strongly in favor of good and competent regulation to ensure markets function, competition works, and customers are protected.

Tuesday, February 23, 2021

Book Review: "This Is How They Tell Me the World Ends"

This blog post is a review of the book "This Is How They Tell Me the World Ends" by Nicole Perlroth. The book tries to shed light on the "zero day market" and how the US government interacts in this market, as well on various aspects of nation-to-nation "cyberwarfare".

I was excited to see this book come out given that there are relatively few hats in this field I have not worn. I have worked in information security since the late 1990's; I was part of a youth culture that playfully pioneered most of the software exploitation techniques that are now used by all major military powers. I have run a business that sold technology to both defenders and offensive actors. I have written a good number of exploits and a paper clarifying the theoretical foundations for understanding them. I have trained governments and members of civil society on both the construction and the analysis of exploits, and on the analysis of backdoors and implants. I have spent several months of my life reading the disassembled code of Stuxnet, Duqu, and the Russian Uroburos. I spent half a decade at Google on supporting Google's defense against government attackers; I spent a few additional years in Project Zero trying to nudge the software industry toward better practices. Nowadays, I spend my time on efficiency instead of security.

I have always been close to, but never part of, the zero-day market. My background and current occupation give me a deep understanding of the subject, while not tying me economically to any particular perspective. I therefore feel qualified like few others to review the book.

"This Is How They Tell Me the World Ends" tackles an important question: What causes the vulnerability of our modern world to "cyberattacks"? Some chapters cover various real-world cyberattacks, some chapters try to shed light on the "market for exploits", and the epilogue of the book discusses ideas for a policy response.

The author managed to get access to a fantastic set of sources. Many things were captured on the record that were previously only discussed on background. Several chapters recount interviews with former practitioners in the exploit market, and these chapters provide a glimpse into the many fascinating and improbable personalities that make up the field. This is definitely a strong asset of the book.

Given the exciting and impactful nature of the "cyberwar" subject, the many improbable characters populating it, and the many difficult and nuanced policy questions in the field, the level of access and raw material the author gathered could have been enough for a fantastic book (or even two).

Unfortunately, "This Is How They Tell Me the World Ends" is not a fantastic book. The potential of the source material is diluted by a large number of inaccuracies or even falsehoods, a surprising amount of ethnocentricity and US-American exceptionalism (that, while being a European, I perceived to border on xenophobia), a hyperbolic narration style, and the impression of facts bent to support a preconceived narrative that has little to do with reality.

For the layperson (presumably the target audience of this book) the many half-truths and falsehoods make the book an untrustworthy guide to an important and difficult topic. For the expert, the book may be an entertaining, if jarring read, provided one has the ability to dig through a fair bit of mud to find some gold. I am confident that the raw material must be great, and where it shines through, the book is good.

Inaccuracies and Falsehoods

The topic is complex, and technical details can be difficult to get right and transmit clearly. A book without any errors cannot and should not be expected, and small technical errors should not concern the reader. That said, the book is full of severe and significant errors - key misunderstandings and false statements that are used as evidence and to support conclusions - and those do raise concerns.

I will highlight a few examples of falsehoods or misleading claims. I only found those falsehoods that overlapped with expertise of mine; extrapolating from this, I am afraid that there may be many more in the book.

The following examples are from the first third of the book; and they are illustrative of the sort of mistakes throughout: Facts are either twisted or exaggerated to the point of becoming demonstrably false; and these twists and exaggerations seem to always happen in support of a narrative that places an unhealthy focus on zero-days.

First, one of the more egregious falsehoods is the claim that NSA hacked into Google servers to steal data:

... the agency hacked its way into the internal servers at companies like Google and Yahoo to grab data before it was encrypted.
This simply did not happen. As far as anyone in the industry knows, in the case of Google, unencrypted network connections between datacenters were tapped. This may sound inconsequential, but undermines the central "zero days are how hacking happens" theme of the book.

Second, the entire description of zero-days is full of false claims and hyperbole:
Chinese spies used a single Microsoft zero-day to steal some of Silicon Valley's most closely held source code.
This alludes to the Aurora attacks on Google; but anyone that knows Google's internal culture knows that source code is not most closely held by design. Google always had a culture where every engineer could roam through almost all the code to help fix issues.
...Once hackers have figured out the commands or written the code to exploit it, they can scamper through the world's computer networks undetected until the day the underlying flaw is discovered
This is simply not true. While a zero-day exploit will provide access to a given machine or resource, it is not a magic invisibility cloak. The Chinese attackers were detected, and many other attackers are routinely detected in spite of having zero-day exploits.
...Only a select few multinationals are deemed secure enough to issue the digital certificates that vouch (...) that Windows operating system could trust the driver  (...) Companies keep the private keys needed to abuse their certificates in the digital equivalent of Fort Knox.
This section is at best misleading: The driver in question was signed with a stolen JMicron "end-entity" certificate. There are thousands of those, all with the authority to sign device drivers to be trusted, and the due diligence to issue one used to be limited to providing a fax of an ID and a credit card number.

The "select few multinationals" Perlroth writes about here are the certificate authorities that issue such "end-entity" certificates. It is true that a CA is required to keep their keys on a hardware security module (a very high-security setup), and that the number of CAs that can issue driver-signing certificates is limited (and falling).

The text makes it appear as if a certificate from a certificate authority (and hence from a hardware security module) had been stolen. This is simply false. End-entity certificates are issued to hardware vendors routinely, and many hardware vendors play fast and loose with them.

(It is widely rumored - but difficult to corroborate - that there used to be a thriving black market where stolen end-entity certificates were traded a few years ago; the going rate was between $30k and $50k if I remember correctly.

Ethnocentricity and US exceptionalism

As a non-US person, the strangest part of the book was the rather extreme ethnocentricity of the book: The US is equated with "respecting human rights", everything outside of the US is treated as both exotic and vaguely threatening, and the book obsesses about a "capability gap" where non-US countries somehow caught up with superior US technology.

This ranges from the benign-but-silly (Canberra becomes the "outback", and Glenn Greenwald lives "in the jungles of Brazil" - evoking FARC-style guerillas, when - as far as I am informed - he lives in a heavily forested suburb of Rio) to seriously impacting and distorting the narrative.

The author seems to find it unimaginable that exploitation techniques and the use of exploits are not a US invention. The text seems to insinuate that exploit technologies and "tradecraft" were invented at NSA and then "proliferated" outward to potentially human-rights-violating "foreign-born" actors via government contractors that ran training classes.

This is false, ridiculous, and insulting on multiple levels.

First off, it is insulting to all non-US security researchers that spent good parts of their lives pioneering exploit techniques.

The reality is that the net flow of software exploitation expertise out of NSA is negative: Half a generation of non-US exploit developers migrated to the US over the years and acquired US passports eventually. The US exploit supply chain has always been heavily dependent on "foreign-born" people. NSA will enthusiastically adopt outside techniques; I have yet to learn about any exploitation technique of the last 25 years that "leaked" out of NSA vs. being invented outside.

The book's prologue, when covering NotPetya, seems to imply that Russia had needed the Shadowbrokers leaks - ("American weapons at its disposal") - to cause severe damage. Anybody with any realistic visibility into both the history of heap exploitation and the vulnerability development community knows this to be absolutely wrong.

Secondly, it seems to willfully ignore recent US history with regards to human rights. Somehow implying that the French police or the Norwegian government have a worse human rights track record than the US government - which unilaterally kills people abroad without fair trial via the drone-strike program, relatively recently stopped torturing people, and keeps prisoners in Guantanamo for 15+ years by having constructed a legal grey zone outside of the Geneva Conventions - is a bit rich.

In the chapter on Argentina, Ivan Arce calls the author out on her worldview (which was one of my favorite moments in the book), but this seems to have not caused any introspection or change of perspective. This chapter also reveals an odd relationship to gender: The narrative focuses on men wreaking havoc, and women seem to exist to rein in the out-of-control hackers. Given that there are (admittedly few, but extremely capable) women and non-binary folks active in the zero-day world, I find this narrative puzzling.

There is also an undercurrent that everything bad is caused by nefarious foreign intervention: The author expresses severe doubts that the 2016 US election would have had the same outcome without "Russian meddling", and in the Epilogue writes "it is now easier for a rogue actor to (...) sabotage (...) the Boeing 737 Max", somehow managing to link a very US-American management failure to vague evil forces.

In its centricity on the US and belief in US exceptionalism, its noticeable grief about the 2016 US election, and the vague suspicion that everything bad must have a foreign cause, the reader learns more about the mindset of a certain subset of the US population than about cybersecurity or cyberwarfare.

Hyperbolic language

The book is also made more difficult to read by constant use of hyperbolic language. Exploits are capable of "crashing Spacecraft into earth", "detonated to steal data", and things always need to be "the most secure", "the most secret", and so forth. The book would have benefitted from the editor-equivalent of an equalizer to balance out the wording.

The good parts

There are several things to like about the book: The chapters that are based on interviews with former practitioners are fun and engaging to read. The history of software exploits is full of interesting and unorthodox characters, and these chapters provide a glimpse into their world and mindsets.

The book also improves as it goes on: The frequency of glaring falsehoods seems to decrease - which lets the fact that it is generally engaging come through.

Depending on what one perceives the thesis of the book to be, one can also argue that the book advances an important point. The general subject - "how should US government policy balance offensive and defensive considerations" - is a deep and interesting one, and there is a deep, important, and nuanced discussion to be had about this. If the underlying premise of the book is "this discussion needs to be had", then that is good. The book seems to go much beyond this (reasonable) premise, and seems to mistakenly identify the zero-day market as the root cause of pervasive insecurity.

As a result, the book contributes little of utility to a defensive policy debate. The main drivers of the cyber insecurity are hardly discussed until the Epilogue: The economic misincentives that cause the tech industry to earn hundreds of billions of dollars from creating the vulnerabilities in the first place (for every million earned through the sale of exploits, an order of magnitude or two more is earned through the sale of the software that creates the security flaw), and the organisational misincentives that keep effective regulation from arising (NSA - rightly - has neither mission or authority to regulate the tech industry into better software, so accusing them of not doing so is a bit odd). By placing too much emphasis on governments knowing about vulnerabilities, the book distracts from the economic forces that create a virtually infinite supply of them.

The Epilogue (while containing plenty to disagree with) was one of the stronger parts of the book. The shortness makes it a bit shallow, but it touches on many points that warrant a serious discussion. (Unfortunately, it again insinuates that "ex-NSA hackers tutor Turkish Generals in their tradecraft"). If anything, the Epilogue can be used as a good (albeit incomplete) list of topics to discuss in any cybersecurity policy class.

Concluding thoughts

I wish the book realized more of the potential that the material provided. The debate about the policy trade-offs for both offense and defense needs to be had (although there is less of a trade-off than most people think: Other countries have SIGINT agencies that can do offense, and defensive agencies focused on improving the overall security level of society; fixing individual bugs will not fix systemic misincentives), and a good book about that topic would be very welcome.

Likewise, a book that gives a layperson a good understanding of the zero-day trade and the practitioners in the trade would be both useful and fascinating.

The present book had the potential to become either of the above good books - the first one by cutting large parts of the book and expanding the Epilogue, the second one by rigorous editing and sticking to the truth.

So I regret having to write that the present book is mostly one of unfulfilled potential, and that the layperson needs to consult experts before taking any mentioned "fact" in the book at face value.

Wednesday, September 16, 2020

The missing OS

Preface:

When I joined Google in 2011, I quoted a quip of a friend of mine:
"There are roughly one and a half computers in the world, and Google has one of them."
The world has changed quite a bit since 2011, and there may possibly be half a dozen computers in the world now. That said, for the following text to make sense, when I say "the computer", I mean a very large assembly of individual machines that have been connected to make them act like one computer.

Actual blog post:

The tech landscape of modern microservice deployments can be confusing - it is fast-changing, with a proliferation of superficially similar projects claiming to do similar things. Even to me as someone fairly deeply into technology, it isn't always clear what precise purpose the different projects serve.

I've quipped repeatedly about "Datacenter OS" (at least here and here), and mused about it since I first left Google for my sabbatical in 2015. I recently had the chance to chat with a bunch of performance engineers (who sit very much at the crossing between Dev and Ops), and they reminded me to write up my thoughts. This is a first post, but there may be more coming (particularly on the security models for it).

Warning: This post is pure, unadulterated opinion. It is full of unsubstantiated unscientific claims. I am often wrong.

I claim the following:
When we first built computers, it took a few decades until we had the first real "operating systems". Before a 'real' OS emerged, there were a number of proto-OS -- collections of tools that had to be managed separately and cobbled together. There were few computers overall in the world, and if you wanted to work on one, you had to work at a large research institution or organization. These machines ran cobbled-together OSs that were unique to that computer.

Since approximately 2007, we're living through a second such period: The "single computer" model is replaced with "warehouse-sized computers". Initially, few organizations had the financial heft to have one of them, but cloud computing is making "lots of individual small computers" accessible to many companies that don't have a billion of cash for a full datacenter.

The hyperscalers (GOOG, FB, but also Tencent etc.) are building approximations to a "proto-datacenter-OS" internally; Amazon is externalizing some of theirs, and a large zoo of individual components for a Datacenter-OS exist as open-source projects.

What does not exist yet is an actual complete DatacenterOS that "regular" companies can just install.

There is a "missing OS" - a piece of software that you install on a large assembly of computers, and that transform this assembly of computers into "one computer".

What would a "Datacenter OS" consist of? If you look at modern tech stacks, you find that there is a surprising convergence - not in the actual software people are running, but in the "roles" that need to be filled. For each role, there are often many different available implementations.

The things you see in every large-scale distributed infrastructure are:

  1. Some form of cluster-wide file system. Think GFS/Colossus if you are inside Google, GlusterFS or something like it if you are outside. Many companies end up using S3 because the available offerings aren't great.
  2. A horizontally scalable key-value store. Think BigTable if you are inside Google, or Cassandra, or Scylla, or (if you squint enough) even ElasticSearch.
  3. A distributed consistent key-value store. Think Chubby if you are inside Google, or etcd if you are outside. This is not directly used by most applications and mostly exists to manage the cluster.
  4. Some sort of pub/sub message queuing system. Think PubSub, or in some sense Kafka, or SQS on AWS, or perhaps RabbitMQ.
  5. A job scheduler / container orchestrator. A system that takes the available resources, and all the jobs that ought to be running, and a bunch of constraints, and then solves a constrained bin-packing optimization problem to make sure resources are used properly. Think Borg, or to some extent Kubernetes. This may or may not be integrated with some sort of MapReduce-style batch workload infrastructure to make use of off-peak CPU cycles.

I find it very worthwhile to think about "what other pieces do I have on a single-laptop-OS that I really ought to have on the DatacenterOS?".

People are building approximations of a process explorer via Prometheus and a variety of other data collection agents.

One can argue that distributed tracing (which everybody realizes they need) is really the Datacenter-OS-strace (and yes, it is crucially important). The question "what is my Datacenter-OS-syslog" is similarly interesting. 

A lot of the engineering that goes into observability is porting the sort of introspection capabilities we are used to having on a single machine to "the computer".

Is this "service mesh" that people are talking about just the DatacenterOS version of the portmapper?

There are other things for which we really have no idea how to build the equivalent. What does a "debugger" for "the computer" look like? Clearly, single-stepping on a single host isn't the right way to fix problems in modern distributed systems - your service may be interacting with dozens of other hosts that may be crashing at the same time (or grinding to a halt or whatever), and re-starting and single-stepping is extremely difficult.

Aside from the many monitoring, development, and debugging tools that need to be rebuilt for "the computer", there are many other - even more fundamental - questions that really have no satisfactory answer. Security is a particularly uncharted territory:

What is a "privileged process" for this computer? What are the privilege and trust boundaries? How does user management work? How does cross-service authentication and credential delegation work? How do we avoid re-introducing literally every logical single-machine privilege escalation that James Forshaw describes in his slides into our new OS and the various services running there? Is there any way that a single Linux Kernel bug in /mm does not spell doom for our entire security model?

To keep the post short:

In my opinion, the emerging DatacenterOS is the most exciting thing that has happened in computer science in decades. I sometimes wish I was better at convincing billionaires to give me a few hundred million dollars to invest in interesting problems -- because if there is a problem that I think I'd love to work on, it'd be a FOSS DatacenterOS - "install this on N machines, and you have 'a computer'".

A lot of the technological landscape is easier to understand if one asks the question: What function in "the computer" does this particular piece of the puzzle solve? What is the single-machine equivalent of this project?

This post will likely have follow-up posts, because there are many more ill-thought-out ideas I have on the topic:

  • Security models for a DatacenterOS
  • Kubernetes: Do you want to be the scheduler, or do you want to be the OS? Pick one.
  • How do we get the power of bash scripting, but for a cluster of 20k machines?

 

 

 



Friday, August 14, 2020

My Twitter-Discussion-Deescalation Policy

Twitter is great, and Twitter is terrible. While it enables getting in contact and starting loose discussions with a great number of people, and while it has certainly helped me broaden my perspectives and understanding of many topics, it also has a lot of downsides.

Most importantly, Twitter discussions, due to their immediacy of feedback and the fact that everybody is busy, often end up in shouting matches where "learning from each other while discussing a topic" (the actual purpose of a discussion) is forgotten.

Most importantly: Twitter can be very repetitive, and it can be very difficult to convey the context for complex topics - and nobody has time to repeat all the context in each Twitter discussion.

Today, I am recovering from a migraine attack that coincided with my kid having a cranky night, and as a result, I cut a few Twitter discussions short. The people on the receiving end of this "short-cutting" may rightly feel slighted, so I am writing this blog post in preparation for future similar situations.

There are some topics (often related to security or economics) about which I have thought for a reasonably long time. Particularly for security, we're talking about a few decades of hands-on experience with a fairly obsessive work on the topic, both on the theoretical and on the practical side. Rooted in this experience, I sometimes make statements on Twitter. These statements may be in conflict with what other people (you?) may think, and we may engage in a discussion. It is possible, though, that we will reach a point in the discussion where my feeling is "oh, in order to convey my point, I'd now need to spend 25 minutes conveying the context necessary for my point, and I only have a few hours in my day after I deduct sleep and other obligations".

At this point, I need to make a judgement call: Do I invest that time? I also need to make the call without having the most important context: Does the other side care about understanding me at all?

So if we end up in a Twitter discussion, and I reply to you with a link to this blog post at some point, please understand: I have run out of time to spend on this Twitter thread, and I need to cut the discussion short because conveying the necessary context is too time consuming without knowing that this is actually desired, and that our discussion is a mutual learning exercise.

If you very much care about the topic, and about understanding the perspective I have, I will happily schedule a 25-minute video call to discuss in person, and will obviously make an effort to understand your perspective, too. My DM's are open, ping me and I will send you a calendly link.