Wednesday, September 16, 2020

The missing OS

Preface:

When I joined Google in 2011, I quoted a quip of a friend of mine:
"There are roughly one and a half computers in the world, and Google has one of them."
The world has changed quite a bit since 2011, and there may possibly be half a dozen computers in the world now. That said, for the following text to make sense, when I say "the computer", I mean a very large assembly of individual machines that have been connected to make them act like one computer.

Actual blog post:

The tech landscape of modern microservice deployments can be confusing - it is fast-changing, with a proliferation of superficially similar projects claiming to do similar things. Even to me as someone fairly deeply into technology, it isn't always clear what precise purpose the different projects serve.

I've quipped repeatedly about "Datacenter OS" (at least here and here), and mused about it since I first left Google for my sabbatical in 2015. I recently had the chance to chat with a bunch of performance engineers (who sit very much at the crossing between Dev and Ops), and they reminded me to write up my thoughts. This is a first post, but there may be more coming (particularly on the security models for it).

Warning: This post is pure, unadulterated opinion. It is full of unsubstantiated unscientific claims. I am often wrong.

I claim the following:
When we first built computers, it took a few decades until we had the first real "operating systems". Before a 'real' OS emerged, there were a number of proto-OS -- collections of tools that had to be managed separately and cobbled together. There were few computers overall in the world, and if you wanted to work on one, you had to work at a large research institution or organization. These machines ran cobbled-together OSs that were unique to that computer.

Since approximately 2007, we're living through a second such period: The "single computer" model is replaced with "warehouse-sized computers". Initially, few organizations had the financial heft to have one of them, but cloud computing is making "lots of individual small computers" accessible to many companies that don't have a billion of cash for a full datacenter.

The hyperscalers (GOOG, FB, but also Tencent etc.) are building approximations to a "proto-datacenter-OS" internally; Amazon is externalizing some of theirs, and a large zoo of individual components for a Datacenter-OS exist as open-source projects.

What does not exist yet is an actual complete DatacenterOS that "regular" companies can just install.

There is a "missing OS" - a piece of software that you install on a large assembly of computers, and that transform this assembly of computers into "one computer".

What would a "Datacenter OS" consist of? If you look at modern tech stacks, you find that there is a surprising convergence - not in the actual software people are running, but in the "roles" that need to be filled. For each role, there are often many different available implementations.

The things you see in every large-scale distributed infrastructure are:

  1. Some form of cluster-wide file system. Think GFS/Colossus if you are inside Google, GlusterFS or something like it if you are outside. Many companies end up using S3 because the available offerings aren't great.
  2. A horizontally scalable key-value store. Think BigTable if you are inside Google, or Cassandra, or Scylla, or (if you squint enough) even ElasticSearch.
  3. A distributed consistent key-value store. Think Chubby if you are inside Google, or etcd if you are outside. This is not directly used by most applications and mostly exists to manage the cluster.
  4. Some sort of pub/sub message queuing system. Think PubSub, or in some sense Kafka, or SQS on AWS, or perhaps RabbitMQ.
  5. A job scheduler / container orchestrator. A system that takes the available resources, and all the jobs that ought to be running, and a bunch of constraints, and then solves a constrained bin-packing optimization problem to make sure resources are used properly. Think Borg, or to some extent Kubernetes. This may or may not be integrated with some sort of MapReduce-style batch workload infrastructure to make use of off-peak CPU cycles.

I find it very worthwhile to think about "what other pieces do I have on a single-laptop-OS that I really ought to have on the DatacenterOS?".

People are building approximations of a process explorer via Prometheus and a variety of other data collection agents.

One can argue that distributed tracing (which everybody realizes they need) is really the Datacenter-OS-strace (and yes, it is crucially important). The question "what is my Datacenter-OS-syslog" is similarly interesting. 

A lot of the engineering that goes into observability is porting the sort of introspection capabilities we are used to having on a single machine to "the computer".

Is this "service mesh" that people are talking about just the DatacenterOS version of the portmapper?

There are other things for which we really have no idea how to build the equivalent. What does a "debugger" for "the computer" look like? Clearly, single-stepping on a single host isn't the right way to fix problems in modern distributed systems - your service may be interacting with dozens of other hosts that may be crashing at the same time (or grinding to a halt or whatever), and re-starting and single-stepping is extremely difficult.

Aside from the many monitoring, development, and debugging tools that need to be rebuilt for "the computer", there are many other - even more fundamental - questions that really have no satisfactory answer. Security is a particularly uncharted territory:

What is a "privileged process" for this computer? What are the privilege and trust boundaries? How does user management work? How does cross-service authentication and credential delegation work? How do we avoid re-introducing literally every logical single-machine privilege escalation that James Forshaw describes in his slides into our new OS and the various services running there? Is there any way that a single Linux Kernel bug in /mm does not spell doom for our entire security model?

To keep the post short:

In my opinion, the emerging DatacenterOS is the most exciting thing that has happened in computer science in decades. I sometimes wish I was better at convincing billionaires to give me a few hundred million dollars to invest in interesting problems -- because if there is a problem that I think I'd love to work on, it'd be a FOSS DatacenterOS - "install this on N machines, and you have 'a computer'".

A lot of the technological landscape is easier to understand if one asks the question: What function in "the computer" does this particular piece of the puzzle solve? What is the single-machine equivalent of this project?

This post will likely have follow-up posts, because there are many more ill-thought-out ideas I have on the topic:

  • Security models for a DatacenterOS
  • Kubernetes: Do you want to be the scheduler, or do you want to be the OS? Pick one.
  • How do we get the power of bash scripting, but for a cluster of 20k machines?

 

 

 



Friday, August 14, 2020

My Twitter-Discussion-Deescalation Policy

Twitter is great, and Twitter is terrible. While it enables getting in contact and starting loose discussions with a great number of people, and while it has certainly helped me broaden my perspectives and understanding of many topics, it also has a lot of downsides.

Most importantly, Twitter discussions, due to their immediacy of feedback and the fact that everybody is busy, often end up in shouting matches where "learning from each other while discussing a topic" (the actual purpose of a discussion) is forgotten.

Most importantly: Twitter can be very repetitive, and it can be very difficult to convey the context for complex topics - and nobody has time to repeat all the context in each Twitter discussion.

Today, I am recovering from a migraine attack that coincided with my kid having a cranky night, and as a result, I cut a few Twitter discussions short. The people on the receiving end of this "short-cutting" may rightly feel slighted, so I am writing this blog post in preparation for future similar situations.

There are some topics (often related to security or economics) about which I have thought for a reasonably long time. Particularly for security, we're talking about a few decades of hands-on experience with a fairly obsessive work on the topic, both on the theoretical and on the practical side. Rooted in this experience, I sometimes make statements on Twitter. These statements may be in conflict with what other people (you?) may think, and we may engage in a discussion. It is possible, though, that we will reach a point in the discussion where my feeling is "oh, in order to convey my point, I'd now need to spend 25 minutes conveying the context necessary for my point, and I only have a few hours in my day after I deduct sleep and other obligations".

At this point, I need to make a judgement call: Do I invest that time? I also need to make the call without having the most important context: Does the other side care about understanding me at all?

So if we end up in a Twitter discussion, and I reply to you with a link to this blog post at some point, please understand: I have run out of time to spend on this Twitter thread, and I need to cut the discussion short because conveying the necessary context is too time consuming without knowing that this is actually desired, and that our discussion is a mutual learning exercise.

If you very much care about the topic, and about understanding the perspective I have, I will happily schedule a 25-minute video call to discuss in person, and will obviously make an effort to understand your perspective, too. My DM's are open, ping me and I will send you a calendly link.

Monday, May 18, 2020

My self-help guide to making sense of a confusing world

It has become painfully evident over the last decade or so that social media has a somewhat corrosive effect on "truth" and "discussion". There are a variety of reasons for this - many unidentified - but a few factors are:
  1. For every opinion, no matter how bizarre, it has become easy to find a community with similar beliefs.
  2. The discoverability of almost all information coupled with the shortening of attention spans allows people with strange beliefs to search for information at - at least if only glanced at for 15 seconds - may be interpreted to confirm their strange belief.
  3. Algorithms that maximize engagement also maximize enragement -- if the algorithms show me content that draws me into a time-sink discussions with no benefit, they are "winning" (in terms of the metrics against which they are evaluated).
  4. The social media platforms favor "immediacy of feedback" vs. taking time to think things through. Social media discussions often devolve into name-calling or rapid-fire quoting of dubious studies without any qualitative understanding - people quote papers and sources they never critically evaluated.
Aside from that, creating false but engagement-producing content has become a veritable industry. Attention can be monetized, so one can earn good money by making up lies that appear credible to some subset of the population. The quality of mainstream reporting has been caught up in this downward spiral.

The result of this is "fake news" and the "death of a consensus on reality"; strange conspiracy theories; and generally many many hours wasted. The problem cuts across class and educational layers; it is simply not true that "only the uneducated" fall prey to internet falsehoods.

Personally, I am terrified of believing things that are not true. I am not quite sure why; but to assuage my fears of misleading myself, I have adapted a number of habits to function as checks on my own beliefs.

By and large, I have found them very useful.

In this blog post, I intend to share my thoughts on how people mislead themselves, and the checks I have decided to apply to my own beliefs. I am not always successful, but this is the bar I try to measure myself against. My hope is that this is helpful for others; perhaps it can help reject false beliefs somewhere.

So let's begin with an underlying assumption I make:

People tend to believe in things that help them.

As a young man I believed that people try to understand a situation, and then form a belief based on that. This is not what I observed in my life. My observation is that people choose beliefs and systems of belief to fulfill a function for them.

My father is born in the 30s in Germany, and as a pre-teen and early teen, he got a front-row seat to watch all adults perform an ideological 180-degree turn in front of him. The question of "how do people adjust their beliefs" has always been important in my discussions with him.

My conclusion is that people are very good at identifying what they want, and what is beneficial to them. They also like to feel good about themselves, and about what they do. Given these constraints, people tend to pick beliefs and systems of belief that ...
  • ... allow them to do what they want to do.
  • ... allow them to reap benefits.
  • ... allow them to feel good about themselves at the same time.
I alluded to this with the sentence "Everybody wants to be the hero of their own story" in my disclosure Rashomon post.

It is crucially important to be aware that belief systems have a functional role for those that believe them. This is why it can be so hard to "convince" anyone of the incorrectness of their belief system: You are asking the person to give up more than a false belief - often, you are asking the person to adjust their view of themselves as being less benign than they like to believe, or you are asking the person to adjust their view in a manner that would cast doubt on their ability to obtain some other benefit.

When I write "people" above, this includes you and me.

Being aware of the functional role of beliefs is hence important when you investigate your own beliefs (more on that later). Trying to believe what makes us feel good is the most fundamental cognitive bias.

So what am I trying to do to counter that bias? Here's my list of 7 habits:
  1. Clarify your beliefs
  2. Ask about the benefits of your beliefs
  3. Seek out original sources
  4. Examine evidence for your beliefs and competing hypotheses
  5. What new information would change your beliefs?
  6. Provide betting odds
  7. Discuss for enlightenment, not "winning"

Habit 1: Clarify beliefs

It may sound odd, but it takes conscious effort to turn everyday diffuse "belief" into an actual clearly articulated statement. For me, nothing quite clarifies thoughts like writing them down - often things that appear clear and convincing in my head turn out to be quite muddled and unstructured when I try to put them to paper.

Asking oneself the question "what are my beliefs on a given topic", and trying to write them down coherently, is surprisingly powerful. It forces a deliberate effort to determine what one actually believes, and committing to that belief in writing (at least to oneself).

Habit 2: Ask about the benefits of your beliefs - "am I the baddie?"

Awareness of the functional role of beliefs is important when examining one's own beliefs. Feynman famously said about science that "the first principle is that you must not fool yourself and you are the easiest person to fool".

When examining my own beliefs, it try to ask myself: What benefits does this belief bestow on me? What does this belief permit me to do? How does this belief make me feel good about myself?

It is quite often helpful to actively try to examine alternative narratives in which one casts oneself in a bad light. Trying to maintain our positive image of ourselves is a strong bias; making a conscious effort at examining alternate, unflattering narratives can be helpful.

My wife and me sometimes jokingly play "mock custody battle" - a game where we jokingly try to portray each other as some sort of terrible person and reinterpret each others entire biography as that of a villain - and it is quite enlightening.

Habit 3: Seek out original sources, distrust secondary reporting

Both for things like presidential debates and for scientific papers, the secondary reporting is often quite incorrect. In political debates, it is often much less relevant how the debate went and what happened -- only a small fraction of the population will have witnessed the event. What really counts is the narrative that gets spun around what happened.

You can observe this during election season in the US, where as soon as the debate is over, all sides will try to flood all the talkshows and newscasts with their "narrative" (which has often been pre-determined prior to the debate happening - "He is a flip-flopper. He flipflops." or something along those lines).

Likewise, scientific papers often get grossly misrepresented in popular reporting, but also by people that only superficially read the paper. Reporting is often grossly inaccurate, and if you are an expert on a topic, you will notice that on your topic of expertise, reporting is often wrong; at the same time, we somehow forget about this and assume that it is more accurate on topics where we are not experts (the famous "Gell-Mann amnesia").

A friend of mine (scientist himself) recently forwarded me a list of papers that estimated COVID-19 IFR; one of them reported an IFR of 0%. Closer examination of the contents of the paper revealed that they examined blood from blood donors for antibodies; nobody that died of COVID-19 went to donate blood 2 weeks later, so clearly there were no fatalities in their cohort.

Nonetheless, the paper was cited as "evidence that the IFR is lower than people think".

A different friend sent me a "scientific paper" that purported to show evidence that tetanus vaccines had been laced with a medication to cause infertility. Examining the paper, it was little than assembling a bunch of hearsay; no experimental setup, no controlling, no alternative hypothesis etc. Examining the homepages of the authors revealed they were all strange cranks, peddling various strange beliefs. It was "published" in a "we accept everything" journal.

Acquiring the habit to read original sources is fascinating - there are bookshelves full of books that are quoted widely, mostly by people who have never read them (Sun Tzu, Clausewitz, and the Bible are probably the most common). It is also useful: Getting into the habit of reading original papers helps cut out the middle-man and start judging the results directly; it is also a good motivator to learn a good bit of statistics.

 

Habit 4: Examine evidence for your beliefs, analyze competing hypotheses

Once one's own beliefs are clarified in writing, and one has looked at the primary sources, one can gather evidence for one's belief.

While one does so, one should also make a deliberate effort to gather and entertain competing hypotheses: What other explanations for the phenomenon under discussion exist? What are hypotheses advanced by others?

Given one's own beliefs, and alternate hypotheses, one can look at the evidence supporting each of them carefully.

Two principles help me at this stage to discount less-credible hypotheses (including my own):
  • Occam's Razor: Often, the simpler explanation is the more likely explanation
  • Structural malice: Malicious individuals are pretty rare, but if an incentive structure exists where an individual can benefit from malice while explaining it away, the tendency is for that to happen.
  • Incompetence is much more common than competent malice. The Peter principle and Parkinsons law apply to human organisations.
After this step, I end up forming an opinion - looking at the various competing hypotheses, I re-examine which I find most credible. Often, but not always, it is the beliefs that I declared initially, but with reasonable frequency I have to adjust my beliefs after this step.

Habit 5: What new information would change my opinion?

John Maynard Keynes is often quoted with "When the facts change, I change my mind; what do you do, Sir?". It is worth examining what would be necessary to change one's beliefs ahead of time.

Given my belief on a subject right now, what new information would need to be disclosed to me for me to change my mind?

This is very helpful to separate "quasi-religious" beliefs from changeable beliefs.

Habit 6: Provide betting odds

This is perhaps the strangest, but ultimately one of my more useful points. Over the last years, I have read a bit about the philosophy of probability; particularly De Finetti's "Philosophical Lectures on Probability".

When we speak about "probability", we actually mean two different things: The fraction for a given outcome if we can repeat an experiment many times (coin-flip etc.), and the strength of belief that a given thing is true or will happen in situations where the experiment cannot be repeated.

These are very different things - the former has an objective truth, the second one is fundamentally subjective.

At the same time, if my subjective view of reality is accurate, I will assign good probability values (in the 2nd sense) to different events. ("Good" here means that if a proper scoring rule was applied, I would do well).

Beliefs carry little cost, and little accountability. Betting provides cost for being wrong, and accountability about being wrong.

This means that if I truly believe that my beliefs are correct, I should be willing to bet on them; and through the betting odds I provide, I can quantify the strength of my belief.

For me, going through the exercise of forcing myself to provide betting odds has been extremely enlightening: It forced me to answer the question "how strongly do I actually believe this?".

In a concrete example: Most currently available data about COVID-19 hints at an IFR of between 0.49% and 1% (source) with a p-value of < 0.001. My personal belief is that the IFR is almost certainly >= 0.5%. I am willing to provide betting odds of 3:1 (meaning of you bet against me, you get 3x the payout) for the statement "By the end of 2022, when the dust around COVID-19 has settled, the empirical IFR for COVID-19 will have been greater than 0.5%".

This expresses strong belief in the statement (much better than even odds), but some uncertainty around the estimate in the paper (the p-value would justify much more aggressive betting odds).

(Be aware that these betting odds are only valid for the next 4 days, as my opinion may change).

To sum up: Providing betting odds is a great way of forcing oneself to confront one's own strength of belief. If I believe something, but am unwilling to bet on it, why would that be the case? If I believe something, and am unwilling to provide strong odds in favor of that belief, why is that the case? Do I really believe these things if I am unwilling to bet?

Habit 7: Discuss for enlightenment, not "winning"

When I was young, my father taught me that the purpose of a discussion is never to win, or even to convince. The purpose of a discussion is to understand - the topic under discussion, or the position of the other side, or a combination thereof. This gets lost a lot in modern social media "debates".
 
Social media encourages participating in discussions and arguing a side without ever carefully thinking about one's view on a topic. The memoryless and repetitive nature of the medium allows one to spend countless hours re-hashing the same arguments over and over, without making any advance, and ignoring any thought-out arguments that may have been put in writing.

After few weeks after the "Rashomon of Disclosure"-post, a Twitter discussion about disclosure erupted; and I upset a few participants by more or less saying: "Y'know, I spent the time writing down my thoughts and arguments around the topic, and I am very willing to engage with anybody that is willing to spend the time writing down their thoughts and arguments, but I am not willing to engage in Twitter yelling matches where we ignore all the nuance and just whip up tribal sentiment."

This was not universally well-received, but refusal to engage in social media yelling matches and the dopamine kick that arises from the immediacy of the experience is an important step if we want to understand either the topic or the other participants in the debate.

Summary

This was a very long-winded post. Thank you for having read this far. I hope this post will be helpful - perhaps some of the habits can be useful to others. If not, this blog post can at least explain why I will sometimes withdraw from social media discussions, and insist on long-form write ups as a means of communication. It should also be helpful to reduce bewilderment if I offer betting odds in surprising situations.


Friday, March 20, 2020

Before you ship a "security mitigation" ...

Hey everybody,

During my years doing vulnerability research and my time in Project Zero, I frequently encountered proposals for new security mitigations. Some of these were great, some of these - were not so great.

The reality is that most mitigations or "hardening" features will impose a tax on someone, somewhere, and quite possibly a heavy one. Many security folks do not have a lot of experience running reliable infrastructure, and some of their solutions can break things in production in rather cumbersome ways.

To make things worse, many mitigations are proposed and implemented with very handwavy and non-scientific justifications - "it makes attacks harder", it "raises the bar", etc., making it difficult or impossible for third parties to understand the design and trade-offs considered during the design.

Over the years, I have complained about this repeatedly, not least in this Twitter thread:

https://twitter.com/halvarflake/status/1156815950873804800

This blog post is really just a recapitulation of the Twitter thread:

Below are rules I wrote for a good mitigation a while ago: “Before you ship a mitigation...

  1. Have a design doc for a mitigation with clear claims of what it intends to achieve. This should ideally be something like "make it impossible to achieve reliable exploitation of bugs like CVE1, CVE2, CVE3", or similar; claims like "make it harder" are difficult to quantify. If you can't avoid such statements, quantify them: "Make sure that development of an exploit for bugs like CVE4, CVE5 takes more than N months".
  2. Pick a few historical bugs (ideally from the design doc) and involve someone with solid vuln-dev experience; give him a 4-8 full engineering weeks to try to bypass the mitigation when exploiting historical bugs. See if the mitigation holds, and to what extent. Most of the time, the result will be that the mitigation did not live up to the promise. This is good news: You have avoided imposing a tax on everybody (in complexity, usability, performance) that provides no or insufficient benefit.
  3. When writing the code for the mitigation, *especially* when it touches kernel components, have a very stringent code review with review history. The reviewer should question any unwarranted complexity and ask for clarification.
    Follow good coding principles - avoid functions with hidden side effects that are not visible from the name etc. - the stringency of the code review should at least match the stringency of a picky C++ readability reviewer, if not exceed it.”

There are also these three slides to remember:
In short: Make it easy for people to understand the design rationale behind the mitigation. Make sure this design rationale is both accessible, and easily debated / discussed. Be precise and quantifiable in your claims so that people can challenge the mitigation on it's professed merits.

Saturday, August 17, 2019

Rashomon of disclosure

In a world of changing technology, there are few constants - but if there is one constant in security, it is the rhythmic flare-up of discussions about disclosure on the social-media-du-jour (mailing lists in the past, now mostly Twitter and Facebook).

Many people in the industry have wrestled with, and contributed to, the discussions, norms, and modes of operation - I would particularly like to highlight contributions by Katie Moussouris and Art Manion, but there are many that would deserve mentioning whose true impact is unknown outside a small circle. In all discussions of disclosure, it is important to keep in mind that many smart people have struggled with the problem. There may not be easy answers.

In this blog post, I would like to highlight a few aspects of the discussion that are important to me personally - aspects which influenced my thinking, and which are underappreciated in my view.

I have been on many (but not most) sides of the table during my career:
  • On the side of the independent bug-finder who reports to a vendor and who is subsequently threatened.
  • On the side of the independent bug-finder that decided reporting is not worth my time.
  • On the side of building and selling a security appliance that handles malicious input and that needs to be built in a way that we do not add net exposure to our clients.
  • On the side of building and selling a software that clients install.
  • On the side of Google Project Zero, which tries to influence the industry to improve its practices and rectify some of the bad incentives.
The sides of the table that are notably missing here are the role of the middle- or senior-level manager that makes his living shipping software on a tight deadline and who is in competition for features, and the role of the security researcher directly selling bugs to governments. I will return to this in the last section.

I expect almost every reader will find something to vehemently disagree with. This is expected, and to some extent, the point of this blog post.

The simplistic view of reporting vulnerabilities to vendors

I will quickly describe the simplistic view of vulnerability reporting / patching. It is commonly brought up in discussions, especially by folks that have not wrestled with the topic for long. The gist of the argument is:
  1. Prior to publishing a vulnerability, the vulnerability is unknown except to the finder and the software vendor.
  2. Very few, if any, people are at risk while we are in this state.
  3. Publishing about the information prior to the vendor publishing a patch puts many people at risk (because they can now be hacked). This should hence not happen.
Variants of this argument are used to claim that no researcher should publish vulnerability information before patches are available, or that no researcher should publish information until patches are applied, or that no researcher should publish information that is helpful for building exploits.

This argument, at first glance, is simple, plausible, and wrong. In the following, I will explain the various ways in which this view is flawed.

The Zardoz experience

For those that joined Cybersecurity in recent years: Zardoz was a mailing list on which "whitehats" discussed and shared security vulnerabilities with each other so they could be fixed without the "public" knowing about them.

The result of this activity was: Every hacker and active intelligence shop at the time wanted to have access to this mailing list (because it would regularly contain important new attacks). They generally succeeded. Quote from the Wikipedia entry on Zardoz:
On the other hand, the circulation of Zardoz postings among computer hackers was an open secret, mocked openly in a famous Phrack parody of an IRC channel populated by notable security experts.[3]
History shows, again and again, that small groups of people that share vulnerability information ahead of time always have at least one member compromised; there are always attackers that read the communication.

It is reasonably safe to assume that the same holds for the email addresses to which security vulnerabilities are reported. These are high-value targets, and getting access to them (even if it means physical tampering or HUMINT) is so useful that well-funded persistent adversaries need to  be assumed to have access to them. It is their job, after all.

(Zardoz isn't unique. Other examples are unfortunately less-well documented. Mail spools of internal mailing lists of various CERTs were circulated in hobbyist hacker circles in the early 2000s, and it is safe to assume that any dedicated intelligence agency today can reproduce that level of access.)

The fallacy of uniform risk

Risk is not uniformly distributed throughout society. Some people are more at-risk than others: Dissidents in oppressive countries, holders of large quantities of cryptocurrency, people who think their work is journalism when the US government thinks their work is espionage, political stakeholders and negotiators. Some of them face quite severe consequences from getting hacked, ranging from mild discomfort to death.

The majority of users in the world are much less at risk: The worst-case scenario for them, in terms of getting hacked, is inconvenience and a moderate amount of financial loss.

This means that the naive "counting" of victims in the original argument makes a false assumption:  Everybody has the same "things to lose" by getting hacked. This is not the case: Some people have their life and liberty at risk, but most people don't. For those that do not, it may actually be rational behavior to not update their devices immediately, or to generally not care much about security - why take precautions against an event that you think is either unlikely or largely not damaging to you?

For those at risk, though, it is often rational to be paranoid - to avoid using technology entirely for a while, to keep things patched, and to invest time and resources into keeping their things secure.

Any discussion of the pros and cons of disclosure should take into account that risk profiles vary drastically. Taking this argument to the extreme, the question arises: "Is it OK to put 100m people at risk of inconvenience if I can reduce the risk of death for 5 people?"

I do not have an answer for this sort of calculation, and given the uncertainty of all probabilities and data points in this, I am unsure whether one exists.

Forgetting about patch diffing

One of the lessons that our industry sometimes (and to my surprise) forgets is: Public availability of a patch is, from the attacker perspective, not much different than a detailed analysis of the vulnerability including a vulnerability trigger.

There used to be a cottage industry of folks that analyze patches and write reports on what the fixed bugs are, whether they were fixed correctly, and how to trigger them. They usually operated away from the spotlight, but that does not mean they do not exist - many were our customers.

People in the offensive business can build infrastructure that helps them rapidly analyze patches and get the information they need out of them. Defenders, mostly due to organizational and not technical reasons, can not do this. This means that in the absence of a full discussion of the vulnerability, defenders will be at a significant information disadvantage compared to attackers.

Without understanding the details of the vulnerability, networks and hosts cannot be monitored for its exploitation, and mitigations-other-than-patching cannot be applied.

Professional attackers, on the other hand, will have all the information about a vulnerability not long after they obtain a patch (if they did not have it beforehand already).

The fallacy of "do not publish triggers"

When publishing about a vulnerability, should "triggers", small pieces of data that hit the vulnerability and crash the program, be published?

Yes, building the first trigger is often time-consuming for an attacker. Why would we save them the time?

Well, because without a public trigger for a vulnerability, at least, it is extremely hard for defensive staff to determine whether a particular product in use may contain the bug in question. A prime example of this CVE-2012-6706: Everybody assumed that the vulnerability is only present on Sophos; no public PoC was provided. So nobody realized that the bug lived in upstream Unrar, and it wasn't until 2017 that it got re-discovered and fixed. 5 years of extra time for a bug because no trigger was published.

If you have an Antivirus Gateway running somewhere, or any piece of legacy software, you need at least a trigger to check whether the product includes the vulnerable software. If you are attempting at building any form of custom detection for an attack, you also need the trigger.

The fallacy of "do not publish exploits"

Now, should exploits be published? Clearly the answer should be no?

In my experience, even large organizations with mature security teams and programs often struggle to understand the changing nature of attacks. Many people that are now in management positions cut their teeth on (from today's perspective) relatively simple bugs, and have not fully understood or appreciated how exploitation has changed.

In general, defenders are almost always at an information disadvantage: Attackers will not tell them what they do, and gleefully applaud and encourage when the defender gets a wrong idea in his head about what to do. Read the declassified cryptolog_126.pdf Eurocrypt trip report to get a good impression of how this works.
Three of the last four sessions were of no value whatever, and indeed there was almost nothing at Eurocrypt to interest us (this is good news!). The scholarship was actually extremely good; it's just that the directions which external cryptologic researchers have taken are remarkably far from our own lines of interest. 
Defense has many resources, but many of them are misapplied: Mitigations performed that do not hold up to an attacker slightly changing strategies, products bought that do not change an attacker calculus or the exploit economics, etc.

A nontrivial part of this misapplication is information scarcity about real exploits. My personal view is that Project Zero's exploit write-ups, and the many great write-ups by the Pwn2Own competitors and other security research teams (Pangu and other Chinese teams come to mind) about the actual internal mechanisms of their exploits is invaluable to transmit understanding of actual attacks to defenders, and are necessary to help the industry stay on course.

Real exploits can be studied, understood, and potentially used by skilled defenders for both mitigation and detection, and to test other defensive measures.

The reality of software shipping and prioritization

Companies that sell software make their money by shipping new features. Managers in these organizations get promoted for shipping said features and reaching more customers. If they succeed in doing so, their career prospects are bright, and by the time the security flaws in the newly-shipped features become evident, they are four steps in the career ladder and two companies away from the risk they created.

The true cost of attack surface is not properly accounted for in modern software development (even if you have an SDLC); largely because this cost is shouldered by the customers that run the software - and even then, only by a select few that have unusual risk profiles.

A sober look at current incentive structures in software development shows that there is next to zero incentive for a team that ships a product to invest in security on a 4-5 year horizon. Everybody perceives themselves to be in breakneck competition, and velocity is prioritized. This includes bug reports: The entire reason for the 90-day deadline that Project Zero enforced was the fact that without a hard deadline, software vendors would routinely not prioritize fixing an obvious defect, because ... why would you distract yourself with doing it if you could be shipping features instead?

The only disincentive to adding new attack surface these days is getting heckled on a blog post or in a Blackhat talk. Has any manager in the software industry ever had their career damaged by shipping particularly broken software and incurring risks for their users? I know of precisely zero examples. If you know of one, please reach out, I would be extremely interested to learn more.

The tech industry as risk-taker on behalf of others

(I will use Microsoft as an example in the following paragraphs, but you can replace it with Apple or Google/Android with only minor changes. The tech giants are quite similar in this.)

Microsoft has made 248bn$+ in profits since 2005. In no year did they make less than 1bn$ in profits per month. Profits in the decade leading up to 2005 were lower, and I could not find numbers, but even in 2000 MS was raking in more than a billion in profits a quarter. And part of these profits were made by incurring risks on behalf of their customers - by making decisions to not properly sandbox the SMB components, by under-staffing security, by not deprecating and migrating customers away from insecure protocols.
The software product industry (including mobile phone makers) has reaped excess profits for decades by selling risky products and offloading the risk onto their clients and society. My analogy is that they constructed financial products that yield a certain amount of excess return but blow up disastrously under certain geopolitical events, then sold some of the excess return and *all* of the risk to a third party that is not informed of the risk.

Any industry that can make profits while offloading the risks will incur excess risks, and regulation is required to make sure that those that make the profits also carry the risks. Due to historical accidents (the fact that software falls under copyright) and unwillingness to regulate the golden goose, we have allowed 30 years of societal-risk-buildup, largely driven by excess profits in the software and tech industry.

Now that MS (and the rest of the tech industry) has sold the rest of society a bunch of toxic paper that blows up in case of some geopolitical tail events (like the resurgence of great-power competition), they really do not wish to take the blame for it - after all, there may be regulation in the future, and they may have to actually shoulder some of the risks they are incurring.

What is the right solution to such a conundrum? Lobbying, and a concerted PR effort to deflect the blame. Security researchers, 0-day vendors, and people that happen to sell tools that could be useful to 0-day vendors are much more convenient targets than admitting: All this risk that is surfaced by security research and 0-day vendors is originally created for excess profit by the tech industry.

FWIW, it is rational for them to do so, but I disagree that we should let them do it :-). 

A right to know

My personal view on disclosure is influenced by the view that consumers have a right to get all available information about the known risks of the products they use. If an internal Tobacco industry study showed that smoking may cause cancer, that should have been public from day 1, including all data.

Likewise, consumers of software products should have access to all known information about the security of their product, all the time. My personal view is that the 90 days deadlines that are accepted these days are an attempt at balancing competing interests (availability of patches vs. telling users about the insecurity of their device).

Delaying much further or withholding data from the customer is - in my personal opinion - a form of deceit; my personal opinion is that the tech industry should be much more aggressive in warning users that under current engineering practices, their personal data is never fully safe in any consumer-level device. Individual bug chains may cost a million dollars now, but that million dollars is amortized over a large number of targets, so the cost-per-individual compromise is reasonably low.

I admit that my view (giving users all the information so that they can (at least in theory) make good decisions using all available information) is a philosophical one: I believe that withholding available information that may alter someone's decision is a form of deceit, and that consent (even in business relationships) requires transparency with each other. Other people may have different philosophies.

Rashomon, or how opinions are driven by career incentives

The movie Rashomon that gave this blog post the title is a beautiful black-and-white movie from 1950, directed by the famous Akira Kurosawa. From the Wikipedia page:
The film is known for a plot device that involves various characters providing subjective, alternative, self-serving, and contradictory versions of the same incident.
If you haven't seen it, I greatly recommend watching it.

The reason why I gave this blog post the title "Rashomon of disclosure" is to emphasize the complexity of the situation. There are many facets, and my views are strongly influenced by the sides of the table I have been on - and those I haven't been on.

Everybody participating in the discussion has some underlying interests and/or philosophical views that influence their argument.

Software vendors do not want to face up to generating excess profits by offloading risk to society. 0-day vendors do not want to face up to the fact that a fraction of their clients kills people (sometimes entirely innocent ones), or at least break laws in some jurisdiction. Security researchers want to have the right to publish their research, even if they fail to significantly impact the broken economics of security.

Everybody wants to be the hero of their own story, and in their own account of the state of the world, they are.

None of the questions surrounding vulnerability disclosure, vulnerability discovery, and the trade-offs involved in it are easy. People that claim there is an easy and obvious path to go about security vulnerability disclosure have either not thought about it very hard, or have sufficiently strong incentives to self-delude that there is one true way.

After 20+ years of seeing this debate go to and fro, my request to everybody is: When you explain to the world why you are the hero of your story, take a moment to reflect on alternative narratives, and make an effort to recognize that the story is probably not that simple.

Tuesday, October 02, 2018

Turing completeness, weird machines, Twitter, and muddled terminology

First off, an apology to the reader: I normally spend a bit of effort to make my blog posts readable / polished, but I am under quite a few time constraints at the moment, so the following will be held to lesser standards of writing than usual.

A discussion arose on Twitter after I tweeted that the use of the term "Turing-complete" in academic exploit papers is wrong. During that discussion, it emerged that there are more misunderstandings of terms that play into this. Correcting these things on Twitter will not work (how I long for the days of useful mailing lists), so I ended up writing a short text. Pastebin is not great for archiving posts either, so for lack of a better place to put it, here it comes:

Our misuse of "Turing completeness" and "weird machine" is harmful and confusing

1. "Turing completeness"

TC refers to computability (in terms of simulating other computers), and is well-defined there. It means "can simulate any Turing machine". There are a few things to keep in mind about TC:

  • None of the machines we use day-to-day are TC in the absence of I/O. TC requires infinite memory.
  • TC arises really quickly; all you need is one instruction that subtracts and performs a conditional branch if not zero instruction. It also arises in quite minimal recurrence equations randomly (see the fact that game of life is Turing-complete).

For the exploitation case, the desirable outcome is "we got integer arithmetic and arbitrary read and write". Turing-completeness says nothing about exploitability as exploitability has little to do with computation and much more to do with "what sort of states are reachable given the possible interactions with my target". These are two very distinct questions.

Font renderers are Turing-complete, Javascript is, and considering that if I/O is present, part of the computation may actually be performed on the attacker side, IMAP is as well.

It is simply a complete mis-use of terms. Even the paper that first used it just used it to say "we eyeballed the instructions we got, and they look like we can probably do everything we'd want to do":
The set of gadgets we describe is Turing complete by inspection, so return-oriented programs can do anything possible with x86 code.
Let's not mis-use a precisely defined term for something else just because saying "TC" makes us feel like we are doing real computer science.

Exploitation is about reachable states (and transitions) much more than about computability.

2. "Weird machines"


Weird machines are one of the most tragically misunderstood abstractions (which, if people understand it properly, helps greatly in reasoning about exploitation).

The point of weird machine research is *not* about showing that everything is Turing complete. The point of weird machine research is that when any finite state automaton is simulated, and when that simulation gets corrupted, a new machine emerges, with it's own instruction set. It is this instruction set that gets programmed in attacks. Constraining the state transitions (and hence the reachable states) of a weird machine is what makes exploitation impossible. The computational power (in the TC sense) is secondary.

The key insight that would have saved us all from the dreary experience of reading 25 tit-for-tat-ROP-and-bad-mitigation papers is that if you do not constrain what the weird machines that emerge on a memory corruption can do, your mitigation is probably not going to help. Most mitigations blacklist a particular weird machine program, without constraining the machine's capabilities.

Weird machines are the proper framework in which to reason about pretty much all exploits that are not side-channel based or missing-auth-based.

Now, unfortunately, the abstraction was well-understood intuitively by a few people that had done a lot of exploitation, but not made precise or put in writing. As a consequence, other researchers heard the term, and "filled it with their imagination": They then used the term wherever they found a surprising method of computation in random things by using them as designed. This made the term even harder to understand - in the same way that nobody will be able to understand "Turing complete" by just reading ROP papers.

I am particularly passionate about this part because I spent a few months putting the informally-defined weird machine onto sound footing and into precise definitions to writing to prevent the term from getting even more muddled :-)

To summarize:


  • The use of "Turing completeness" in ROP papers is an abuse; the use of that term does not correspond to the use of the term in theoretical CS. If we are going to use rigorous terms, we should make sure we use them correctly. I see students and researchers be confused about the actual meaning all the time now.
  • The ROP-tit-for-tat-mitigation paper deluge (and the subsequent unhealthy focus on CFI) could have been avoided if people had reasoned about ROP things at the right abstraction levels (weird machines). They would have had the chance, because there were quite a few conversations between the authors of the tit-for-tat-papers and people that understood things properly :-), but my suspicion is that there was no incentive to reason on an abstraction level that messes with your ability to salami-slice more papers.
  • It was made harder to understand the concept of weird machines because people that had not properly understood weird machines started calling everything that looked vaguely interesting "weird machines". Nobody was sitting on any review boards to stop them :-), so now that term (which also has a precise definition) is used in a wrong manner, too.

My larger point is: We have so much handwavy and muddled terminology in the papers in this field, it is harmful to young researchers, other fields, and PhD students. The terminology is confusing, often ill-defined (what is a gadget?), terms that have a precise meaning in general CS get used to mean something completely else without anybody explaining it (Turing complete).

This creates unnecessary and harmful confusion, and it should be fixed.

Saturday, March 31, 2018

A bank statement for app activity (and thus personal data)

During my long sabbatical in 2015-2016 I had plenty of time to think about random things and come up with strange ideas. Most of these ideas are more funny than practical - their primary use is boring people that are reckless enough to have drinks with me.

This blog post describes one of these ideas. With the recent renewed interest in privacy and overreach of smart phone apps, it seems like a topic that is - at least temporarily - less boring than usual.

ML, software behavior, and the boundary between 'malicious' and 'non-malicious'

I have seen a lot of human brain power (and a vast amount of computational power) thrown at the problem of automatically deciding whether a given piece of software is good or bad. 

This is usually done as follows:
  1. Collect a lot of information about the behavior of software (normally by running the software in some simulated environment)
  2. Extract features from this information
  3. Apply some more-or-less sophisticated machine learning model to decide between "good" or "bad"
The underlying idea behind this is that there is "bad" behavior, and "good" behavior, and if we could somehow build a machine learning model that is sufficiently powerful, we could automatically decide whether a given piece of software is good or bad.

In practice, this rarely works without significant false-positive problems, or significant false-negative-problems, or all sorts of complicated corner-cases where the system fails.

In 2015, I had to deal with the fallout of the badly-phrased Wassenaar wording: Export-control legislation which tried to define "bad behavior" for software. During this, it became clear to me that the idea that behavior alone determines good/bad is flawed.

The behavior of a piece of software does not determine whether it is malicious or not. The true defining line between malicious and non-malicious software is whether software does what the user expects it to do

Users run software because they have an expectation for what this software does. They grant permissions for software because they have an expectation for the software to do something for them ("I want to make phone calls, so clearly the app should use the microphone"). This permission is given conditionally, with context -- the user does not want to give the app permission to switch on the microphone when the user does not intend to make a phone call.

The question of malicious / non-malicious software is a question of alignment between user expectations and software behavior.

This means, in practice, that efforts in applying machine learning to separate malicious from non-malicious software are doomed to fail, because they fail to measure the one dimension through which the boundary between good and bad runs.

Intuitively, this can be illustrated with the two pictures below. They show the same set of red and green points in 3d-space from two different perspectives -- once with their z-axis projected away, and once in a 3-d plot where the z-axis is still visible:

Cloud of points from the side, with the "important" dimension projected away. It is near-impossible to draw a sane boundary between red and green points, and whatever boundary you draw won't generalize well.

Same cloud of points, with the "important" dimension going from left to right. It is much clearer how to separate green from red points now.
The question that arises naturally, then, is:

How can one measure the missing dimension (user intent)?

User intent is a difficult thing to measure. The software industry has the practice of forcing the user to agree to some ridiculously wide-reaching terms-of-services or EULA that few users read, even fewer understand, and which are often near-equivalent to giving the person you hire to clean your flat a power of attorney over all your documents, and allowing them to throw parties in your flat while you are not looking.

It is commonly argued that - because the user clicked "agree" to an extremely broad agreement - the user consented to everything the software can possibly do.

But consent to software actions is context-dependent and conditioned on particular, specific actions. It is fine for my messenger to request access to my camera, microphone and files - I may need to send a picture, I may need to make a phone call, and I may need to send an attachment. It is not OK for my messenger to use my microphone to see if a particular ultrasonic tracker sound is received, it is not OK for my messenger to randomly search through files etc.

Users do not get to tell the software vendor their intent and the context for which they are providing consent.

Now, given that user intent is difficult to measure up-front - how about we simply ask the user whether something that an app / software did was what he expected it to do?

Information and attention is a currency - but one with bad accounting

The modern ad economy runs on attention and private data. The big advertising platforms make their money by selling the combination of user attention and the ability to micro-target advertisements given contextual data about a user. The user "pays" for goods and services by providing attention and private data.

People often fear that big platforms will "sell their data". This is, at least for the smarter / more profitable platforms, an unnecessary fear: These platforms make their money by having data that others do not have, and which allows better micro-targeting. They do not make their money "selling data", they make money "monetizing the data they have".

The way to think about the relationship between the user and the platform is more of a clicheed "musician-agent" relationship: The musician produces something, but does not know how to monetize it. His Agent knows how to monetize it, and strikes a deal with the musician: You give me exclusive use of your product, and I will monetize it for you - and take a cut from the proceeds.

The profits accumulated by the big platforms are the difference between what the combination of attention & private data obtained from users is worth and the cost of obtaining this attention and data.

For payments in "normal" currency, users usually have pretty good accounting: They know what is in their wallet, and (to the extent that they use electronic means for payments) they get pretty detailed transaction statements. It is not difficult for a normal household to reconstruct from their bank statements relatively precisely how much they paid for what goods in a given month.

This transparency creates trust: We do not hesitate much to give our credit card numbers to online service providers, because we know that we can intervene if they charge our credit cards without reason and in excess of what we agreed to.

Private information, on the other hand, is not accounted for. Users have no way to see how much private data they provide, and whether they are actually OK with that.

A bank statement for app/software activity

How could one empower users to account for their private data, while at the same time helping platform providers identify malicious software better?

By providing users with the equivalent of a bank statement for app/software activity. The way I imagine it would be roughly as follows:

A separate component of my mobile phone (or computer) OS keeps detailed track of app activity: What peripherals are accessed at what times, what files are accessed, etc.

Users are given the option of checking the activity on their device through a UI that makes these details understandable and accessible:
  • App XYZ accessed your microphone in the last week at the following times, showing you the following screen:
    • Timestamp 1, screenshot 1
    • Timestamp 2, screenshot 2
  • Does this match your expectations of what the app should do? YES / NO
  • App ABC accessed the following files during the last week at the following times, showing you the following screen:
    • Timestamp 3, screenshot 3
      • Filename
      • Filename
      • filename
  • Does this match your expectations of what the app should do? YES / NO
At least on modern mobile platforms, most of the above data is already available - modern permissions systems can keep relatively detailed logs of "when what was accessed". Adding the ability to save screenshots alongside is easy.

Yes, a lot of work has to go into a thoughtful UI, but it seems worth the trouble: Even if most users will randomly click on YES / NO, the few thousand users that actually care will provide platform providers valuable information about whether an app is overreaching or not. At the same time, more paranoid users (like me) would feel less fearful about installing useful apps: If I see the app doing something in excess of what I would like it to do, I could remove it.

Right now, users have extremely limited transparency into what apps are actually doing. While the situation is improving slowly (most platforms allow me to check which app last used my GPS), it is still way too opaque for comfort, and overreach / abuse is likely pervasive.

Changing this does not seem hard, if any of the big platform providers could muster the will.

It seems like a win / win situation, so I can hope. I can also promise that I will buy the first phone to offer this in a credible way :-).

PS: There are many more side-benefits to the above model - for example making it more difficult to hack a trusted app developer to then silently exfiltrate data from users that trust said developer - but I won't bore you with those details now.