Tuesday, October 02, 2018

Turing completeness, weird machines, Twitter, and muddled terminology

First off, an apology to the reader: I normally spend a bit of effort to make my blog posts readable / polished, but I am under quite a few time constraints at the moment, so the following will be held to lesser standards of writing than usual.

A discussion arose on Twitter after I tweeted that the use of the term "Turing-complete" in academic exploit papers is wrong. During that discussion, it emerged that there are more misunderstandings of terms that play into this. Correcting these things on Twitter will not work (how I long for the days of useful mailing lists), so I ended up writing a short text. Pastebin is not great for archiving posts either, so for lack of a better place to put it, here it comes:

Our misuse of "Turing completeness" and "weird machine" is harmful and confusing

1. "Turing completeness"

TC refers to computability (in terms of simulating other computers), and is well-defined there. It means "can simulate any Turing machine". There are a few things to keep in mind about TC:

  • None of the machines we use day-to-day are TC in the absence of I/O. TC requires infinite memory.
  • TC arises really quickly; all you need is one instruction that subtracts and performs a conditional branch if not zero instruction. It also arises in quite minimal recurrence equations randomly (see the fact that game of life is Turing-complete).

For the exploitation case, the desirable outcome is "we got integer arithmetic and arbitrary read and write". Turing-completeness says nothing about exploitability as exploitability has little to do with computation and much more to do with "what sort of states are reachable given the possible interactions with my target". These are two very distinct questions.

Font renderers are Turing-complete, Javascript is, and considering that if I/O is present, part of the computation may actually be performed on the attacker side, IMAP is as well.

It is simply a complete mis-use of terms. Even the paper that first used it just used it to say "we eyeballed the instructions we got, and they look like we can probably do everything we'd want to do":
The set of gadgets we describe is Turing complete by inspection, so return-oriented programs can do anything possible with x86 code.
Let's not mis-use a precisely defined term for something else just because saying "TC" makes us feel like we are doing real computer science.

Exploitation is about reachable states (and transitions) much more than about computability.

2. "Weird machines"


Weird machines are one of the most tragically misunderstood abstractions (which, if people understand it properly, helps greatly in reasoning about exploitation).

The point of weird machine research is *not* about showing that everything is Turing complete. The point of weird machine research is that when any finite state automaton is simulated, and when that simulation gets corrupted, a new machine emerges, with it's own instruction set. It is this instruction set that gets programmed in attacks. Constraining the state transitions (and hence the reachable states) of a weird machine is what makes exploitation impossible. The computational power (in the TC sense) is secondary.

The key insight that would have saved us all from the dreary experience of reading 25 tit-for-tat-ROP-and-bad-mitigation papers is that if you do not constrain what the weird machines that emerge on a memory corruption can do, your mitigation is probably not going to help. Most mitigations blacklist a particular weird machine program, without constraining the machine's capabilities.

Weird machines are the proper framework in which to reason about pretty much all exploits that are not side-channel based or missing-auth-based.

Now, unfortunately, the abstraction was well-understood intuitively by a few people that had done a lot of exploitation, but not made precise or put in writing. As a consequence, other researchers heard the term, and "filled it with their imagination": They then used the term wherever they found a surprising method of computation in random things by using them as designed. This made the term even harder to understand - in the same way that nobody will be able to understand "Turing complete" by just reading ROP papers.

I am particularly passionate about this part because I spent a few months putting the informally-defined weird machine onto sound footing and into precise definitions to writing to prevent the term from getting even more muddled :-)

To summarize:


  • The use of "Turing completeness" in ROP papers is an abuse; the use of that term does not correspond to the use of the term in theoretical CS. If we are going to use rigorous terms, we should make sure we use them correctly. I see students and researchers be confused about the actual meaning all the time now.
  • The ROP-tit-for-tat-mitigation paper deluge (and the subsequent unhealthy focus on CFI) could have been avoided if people had reasoned about ROP things at the right abstraction levels (weird machines). They would have had the chance, because there were quite a few conversations between the authors of the tit-for-tat-papers and people that understood things properly :-), but my suspicion is that there was no incentive to reason on an abstraction level that messes with your ability to salami-slice more papers.
  • It was made harder to understand the concept of weird machines because people that had not properly understood weird machines started calling everything that looked vaguely interesting "weird machines". Nobody was sitting on any review boards to stop them :-), so now that term (which also has a precise definition) is used in a wrong manner, too.

My larger point is: We have so much handwavy and muddled terminology in the papers in this field, it is harmful to young researchers, other fields, and PhD students. The terminology is confusing, often ill-defined (what is a gadget?), terms that have a precise meaning in general CS get used to mean something completely else without anybody explaining it (Turing complete).

This creates unnecessary and harmful confusion, and it should be fixed.

Saturday, March 31, 2018

A bank statement for app activity (and thus personal data)

During my long sabbatical in 2015-2016 I had plenty of time to think about random things and come up with strange ideas. Most of these ideas are more funny than practical - their primary use is boring people that are reckless enough to have drinks with me.

This blog post describes one of these ideas. With the recent renewed interest in privacy and overreach of smart phone apps, it seems like a topic that is - at least temporarily - less boring than usual.

ML, software behavior, and the boundary between 'malicious' and 'non-malicious'

I have seen a lot of human brain power (and a vast amount of computational power) thrown at the problem of automatically deciding whether a given piece of software is good or bad. 

This is usually done as follows:
  1. Collect a lot of information about the behavior of software (normally by running the software in some simulated environment)
  2. Extract features from this information
  3. Apply some more-or-less sophisticated machine learning model to decide between "good" or "bad"
The underlying idea behind this is that there is "bad" behavior, and "good" behavior, and if we could somehow build a machine learning model that is sufficiently powerful, we could automatically decide whether a given piece of software is good or bad.

In practice, this rarely works without significant false-positive problems, or significant false-negative-problems, or all sorts of complicated corner-cases where the system fails.

In 2015, I had to deal with the fallout of the badly-phrased Wassenaar wording: Export-control legislation which tried to define "bad behavior" for software. During this, it became clear to me that the idea that behavior alone determines good/bad is flawed.

The behavior of a piece of software does not determine whether it is malicious or not. The true defining line between malicious and non-malicious software is whether software does what the user expects it to do

Users run software because they have an expectation for what this software does. They grant permissions for software because they have an expectation for the software to do something for them ("I want to make phone calls, so clearly the app should use the microphone"). This permission is given conditionally, with context -- the user does not want to give the app permission to switch on the microphone when the user does not intend to make a phone call.

The question of malicious / non-malicious software is a question of alignment between user expectations and software behavior.

This means, in practice, that efforts in applying machine learning to separate malicious from non-malicious software are doomed to fail, because they fail to measure the one dimension through which the boundary between good and bad runs.

Intuitively, this can be illustrated with the two pictures below. They show the same set of red and green points in 3d-space from two different perspectives -- once with their z-axis projected away, and once in a 3-d plot where the z-axis is still visible:

Cloud of points from the side, with the "important" dimension projected away. It is near-impossible to draw a sane boundary between red and green points, and whatever boundary you draw won't generalize well.

Same cloud of points, with the "important" dimension going from left to right. It is much clearer how to separate green from red points now.
The question that arises naturally, then, is:

How can one measure the missing dimension (user intent)?

User intent is a difficult thing to measure. The software industry has the practice of forcing the user to agree to some ridiculously wide-reaching terms-of-services or EULA that few users read, even fewer understand, and which are often near-equivalent to giving the person you hire to clean your flat a power of attorney over all your documents, and allowing them to throw parties in your flat while you are not looking.

It is commonly argued that - because the user clicked "agree" to an extremely broad agreement - the user consented to everything the software can possibly do.

But consent to software actions is context-dependent and conditioned on particular, specific actions. It is fine for my messenger to request access to my camera, microphone and files - I may need to send a picture, I may need to make a phone call, and I may need to send an attachment. It is not OK for my messenger to use my microphone to see if a particular ultrasonic tracker sound is received, it is not OK for my messenger to randomly search through files etc.

Users do not get to tell the software vendor their intent and the context for which they are providing consent.

Now, given that user intent is difficult to measure up-front - how about we simply ask the user whether something that an app / software did was what he expected it to do?

Information and attention is a currency - but one with bad accounting

The modern ad economy runs on attention and private data. The big advertising platforms make their money by selling the combination of user attention and the ability to micro-target advertisements given contextual data about a user. The user "pays" for goods and services by providing attention and private data.

People often fear that big platforms will "sell their data". This is, at least for the smarter / more profitable platforms, an unnecessary fear: These platforms make their money by having data that others do not have, and which allows better micro-targeting. They do not make their money "selling data", they make money "monetizing the data they have".

The way to think about the relationship between the user and the platform is more of a clicheed "musician-agent" relationship: The musician produces something, but does not know how to monetize it. His Agent knows how to monetize it, and strikes a deal with the musician: You give me exclusive use of your product, and I will monetize it for you - and take a cut from the proceeds.

The profits accumulated by the big platforms are the difference between what the combination of attention & private data obtained from users is worth and the cost of obtaining this attention and data.

For payments in "normal" currency, users usually have pretty good accounting: They know what is in their wallet, and (to the extent that they use electronic means for payments) they get pretty detailed transaction statements. It is not difficult for a normal household to reconstruct from their bank statements relatively precisely how much they paid for what goods in a given month.

This transparency creates trust: We do not hesitate much to give our credit card numbers to online service providers, because we know that we can intervene if they charge our credit cards without reason and in excess of what we agreed to.

Private information, on the other hand, is not accounted for. Users have no way to see how much private data they provide, and whether they are actually OK with that.

A bank statement for app/software activity

How could one empower users to account for their private data, while at the same time helping platform providers identify malicious software better?

By providing users with the equivalent of a bank statement for app/software activity. The way I imagine it would be roughly as follows:

A separate component of my mobile phone (or computer) OS keeps detailed track of app activity: What peripherals are accessed at what times, what files are accessed, etc.

Users are given the option of checking the activity on their device through a UI that makes these details understandable and accessible:
  • App XYZ accessed your microphone in the last week at the following times, showing you the following screen:
    • Timestamp 1, screenshot 1
    • Timestamp 2, screenshot 2
  • Does this match your expectations of what the app should do? YES / NO
  • App ABC accessed the following files during the last week at the following times, showing you the following screen:
    • Timestamp 3, screenshot 3
      • Filename
      • Filename
      • filename
  • Does this match your expectations of what the app should do? YES / NO
At least on modern mobile platforms, most of the above data is already available - modern permissions systems can keep relatively detailed logs of "when what was accessed". Adding the ability to save screenshots alongside is easy.

Yes, a lot of work has to go into a thoughtful UI, but it seems worth the trouble: Even if most users will randomly click on YES / NO, the few thousand users that actually care will provide platform providers valuable information about whether an app is overreaching or not. At the same time, more paranoid users (like me) would feel less fearful about installing useful apps: If I see the app doing something in excess of what I would like it to do, I could remove it.

Right now, users have extremely limited transparency into what apps are actually doing. While the situation is improving slowly (most platforms allow me to check which app last used my GPS), it is still way too opaque for comfort, and overreach / abuse is likely pervasive.

Changing this does not seem hard, if any of the big platform providers could muster the will.

It seems like a win / win situation, so I can hope. I can also promise that I will buy the first phone to offer this in a credible way :-).

PS: There are many more side-benefits to the above model - for example making it more difficult to hack a trusted app developer to then silently exfiltrate data from users that trust said developer - but I won't bore you with those details now.

Wednesday, February 21, 2018

Two small notes on the "malicious use of AI" report

After a long hiatus on this blog, a new post! Well, not really - but a whitepaper was published today titled "The Malicious Use of Artificial Intelligence", and I decided I should cut/paste/publish two notes that apply to the paper from an email I wrote a while ago.

Perhaps they are useful to someone:
1) On the ill-definedness of AI: AI is a diffuse and ill-defined term. Pretty much *anything* where a parameter is inferred from data is called "AI" today. Yes, clothing sizes are determined by "AI", because mean measurements are inferred from real data.

To test whether one has fallen into the trap as viewing AI as something structurally different from other mathematics or computer science (it is not!), one should try to battle-test documents about AI policy, and check them for proportionality, by doing the following:

Take the existing test and search/replace every occurrence of the word "AI" or "artificial intelligence" with "Mathematics", and every occurrence of the word "machine learning" with "statistics". Re-read the text and see whether you would still agree.

2) "All science is always dual-use":

Everybody that works at the intersection of science & policy should read Hardy's "A mathematicians apology". https://www.math.ualberta.ca/mss/misc/A%20Mathematician%27s%20Apology.pdf

I am not sure how many of the contributors have done so, but it is a fascinating read - he contemplates among other things the effect that mathematics had on warfare, and to what extent science can be conducted if one has to assume it will be used for nefarious purposes.

My favorite section is the following:
We have still one more question to consider. We have concluded that the trivial mathematics is, on the whole, useful, and that the real mathematics, on the whole, is not; that the trivial mathematics does, and the real mathematics does not, ‘do good’ in a certain sense; but we have still to ask whether either sort of mathematics does harm. It would be paradoxical to suggest that mathematics of any sort does much harm in time of peace, so that we are driven to the consideration of the effects of mathematics on war. It is every difficult to argue such questions at all dispassionately now, and I should have preferred to avoid them; but some sort of discussion seems inevitable. Fortunately, it need not be a long one.
There is one comforting conclusions which is easy for a real mathematician. Real mathematics has no effects on war.
No one has yet discovered any warlike purpose to be served by the theory of numbers or relativity, and it seems very unlikely that anyone will do so for many years. It is true that there are branches of applied mathematics, such as ballistics and aerodynamics, which have been developed deliberately for war and demand a quite elaborate technique: it is perhaps hard to call them ‘trivial’, but none of them has any claim to rank as ‘real’. They are indeed repulsively ugly and intolerably dull; even Littlewood could not make ballistics respectable, and if he could not who can? So a real mathematician has his conscience clear; there is nothing to be set against any value his work may have; mathematics is, as I said at Oxford, a ‘harmless and innocent’ occupation. The trivial mathematics, on the other hand, has many applications in war.
The gunnery experts and aeroplane designers, for example, could not do their work without it. And the general effect of these applications is plain: mathematics facilitates (if not so obviously as physics or chemistry) modern, scientific, ‘total’ war.

The most fascinating bit about the above is how fantastically presciently wrong Hardy was when speaking about the lack of war-like applications for number theory or relativity - RSA and nuclear weapons respectively. In a similar vein - I was in a relationship in the past with a woman who was a social anthropologist, and who often mocked my field of expertise for being close to the military funding agencies (this was in the early 2000s). The first thing that SecDef Gates did when he took his position was hire a bunch of social anthropologists to help DoD unravel the tribal structure in Iraq.

The point of this disgression is: It is impossible for any scientist to imagine future uses and abuses of his scientific work. You cannot choose to work on "safe" or "unsafe" science - the only choice you have is between relevant and irrelevant, and the militaries of this world *will* use whatever is relevant and use it to maximize their warfare capabilities.