Meltdown And Spectre

Wayne Rolfe

8 years ago

The Meltdown and Spectre flaws—two related vulnerabilities that enable a wide range of information disclosure from every mainstream processor, with particularly severe flaws for Intel and some ARM chips—were originally revealed privately to chip companies, operating system developers, and cloud computing providers. That private disclosure was scheduled to become public some time next week, enabling these companies to develop (and, in the case of the cloud companies, deploy) suitable patches, workarounds, and mitigations.

With researchers figuring out one of the flaws ahead of that planned reveal, that schedule was abruptly brought forward, and the pair of vulnerabilities was publicly disclosed on Wednesday, prompting a rather disorderly set of responses from the companies involved.

There are three main groups of companies responding to the Meltdown and Spectre pair: processor companies, operating system companies, and cloud providers. Their reactions have been quite varied.

What Meltdown and Spectre do

A brief recap of the problem: modern processors perform speculative execution. To maximize performance, they try to execute instructions even before it is certain that those instructions need to be executed. For example, the processors will guess at which way a branch will be taken and execute instructions on the basis of that guess. If the guess is correct, great; the processor got some work done without having to wait to see if the branch was taken or not. If the guess is wrong, no big deal; the results are discarded and the processor resumes executing the correct side of the branch.

While this speculative execution does not alter program behavior at all, the Spectre and Meltdown research demonstrates that it perturbs the processor’s state in detectable ways. This perturbation can be detected by carefully measuring how long it takes to perform certain operations. Using these timings, it’s possible for one process to infer properties of data belonging to another process—or even the operating system kernel or virtual machine hypervisor.

This information leakage can be used directly; for example, a malicious JavaScript in a browser could steal passwords stored in the browser. It can also be used in tandem with other security flaws to increase their impact. Information leakage tends to undermine protections such as ASLR (address space layout randomization), so these flaws may enable effective exploitation of buffer overflows.

Meltdown, applicable to virtually every Intel chip made for many years, along with certain high-performance ARM designs, is the easier to exploit and enables any user program to read vast tracts of kernel data. The good news, such as it is, is that Meltdown also appears easier to robustly guard against. The flaw depends on the way that operating systems share memory between user programs and the kernel, and the solution—albeit a solution that carries some performance penalty—is to put an end to that sharing.

Spectre, applicable to chips from Intel, AMD, and ARM, and probably every other processor on the market that offers speculative execution, too, is more subtle. It encompasses a trick testing array bounds to read memory within a single process, which can be used to attack the integrity of virtual machines and sandboxes, and cross-process attacks using the processor’s branch predictors (the hardware that guesses which side of a branch is taken and hence controls the speculative execution). Systemic fixes for some aspects of Spectre appear to have been developed, but protecting against the whole range of fixes will require modification (or at least recompilation) of at-risk programs.

Intel

So, on to the responses. Intel is the company most significantly affected by these problems. Spectre hits everyone, but Meltdown only hits Intel and ARM. Moreover, it only hits the highest performance ARM designs. For Intel, virtually every chip made for the last five, ten, and possibly even 20 years is vulnerable to Meltdown.

The company’s initial statement, produced on Wednesday, was a masterpiece of obfuscation. It contains many statements that are technically true—for example, “these exploits do not have the potential to corrupt, modify, or delete data”—but utterly beside the point. Nobody claimed otherwise! The statement doesn’t distinguish between Meltdown—a flaw that Intel’s biggest competitor, AMD, appears to have dodged—and Spectre and, hence, fails to demonstrate the unequal impact on the different companies’ products.

Follow-up material from Intel has been rather better. In particular, this whitepaper describing mitigation techniques and future processor changes to introduce anti-Spectre features appears sensible and accurate.

For the Spectre array bounds problem, Intel recommends inserting a serializing instruction (lfence is Intel’s choice, though there are others) in code between testing array bounds and accessing the array. Serializing instructions prevent speculation: every instruction that appears before the serializing instruction must be completed after the serializing instruction can begin to execute. In this case, it means that the test of the array bounds must have been definitively calculated before the array is ever accessed; no speculative access to the array that assumes that the tests succeed is allowed.

Less clear is where these serializing instructions should be added. Intel says that heuristics can be developed to figure out the best places in a program to include them but warns that they probably shouldn’t be used with every single array bounds test; the loss of speculative execution imposes too high a penalty. One imagines that perhaps array bounds that come from user data should be serialized and others left unaltered. This difficulty underscores the complexity of Spectre.

For the Spectre branch prediction attack, Intel is going to add new capabilities to its processors to alter the behavior of branch prediction. Interestingly, some existing processors that are already in customer systems are going to have these capabilities retrofitted via a microcode update. Future generation processors will also include the capabilities, with Intel promising a lower performance impact. There are three new capabilities in total: one to “restrict” certain kinds of branch prediction, one to prevent one HyperThread from influencing the branch predictor of the other HyperThread on the same core, and one to act as a kind of branch prediction “barrier” that prevents branches before the “barrier” from influencing branches after the barrier.

These new restrictions will need to be supported and used by operating systems; they won’t be available to individual applications. Some systems appear to already have the microcode update; everyone else will have to wait for their system vendors to get their act together.

The ability to add this capability with a microcode update is interesting, and it suggests that the processors already had the ability to restrict or invalidate the branch predictor in some way—it was just never publicly documented or enabled. The capability likely exists for testing purposes.

Intel also suggests a way of representing certain branches in code with “return” instructions. Patches to enable this have already been contributed to the gcc compiler. Return instructions don’t get branch predicted in the same way so aren’t susceptible to the same information leak. However, it appears that they’re not completely immune to branch predictor influence; a microcode update for Broadwell processors or newer is required to make this transformation a robust protection.

This approach would require every vulnerable application, operating system, and hypervisor to be recompiled.

For Meltdown, Intel is recommending the operating system level fix that first sparked interest and intrigue late last year. The company also says that future processors will contain some unspecified mitigation for the problem.

AMD

AMD’s response has a lot less detail. AMD’s chips aren’t believed susceptible to the Meltdown flaw at all. The company also says (vaguely) that it should be less susceptible to the branch prediction attack.

The array bounds problem has, however, been demonstrated on AMD systems, and for that, AMD is suggesting a very different solution from that of Intel: specifically, operating system patches. It’s not clear what these might be—while Intel released awful PR, it also produced a good whitepaper, whereas AMD so far has only offered PR—and the fact that it contradicts both Intel’s (and, as we’ll see later, ARM’s) response is very peculiar.

AMD’s behavior before this all went public was also rather suspect. AMD, like the other important companies in this field, was contacted privately by the researchers, and the intent was to keep all the details private until a coordinated release next week, in a bid to maximize the deployment of patches before revealing the problems. Generally that private contact is made on the condition that any embargo or non-disclosure agreement is honored.

It’s true that AMD didn’t actually reveal the details of the flaw before the embargo was up, but one of the company’s developers came very close. Just after Christmas, an AMD developer contributed a Linux patch that excluded AMD chips from the Meltdown mitigation. In the note with that patch, the developer wrote, “The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.”

It was this specific information—that the flaw involved speculative attempts to access kernel data from user programs—that arguably led to researchers figuring out what the problem was. The message narrowed the search considerably, outlining the precise conditions required to trigger the flaw.

For a company operating under an embargo, with many different players attempting to synchronize and coordinate their updates, patches, whitepapers, and other information, this was a deeply unhelpful act. While there are certainly those in the security community that oppose this kind of information embargo and prefer to reveal any and all information at the earliest opportunity, given the rest of the industry’s approach to these flaws, AMD’s action seems, at the least, reckless.

ARM

Enlarge / The inside of the ExoKey, with its Atmel ARM-based CPU.

ARM’s response was the gold standard. Lots of technical detail in a whitepaper, but ARM chose to let that stand alone, without the misleading PR of Intel or the vague imprecision of AMD.

For the array bounds attack, ARM is introducing a new instruction that provides a speculation barrier; similar to Intel’s serializing instructions, the new ARM instruction should be inserted between the test of array bounds and the array access itself. ARM even provides sample code to show this.

ARM doesn’t have a generic approach for solving the branch prediction attack, and, unlike Intel, it doesn’t appear to be developing any immediate solution. However, the company notes that many of its chips already have systems in place for invalidating or temporarily disabling the branch predictor and that operating systems should use that.

ARM’s very latest high-performance design, the Cortex A-75, is also vulnerable to Meltdown attacks. The solution proposed is the same as Intel suggests and the same that Linux, Windows, and macOS are known to have implemented: change the memory mapping so that kernel memory mappings are no longer shared with user processes. ARM engineers have contributed patches to Linux to implement this for ARM chips.

Apple

Apple’s response took a little longer than others, only coming late on Thursday. Apple’s position is a bit different; it designs its own chips and sells devices that contain them, and it also designs and develops its own operating system. Apple’s reaction was high level, but fairly matter of fact and free of marketing fluff.That Apple’s chips should be susceptible to Spectre was unsurprising; it’s a rather generic problem for any chips with speculative execution. Apple doesn’t go into detail about mitigation techniques and doesn’t distinguish between the array-bounds attack and the branch-predictor attack but says that it is updating Safari to protect against the problem.

The more interesting part of Apple’s response is that the company acknowledges that its processors are also vulnerable to Meltdown. This does go a long way toward diminishing Meltdown as an “Intel problem;” with most of Apple’s chips affected, too (though the simpler processors used in the Apple Watch are not vulnerable), it means that there is a pretty substantial number of non-Intel chips that can also be attacked in this way.

As an operating system vendor, Apple has updated both iOS and macOS to use the dual page table mappings that are the recommended solution here. For Apple, this is perhaps not such a big change; the 32-bit x86 versions of macOS already used a similar scheme. This was because Apple wanted to give 32-bit applications access to the full 4GB of virtual memory, rather than splitting that 4GB between the program and the kernel. While this imposes a performance cost, it provided better compatibility with the PowerPC macOS, which also gave applications the full 4GB.

Apple’s work on the operating system isn’t complete, either; although the dual page table work is done, the company has further (non disclosure agreement-protected) protective work in the pipeline.

Microsoft

Enlarge
Photo: Ethan Miller/Getty Images

Microsoft started testing its Meltdown protection in November last year. It tested the dual page table system in Insider builds of Windows 10, though at the time the reason for this work was unknown. From what we can tell, Microsoft’s implementation isn’t applied to AMD systems (nor to pre-2013 Intel Atoms, as they have no speculative execution), and, where available in the underlying hardware, Microsoft’s implementation will also use certain hardware capabilities to reduce the performance impact of the dual page tables.There are a couple of wrinkles. During testing, Microsoft found that some anti-virus software tries to do undocumented, unsupported things with kernel memory, and these things break when dual page tables are used. Accordingly, dual page tables won’t be used when third-party anti-virus is installed, until and unless that anti-virus software sets a specific registry key to indicate that it supports dual page tables.

Further, Windows Server does not enable dual page tables by default. Although most workloads show little to no performance impact from the use of dual page tables, there are certain workloads, particularly for servers, that can show a bigger performance impact. Workloads that are particularly I/O intensive (both disk and network) will tend to see the greatest impact. System administrators will have to turn on dual page tables with a registry key change.

Redmond has also built support for the new Intel chip capabilities to help protect against Spectre branch prediction attacks. As soon as microcode updates light up the new features, Windows should be able to take advantage of them. While there are some extra nuances, the basic approach is that every time the operating system performs a context switch (switching from one process to another, or from one process to the kernel) it will also reset the branch-prediction buffers. This will prevent one process from being able to prime the branch predictor used by another, which should curtail, if not outright eliminate, the Spectre branch prediction attack.

For systems without hardware support, it appears that Microsoft is working on some brute-force approaches to try to reset the branch predictor in certain situations.

Enlarge/ This system doesn’t have a suitable microcode update available yet so doesn’t include the branch prediction attack mitigation.

Helpfully, Microsoft has also published a PowerShell script that describes the current system protection against Meltdown and Spectre.

For the array bounds variant of Spectre, Microsoft’s main action is to modify Edge and Internet Explorer. Browsers represent a particular risk for this attack, as it’s relatively straightforward to write JavaScript that sets up the conditions necessary to perform the attack. Depending on the browser, browser-based attacks can do things such as steal passwords, and in all browsers the attack provides data useful for breaking out of sandboxes.

Accordingly, Microsoft is disabling access to JavaScript SharedArrayBuffer—a kind of high-performance array that was only enabled in Edge a few months ago—and reducing the precision of timers available to JavaScript. Successful exploitation of both Meltdown and Spectre requires careful timing of actions that may differ by only a few hundred processor cycles. To make this timing harder to achieve in JavaScript, the high-precision JavaScript timer (intended mainly for things like benchmarking and performance profiling) is having both its precision reduced, from a granularity of 5 microseconds to 20 microseconds, and its accuracy reduced, by introducing up to 20 microseconds of random jitter to its results.

This same approach is being replicated by Mozilla in Firefox and is already shipping in the latest version of Firefox. Google too is applying these changes to Chrome, and they should ship to end users in late January. Mozilla and Google both say that they’re also developing more precise mitigations—likely to be the judicious insertion of serializing instructions to prevent speculative execution in certain places—to address the problem.

Finally, Microsoft has updated its Azure cloud computing platform to protect against Meltdown. Microsoft has implemented a fix at the hypervisor level; this means that virtual machines running on Azure don’t need to be patched to protect against Meltdown. While patching your virtual machines is, of course, strongly advised as a general principle, for systems running on Azure there’s less of an immediate rush.

Guarding against Spectre will still need operating system and application-level changes.

The company claims that the performance impact on Azure is negligible, even for most I/O-intensive workloads. This makes the decision to not enable dual page tables by default on Windows Server a little surprising—if it’s good enough for the wide range of customers and workloads on Azure, it’s probably good enough for everyone else. But perhaps the company felt that it was more acceptable to impose a 15-percent performance hit on some unlucky customer running just the wrong type of workload when that customer is using cloud services, than an on-premises server.

Amazon

The Amazon cloud story is much the same as Azure. Amazon has rolled out patches for Meltdown, so EC2 and other services should be safe. Just as with Azure, Amazon’s fix is at the hypervisor level and so shouldn’t need any corresponding operating system updates.

And also as with Azure, Amazon says that it has not seen a “meaningful” performance impact for the “overwhelming majority” of workloads.

Google

Google’s two consumer operating systems, Android and Chrome OS, both depend on Linux kernels. Naturally, Google is adopting Linux’s protection when it makes sense to do so. Chrome OS for x86 processors has been updated to include dual page table protection, and Google will update Chrome OS for ARM processors at a later date.Most Android hardware isn’t susceptible to Meltdown, or at least, it isn’t yet. When Cortex A-75 designs hit the market, Android too will likely become vulnerable; we wouldn’t be surprised to see Google using the ARM dual page table support in Linux at that time. To protect against Spectre, the latest Android release reduces the precision of the timers available to Android applications, making it harder for malicious code to make the precise measurements required.

As mentioned previously, the Chrome browser has been similarly modified to prevent precise timing.

Google has updated its cloud infrastructure to address Meltdown, though unlike Amazon and Microsoft, Google’s guidance suggests that guest virtual machines will still need to be patched for complete protection.

An effective, if uncoordinated, response

While the response to Meltdown and Spectre hasn’t been as smooth as originally hoped, vendors appear to have done a thorough job. Meltdown, though easier to exploit, is also easier to protect against; the operating system changes appear successful and should do a solid job for the Intel, Apple, and future ARM chips that are susceptible to the attack.

Spectre, however, is going to be a trickier customer. It doesn’t have any clean, simple fix. Operating system changes, ideally in conjunction with greater hardware control over branch prediction, will provide protection in some scenarios, but the array-bounds version of Spectre is going to require careful examination of, and repair to, vulnerable applications. Unlike the other attacks, there doesn’t appear to be any way of implementing an operating system-level fix, and the application of appropriate application-level fixes is in all likelihood going to require lots of manual effort by developers.

Longer term, it seems likely that Meltdown will recede into the distance—an annoyance, perhaps, but fully patched and protected against—but the rather more subtle Spectre is going to be with us for a while.

Original article can be found here