Alexandra Sandulescu
Eduardo Vela Nava
Rodrigo Branco (BSDaemon)
We observed some undocumented (to the best of our knowledge) behavior of the indirect branch predictors, specifically relative to ‘ret’ instructions. The research we conducted appears to show that this behavior doesn’t seem to create exploitable security vulnerabilities in the software we’ve tested. We would like to better understand the impact and implications for different software stacks, thus we welcome feedback or further research.
Our observations (and tests) indicate that certain microarchitectures decide the destination of a ‘ret’ instruction based on a different order than first the ‘RSB/RAS’ and then the ‘BTB’.
This behavior was confirmed in the following microarchitectures, other vendors/versions may also have similar behavior.
Interestingly, in our tests, we did not observe signals in the following microarchitectures:
We’ve seen mixed results on:
By not observing the signal in our tests we can not rule out the possibility that they are affected. We would like to learn more about the behavior in these and other microarchitectures.
Returns (ret) are indirect branches that should be predicted from a data structure called RSB (or RAS on AMD). When the RSB/RAS structure is empty, depending on the microarchitecture and patch level/configuration, the returns might also be predicted from the BTB (the order for the prediction between the RSB/RAS and the BTB might be also different in some microarchitectures, as recently disclosed as part of the RetBleed response1).
While not officially documented/discussed, the work in Spectre v1.1 2 indicates that speculative overwrites controlling some other data structure also affect the prediction. The example discussed talks about a speculative overwrite over the return address and how a speculative return uses that value. Our tests indicate that such overwrites are using the store buffer (see Subsection: Speculative top of the stack for more details). But still, an open question remains: What other prediction order/conditions exists? This matters because mitigations such as retpoline 3 clearly depend on it to be properly understood (and effective). Nonetheless, retpoline documentation only discusses RSB/RAS and BTB.
Our experiments confirm the findings of Mambretti et. al 4, that ‘rets’ also predict from the top of the stack if the contents of it are recently accessed even when NOT speculatively overwritten. Given that to prevent an attacker from controlling the destination of a ‘ret’ (Spectre v2) the recommendation is to perform an IBPB (which flushes the BTB and the RSB), we have common scenarios in which the first ‘ret’ upon a context switch (between untrusted and trusted entities, such as user to kernel or guest to hypervisor) will actually predict from the recently accessed top of the stack.
What is worse is that in the user to kernel case, the RSB/RAS is thought to not be possible to point to a kernel address (since their entries are only created by ‘calls’). With that, SMEP is the mechanism that prevents bad speculation from happening on the user->kernel attack case (via RSB/RAS control) because only user-space addresses can be trained/injected there. But in the case of the top of the stack, an entry can be created with a simple ‘push’ instruction (in fact, many instructions such as pop, sub, add, leave, xchg), potentially making SMEP ineffective for the observed scenario. It is also worth noting that deeper (in the control flow) ‘ret’s might still have attacker controlled values in the top of the stack (that are recently accessed) due to parameter passing, stack adjustments (such as subs to allocate stack space) and many other software-controlled reasons.
The easiest way to see/test the behavior is to fill the RSB/RAS (in case the IBPB instruction does not clear the RAS, as is the case on some AMD microarchitectures) and perform an IBPB (to flush the BTB). If the top of the stack is accessed (for example, via a ‘push’), a speculatively executed ‘ret’ instruction will actually predict using the value from that location. A ‘clflush’ can be added for the negative testing (notice that we still see some hits in some of the microarchitectures, which might support the theory of the usage of store-buffers).
Here is an example of a test (based on KTF 5):
/* Preparing */
flushbtb();
rsbstuff();
clflush(&end_ptr);
lfence();
mfence();
/* Ret speculates via shadow of a branch */
// Uncomment out the clflush (%%rsp) for negative testing
asm goto(".global branch\n"
"push %%rax\n"
//"clflush (%%rsp)\n"
"lfence\n"
".align 16\n"
"branch:\n"
"cmp %%rax, (%%rdi)\n"
"jnz %l[end]\n"
"ret\n"
"nop\n"
::"a" (&leak_secret),
"D" (&baseline):
: end);
// end
end:
asm volatile(".global _end;_end: nop; pop %rax);
We’ve also compared this to the Spectre v1.1 2 case. In a speculative overwrite, the ‘store buffers’ seem to be used and the ‘ret’ speculate from them. Our experiment with the Spectre v1.1 case look like this:
".align 16\n\t"
"SHADOW_BRANCH:\n\t"
"cmp %%rax, (%%rdi)\n\t"
// This branch is always taken
"jnz SHADOW_DEST\n\t"
"sub $0x100, %%rax\n\t"
"mov %%rax, (%%rsp)\n\t"
".align 64\n\t"
"ret\n\t"
// where: rax is the gadget address + 0x100 (to avoid false positives)
// rdi points to a page address that we allocate randomly and will never be // equal to rax because rax points to a .text address
Coincidentally, we got N = 42 for a Broadwell Server and N = 56 for {Skylake Server, Cascadelake}. From 6 we see that one of the changes from Broadwell to Skylake is exactly the increase of the store buffer: “Larger store buffer (56 entries, up from 42)”
Speculative and non-speculative paths both leverage the store buffers. That means that other values (recently overwritten, architecturally or speculatively) might be used in ret destination prediction (e.g. nested cases of ‘rets’).
While this does not seem to be a vulnerability (because we have not yet identified cases in which a compiler would generate vulnerable code) it is an undocumented behavior that might have security implications in some scenarios that we may not have thought of. We welcome feedback or further research.
Here are some examples of code constructs that may be vulnerable due to the behavior we discussed here. We did not test any of these scenarios:
We would like to thank Pawel Wieczorkiewicz from Open Source Security Inc. for his collaboration in this work. We would like to thank Intel and AMD for the timely response to our inquiry about the findings documented here. We thank the IBM Research System Security group 8 for their timely feedback.
(in case the IBPB instruction does not clear the RAS, as is the case on some AMD microarchitectures)
(which was done while we investigated the reason)“Retbleed: Arbitrary Speculative Code Execution with Return Instructions”. Link: https://comsec.ethz.ch/research/microarch/retbleed/ ↩
“Speculative Buffer Overflows: Attacks and Defenses”. Link: https://people.csail.mit.edu/vlk/spectre11.pdf ↩ ↩2
“Retpoline: A Branch Target Injection Mitigation”. Link: https://www.intel.com/content/dam/develop/external/us/en/documents/retpoline-a-branch-target-injection-mitigation.pdf ↩
Bypassing memory safety mechanisms through speculative control flow hijacks. Link: https://arxiv.org/pdf/2003.05503.pdf ↩
KTF (Kernel Test Framework). Link: https://github.com/KernelTestFramework/ktf ↩
Skylake Server Microarchitecture (Wikichip). Link: https://en.wikichip.org/wiki/intel/microarchitectures/skylake_%28server%29 ↩
“Post-barrier RSB Prediction”. Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00706.html ↩
IBM System Security. Link: https://researcher.watson.ibm.com/researcher/view_group.php?id=8257 ↩