Advanced Trampoline Hooks in x86 Linux
27 II 2023
4.5k
After having mostly finished the initial plugin system implementation in Hyprland, a bunch of people were curious about the new hook system, and understandably so.
Hyprland uses trampoline hooks to let plugins call their own code before or after a chosen method is executed, allowing them to change the inputs, modify the outputs, call some additional code, or block the function from executing. It's a very powerful API allowing for the ultimate integration with Hyprland.
To the point though.
One person asked me: "why not just make a list of function pointers and call them".
Well, I'd have to stick C++ code at the beginning of each function in the source. That kinda sucks.
I could also do the same via assembly, but that's like making a trampoline for every function regardless of whether it's used or not. Inefficient.
Another reason is that Trampolines are simple, and most importantly, they just work. The only downside is the lack of portability across architectures, but that would come with any method of hooking, unfortunately.
So,
Starting from the beginning. Let's say we want to hook a function in Hyprland, let's say CCompositor::focusWindow(CWindow*, wlr_surface*)
If we disassemble the function in the memory, we see:
push %rbp
mov %rsp,%rbp
push %r15
push %r14
push %r13
push %r12
push %rbx
sub $0x528,%rsp
call *0x61f766(%rip) # 0xc21fb0
mov %rdi,-0x528(%rbp)
mov %rsi,-0x530(%rbp)
mov %rdx,-0x538(%rbp)
lea -0x4f0(%rbp),%r13
mov %r13,%r14
...
Now, in order to perform a "hook", we will need to overwrite the first few instructions to a jump to our hook.
Since this is AMD64, (x86_64) we will use a register that is unused for the linux calling convention, which is %rax
. However, since we may overwrite a sizeable chunk, I decided to err on the side of caution and preserve %rax
anyways later in case any of our transported opcodes use it.
Our jump "method" looks like this:
mov $0x<addr>, %rax
jmp %rax
First, we calculate our required hook size. We need to overwrite at least 12 bytes, the size of our method. We can't split an operation in half, though.
Here, the minimum comes to 20. (up to, and including, the sub
)
Our trampoline will look like this:
push %rbp
mov %rsp,%rbp
push %r15
push %r14
push %r13
push %r12
push %rbx
sub $0x528,%rsp
push %rax
mov $0x<BACK>,%rax
jmp %rax
As you can see, we preserve %rax
with a push
which pushes it onto the stack. We will retrieve it back at the source in a moment.
BACK
is the "back" address, which is basically our function's address + 12 bytes for the original jump to the hook.
Now, let's look at the function again, after hooking:
movabs $0x7fffddea3cb1,%rax
jmp *%rax
pop %rax
nop
nop
nop
nop
nop
nop
nop
call *0x61f766(%rip) # 0xc21fb0
mov %rdi,-0x528(%rbp)
mov %rsi,-0x530(%rbp)
mov %rdx,-0x538(%rbp)
lea -0x4f0(%rbp),%r13
mov %r13,%r14
...
As you can see, we have the jump to our hook first, which apparently decided to locate itself at a wonderful address of 0x7fffddea3cb1
.
We also have a pop %rax
to retrieve our pushed %rax
.
We fill the rest of the bytes we had to yank out with nop
s so that our nice CPU can waste some cycles doing nothing.
Great! our function is now hooked. If the hook wants to call the original function, it can invoke the address of our trampoline to do so. That is exactly the address that is passed in m_pOriginal
in Hyprland.
In order to unhook, we copy the original bytes back to the function and free the trampoline. Easy!
For the C++ symbols, our hook should look exactly like the original function, but if it was a member, (like here) then we need to remember the first parameter is always a thisptr
:
void hkFocusWindow(void* thisptr, CWindow* pWindow, wlr_surface* pSurface) {
// stuff
}
As you might have noticed, we are moving a small part of the function to a different place. If you're used to any programming language other than assembly, you probably do not see an issue with that. Unfortunately, there is. The dreaded %rip
register.
If we bring back the disassembly for CCompositor::focusWindow
, with the memory addresses this time:
0x0000000000602840 push %rbp
0x0000000000602841 mov %rsp,%rbp
0x0000000000602844 push %r15
0x0000000000602846 push %r14
0x0000000000602848 push %r13
0x000000000060284a push %r12
0x000000000060284c push %rbx
0x000000000060284d sub $0x528,%rsp
0x0000000000602854 call *0x61f756(%rip) # 0xc21fb0
0x000000000060285a mov %rdi,-0x528(%rbp)
0x0000000000602861 mov %rsi,-0x530(%rbp)
0x0000000000602868 mov %rdx,-0x538(%rbp)
0x000000000060286f lea -0x4f0(%rbp),%r13
0x0000000000602876 mov %r13,%r14
...
You can see that there is a call
to *0x61f766(%rip)
. This time, we have managed to evade it, (our trampoline didn't consume it) but in some cases our trampoline might just do that.
There is a comment there for a reason. It was automatically inserted by gdb because it won't change and is evaluable instantly.
If you have a keen eye, you'll notice that the comment is basically 0x0000000000602854 + 0x61f756 + 0x6
.
Which is operation_addr + offset + 6
. The 6
comes from the length of the call operation, which is 6 bytes.
Yes, this basically means that %rip
holds operation_addr + 6
. And thus we have a problem.
If we move the operation to a different place, %rip
has a different value, and thus we are instantly sent to crash city. Oops.
How to solve it? Well, for that I don't have a full answer yet, but from what I've seen, the only offenders so far are call
opcodes, so what I did in Hyprland was simple.
Before we overwrite anything, Hyprland will find any and all call
s to %rip
offsets and note them.
Then, in the trampoline, the address gets calculated, and Hyprland replaces the call
to:
mov $0x<addr>, %rax
call [%rax]
The reason why we're not preserving %rax
is because it holds the return value of the call
ed function.
Probably not an ideal solution, as some situations might arise where %rip
is used in other opcodes, but again, haven't found such instances yet. If there are any, and I'll find a better solution, I will make an update blogpost.
Thanks to this article by spacehen for an interesting insight into Linux assembly patching, and canihavesomecoffee's udis fork.