Yes, you have read it right. Unless you are a compiler or a exploit developer you do not really want to know anything about stack canaries. You just need to know they exists and will prevent stack overflows. But if you want to know how they work in detail, then keep reading.
Stack canaries are a technique to detect stack overflows and therefore prevent the exploitation of vulnerable applications. They have been around for a while and in this paper I'm going to put together everything I have learnt about them so far.
Nowadays, a vulnerable program doesn't mean automatically a security breach, i.e. that it can be exploited. The use of ASLR (Address Space Layout Randomization), proper execution permission on process pages and stack canaries makes very difficult to exploit a classical buffer overflows, even tho it is still possible in some cases.
In this paper we'll explore the use of stack canaries, but before jumping in the details, and for completeness, let's describe the issue this technique solves so it will be way easier to understand what is coming next in this paper. If you already know how functions are called in assembler, what is a stack frame and how the stack layout looks like when calling a function, you can skip the next sections and go straight to section Stack Canaries. Otherwise do not panic, I will explain anything you need to know in the next sections.... and it is easier than you may expect.
Function Calls
Whenever we call a function in a high level language like C, the compiler needs to generate opcodes to change the program flow, but it has to do that in a way such as it can get back to the exact point it was just before calling the function.
This process implies the modification of the so-called Instruction Pointer or Program Counter (the name depends on the documentation you are reading). This is just a processor register that points to the next instruction to be executed. So, calling a function requires to temporally store the current value of the IP
, before modifying it with the address of the function (I will use IP
from now on as all examples will be x86 and that is how it is named for that architecture).
So, for a simple C program like this:
void func () {
// Do something
return;
}
int main () { func(); return 0}
The compiler will generate something like this:
main: ADDR+0 There is always some code there +----------- ADDR+1 F() => TMP=IP=ADDR+2; IP=func | ADDR+2 return; <----------------------+ | (...) | + -> func: ADDR+X Do something | (...) | ADD+X+N: return; => IP=TMP ------------+
Each processor provides instructions to change the IP
register (all branch instructions does) and store the current IP
value somewhere. This somewhere is either the stack (like the x86 processors) or a Return Address Register or Link Register (like ARM and MIPS). In the same way, processors also provides a way to easily restore the saved IP
value, from the stack or from the mentioned return address register.
Note that the platforms implementing the Return Address Register solution also need to use the stack whenever the function needs to call other function, because the Return Address Register will be overwritten, losing the original return address. Only leaf functions (functions don't calling any other function), can be implemented just using the register.
The Stack
To get the whole picture, we still need to depict the structure of the stack when a function is called.
Calling a single function is pretty straightforward, and everything looks very simple. We just store our arguments somewhere, the return address, and use some random memory for our local variables.
Now think about a complex function that calls many other functions. You can chose different areas of memory to store that information for each function. That will also work fine, but now, think about a recursive function, or even worst, a re-entrant function. Implementing this storing data in memory differently for each function may be possible but tricky.
Fortunately, a lot of smart people years ago, figured out how to solve this in a elegant and neat way... Just use a stack. Each block of memory required for each function (return address, parameters, local variables) just get stored in the stack. Each function call pushes its information into the stack, and pops it up when the function is done. We can call the same function many times. Each time a new memory block will be added to the stack and everything will work fine, as each function invocation will have its own data block. Each one of those blocks is generically known as a Stack Frame.
So, how does a typical Stack Frame looks like?
SP+XX: Parameters (depends on ABI) SP+00: Return Address SP-04: Base Pointer Save SP-08: Local Variables Ends SP- N: Depends on number and types of Local Variables
Note: SP is the Stack Pointer a register that points to either the last used entry or the first empty entry in the stack.
As we can see, depending on the ABI the first thing in the Stack Frame may be the list of parameters. For instance, 32bits x86 Linux ABI passes all parameters in the stack, while the 64bits ABI uses registers, until they get exhausted, then the stack will be used.... but that is very uncommon, you need to pass a lot of parameters to the function and that is not a good programming practice.
Then we find the return address as we have already discussed and after that, the local variables. Note that local variables are stored in the stack. This is very convenient because we can allocate memory just decreasing the Stack Pointer and free it straight away, just increasing it back.
NOTE:When you allocate memory in the heap, that data will be somewhere else, but the pointer to that data is likely to be stored in the stack. Languages like C++ that heavily rely on objects allocated in the heap (created with new
), usually need to unwind the stack before the function is left. In other words, it is not enough to just increase the Stack Pointer in those cases
Smashing The Stack
So, know we know everything to understand how a stack overflow works. We will now use some real code to match all the theoretical concepts introduced so far.
#include <string.h>
void func () {
unsigned char buf[32];
memset (buf, 0, 64);
return;
}
int main () {
func ();
return 0;
}
If we compile and run it we get this:
$ gcc -fno-stack-protector -o bufferoverflow bufferoverflow.c $ ./bufferoverflow Segmentation fault
Yes, we have smashed the stack. In this example it is very obvious isn't it? We are writing 64 zeros in a 32 bytes buffer. If you remember our stack diagram, the local variables (that is, buf
) are allocated in the stack in the lower addresses. When we start writing to buf
in the loop, we start filling the stack from the lower address up... towards the return address, eventually overwriting it. When the function returns, it will return to address 0x00
instead of the main
function (because we have written zeros in the buffer).
Let's take a look to the function:
$ objdump -Mintel -d bufferoverflow | grep -A20 "<func>:"
000000000000064a <func>:
64a: 55 push rbp
64b: 48 89 e5 mov rbp,rsp
64e: 48 83 ec 20 sub rsp,0x20
652: 48 8d 45 e0 lea rax,[rbp-0x20]
656: ba 40 00 00 00 mov edx,0x40
65b: be 00 00 00 00 mov esi,0x0
660: 48 89 c7 mov rdi,rax
663: e8 b8 fe ff ff call 520 <memset@plt>
668: 90 nop
669: c9 leave
66a: c3 ret
000000000000066b <main>:
66b: 55 push rbp
66c: 48 89 e5 mov rbp,rsp
66f: b8 00 00 00 00 mov eax,0x0
674: e8 d1 ff ff ff call 64a <func>
679: b8 00 00 00 00 mov eax,0x0
67e: 5d pop rbp
67f: c3 ret
In the main
function we can easily identify the call to func
using the call
instruction. This will push the return address,0x679
in this case, in the stack. Actually this is a PIE (Position Independent Executable) binary and the actual address will be some random address plus this offset.
RSP --> | RET (0x679) | EMPTY |--------------------
Note: For intel processors the Stack Pointer (RSP
) points to the last used position
Then we continue execution on func
and the first thing we do is to push in the stack the base index register. We haven't talked about this, but, in short, this register is used to index values in the stack (namely parameters and local variables).
| RET (0x679) RSP --> | RBP <=== RBP | EMPTY |--------------------
After that, the buffer is allocated. 32 bytes (0x20
in hexadecimal) just subtracting that amount to RSP
.
Higher Addresses RSP+0x30 | RET (0x679) | RBP <== RBP | 8 bytes | 8 bytes | 8 bytes RSP ---> | 8 bytes <-- buf (RBP-x20) | EMPTY |-------------------- Lower Addresses
So it is clear now, how the memset
function will write zeros pass the return address. This is a stack overflow.
In this case the program is not vulnerable as there is no way for an attacker to control what is written in the return address. Actually, it will always be zeros. However in many other cases, this buffer operations take as source some user input, some data read from the network, or from a file... external data sources that an attacker can manipulate to write an appropriate value in the return address and change the program execution flow.
Stack Canaries
Finally we have reach the point were we can introduce the canaries. The name comes from the old days in the mines. Miners use to carry canaries (the birds) to detect leaks of gases (firedamp). The canaries dead with small concentrations of the gas, concentration not mortal for humans, and that will warn miners to get the hell out of there.
Stack Canaries do something similar. They do not fix the buffer overflow, but they allow us to detect it, and finish execution in an ordered way.... In other words.... to get the hell out of there.
The concept they are based on is very simple. As we had seen in the previous section, the stack has to be smashed from lower address to top, being the return address the highest, and that address is only used by the very last instruction in the function. The ret
instruction.
A canary, is just a random number that we introduce just after the return address so it will always be smashed before the return address. Before leaving the function that value is checked again and if it was changed then the stack was smashed.
On the previous example, we omitted the generation of the canary on purpose so we get simpler code to better understand how the stack frame is build. For that we used the -fno-frame-protector
. Now we can recompile the test program without this flag and see what we get.
$ make bufferoverflow cc bufferoverflow.c -o bufferoverflow $ ./bufferoverflow *** stack smashing detected ***: <unknown> terminated Aborted
Now, instead of the segmentation fault (caused for the return to address 0x00000
) we get a message indicating than the stack has been smashed.
The anatomy of a canary
Let's take a look to the code for this last compilation:
$ objdump -Mintel -d bufferoverflow | grep -A20 "<func>:" 00000000000006aa <func>: 6aa: 55 push rbp 6ab: 48 89 e5 mov rbp,rsp 6ae: 48 83 ec 30 sub rsp,0x30 6b2: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28 6b9: 00 00 6bb: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax 6bf: 31 c0 xor eax,eax 6c1: 48 8d 45 d0 lea rax,[rbp-0x30] 6c5: ba 40 00 00 00 mov edx,0x40 6ca: be 00 00 00 00 mov esi,0x0 6cf: 48 89 c7 mov rdi,rax 6d2: e8 a9 fe ff ff call 580 <memset@plt> 6d7: 90 nop 6d8: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8] 6dc: 64 48 33 04 25 28 00 xor rax,QWORD PTR fs:0x28 6e3: 00 00 6e5: 74 05 je 6ec <func+0x42> 6e7: e8 84 fe ff ff call 570 <__stack_chk_fail@plt> 6ec: c9 leave 6ed: c3 ret
The middle part of the function is still the same, but we can now see some code at the beginning and at the end. Specifically this:
6b2: mov rax,QWORD PTR fs:0x28
6bb: mov QWORD PTR [rbp-0x8],rax
This code just reads some value from address FS:0x28
(FS
is a segment register, you do not see those very often nowadays but some of us are familiar to them from the DOS age). That is the canary, a random value initialised when the program is loaded. Then what it does is to store it at the top of the stack frame:
RSP+0x30 | RET (0x679) RBP --> | RBP | 8 bytes <=== RBP - 0x08 | 8 bytes | 8 bytes | 8 bytes | 8 bytes RSP ---> | 8 bytes <-- buf (RBP-0x30) | EMPTY |--------------------
In this diagram it seems to be an extra position in the stack. That's right and it is because for 64bits x86 processors the stack shall be aligned to 16 bytes (0x10) blocks. We just need 8 bytes for the canary, but the compiler generates 8 more to keep the stack alignment. Well, I hope that is the right explanation, at this level, compilers do many arcane tricks that sometimes looks like something but they are something else.
Then, just before leaving the function, it checks that the canary stored in the stack is still the same than the one in the special address... it checks it is still alive:
6d8: mov rax,QWORD PTR [rbp-0x8]
6dc: xor rax,QWORD PTR fs:0x28
6e5: je 6ec <func+0x42>
6e7: call 570 <__stack_chk_fail@plt>
In case the value doesn't match (the stack has been smashed), the __stack_chk_fail
function is invoked. That is the one showing the message:
*** stack smashing detected ***: <unknown> terminated
Aborted
Canaries are only generated when needed
Let's do a small change to our test program:
#include <string.h>
void func () {
unsigned char buf[32];
memset (buf, 0, 64);
return;
}
int func1 () {
int a = 1;
int b = 2;
return a + b;
}
int main () {
func ();
return 0;
}
The generated code for func1
is:
00000000000006ee <func1>:
6ee: 55 push rbp
6ef: 48 89 e5 mov rbp,rsp
6f2: c7 45 f8 01 00 00 00 mov DWORD PTR [rbp-0x8],0x1
6f9: c7 45 fc 02 00 00 00 mov DWORD PTR [rbp-0x4],0x2
700: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8]
703: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
706: 01 d0 add eax,edx
708: 5d pop rbp
709: c3 ret
As you can see there is no canary used in this one, because it is not possible to smash the stack in this case. In general, gcc won't generate canary code for functions that doesn't declare arrays/buffers as local variables.
How are canaries generated?
In the previous section we have seen how the canary value is retrieved from the address fs:0x28
. But how does that value get there? and where does it come from?.
If we look for this address in our test program, we will just find it on func
. That means that the code to initialise the canary is either in the system or in the standard C library... in some of all that code that gets executed before we get to the main
function. So let's compile a static version of our test program in order to get all the code in, and see what we find:
0000000000400e40 <__libc_start_main>:
(...)
401046: 48 8b 05 53 7a 2b 00 mov rax,QWORD PTR [rip+0x2b7a53] # 6b8aa0 <_dl_random>
40104d: 48 8b 00 mov rax,QWORD PTR [rax]
401050: 30 c0 xor al,al
401052: 64 48 89 04 25 28 00 mov QWORD PTR fs:0x28,rax
401059: 00 00
Looks like it is initialised by the C library. The value is taken from a global symbol named _dl_random
. A quick search will show that this is a random value provided by the kernel ELF loader. We can see in the code above how the less significant byte is deleted before storing the value in the address fs:0x28
...
So the value is generated by the kernel at the time of loading the binary. Then it is also exposed to the application as the auxiliary vector value AT_RANDOM
, that points to the address containing the bytes that will be used by the canary.
All together now
To illustrate everything we have gone trough so far, I have built a simple test program with a function that dumps the stack and allows us to easily modify it to change the control flow or fire the canary detection. It also shows the auxiliary vectors (yes, we could have used LD_SHOW_AUXV=1
but this is just other way).
#include <stdio.h>
#include <sys/auxv.h>
// Let's store this in globals so we keep the stack cleaner
void *ret_addr = NULL;
long *new_ret = NULL;
int func ()
{
char buf[16];
long i;
long *p = (long*) buf;
long *ret = NULL;
long *canary = NULL;
// initialise buffer for easy stack identification
p[0] = 0xbada5500cafebabe;
p[1] = 0xbada550ff1cebabe;
// Dump the stack
for (i = 8; i >= -2; i--) {
printf ("STACK + %p : %16lx", (p+i), p[i]);
if (p[i] == (long)ret_addr) {
ret = p + i; // When we find ret address store the ptr
canary = p + i - 2; // And also the canary one
printf (" <=== RET ADDR\n");
}
else if ( canary && (p + i) == (long*)canary) printf (" <=== CANARY\n");
else printf ("\n");
}
printf( "-----[Let's force jumping into l2]----------------\n");
*ret = (long)new_ret; // Return to l2 instead of l1
//*canary = 0; // This fires the stack protection.
for (i = 8; i >= -2; i--) {
printf ("STACK + %p : %16lx", (p+i), p[i]);
if ( (p + i) == (long*)ret) printf (" <=== RET ADDR\n");
else if ( canary && (p + i) == (long*)canary) printf (" <=== CANARY\n");
else printf ("\n");
}
return 0;
}
int main () {
int i;
ret_addr = &&l1;
new_ret = &&l2;
unsigned char *rnd = (unsigned char*) getauxval (AT_RANDOM);
printf ("Ret address1 : %p\n", ret_addr);
printf ("Ret address2 : %p\n", new_ret);
printf ("AT_RANDOM (%p) : ", rnd);
for (i = 0; i < 16; i++) printf ("%02x ", rnd[i]);
printf ("\n");
func ();
l1:
printf ("Don't see me!\n");
l2:
printf("Normal shutdown!\n");
return 0;
}
The output of this program is shown below (actual values will be different in your machine):
$ ./test Ret address1 : 0x5606f7e0cab3 Ret address2 : 0x5606f7e0cabf AT_RANDOM (0x7ffc43e13139) : 2a 6b 50 bd fe 25 7d 97 9e b1 a5 47 8e c9 20 51 STACK + 0x7ffc43e12d00 : 5606f7e0cae0 STACK + 0x7ffc43e12cf8 : 7ffc43e13139 STACK + 0x7ffc43e12cf0 : 1043e12de0 STACK + 0x7ffc43e12ce8 : 5606f7e0cab3 <=== RET ADDR # l1 STACK + 0x7ffc43e12ce0 : 7ffc43e12d00 # RBP STACK + 0x7ffc43e12cd8 : 977d25febd506b00 <=== CANARY STACK + 0x7ffc43e12cd0 : 5606f7e0c6a0 STACK + 0x7ffc43e12cc8 : bada550ff1cebabe STACK + 0x7ffc43e12cc0 : bada5500cafebabe # buf STACK + 0x7ffc43e12cb8 : 7ffc43e12cc0 STACK + 0x7ffc43e12cb0 : 7ffc43e12cd8 -----[Let's force jumping into l2]---------------- STACK + 0x7ffc43e12d00 : 5606f7e0cae0 STACK + 0x7ffc43e12cf8 : 7ffc43e13139 STACK + 0x7ffc43e12cf0 : 1043e12de0 STACK + 0x7ffc43e12ce8 : 5606f7e0cabf <=== RET ADDR # l2 STACK + 0x7ffc43e12ce0 : 7ffc43e12d00 # RBP STACK + 0x7ffc43e12cd8 : 977d25febd506b00 <=== CANARY STACK + 0x7ffc43e12cd0 : 5606f7e0c6a0 STACK + 0x7ffc43e12cc8 : bada550ff1cebabe STACK + 0x7ffc43e12cc0 : bada5500cafebabe # buf STACK + 0x7ffc43e12cb8 : 7ffc43e12cc0 STACK + 0x7ffc43e12cb0 : 7ffc43e12cd8 Normal shutdown!
Note: the lines preceded by # are annotations I have added to the text, they are not produced by the program
The program, just calls a function that drops the stack showing the return address and the canary, that matches the AT_RANDOM
auxiliary vector (shown at the beginning) except for the last byte (remember that xor al, al
?). The function also changes the return address to jump to l2
so the first print is not shown....
Note that the canary determination by the program is a bit lousy in the sense that it doesn't have to be at two words of the return address. You can recompile the program with optimisations on and the code will be different and the whole stack layout may be different. Anyway, it is a good simple test program you can use to build up your own tests and solve further doubts you may still have.
Before I leave
This looks like a very good technique to prevent buffer overflow exploits. We may think that we are safe enabling the stack protector flag in our compiler and ask it to produce canaries for all potentially exploitable functions. This is not fully true, there is always a way.
Exploiting applications protected with canaries usually requires to figure out the value of the canary. This way when the stack is overwritten the proper canary will also be written and therefore gain control on the execution. So a canary is as good as the applications preventing the exfiltration of its value.
For instance, suppose the application has a format string vulnerability somewhere in the code. Using a format string vulnerability an attacker can dump data from the stack and have a chance to figure out the canary value. Once the value of the canary is know, the buffer overflow can be exploited despite of the existence of the canary. The attacker will just write the canary in the buffer in the right position so the check at the end of the function will pass and effectively modify the flow of the application.
So the only way to have a secure program is to make sure there is no buffer overflows on them.
Conclusions
We have gone through a basic description of how functions are called from high level languages and what needs to be done at assembly level to support them (recursivity, re-entrancy,...). We briefly introduced the stack frame and how a program bug can be used to modify the normal program control flow and potentially exploit it to gain access to a given system.
Then we introduced the canary mechanism to detect stack overflows and looked into its low level implementation.
■