x86-64 Code Injection Explained

Caution

The information you gather from this blog, all the techniques, proofs-of-concept code, or whatever else you may possibly find here, are strictly for educational purposes. I do not condone the usage of anything you might gather from this blog for malicious purposes. I've made this blog therein, to consolidate my learning by teaching it to the world.

The Core Challenge

You have a binary file on disk. You want to:

Insert your own code (shellcode) into it
Make it execute before the original program
Return control to the original program seamlessly

Think of it like hijacking a train: you board it at the station, do your thing during the ride, then let the original passengers continue to their destination.

Understanding the Landscape

What You're Working With

When you open an ELF binary, you see:

┌─────────────────────────────────────┐
│  ELF Header (64 bytes)              │  ← Metadata about the file
├─────────────────────────────────────┤
│  Program Headers (56 bytes × N)     │  ← How to load into memory
├─────────────────────────────────────┤
│  .text section (CODE)               │  ← The actual program
│  .rodata section (constants)        │
│  .data section (variables)          │
├─────────────────────────────────────┤
│  Padding/Alignment (0x00 bytes)     │  ← **GOLD MINE FOR INJECTION**
├─────────────────────────────────────┤
│  .got/.plt (dynamic linking)        │
│  .bss (uninitialized data)          │
├─────────────────────────────────────┤
│  Section Headers (64 bytes × N)     │  ← Debug/link info
└─────────────────────────────────────┘

The padding exists because segments must align to page boundaries (4096 bytes). This creates "code caves" - empty spaces where we can hide our shellcode.

Step-by-Step Injection Process

Step 1: Find the Code Cave

// Parse the ELF file
Elf64_Ehdr *elf_header = (Elf64_Ehdr *)mapped_file;
Elf64_Phdr *program_headers = (Elf64_Phdr *)(mapped_file + elf_header->e_phoff);

// Find the first executable LOAD segment
for (int i = 0; i < elf_header->e_phnum; i++) {
    if (program_headers[i].p_type == PT_LOAD && 
        (program_headers[i].p_flags & PF_X)) {
        
        // Calculate where this segment ends in the file
        uint64_t segment_end = program_headers[i].p_offset + 
                               program_headers[i].p_filesz;
        
        // Where does the NEXT segment start?
        uint64_t next_segment_start = program_headers[i + 1].p_offset;
        
        // The gap is our code cave!
        uint64_t cave_size = next_segment_start - segment_end;
        
        printf("Found %lu bytes of space!\n", cave_size);
        return segment_end; // This is where we inject
    }
}

Why the executable segment? Because our shellcode needs execute permission. If we inject into a data segment, the CPU will refuse to run it (NX protection).

Step 2: Craft the Shellcode

Your shellcode must be position-independent (work at any memory address) and self-contained (no library calls).

The Anatomy of x86-64 Shellcode

BITS 64
default rel                    ; Use RIP-relative addressing

global _start
_start:
    ; ─────────────────────────────────────────────────────────
    ; PART 1: Print "....WOODY...."
    ; ─────────────────────────────────────────────────────────
    xor     rax, rax           ; Clear rax (zero it out)
    mov     al, 1              ; syscall number: write (1)
    mov     rdi, 1             ; arg1: file descriptor (stdout = 1)
    lea     rsi, [rel msg]     ; arg2: pointer to message (RIP-relative!)
    mov     rdx, 14            ; arg3: message length
    syscall                    ; Invoke kernel
    
    ; ─────────────────────────────────────────────────────────
    ; PART 2: Decrypt the .text section
    ; ─────────────────────────────────────────────────────────
    mov     rsi, [rel text_addr]   ; Load encrypted text address
    mov     rcx, [rel text_size]   ; Load text section size
    lea     rdi, [rel key]         ; Load encryption key address
    
.decrypt_loop:
    lodsb                      ; Load byte from [rsi] into AL, increment rsi
    xor     al, [rdi]          ; XOR with current key byte
    mov     [rsi-1], al        ; Write decrypted byte back
    inc     rdi                ; Move to next key byte
    lea     r8, [rel key]      ; Get key start address
    add     r8, 8              ; Key is 8 bytes long
    cmp     rdi, r8            ; Reached end of key?
    jne     .no_wrap
    lea     rdi, [rel key]     ; Wrap around to key start
.no_wrap:
    loop    .decrypt_loop      ; Decrement rcx, loop if not zero
    
    ; ─────────────────────────────────────────────────────────
    ; PART 3: Jump to original program
    ; ─────────────────────────────────────────────────────────
    mov     rax, [rel original_entry]  ; Load original entry point
    jmp     rax                        ; Jump there (no return!)

; ─────────────────────────────────────────────────────────────
; DATA SECTION (patched by packer at injection time)
; ─────────────────────────────────────────────────────────────
msg:            db "....WOODY....", 10
key:            dq 0x0000000000000000  ; Placeholder for XOR key
text_addr:      dq 0x0000000000000000  ; Placeholder for .text vaddr
text_size:      dq 0x0000000000000000  ; Placeholder for .text size
original_entry: dq 0x0000000000000000  ; Placeholder for real entry

Why `lea rsi, [rel msg]`?

Problem: If you use mov rsi, msg, the assembler generates an absolute address like 0x401000. But when your shellcode is injected, it might load at 0x500000, making the address wrong.

Solution: [rel msg] means "calculate offset from current instruction pointer (RIP)". This works anywhere:

At address 0x401000:  lea rsi, [rip + 50]  → rsi = 0x401032
At address 0x500000:  lea rsi, [rip + 50]  → rsi = 0x500032

The relative offset (50 bytes) stays the same!

Step 3: Compile to Raw Bytes

# Assemble to object file
nasm -f elf64 -o shellcode.o shellcode.asm

# Link (if testing standalone)
ld -o shellcode shellcode.o

# Extract ONLY the machine code
objcopy -O binary -j .text shellcode shellcode.bin

# View the raw bytes
xxd shellcode.bin

Result: You get pure machine code like:

48 31 c0 b0 01 bf 01 00 00 00 48 8d 35 0a 00 00
00 ba 0e 00 00 00 0f 05 48 8b 35 00 00 00 00 ...

Step 4: Inject into the Binary

// Open the target binary
int fd = open("target_binary", O_RDWR);
struct stat st;
fstat(fd, &st);

// Map it to memory for easy manipulation
uint8_t *file = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, 
                     MAP_SHARED, fd, 0);

// Load your shellcode
FILE *sc_file = fopen("shellcode.bin", "rb");
uint8_t shellcode[1024];
size_t sc_size = fread(shellcode, 1, 1024, sc_file);

// Find the code cave (from Step 1)
uint64_t cave_offset = find_code_cave(file);

// Calculate where shellcode will be in MEMORY (not file!)
Elf64_Ehdr *ehdr = (Elf64_Ehdr *)file;
Elf64_Phdr *phdr = find_executable_segment(file);

// Virtual address = segment's vaddr + offset within segment
uint64_t shellcode_vaddr = phdr->p_vaddr + 
                          (cave_offset - phdr->p_offset);

printf("Shellcode will execute at: 0x%lx\n", shellcode_vaddr);

// Copy shellcode into the cave
memcpy(file + cave_offset, shellcode, sc_size);

Step 5: Patch the Shellcode Placeholders

Remember those dq 0x0000000000000000 placeholders? Now we fill them:

// Find where each placeholder is in the shellcode
#define KEY_OFFSET         14   // Offset of 'key:' label
#define TEXT_ADDR_OFFSET   22   // Offset of 'text_addr:' label
#define TEXT_SIZE_OFFSET   30   // Offset of 'text_size:' label
#define ENTRY_OFFSET       38   // Offset of 'original_entry:' label

// Generate random encryption key
uint64_t key;
FILE *urandom = fopen("/dev/urandom", "rb");
fread(&key, sizeof(key), 1, urandom);
fclose(urandom);

// Find .text section details
Elf64_Shdr *text_section = find_section(file, ".text");
uint64_t text_vaddr = text_section->sh_addr;
uint64_t text_size = text_section->sh_size;

// Save original entry point
uint64_t original_entry = ehdr->e_entry;

// Patch the shellcode in memory
memcpy(file + cave_offset + KEY_OFFSET, &key, 8);
memcpy(file + cave_offset + TEXT_ADDR_OFFSET, &text_vaddr, 8);
memcpy(file + cave_offset + TEXT_SIZE_OFFSET, &text_size, 8);
memcpy(file + cave_offset + ENTRY_OFFSET, &original_entry, 8);

Step 6: Encrypt the .text Section

uint8_t *text_start = file + text_section->sh_offset;
for (size_t i = 0; i < text_size; i++) {
    text_start[i] ^= ((uint8_t *)&key)[i % 8];  // XOR with rotating key
}

Now the program is broken! If you run it, it will crash because the code is encrypted garbage. That's why we need...

Step 7: Hijack the Entry Point

// Change where the program starts executing
ehdr->e_entry = shellcode_vaddr;  // Point to OUR code instead!

// Save the modified binary
munmap(file, st.st_size);
close(fd);

What happens now when someone runs the binary?

1. OS loads the ELF into memory
2. OS jumps to e_entry (our shellcode!)
3. Shellcode prints "....WOODY...."
4. Shellcode decrypts .text section
5. Shellcode jumps to original_entry
6. Original program runs normally

Visual Flow Diagram

Before Injection

File:                          Memory:
┌──────────────┐              ┌──────────────┐
│ ELF Header   │              │ ELF Header   │
│ e_entry:     │──────┐       │              │
│  0x401000    │      │       ├──────────────┤
├──────────────┤      │       │              │
│ Segments     │      │       │              │
├──────────────┤      │       ├──────────────┤
│ .text        │◄─────┘       │ .text        │◄── Execution starts here
│ (original    │              │ (original)   │
│  code)       │              │ mov rdi, ... │
├──────────────┤              │ call printf  │
│ [PADDING]    │              │ ...          │
│ 0x00 0x00    │              ├──────────────┤
└──────────────┘              │ .data        │
                              └──────────────┘

After Injection

File:                          Memory:
┌──────────────┐              ┌──────────────┐
│ ELF Header   │              │ ELF Header   │
│ e_entry:     │──────┐       │              │
│  0x402000    │      │       ├──────────────┤
├──────────────┤      │       │              │
│ Segments     │      │       │              │
├──────────────┤      │       ├──────────────┤
│ .text        │      │       │ .text        │
│ (ENCRYPTED!) │      │       │ (ENCRYPTED!) │
│ 0x9f 0x3a... │      │       │ 0x9f 0x3a... │
├──────────────┤      │       ├──────────────┤
│ [SHELLCODE]  │◄─────┘       │ [SHELLCODE]  │◄── Execution starts here!
│ xor rax,rax  │              │ 1. Print msg │
│ syscall...   │              │ 2. Decrypt   │
│ jmp 0x401000 │──────┐       │ 3. Jump orig │
├──────────────┤      │       ├──────────────┤
│ .data        │      │       │ .text        │◄── Then jump here
└──────────────┘      │       │ (DECRYPTED!) │
                      └──────►│ mov rdi, ... │
                              │ call printf  │
                              └──────────────┘

Key Concepts Explained

1. File Offset vs Virtual Address

File offset: Where bytes are in the binary file on disk

$ xxd -s 0x1000 -l 16 binary
00001000: 48 89 e5 48 83 ec 10 89 ...
          ↑
    This is at file offset 0x1000

Virtual address: Where bytes will be in memory when loaded

// The ELF segment says:
p_offset = 0x1000   // Load from file position 0x1000
p_vaddr  = 0x401000 // Place at memory address 0x401000

Conversion:

uint64_t offset_to_vaddr(uint64_t file_offset, Elf64_Phdr *segment) {
    return segment->p_vaddr + (file_offset - segment->p_offset);
}

uint64_t vaddr_to_offset(uint64_t vaddr, Elf64_Phdr *segment) {
    return segment->p_offset + (vaddr - segment->p_vaddr);
}

2. Why XOR for Encryption?

Properties:

A ⊕ B ⊕ B = A (self-inverse: encrypt and decrypt are the same operation)
Fast (single CPU instruction)
Simple to implement in assembly

In the shellcode:

; Encryption (by packer, in C):
for (i = 0; i < size; i++)
    text[i] ^= key[i % 8];

; Decryption (by shellcode, in assembly):
.loop:
    lodsb              ; al = text[i]
    xor al, [rdi]      ; al = text[i] ^ key[j]
    mov [rsi-1], al    ; text[i] = decrypted_byte

3. Register Conventions (x86-64 System V ABI)

For syscalls:

Register	Purpose
`rax`	Syscall number (1=write, 60=exit)
`rdi`	Argument 1
`rsi`	Argument 2
`rdx`	Argument 3
`r10`	Argument 4
`r8`	Argument 5
`r9`	Argument 6

Example:

; write(1, "hello", 5)
mov rax, 1        ; syscall number
mov rdi, 1        ; fd = stdout
lea rsi, [msg]    ; buf = "hello"
mov rdx, 5        ; count = 5
syscall

For function calls (if you needed libc):

Register	Purpose
`rdi`	Arg 1
`rsi`	Arg 2
`rdx`	Arg 3
`rcx`	Arg 4
`r8`	Arg 5
`r9`	Arg 6

Common Pitfalls

❌ Using Absolute Addresses

; BAD - hardcoded address
mov rax, 0x401000
jmp rax

; GOOD - relative
lea rax, [rel target]
jmp rax

❌ Forgetting to Update p_filesz

// If you inject shellcode, update the segment size!
phdr[0].p_filesz += shellcode_size;
phdr[0].p_memsz  += shellcode_size;

❌ Wrong Offset Calculation

// WRONG
shellcode_vaddr = cave_offset;

// RIGHT
shellcode_vaddr = segment->p_vaddr + (cave_offset - segment->p_offset);

❌ Not Preserving Registers

; If you care about original program state:
_start:
    push rax
    push rbx
    ; ... your code ...
    pop rbx
    pop rax
    jmp [original_entry]

Testing Your Injection

# 1. Verify shellcode runs standalone
nasm -f elf64 shellcode.asm -o shellcode.o
ld -o shellcode shellcode.o
./shellcode  # Should print "....WOODY...."

# 2. Pack a simple binary
echo 'int main() { return 42; }' > test.c
gcc -o test test.c
./woody_woodpacker test

# 3. Test packed binary
./woody
echo $?  # Should print "42"

# 4. Check it actually decrypted
objdump -d test > orig.asm
objdump -d woody > packed.asm
diff orig.asm packed.asm  # .text should differ