Windows x64 - Shellcoding (Static)

Prologue
The Assembly Code
The Memory Addresses
Our First Shellcode
Running Our First Shellcode

Prologue

Shellcode is a term used in the realm of cybersecurity and computer hacking to refer to a small piece of code, usually written in assembly language, that is designed to be injected into a vulnerable software application to exploit a system's security vulnerabilities. The primary purpose of shellcode is to provide an attacker with a way to take control of a compromised system by executing arbitrary commands or launching a shell, hence the name "shellcode."

Shellcode is typically used as a payload in various types of attacks, including remote code execution, privilege escalation, and reverse shell attacks. It requires a deep understanding of low-level programming and the target system's architecture. Due to its potential for malicious use, it is a crucial focus in cybersecurity to protect against and detect attempts to execute shellcode on systems.

There are many automated frameworks used to generate shellcode for specific engagements. One such tool is metasploit framework which is popular among hackers. Using shellcode from Metasploit, while convenient and powerful, comes with several demerits and potential drawbacks that cybersecurity professionals and ethical hackers should be aware of:

Detection and Signature-Based Defenses: Metasploit is widely known and used, making its payloads, including shellcode, recognizable by security tools and systems. This increases the likelihood of detection and potential blocking by intrusion detection systems (IDS), intrusion prevention systems (IPS), and antivirus solutions that maintain signatures for known Metasploit payloads.
Limited Customization: Metasploit's pre-made shellcode payloads are designed for versatility, covering a range of scenarios. However, this versatility comes at the cost of specificity. Using generic shellcode may not fully align with the unique requirements of a particular exploit or target environment.
Overreliance on Tools: Relying solely on Metasploit's shellcode can lead to a lack of understanding of the underlying exploit techniques and shellcode creation. This can hinder a cybersecurity professional's ability to adapt to new or novel vulnerabilities that may not have a pre-built Metasploit module.
Lack of Stealth: Because Metasploit's shellcode is widely known, it might be easily identifiable by security analysts and administrators who are well-versed in common attack techniques. This diminishes the element of surprise and makes it easier for defenders to take appropriate measures.
Variability in Target Environments: Not all target systems will behave the same way or have the same vulnerabilities. While Metasploit offers a variety of payloads, they might not be perfectly suited to every scenario, potentially resulting in incomplete exploitation or unintended consequences.

A well-rounded cybersecurity professional should strive to understand the underlying concepts, techniques, and methodologies to make informed decisions when it comes to choosing between using pre-made tools and crafting custom solutions.

There are many reasons for notoriously using these shellcodes in a offensive security engagements. Shellcode can be employed to establish persistence on a compromised system. By injecting shellcode into critical system processes or creating new, hidden processes, attackers can ensure their malicious presence survives system reboots and other maintenance actions. Malicious actors can tailor shellcode to exploit specific vulnerabilities, evade security measures, and execute their intended malicious actions. This level of customization enables them to adapt to different target environments and defenses. Malicious actors can tailor shellcode to exploit specific vulnerabilities, evade security measures, and execute their intended malicious actions. This level of customization enables them to adapt to different target environments and defenses which allows it to be easily injected into memory without raising suspicion. Its concise nature makes it challenging to detect by traditional signature-based antivirus and intrusion detection systems.

The Assembly Code

Assuming the reader has good knowledge on Assembly language, CPU instructions and registers etc. We will be compiling a C/C++ code and understand the program flow in low level to create a shellcode. The process of creating a shellcode becomes easier when a developer wants to mimic the program functionality in low level by reading its assembly code.

Lets build a C++ code which displays a Message Box using MessageBoxA WIN API.

#include "Windows.h"

int main()
{
	MessageBoxA(0, "aidenpearce369 was here", "0xhacked", 0);
	return 0;
}

When we run this we would get a Message Box pop on our window.

MessageBoxPopup

Now lets analyse the PE structure of the code which we have built.

PEStudioMessageBox

It can be observed that the kernel32.dll is loaded before every DLLs and plays a crucial role in all windows executables. It's called mother of all process, since it is used to abstract low-level hardware and system calls, allowing applications to be more portable and compatible with various Windows versions and hardware configurations. This abstraction prevents applications from needing to deal directly with hardware-specific details.

Technically kernel32.dll is used to communicate into the kernel space from an end user perspective. The functions in kernel32.dll are part of the user mode API (Application Programming Interface) for Windows. This means that applications can interact with the operating system's core functionality without needing to switch to kernel mode, which would require higher privileges and is generally more complex. Even a simple program would have kernel32.dll loaded into its PE structure.

PEStudioSimpleReturn

Similarly to kernel32.dll, there is one more special DLL named ntdll.dll. While not directly loaded into all PEs, ntdll.dll contains low-level functions that interact closely with the Windows kernel. It's responsible for system calls and provides functions for memory management and synchronization. We will discuss the importance and weaponizing of ntdll.dll in our later posts.

Now lets debug our MessageBox PE into our disassmbler and try to understand it.

WinDbgMain

It can be clearly observed that the MessageBoxA WIN API is being executed with its arguments in the assembly code. If you can't able to understand what's going on here, let me simply break it down.

The MessageBoxA WIN API function needs 4 arguments to display a message box popup. This information can be found in the MS WINAPI documentation

int MessageBoxA(
  [in, optional] HWND   hWnd,
  [in, optional] LPCSTR lpText,
  [in, optional] LPCSTR lpCaption,
  [in]           UINT   uType
);

Each argument represents the attributes of the message popup. Same as the high level OOPS language, it will be initialised in the low level assembly too. It's just simple as it is, if you understand how ASM instructions work.

00007ff6`48a1178b 4533c9          xor     r9d,r9d
00007ff6`48a1178e 4c8d051b840000  lea     r8,[MessageBoxShellcode!`string' (00007ff6`48a19bb0)]
00007ff6`48a11795 488d1524840000  lea     rdx,[MessageBoxShellcode!`string' (00007ff6`48a19bc0)]
00007ff6`48a1179c 33c9            xor     ecx,ecx
00007ff6`48a1179e ff15ace90000    call    qword ptr [MessageBoxShellcode!_imp_MessageBoxA (00007ff6`48a20150)]

Things you should be clear while working with x64 bit ASM instruction are,

The first four integer or pointer arguments are passed through RCX, RDX, R8, and R9 registers respectively.
If there are more than four arguments, they are pushed onto the stack in right-to-left order of the arguments.
The return value of the function is placed in the RAX register for integer or pointer return types.

Now by understanding this it would be easier to view the assembly code. Since our MessageBoxA has only 4 arguments, its going to be stored in the rcx, rdx, r8, and r9 registers respectively. And then the function will be called to retrieve the arguments from the corresponding registers for code execution.

; Set the argument UINT uType = 0
00007ff6`48a1178b 4533c9          xor     r9d,r9d

; Set the argument LPCSTR lpCaption to some string from the address
00007ff6`48a1178e 4c8d051b840000  lea     r8,[MessageBoxShellcode!`string' (00007ff6`48a19bb0)]

; Set the argument LPCSTR lpText to some string from the address
00007ff6`48a11795 488d1524840000  lea     rdx,[MessageBoxShellcode!`string' (00007ff6`48a19bc0)]

; Set the argument HWND hWnd = 0
00007ff6`48a1179c 33c9            xor     ecx,ecx 

; Call the MessageBoxA() WIN API
00007ff6`48a1179e ff15ace90000    call    qword ptr [MessageBoxShellcode!_imp_MessageBoxA (00007ff6`48a20150)]

The string arguments will be loaded via LEA (Load Effective Address) instruction, which points the address where the string data is being stored.

0:000> da 00007ff6`48a19bb0
00007ff6`48a19bb0  "0xhacked"
0:000> da 00007ff6`48a19bc0
00007ff6`48a19bc0  "aidenpearce369 was here"

Again, it is simple as it is. All we have to do is understand the ASM instructions and replicate it for our desired target architecture.

The Memory Addresses

Each time we write code in assembly we have to keep in mind that the addresses will be random for each program runtime. It would not be same due to ASLR(Address Space Layout Randomization). ASLR works by randomly arranging the memory addresses used by system libraries, executables, and dynamic link libraries. This randomization makes it much harder for attackers to predict the location of specific functions, variables, or code segments in memory, as the addresses differ each time a program is executed or a system boots up.

But there is a catch in this. DLLs like kernel32.dll, ntdll.dll, user32.dll etc will be loaded in to the Windows Operating System at a specific address and will be mapped to every process running in the OS, since these DLLs cannot be randomized everytime which would not be effective. These addresses may change for different machines or in the same machine after a reboot.

ProcMonAddressLayout

In Windows, dynamic link libraries (DLLs) like kernel32.dll and user32.dll are known as system DLLs or system libraries. These DLLs contain essential functions and routines that provide core functionality to user-mode applications and services. Because these DLLs are fundamental to the Windows operating system, they are loaded into the same memory address in different processes. This behavior is by design and has several reasons:

Shared Functionality
Memory Conservation
Code Reusability
Performance Optimization

Now lets write a simple C++ code to enumerate the memory addresses of DLLs and the function addresses from the DLLs which we require.

#include <stdio.h>
#include <Windows.h>

int main() {
	HMODULE kernelDll = LoadLibraryA("kernel32.dll");
	HMODULE userDll = LoadLibraryA("user32.dll");
	printf("[+] Address of kernel32.dll - 0x%p\n", kernelDll);
	printf("[+] Address of user32.dll - 0x%p\n", userDll);
	printf("[+] Address of WinExec - 0x%p\n", GetProcAddress(kernelDll, "WinExec"));
	printf("[+] Address of ExitProcess - 0x%p\n", GetProcAddress(kernelDll, "ExitProcess"));
	printf("[+] Address of GetDesktopWindow - 0x%p\n", GetProcAddress(userDll, "GetDesktopWindow"));
	printf("[+] Address of MessageBoxA - 0x%p\n", GetProcAddress(userDll, "MessageBoxA"));
}

We should get the memory addresses of the components while executing the above code.

AddressEnumeration

It can be observed that the kernel32.dll has the same address as before. Same like that the function inside these DLLs will have the same address until the system reboots, because the base address is same.

[+] Address of kernel32.dll - 0x00007FFA4E070000
[+] Address of user32.dll - 0x00007FFA4EE20000
[+] Address of WinExec - 0x00007FFA4E0D7780
[+] Address of ExitProcess - 0x00007FFA4E08E820
[+] Address of GetDesktopWindow - 0x00007FFA4EE2AEB0
[+] Address of MessageBoxA - 0x00007FFA4EE990D0

Now we have the necessary addresses for the DLLs and functions which we are going to use for our shellcodes.

Our First Shellcode

Now we have found the static addresses for the required components. Lets craft our first shellcode.

#include <Windows.h>

int main() {
	ExitProcess(0);
	return -1;
}

By running this program, we will exit the flow with exit code 0.

ExitCode0

The documentation for ExitProcess WIN API describes that this function requires only one argument, uExitCode which is the exit code for the process and all the threads in it.

void ExitProcess(
  [in] UINT uExitCode
);

Lets create a asm file performing the same operation in it. Since ExitProcess comes directly from kernel32.dll we do need to load other DLLs in assembly (will be covered in next post).

section .text
    global main

main:
    xor rcx, rcx ; UINT uExitCode = 0
    mov rbx, 0x7FFA4E08E820 ; Address of ExitProcess()
    call rbx ; Call ExitProcess(0)

Lets create the object file for this code using nasm for Windows x64 format.

ubuntu-wsl@ra:/mnt/e/Shellcoding$ nasm -f win64 ExitProcess.asm -o ExitProcess.o
ubuntu-wsl@ra:/mnt/e/Shellcoding$ file ExitProcess.o
ExitProcess.o: Intel amd64 COFF object file, not stripped, 1 section, symbol offset=0x4b, 6 symbols, created Fri Aug 18 15:51:46 2023, 1st section name ".text"

Now lets link the object file with necessary dependencies using link.exe from Visual Studio.

Microsoft Windows [Version 10.0.19045.3324]
(c) Microsoft Corporation. All rights reserved.

E:\Shellcoding>link /ENTRY:main /MACHINE:X64 /NODEFAULTLIB /SUBSYSTEM:CONSOLE ExitProcess.o
Microsoft (R) Incremental Linker Version 14.00.24247.2
Copyright (C) Microsoft Corporation.  All rights reserved.


E:\Shellcoding>

Lets view the assembly content using objdump.

ubuntu-wsl@ra:/mnt/e/shellcoding$ file ExitProcess.exe
ExitProcess.exe: PE32+ executable (console) x86-64, for MS Windows
ubuntu-wsl@ra:/mnt/e/shellcoding$ objdump -d ExitProcess.exe >dump
ubuntu-wsl@ra:/mnt/e/shellcoding$ cat dump

ExitProcess.exe:     file format pei-x86-64


Disassembly of section .text:

0000000140001000 <.text>:
   140001000:   48 31 c9                xor    %rcx,%rcx
   140001003:   48 bb 20 e8 08 4e fa    movabs $0x7ffa4e08e820,%rbx
   14000100a:   7f 00 00
   14000100d:   ff d3                   call   *%rbx

We can see some null bytes \x00, which is caused by the address value 0x00007FFA4E08E820 stored in little endian format. This can be easily removed by a byte shift operation.

section .text
    global main

main:
    xor rcx, rcx ; UINT uExitCode = 0
    mov rbx, 0x7FFA4E08E8204141 ; Address of ExitProcess() + AA (For null byte padding)
    shr rbx, 16 ; Byte shift on right for 2 bytes  
    call rbx ; Call ExitProcess(0)

Now lets view the objdump data for the modified code.

ubuntu-wsl@ra:/mnt/e/shellcoding$ nasm -f win64 ExitProcess.asm -o ExitProcess.o
ubuntu-wsl@ra:/mnt/e/shellcoding$ file ExitProcess.exe
ExitProcess.exe: PE32+ executable (console) x86-64, for MS Windows
ubuntu-wsl@ra:/mnt/e/shellcoding$ objdump -d ExitProcess.exe >dump
ubuntu-wsl@ra:/mnt/e/shellcoding$ cat dump

ExitProcess.exe:     file format pei-x86-64


Disassembly of section .text:

0000000140001000 <.text>:
   140001000:   48 31 c9                xor    %rcx,%rcx
   140001003:   48 bb 41 41 20 e8 08    movabs $0x7ffa4e08e8204141,%rbx
   14000100a:   4e fa 7f
   14000100d:   48 c1 eb 10             shr    $0x10,%rbx
   140001011:   ff d3                   call   *%rbx

Now there is no null bytes in our dump, lets extract the shellcode from the dump. I recommend using this code to parse the shellcode from the objdump to save time.

ubuntu-wsl@ra:/mnt/e/shellcoding$ cat dump

ExitProcess.exe:     file format pei-x86-64


Disassembly of section .text:

0000000140001000 <.text>:
   140001000:   48 31 c9                xor    %rcx,%rcx
   140001003:   48 bb 41 41 20 e8 08    movabs $0x7ffa4e08e8204141,%rbx
   14000100a:   4e fa 7f
   14000100d:   48 c1 eb 10             shr    $0x10,%rbx
   140001011:   ff d3                   call   *%rbx
ubuntu-wsl@ra:/mnt/e/shellcoding$ ./extract dump

Odfhex - object dump shellcode extractor - by steve hanna - v.01
Trying to extract the hex of dump which is 352 bytes long
"\x48\x31\xc9\x48\xbb\x41\x41\x20\xe8\x08\x4e\xfa\x7f\x48\xc1\xeb\x10"\
"\xff\xd3";

19 bytes extracted.

Now we have successfully generated a shellcode of 19 bytes to perform the ExitProcess(0) operation. Lets test it with a shellcode wrapper.

Running Our First Shellcode

To test our shellcode we will be needing the DLLs loaded into the process in order to call the address. The exe which we have generated doesn't have any imports, so it will not work perfectly as intended.

E:\shellcoding>dumpbin /IMPORTS ExitProcess.exe
Microsoft (R) COFF/PE Dumper Version 14.00.24247.2
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file ExitProcess.exe

File Type: EXECUTABLE IMAGE

  Summary

        1000 .rdata
        1000 .text

That's why we will be extracting the hex dump as shellcode and we will be loading it in a windows process which has the required DLLs loaded in it. In this case we require the kernel32.dll to kickstart the shellcode.

#include "Windows.h"

int main()
{
	unsigned char shellcode[] =
		"\x48\x31\xc9\x48\xbb\x41\x41\x20\xe8\x08\x4e\xfa\x7f\x48\xc1\xeb\x10"\
		"\xff\xd3";

	void* exec = VirtualAlloc(0, sizeof shellcode, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
	if (exec != 0) {
		memcpy(exec, shellcode, sizeof shellcode);
		((void(*)())exec)();
	}
	return -1;
}

Again there are two commonly used functions by a malware developer here. The program flow is described as below.

VirtualAlloc is a Windows API function used to allocate memory in a process's address space. The first argument (0) indicates that the function should determine the base address for the allocated memory. The second argument sizeof shellcode specifies the size of the memory block to allocate. The third argument MEM_COMMIT indicates that the memory should be committed (allocated) immediately. The fourth argument PAGE_EXECUTE_READWRITE sets the memory protection attributes. It allows the memory to be both readable and writable, as well as executable. This permission is important for executing the shellcode.
The memcpy function is used to copy the contents of the shellcode array into the allocated memory variable exec.The first argument exec is the destination memory address. shellcode is the source array containing the actual shellcode to be stored in the buffer space. sizeof shellcode determines the number of bytes to copy, which matches the size of the shellcode array.
The line ((void(*)())exec)() is a function call using a function pointer cast. It's used to execute the shellcode that has been copied into the allocated memory. void(*)() is a function pointer declaration. It points to a function with no parameters and no return value. (void(*)())exec casts the memory address exec to the function pointer type. The () at the end calls the function pointed to by the function pointer, effectively executing the shellcode.

Now if we run the program instead of returning exit code -1, our program should execute the shellcode for ExitProcess(0) and returns with exit code 0.

ShellCodeExecution

That's it we have created our first static shellcode for Windows x64 bit environment. But remember since we have used the static addresses within the shellcode, it would not be reusable in another environment or after reboot in the same environment.