Reverse Engineering

Cards (24)

  • CPU Architecture
    • Control Unit: The Control Unit gets instructions from the main memory, depicted here outside the CPU. The address to the next instruction to execute is stored in a register called the Instruction Pointer or IP. In 32-bit systems, this register is called EIP, and in 64-bit systems, it is called RIP.
    • Arithmetic Logic Unit: The arithmetic logic unit executes the instruction fetched from the Memory. The results of the executed instruction are then stored in either the Registers or the Memory.
  • CPU Architecture
    • Registers: The Registers are the CPU's storage. Registers are generally much smaller than the Main Memory, which is outside the CPU, and help save time in executing instructions by placing important data in direct access to the CPU.
    • Memory: The Memory, also called Main Memory or Random Access Memory (RAM), contains all the code and data for a program to run. When a user executes a program, its code and data are loaded into the Memory, from where the CPU accesses it one instruction at a time.
  • CPU Architecture
    I/O devices:
    • I/O devices or Input/Output devices are all other devices that interact with a computer. These devices include Keyboards, Mice, Displays, Printers, Mass storage devices like Hard Disks and USBs, etc.
    • In short, when a program has to be executed, it is loaded into the memory. From there, the Control Unit fetches one instruction at a time using the Instruction Pointer Register, and the Arithmetic Logic Unit executes it. The results are stored in either the Registers or the Memory.
  • Registers
    Registers are the CPU's storage medium. The CPU can access data from the registers quicker than any other storage medium; however, its limited size means it has to be used effectively. For this purpose, the registers are divided into the following different types:
    • Instruction Pointer
    • General Purpose Registers
    • Status Flag Registers
    • Segment Registers
  • The Instruction Pointer:
    • The Instruction Pointer is a register that contains the address of the next instruction to be executed by the CPU. It is also called the Program Counter. It was originally a 16-bit register in the Intel 8086 processor (from where the term x86 originated) and was abbreviated as IP.
    • In 32-bit processors, the Instruction Pointer became a 32-bit register called the EIP or the Extended Instruction Pointer. In 64-bit systems, this register became a 64-bit register called RIP (the R here stands for register).
  • x86
    • In x86 systems, general-purpose registers are 32-bit, while in 64-bit systems, they extend to 64-bit. These registers are used for various operations during CPU instruction execution.
  • General Purpose Registers
    • EAX or RAX: Known as the Accumulator Register, it stores results of arithmetic operations. In 32-bit systems, it is referred to as EAX, while in 64-bit systems, it's RAX. The lower 16 bits can be accessed using AX, and it can be further divided into 8-bit AL and AH.
    • EBX or RBX: Called the Base Register, it is used to store base addresses for referencing offsets. Similar to EAX/RAX, it can be addressed as RBX (64-bit), EBX (32-bit), BX (16-bit), and BL/BH (8-bit).
  • General Purpose Registers
    • ECX or RCX: The Counter Register, used for counting operations such as loops. It can be accessed as RCX (64-bit), ECX (32-bit), CX (16-bit), and CL/CH (8-bit).
    • EDX or RDX: The Data Register, often used in multiplication and division operations. It can be accessed as RDX (64-bit), EDX (32-bit), DX (16-bit), and DL/DH (8-bit).
    • ESP or RSP: The Stack Pointer register, which points to the top of the stack. It is 32-bit in 32-bit systems (ESP) and 64-bit in 64-bit systems (RSP). It cannot be divided into smaller registers.
  • General Purpose Registers
    • EBP or RBP: The Base Pointer, used to access parameters passed by the stack and works with the Stack Segment register. It is 32-bit in 32-bit systems (EBP) and 64-bit in 64-bit systems (RBP).
    • ESI or RSI: Known as the Source Index register, it is used for string operations and works with the Data Segment register as an offset. It is 32-bit in 32-bit systems (ESI) and 64-bit in 64-bit systems (RSI).
  • General Purpose Registers
    • EDI or RDI: The Destination Index register, also used for string operations and works with the Extra Segment register as an offset. It is 32-bit in 32-bit systems (EDI) and 64-bit in 64-bit systems (RDI).
    • R8-R15: These registers are unique to 64-bit systems and can be addressed in 32-bit, 16-bit, and 8-bit modes using suffixes D (for 32-bit), W (for 16-bit), and B (for 8-bit).
  • Status Flag Registers
    • The Status Flag Registers in a system provide indications about the status of instruction execution. In 32-bit systems, this register is called EFLAGS, while in 64-bit systems, it is extended to 64 bits and called RFLAGS. This register consists of individual single-bit flags that reflect specific conditions during execution.
  • Status Flag Registers: Flags
    • Zero Flag (ZF): This flag is set to 1 if the result of the last executed instruction was zero. For instance, if a subtraction operation results in zero, ZF will be set to 1.
    • Carry Flag (CF): This flag indicates when an operation results in a number too large or too small for the destination register. For example, adding 0xFFFFFFFF and 0x00000001 in a 32-bit register would set CF to 1 due to overflow.
  • Status Flag Registers: Flags
    • Sign Flag (SF): The Sign Flag is set to 1 if the result of an operation is negative or if the most significant bit is 1. Otherwise, it is set to 0.
    • Trap Flag (TF): This flag indicates if the processor is in debugging mode. When set, the CPU executes one instruction at a time for debugging purposes. Malware can use this flag to detect if it is being run in a debugger.
    These flags help in determining the outcome and the state of the CPU after executing instructions, playing a crucial role in program control flow and debugging.
  • Segment Registers:
    Segment Registers are 16-bit registers that convert the flat memory space into different segments for easier addressing. There are six segment registers, as explained below:
    • Code Segment: The Code Segment (CS ) register points to the Code section in the memory.
    • Data Segment: The Data Segment (DS) register points to the program's data section in the memory.
  • Segment Registers:
    • Stack Segment: The Stack Segment (SS) register points to the program's Stack in the memory.
    • Extra Segments (ES, FS, and GS): These extra segment registers point to different data sections. These and the DS register divide the program's memory into four distinct data sections. 
  • Memory
    • When a program is loaded into memory on a Windows Operating System, it sees an abstracted view of the memory, meaning the program only has access to its allocated memory rather than the entire system memory.
    • This abstracted view is more relevant for reverse-engineering malware, so the focus is on how memory appears to the program.
  • Memory: 4 Sections
    • Code: This section contains the program's code, specifically the instructions executed by the CPU. It corresponds to the text section of a Portable Executable (PE) file and has execute permissions, allowing the CPU to run the data in this section.
    • Data: The Data section holds initialized, constant data that doesn't change during the program's execution. It typically includes global variables and is associated with the data section in a PE file.
  • Memory: 4 Sections
    • Heap: Also known as dynamic memory, the heap stores variables and data that are created and destroyed during runtime. Memory is allocated for variables when they are created and freed when they are deleted, making this section dynamic
  • Memory 4 Sections
    • Stack: The stack is crucial for malware analysis, as it contains local variables, arguments passed to the program, and the return address of the calling process. Since the return address influences the control flow of the CPU's instructions, the stack is often targeted by malware for control flow hijacking, such as through buffer overflow attacks.
  • The Stack
    • Malware often exploits the stack to hijack the control flow of the program. Therefore it is important to understand the stack, its layout, and its working.
    • The stack is a Last In First Out (LIFO) memory. This means that the last element pushed onto the stack is the first one to be popped out.
    • For example, if we push A, B, and C onto the stack, when we pop out these elements, the first to pop out will be C, B, and then A. The CPU uses two registers to keep track of the stack. One is the Stack Pointer (the ESP or RSP), and the other is the Base Pointer (the EBP or RBP).
  • The Stack Pointer:
    • The Stack Pointer points to the top of the stack. When any new element is pushed on the stack, the location of the Stack Pointer changes to consider the new element that was pushed on the stack. Similarly, when an element is popped off the stack, the stack pointer adjusts itself to reflect that change. 
    The Base Pointer:
    • The Base Pointer for any program remains constant. This is the reference address where the current program stack tracks its local variables and arguments.
  • Stacks
    • Old Base Pointer and Return Address: The old Base Pointer from the calling program (the one that initiated the current function) is stored just below the current Base Pointer on the stack.
    • Below this lies the Return Address, which indicates where the Instruction Pointer should return once the current function finishes executing.
    • A technique known as Stack Buffer Overflow can hijack control flow by overflowing a local variable on the stack and overwriting the Return Address, redirecting the program to a malicious address.
  • Stacks
    • Arguments: Before a function begins executing, the arguments passed to it are pushed onto the stack. These arguments are stored just below the Return Address.
    • Function Prologue and Epilogue: The Function Prologue prepares the stack for the function's execution. It involves pushing the arguments, Return Address, and Old Base Pointer onto the stack, followed by adjusting the Base Pointer to the current top of the stack. As the function executes, the Stack Pointer shifts according to the function's needs.
  • Stacks
    • When the function completes, the Function Epilogue restores the stack to its previous state. The Old Base Pointer is popped back onto the Base Pointer, and the Return Address is moved to the Instruction Pointer. Finally, the Stack Pointer is adjusted to point to the top of the stack
    • These processes are crucial for managing function calls and returns in a program, and understanding them is essential for tasks like reverse-engineering and exploiting vulnerabilities, such as buffer overflows.