Reverse Engineering Guide on x86 Assembly: Part 1 Intro To Registers
Learning
x86 assembly is critical when you’re analysing malware, deconstructing
executable files and developing your own exploits. However, before you’re able
to embark on this journey, it’s crucial you’re familiar with C and compilation.
What are registers?
A register is a storage space in the CPU
that’s faster to access then RAM. All x86 CPUs have 8 general-purpose registers
in total. They are generally 32-bits wide, however 16-bit versions are also
accessible. Some registers have reserved purposes for the CPU and others don’t
and are referred to as ‘general purpose’ registers.
Introduction to the 8 Registers
Here are the 8 registers with their register names (the acronym) and their meaning:
Here are the 8 registers with their register names (the acronym) and their meaning:
- EAX (extended accumulator register used for major calculations)
- EBX (extended base register used for storing data)
- ECX (extended counter register used as the universal loop counter)
- EDX (extended data register used for storing data related to the accumulator’s current calculation)
- ESI (extended source index used to hold the location of the input data stream)
- EDI (extended destination index stores the result of every loop)
- EBP (extended base pointer)
- ESP (extended stack pointer)
Register Access
Some of
these registers can be accessible in subsections of 16-bits or 8-bits, rather
than the whole register dependent on the program and the requirements. For
example if we analysed EAX and how it can be accessed – it can be broken
down into these subsections:
- AX (16 least significant bits of EAX)
- AH (8 most significant bits of AX)
- AL (8 least significant bits of AX)
Stack
Frames: ESP and EBP Registers
There are four main segments that
belong on all programs, the .text, .data,
.stack
and .heap. The code of the program will be stored in .text, the global data is
stored in .data, .stack stores local variables and functional arguments and
finally, the .heap stores extendable memory (malloc, calloc calls in C).
You
should be familiar with how the stack works – LIFO (last-in-first-out). Let’s
refresh this quickly: whenever something is added to the stack, it’s added on
top of the most recent addition. The ‘adding’ of something is called ‘push’ –
akin to pushing something on top of
the stack. Think about this like a stack of paper on the table. When something
is removed, it’s removed in the order of the last piece of paper that was
placed down. This process is called popping.
Therefore, the stack grows backwards — from the highest memory address to
the lowest.
The two registers: ESP and EBP work closely with the stack. ESP points to the top of the stack and every time the stack grows, the address of ESP will be updated. This ‘updating’ is done by decrementing the ESP (as the stack grows backwards from high to low). EBP points to the base of the attack – also known as the beginning of the stack.
FLAGS Register
The two registers: ESP and EBP work closely with the stack. ESP points to the top of the stack and every time the stack grows, the address of ESP will be updated. This ‘updating’ is done by decrementing the ESP (as the stack grows backwards from high to low). EBP points to the base of the attack – also known as the beginning of the stack.
FLAGS Register
The FLAGS register holds tiny bits of values used to represent the current status of the processor. These bits are either (1) or (0). The FLAGS register is 16 bits wide, EFLAGS is 32 bits and RFLAGS is 64 bits wide. Here are some of the more common flags:
- ZF – zero flag set when the last operation is zero
- CF – carry flag that’s set when the last operation changes the most significant bit
- SF – signed flag, used to determine if values should be signed or unsigned
- OF – overflow flag used when the last operation switches the most significant bit
- PF – parity flag used to indicate if the number of set bits are odd or even
- DF – direction flag used to determine the direction (forwards or backwards) of bytes being copied
AT&T
vs Intel syntax
Dependent on the program you’re using – i.e. radare2, IDA Pro, gdb – often two different types of syntax will appear for the executable layout. They are:
Dependent on the program you’re using – i.e. radare2, IDA Pro, gdb – often two different types of syntax will appear for the executable layout. They are:
- Intel: mov eax, 1
- AT&T: mov $1, %eax
Both
of these are communicating the same thing, however we will only be focusing on
Intel syntax in this guide.
Refresher: Bits, Bytes & Dwords
Bits,
bytes and dwords are all data types. Bits are generally 0 or 1. There are 8
bits in a byte and can hold a value between 0 and 255. A word consists of two
bytes or 16-bits and can hold a value up to and including 65525. A dword is
made up of two words, four bytes or 32-bits.
Good one.
ReplyDeleteThanks!
DeleteHey I am so happy I found your web site, I really found you by error, while I was looking on Yahoo for something else, Anyhow I am here now and would just like to say many thanks for a marvelous post and a all round interesting blog (I also love the theme/design), I don’t have time to read through it all at the moment but I have book-marked it and also added in your RSS feeds, so when I have time I will be back to read a lot more, Please do keep up the excellent job. return America to greatness
ReplyDeleteNice blog!!!!!!!.
ReplyDeleteReverseEngineering
Nice blog!!!!!!!.
ReplyDeleteReverseEngineering