Download
Understanding VS C++ Assembly Code.pdf
Adobe Acrobat Document 318.9 KB

Understanding x86 Assembly Code

To view the assembly code generated for your program in Visual Studio you need to set a break point at the location you want to examine, then run the application in debug-mode. When it stops at your break point, press Ctrl-F11.

Below is an example of what you see when you call function foo(). One breakpoint was set just before the function call and another at the start of the function implementation. The assembly code window shows you the source code and then the assembly code it generated:

  int varint = 5;

0097BAA9  mov         dword ptr [varint],5 

  int test = foo(varint,3);

0097BAB0  push        3 

0097BAB2  lea         eax,[varint] 

0097BAB5  push        eax 

0097BAB6  call        foo (096648Dh) 

0097BABB  add         esp,8 

0097BABE  mov         dword ptr [test],eax

 

int foo(int & a, int b) {

00F0A940  push        ebp 

00F0A941  mov         ebp,esp 

00F0A943  sub         esp,44h 

00F0A946  push        ebx 

00F0A947  push        esi 

00F0A948  push        edi 

  int orig = a;

00F0A949  mov         eax,dword ptr [a] 

00F0A94C  mov         ecx,dword ptr [eax] 

00F0A94E  mov         dword ptr [orig],ecx 

  a = a + 1;

00F0A951  mov         eax,dword ptr [a] 

00F0A954  mov         ecx,dword ptr [eax] 

00F0A956  add         ecx,1 

00F0A959  mov         edx,dword ptr [a] 

00F0A95C  mov         dword ptr [edx],ecx 

  return orig * b;

00F0A95E  mov         eax,dword ptr [orig] 

00F0A961  imul        eax,dword ptr [b] 

}

00F0A965  pop         edi 

00F0A966  pop         esi 

00F0A967  pop         ebx 

00F0A968  mov         esp,ebp 

00F0A96A  pop         ebp 

00F0A96B  ret 

Instructions are shown in AT&T syntax:  mnemonic source, destination.

Source and destination are operands and can be immediate values, registers, memory addresses, or labels. Immediate values are constants, and on some systems will be prefixed by a $. For instance, $0x5 represents the number 5 in hexadecimal. Register names may be prefixed by a %.

Memory addresses are referenced by name. For any name, name is the address of the memory and [name] is the contents of the memory.

Values have a specified size. A byte is 8-bits, a word is 16 bits, a dword is 32 bits. In assembler these sizes are specified by byte ptr, word ptr and dword ptr.

Registers are data storage locations within the CPU. The width of a CPU’s registers is defined by its architecture. So if you have a 64-bit CPU, your registers will be 64 bits wide.

The x86 family has 8 registers whose prefix depends on the architecture as shown below: 

Note: the * registers must be restored on return from a function call. See below.

In addition to these registers there will be instruction address registers and status registers.

For backwards compatibility, any register can be accessed as a narrower type by using a different register prefix.

The registers of particular interest are:

~ax – The return value from a function is placed here.

~bp - Base Pointer for the current function from which arguments and local variables are offset.

~sp - Stack Pointer – on entry to a function, the base-address for the function.

On return from a function call ~bp must be restored to its previous state so that the base-address is not lost for the calling context. In addition, the C/C++ compiler expects any function to return with ~bx, ~si and ~di in the same state that it got them. It can use them internally, but must restore their values before returning. These registers, marked with * in the diagram above, are known as “untouchables”.

64-bit Intel processors add another 8 64-bit registers, R8-R15, sixteen 128-bit registers, XMM0-XMM15 and eight 80-bit floating-point registers FPR0-FPR7

A Function Call

 

When a function call is encountered, the assembler inserts instructions to push the arguments onto the stack (in reverse order) and then the return address for the calling instruction. The stack pointer now points at the next available stack location. These operations are wrapped in the assembler call instruction. It is then the function’s responsibility to save the current base-pointer (ebp) and set the base pointer to the current stack pointer, as the base for this function. It must also save the other untouchable registers (edi,esi,ebx) by pushing these onto the stack. But before pushing the untouchable registers it will decrement the stack pointer to leave room for the local variables. All these will be accessed with offsets to the base-pointer.

Understanding the code

  int varint = 5;

0097B AA9  mov         dword ptr [varint],5 

mov assigns the value 5, as a dword, to the address pointed to by the name varint.

0097BAB0  push        3 

0097BAB2  lea         eax,[varint] 

0097BAB5  push        eax 

0097BAB6  call        foo (096648Dh) 

The arguments for the function call are pushed onto the stack in reverse order. The first argument is a reference, so the destination address is first obtained by lea (Load Effective Address) which is placed in eax. That address is then pushed onto the stack. Finally the function is called.

int foo(int & a, int b) {

00F0A940  push        ebp 

00F0A941  mov         ebp,esp 

00F0A943  sub         esp,44h 

00F0A946  push        ebx 

00F0A947  push        esi 

00F0A948  push        edi 

Upon entering a function, esp points to the base address for our function.  The arguments and local variables are at known offsets from this pointer. It becomes the base-pointer inside our function. Before we copy the esp into esb we must preserve the old esb on the stack for later reinstating.

Upon entering a function, the value of the “untouchable” registers must be stored. The push instructions save these registers to the stack.

push decrements the stack pointer, esp, and writes its operand to the stack.

mov copies the second argument into the first, here the stack pointer (esp) is copied to the base pointer register. ebp now points to the base of the function’s stack frame.

sub (subtract) adjusts the stack pointer to make room for pushing the other “untouchable” registers and local variables onto the stack.

    int orig = a;

00F0A949  mov         eax,dword ptr [a] 

00F0A94C  mov         ecx,dword ptr [eax] 

00F0A94E  mov         dword ptr [orig],ecx 

The argument is now moved to register eax, specifying its size as dword ptr.

dword ptr [a]  says “’a’ is a dword pointer; give me the value it points to”

But because the argument was passed by reference, we need to follow the argument to its address and get the value there. The value is placed in ecx.

Finally, the value is copied onto the stack at the location reserved for orig.

  a = a + 1;

00F0A951  mov         eax,dword ptr [a] 

00F0A954  mov         ecx,dword ptr [eax] 

00F0A956  add         ecx,1 

00F0A959  mov         edx,dword ptr [a] 

00F0A95C  mov         dword ptr [edx],ecx 

Now the argument is dereferenced again and placed in ecx, then 1 is added to the register.

00F0A959  mov         edx,dword ptr [a] 

00F0A95C  mov         dword ptr [edx],ecx 

Here the result is written back to the argument address because the argument was a referece.  edx is used to hold the address in the first mov. The second mov writes the result from ecx to the address held in edx.

  return orig * b;

00F0A95E  mov         eax,dword ptr [orig] 

00F0A961  imul        eax,dword ptr [b] 

}

At the return point the local variable on the stack at orig is moved into eax. It is then integer-multiplied (imul ) by argument b. The caller of the function recovers the return result from register eax.

Now the final clean-up (epilogue):

00F0A965  pop         edi 

00F0A966  pop         esi 

00F0A967  pop         ebx 

00F0A968  mov         esp,ebp 

00F0A96A  pop         ebp 

00F0A96B  ret 

 

The untouchable registers are restored from the stack and the local variables discarded by moving the stack pointer back to the base with mov   esp,ebp