Today programming has become one of the required skill in most of the field whether it is medical field, business, education or any other. It is due to the meaning of programming has changed and now it is used to solve our day to day real life problems. The days are gone when we have to write a program in machine language to solve our problem.
Concept of high-level language has introduced some kind of abstraction in programming process because now you don’t have to take care of how your code is being get executed and handled by the machine at lower level. You just write your code to carry out a task in high-level language and it starts working. This abstraction is fruitful for many people who code, but sometimes people who write or analyze very complex codes need to go to lower level of the code to understand it better. So in this article we will see how we can analyze our own code and logic written in C language at lower level.
- Linux OS with gnu and gdb
- Some knowledge of assembly language and cpu registers
First let’s understand how our source code written in high-level language gets converted in machine language. This process is called compilation.
Compilation is carried out by another program called compiler which checks your code for any syntactical or structural error and if there is no error then an object code or you can say machine code is given as output by the compiler.
Before the final output(machine code) is given, our code has to pass through four phases as given in the image above which are
To keep this article short we only need to understand what is assembler. The assembly code given out by the compiler goes to the assembler which produces the object code. You can understand assembly language as a language which is just one level above machine code. Assembly language uses mnemonics to represent opcodes, which makes easy for us to read and understand instructions of our program at machine level.
Now we are going to make a simple program in C language and try to understand it in assembly language.
This is a simple snippet written in C which prints out a string “Hello World” ten times. Now first we compile it using gcc and then analyze it’s assembly code using gdb debugger.
To compile your program use command given below
gcc -g hello.c -o hello
-g flag will embed debugging information to give additional information while debugging the program
-o flag is used to specify the name of the output file(object code)
Now we will run our compiled program inside gdb and mark a breakpoint at main. Using run command we can execute our program which stops at breakpoint which is main.
In the image above the highlighted line is the instruction which is pointed by the RIP(Instruction Pointer). Also notice that I have changed the disassemby flavor to intel because gdb uses AT&T syntax by default. Before running each instruction line by line first let’s try to understand the code. Let me label each line with line number to make explanation better.
In the above code, line 5 at which RIP is pointing currently is for initializing our loop variable i as 0 which means variable i in our C program has its value stored at [rbp-0x4].
Instruction at line 6 is an unconditional jump to address 0x555555555156 which is line 10 in our code.
Now instruction at line 10 is a compare instruction which is comparing i variable’s value (at [rbp-0x4]) with 0x9. After execution of this instruction flag registers are updated which are used by our next immediate instruction at line 11 for conditional jump.
jle instruction means jump if less than equal to, that means if value at [rbp-0x4] is less than or equal to 0x9 then only jump to 0x555555555146(line 7) at which lea(Load effective address) instruction gets executed which sets our “Hello World” string as parameter for next instruction which is call puts procedure to print our string on the screen.
Then after printing our string the instruction at line 9 will add one or increments our i variable’s value. Then again comparison of i variable’s value is done based on which jump is performed. If comparison gives out result as i greater than 0x9 then our RIP register will move to line 12 which is nop(no operation) instruction and finally reaches to ret instruction otherwise if i’s value is less than or equal to 0x9 then all the instructions discussed above are again executed and prints our string “Hello World”. This is how our for loop is working.
Now using gdb command nexti or ni(next instruction) we can execute each instruction line by line and check registers and memory status after each instruction for which you need to learn about gdb more. Right now you do not need those commands, just run ni command and observe how loop is working.
You can also check [rbp-0x4] or i’s value after each increment using examine command inside gdb.
So this is how our for loop is working. Similarly we can also make different programs using different built-in C functions and writing new logic which are hard to crack at machine level.
Thank you for reading this article. Hope you liked this article and learned something from this.