Post

Of Pointers and Men (2)

This post is an automatic translation from French. You can read the original version here.

And here we go again for a new chapter in our exploration of pointers! Now that you understand what a pointer actually is, I hope all of this scares you much less on the theoretical side… Yes, all that fuss for just that!

Yet the pointer, despite its simplicity, is a fundamental concept of C. It is truly one of the aspects of the language that make it, in my humble opinion, so close to the machine and so fascinating! So I suggest we continue our journey by looking at how they are used in a small function example.

A seemingly simple function…

Let’s imagine we write a small program that calls an integer display function, which we will call dis_moi_des_mots_doux:

main.c

#include <stdio.h>

void dis_moi_des_mots_doux( int ) ;

int
main()
{
    int A ;
    A = 69 ;
    dis_moi_des_mots_doux( A ) ;
    return 0 ;
}

void
dis_moi_des_mots_doux( int n )
{
   printf("Oh ... un %d ! Comme c'est gentil !!!\n", n ) ;
}

We can verify that our function works perfectly:

$gcc -o thefunk main.c
$./thefunk
Oh ... un 69 ! Comme c'est gentil !!!

The value 69 is perfectly passed to the dis_moi_des_mots_doux function, which displays it. So far so good!

Emboldened by this success, we decide to add the remise_a_zero function which, as you may have guessed, should reset the passed variable to zero. Most beginners will write a function that looks like this:

main_v2.c

#include <stdio.h>

void dis_moi_des_mots_doux( int ) ;
void remise_a_zero(int) ;

int
main()
{
    int A ;
    A = 69 ;
    dis_moi_des_mots_doux( A ) ;    // J'affiche A
    remise_a_zero( A ) ;            // Je remets A à zéro
    dis_moi_des_mots_doux( A ) ;    // Je réaffiche A
    return 0 ;
}

void
dis_moi_des_mots_doux( int n )
{
   printf("Oh ... un %d ! Comme c'est gentil !!!\n", n ) ;
}

void
remise_a_zero( int k )
{
    k = 0 ;
}

Unfortunately, when we test this function, our disappointment is immense:

$gcc -o thefunk_v2 main_v2.c
$./thefunk_v2
Oh ... un 69 ! Comme c'est gentil !!!
Oh ... un 69 ! Comme c'est gentil !!!

A has not changed value.

And this is completely normal!!! Because A itself is never transmitted when the function is called. It is a copy of its value that is sent to the function.

To explain what is happening, I offer you two ways to look at it: graphically, and in assembly. You will see that it is very intuitive.

When you call a function, the following operations are performed, in order:

  1. The expression in parentheses is evaluated (i.e., we find its numerical value)
  2. We jump to the function’s location
  3. A local variable is created (here n in the dis_moi_des_mots_doux function)
  4. The variable is initialized with our numerical value.

Graphically, for the first function call, the execution looks like this:

capture

The problem, as you might suspect, is with the call to the remise_a_zero function, which tries to modify A.

Here are the next steps of our program’s execution:

capture

Indeed, since only a copy of the value contained in A is passed to remise_a_zero, the variable A remains unchanged. It cannot work this way! (too hard for us!)

Let’s now look at “the real thing” and dive into assembly. If this step scares you, you can skip it without any problem: meet us at the next chapter, a bit further down!

Let’s compile, disassemble, and view our program’s code:

$gcc -g -o thefunk_v2 mainv2.c --static
$objdump -S ./thefunk_v2

( If you don’t have the necessary tools, you can get this assembly code here

[...]

int
main()
{
  40174d:       55                      push   %rbp
  40174e:       48 89 e5                mov    %rsp,%rbp
  401751:       48 83 ec 10             sub    $0x10,%rsp
    int A ;
    A = 69 ;
  401755:       c7 45 fc 45 00 00 00    movl   $0x45,-0x4(%rbp)
    dis_moi_des_mots_doux( A ) ;    // J'affiche A
  40175c:       8b 45 fc                mov    -0x4(%rbp),%eax
  40175f:       89 c7                   mov    %eax,%edi
  401761:       e8 1b 00 00 00          call   401781 <dis_moi_des_mots_doux>
    remise_a_zero( A ) ;            // Je remets A à zéro
  401766:       8b 45 fc                mov    -0x4(%rbp),%eax
  401769:       89 c7                   mov    %eax,%edi
  40176b:       e8 35 00 00 00          call   4017a5 <remise_a_zero>
    dis_moi_des_mots_doux( A ) ;    // Je réaffiche A
  401770:       8b 45 fc                mov    -0x4(%rbp),%eax
  401773:       89 c7                   mov    %eax,%edi
  401775:       e8 07 00 00 00          call   401781 <dis_moi_des_mots_doux>
    return 0 ;
  40177a:       b8 00 00 00 00          mov    $0x0,%eax
}
  40177f:       c9                      leave
  401780:       c3                      ret

0000000000401781 <dis_moi_des_mots_doux>:

void
dis_moi_des_mots_doux( int n )
{
  401781:       55                      push   %rbp
  401782:       48 89 e5                mov    %rsp,%rbp
  401785:       48 83 ec 10             sub    $0x10,%rsp
  401789:       89 7d fc                mov    %edi,-0x4(%rbp)
   printf("Oh ... un %d ! Comme c'est gentil !!!\n", n ) ;
  40178c:       8b 45 fc                mov    -0x4(%rbp),%eax
  40178f:       89 c6                   mov    %eax,%esi
  401791:       48 8d 3d 70 f8 07 00    lea    0x7f870(%rip),%rdi        # 481008 <_IO_stdin_used+0x8>
  401798:       b8 00 00 00 00          mov    $0x0,%eax
  40179d:       e8 5e 84 00 00          call   409c00 <_IO_printf>
}
  4017a2:       90                      nop
  4017a3:       c9                      leave
  4017a4:       c3                      ret

00000000004017a5 <remise_a_zero>:

void
remise_a_zero( int k )
{
  4017a5:       55                      push   %rbp
  4017a6:       48 89 e5                mov    %rsp,%rbp
  4017a9:       89 7d fc                mov    %edi,-0x4(%rbp)
    k = 0 ;
  4017ac:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
}
  4017b3:       90                      nop
  4017b4:       5d                      pop    %rbp
  4017b5:       c3                      ret

[...]

Since A is a local variable, it is created on the stack. This is done by growing said stack with the following instruction:

401751:       48 83 ec 10             sub    $0x10,%rsp

The stack is increased by 16 bytes by changing the value of %rsp (the stack pointer). This is more than enough to store an int (4 bytes).

The value 69 (0x45 in hexadecimal) is then stored in A, which is at address rbp-4 (thus on the stack, and more specifically in main()’s frame):

401755:       c7 45 fc 45 00 00 00    movl   $0x45,-0x4(%rbp)

Let’s now see how the call to the remise_a_zero function is made from the main function:

  401766:       8b 45 fc                mov    -0x4(%rbp),%eax
  401769:       89 c7                   mov    %eax,%edi
  40176b:       e8 35 00 00 00          call   4017a5 <remise_a_zero>

A’s value is first loaded into the %eax register from the stack. It is then copied into the %edi register and execution “jumps” to address 4017a5, where the remise_a_zero function is located.

This passing through %eax then %edi may seem surprising: why not put A’s value directly into %edi? Well, simply because the expression in parentheses is first evaluated, and only then is the call made.

Thus, you could have things like:

remise_a_zero( 2*A+3 ) ;

In this case, the expression (2*A+3) would first need to be evaluated, then remise_a_zero would be called. The compiler therefore generated two-step code:

  • Evaluation of the expression with %eax (here it is trivial)
  • Calling the function (the standard tells us the first argument must be placed in %edi)

In any case, neither A nor its memory location are transmitted to the function: only a copy of its value!

Pointers and functions

Things are a bit clearer now: the remise_a_zero function does not work because no variable is “transmitted” – only a copy of its value.

This is where pointers come into play: instead of transmitting A’s value, we will transmit its address, so that the called function knows where to write. And addresses, we manipulate them with this new type of variable that we saw last time: the pointer!

Let’s go ahead and modify main() to transmit A’s address to our function. We will simply use the & operator that we saw last time:

int
main()
{
    int A ;
    A = 69 ;
    dis_moi_des_mots_doux( A ) ;
    remise_a_zero( &A ) ;            // <= The magic is here!!!
    dis_moi_des_mots_doux( A ) ;
    return 0 ;
}

Since we are no longer passing an integer but an address of an integer, we need to adapt remise_a_zero:

void
remise_a_zero( int* k)
{
    *k = 0 ;                        // Don't forget the * operator here!!!
}

The local variable k is now a pointer, which will therefore contain an address.

To write a zero at this address, we use the * operator as seen last time. Once again, this means:

  • I read k
  • k contains an address
  • I write 0 at that address

The complete code looks like this:

main_v3.c

#include <stdio.h>

void dis_moi_des_mots_doux( int ) ;
void remise_a_zero(int*) ;

int
main()
{
    int A ;
    A = 69 ;
    dis_moi_des_mots_doux( A ) ;
    remise_a_zero( &A ) ;
    dis_moi_des_mots_doux( A ) ;
    return 0 ;
}

void
dis_moi_des_mots_doux( int n )
{
   printf("Oh ... un %d ! Comme c'est gentil !!!\n", n ) ;
}

void
remise_a_zero( int* k )
{
    *k = 0 ;
}

What? You want another little drawing? Really?

But come ooooon… It takes forever to makeeeee!!!!!!!!!!!!!!

Fine, since I like you, here it is:

capture

And it should work! Let’s test it right away:

$gcc -o thefunk_v3 main_v3.c
$./thefunk_v3
Oh ... un 69 ! Comme c'est gentil !!!
Oh ... un 0 ! Comme c'est gentil !!!

I think we can call this a victory!

Once again, let’s look at the assembly code. Those who want to can once again skip to the next chapter :)

[...]

int
main()
{
  40174d:       55                      push   %rbp
  40174e:       48 89 e5                mov    %rsp,%rbp
  401751:       48 83 ec 10             sub    $0x10,%rsp
  401755:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
  40175c:       00 00
  40175e:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  401762:       31 c0                   xor    %eax,%eax
    int A ;
    A = 69 ;
  401764:       c7 45 f4 45 00 00 00    movl   $0x45,-0xc(%rbp)
    dis_moi_des_mots_doux( A ) ;
  40176b:       8b 45 f4                mov    -0xc(%rbp),%eax
  40176e:       89 c7                   mov    %eax,%edi
  401770:       e8 31 00 00 00          call   4017a6 <dis_moi_des_mots_doux>
    remise_a_zero( &A ) ;
  401775:       48 8d 45 f4             lea    -0xc(%rbp),%rax
  401779:       48 89 c7                mov    %rax,%rdi
  40177c:       e8 49 00 00 00          call   4017ca <remise_a_zero>
    dis_moi_des_mots_doux( A ) ;
  401781:       8b 45 f4                mov    -0xc(%rbp),%eax
  401784:       89 c7                   mov    %eax,%edi
  401786:       e8 1b 00 00 00          call   4017a6 <dis_moi_des_mots_doux>
    return 0 ;
  40178b:       b8 00 00 00 00          mov    $0x0,%eax
}
  401790:       48 8b 55 f8             mov    -0x8(%rbp),%rdx
  401794:       64 48 2b 14 25 28 00    sub    %fs:0x28,%rdx
  40179b:       00 00
  40179d:       74 05                   je     4017a4 <main+0x57>
  40179f:       e8 dc 62 04 00          call   447a80 <__stack_chk_fail>
  4017a4:       c9                      leave
  4017a5:       c3                      ret

[...]

void
remise_a_zero( int* k )
{
  4017ca:       55                      push   %rbp
  4017cb:       48 89 e5                mov    %rsp,%rbp
  4017ce:       48 89 7d f8             mov    %rdi,-0x8(%rbp)
    *k = 0 ;
  4017d2:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4017d6:       c7 00 00 00 00 00       movl   $0x0,(%rax)
}
  4017dc:       90                      nop
  4017dd:       5d                      pop    %rbp
  4017de:       c3                      ret
  4017df:       90                      nop

[...]

While the creation of A has not changed much (A is still on the stack), the stack now also contains a canary. This is not today’s topic, so we will disregard it.

However, the assignment of the value 69 to A is still performed in the same way:

401764:       c7 45 f4 45 00 00 00    movl   $0x45,-0xc(%rbp)

The call to the remise_a_zero function, however, has changed significantly:

  remise_a_zero( &A ) ;
401775:       48 8d 45 f4             lea    -0xc(%rbp),%rax
401779:       48 89 c7                mov    %rax,%rdi
40177c:       e8 49 00 00 00          call   4017ca <remise_a_zero>

We no longer use the mov instruction, but the lea instruction. And this one stores A’s address (%rbp-0xc) in %rax.

As you can see, no more trickery: everything happens exactly as expected!

Final words

This chapter is already quite long, so we will postpone some of the points I wanted to cover here to a future installment. To conclude, I would just like to highlight how clear C can be when it comes to functions.

If you see a function declared like this:

int fonction_mystere ( int A, double* B, int *C ) ;

You know that after a call to this function, the first parameter passed could not have been modified by it, while the next two could! This allows you to “compartmentalize” your variables, and maintain some control even when using functions from obscure libraries.

On the other hand, you will need to be very careful not to use invalid addresses – it is very easy to shoot yourself in the foot! Typedefs, in particular, can hide pointers from your keen eye.

See you soon, and don’t hesitate to send me your questions!

Rancune.

This post is licensed under CC BY 4.0 by the author.