Post

Of pointers and men (1)

This post is an automatic translation from French. You can read the original version here.

For most students, the pointer is one of the C concepts that scares them. And the worst part is that I can’t even blame them for it: if you take a quick look at the course materials floating around on the web, some of them are truly awful and the vast majority uses vocabulary that would make your head spin. Yet there is nothing simpler! So sit back, grab a coffee, and let’s see if I can do a bit better than the average Professor Bumble!

Everything is memory

Let’s start with a very simple program:

#include <stdio.h>

int main() {
    int A ;
    A = 42 ;
    printf(" A vaut %d \n", A ) ;
    return 0 ;
}

Nothing fancy here: we asked the computer to reserve a bit of memory for us, enough to store an int, and we told it that from now on we would refer to this integer by the name A. All of that, in a single instruction:

    int A ;

By the way, how many bytes did we reserve? Easy, just ask the machine using the sizeof() operator:

#include <stdio.h>

int main() {
    int A ;
    A = 42 ;
    printf("A vaut %d \n", A ) ;
    printf("A occupe %ld octets \n", sizeof(A) ) ;
    return 0 ;
}
$gcc -o pouet main.c
$./pouet
A vaut 42
A occupe 4 octets

So A occupies 4 bytes! If we remember that our RAM is just a loooooong ribbon of bytes, this simply means that by declaring the variable A, we decided to use 4 of our RAM’s bytes to store our integer.

A little drawing to make things clearer, here is our RAM:

ram

It is made up of one-byte cells that are numbered. To simplify the diagram, I have shown above a “small” 256-byte RAM. The numbers therefore go in hexadecimal from 0x00 to 0xff. In reality, your memory is much larger, and addresses are represented on 8 bytes (from 0 to 0xffffffffffffffff) on your nice 64-bit machine :)

Ultimately, declaring our variable A is just choosing 4 memory cells (4 bytes) to store the contents of A:

ram

The variable name, “A”, is only there for the poor humans that we are. For the machine, the integer we call A is “the integer stored in cells 0xA2 and beyond”. Or, if you prefer, “the integer stored at address 0xA2”.

The address of a piece of data, in the end, is just that: It is the memory cell number of the first byte of the data.

The & operator

I can see where you’re going: “Okay, that’s all well and good… But show me where A actually is!”. First of all, know that your lack of faith in me hurts me deeply. But since you ask, we can determine it using the “&” operator.

This operator allows determining the address of a piece of data in memory. A small example?

#include <stdio.h>

int main() {
    int A ;
    A = 42 ;
    printf("A vaut %d \n", A ) ;
    printf("A occupe %ld octets \n", sizeof(A) ) ;
    printf("A est situé à l'adresse mémoire %p \n", &A ) ;
    return 0 ;
}
$gcc -o pouet main.c
$./pouet
A vaut 42
A occupe 4 octets
A est situé à l'adresse mémoire 0x7ffeca7b6334

Our variable A has therefore been placed at memory address 0x7ffeca7b6334… Cool, right?

I know the hexadecimal notation may confuse you, but don’t pay too much attention to it. It is just a way for computer scientists to handle numbers in a somewhat more compact form. After all, whether I tell you that A is at 0x7ffeca7b6334 or at cell number 140732295504692, it is strictly the same thing! And I remind you that your computer only speaks binary in the end, anyway!

That said, for now, all of this is not very useful. I don’t quite see how to work it into a conversation over drinks, and for flirting, I’m sure you can find something better :) So let’s try to play around a bit with our addresses.

Manipulating addresses

What could we store an address in? Often, the first idea that comes to mind is “Well, it’s just an integer! Let’s put that in an int!”.

Yes, but no! Because it’s a big integer. Our int, on a PC architecture, is 32 bits, while our addresses are 64! A long int then? Hmm… but that’s not very portable!

So we decided to create a new type, a type “Variable that contains the address of an integer”. In C, that is written as follows:

int * addr ;

The variable addr is a perfectly ordinary variable. It’s just that its type is “int*”, meaning it contains the address of an int.

That’s what a pointer is, plain and simple! A variable that contains an address.

An example? Alright, since you ask so nicely:

#include <stdio.h>

int main() {
    int A ;
    int *P ;

    A = 42 ;
    P = &A ;

    printf("A vaut %d \n", A ) ;
    printf("A occupe %ld octets \n", sizeof(A) ) ;
    printf("A est situé à l'adresse mémoire %p \n\n", &A ) ;
    printf("P vaut %p \n", P ) ;
    printf("P occupe %ld octets \n", sizeof(P) ) ;

    return 0 ;
}

In this little program, we added a variable P which is an int*. Like all variables, P can be filled using the = operator. Here, we stored the address of A in it with the line:

P = &A ;

We say that “P points to A”. Personally, I find that expression rather convoluted. I prefer to say “P contains the address of A”. It’s simpler, and I find it clearer!

Alright, enough talking… let’s run it!

$gcc -o pouet main.c
$./pouet
A vaut 42
A occupe 4 octets
A est situé à l'adresse mémoire 0x7ffe78c958fc

P vaut 0x7ffe78c958fc
P occupe 8 octets

As we can see, P, of type int*, is 8 bytes. Which is indeed the size needed to store a memory address. And it contains 0x7ffe78c958fc, which is indeed the address of A. Graphically, this gives us a RAM that looks like this:

ram

If A had been a double, we would have declared P as double*, to say it contains the address of a double. If A had been an unsigned char, P would have been declared unsigned char*. And so on… You can make a pointer to any variable, regardless of its type!

However, no matter what type of data P points to, this does not change its fundamental nature: It contains an address (a memory cell number if you prefer!)… and so it will always be 8 bytes in size on this machine.

But is a pointer really a variable like any other?

Well, yes! Nothing distinguishes a pointer from all the variables you have been manipulating from the start. If you are struggling with it, take the drama out of it and tell yourself it is just a big integer… a “memory cell number”!

But wait… if it’s a variable… can we ask for its address?????

YES!

In our example, P is necessarily stored somewhere in RAM… so we can use our “&” operator to get its address:

#include <stdio.h>

int main() {
    int A ;
    int *P ;

    A = 42 ;
    P = &A ;

    printf("A vaut %d \n", A ) ;
    printf("A occupe %ld octets \n", sizeof(A) ) ;
    printf("A est situé à l'adresse mémoire %p \n\n", &A ) ;
    printf("P vaut %p \n", P ) ;
    printf("P occupe %ld octets \n", sizeof(P) ) ;
    printf("P est situé à l'adresse mémoire %p \n", &P ) ;

    return 0 ;
}
$gcc -o pouet main.c
$./pouet
A vaut 42
A occupe 4 octets
A est situé à l'adresse mémoire 0x7ffeea36056c

P vaut 0x7ffeea36056c
P occupe 8 octets
P est situé à l'adresse mémoire 0x7ffeea360570

ram

And if I wanted to manipulate the address of P, what would I store it in?

You guessed it: It’s the address of an int*, so I would store it in an int**!

And we can keep going like this for quite a while! (I don’t even know if there’s a practical limit!). However, if you go beyond three or four stars, it’s probably a sign that your code deserves a second look!

The dereference operator *

One last concept, and I promise I’ll leave you alone: The “*” operator.

It is an operator that allows you to refer to the data located at the address pointed to by a pointer. Okay, I know, put like that, it’s not clear! But don’t worry, there’s nothing complicated about it at all!

If I go back to the previous example:

    int A ;
    int *P ;

    A = 42 ;
    P = &A ;

P now contains the address of A.

If we type the following instruction:

    *P = 69 ;

Then we are asking to store the value 69 at the address contained in P.

If we break down the process, the PC will:

  • Look at P
  • find an address there
  • store the value 69 at that address.

And so… we just modified A, and it is in A that 69 is stored!

Caution, do not confuse this star, which is an operator, with the star used to declare a pointer!!!

I’ll let you test this little example to fully understand:

#include <stdio.h>

int main() {
    int A ;
    int *P ;

    A = 42 ;
    P = &A ;

    printf("A vaut %d \n", A ) ;

    *P = 69 ;

    printf("A vaut maintenant %d \n", A ) ;
    return 0 ;
}

The warrior’s rest

That’s already a lot to take in, and the road is still long. I therefore suggest splitting this article into several posts, to take the time to understand and not overwhelm you all at once. I promise, the next article will show you how to use all of this in a very practical way!

On the agenda for the following articles:

  • Episode 2: The NULL pointer, functions, and the return of the revenge of the MMU
  • Episode 3: Pointers & arrays, pointer arithmetic
  • Episode 4: Malloc, free
  • Episode 5: Pointers and structures
  • Episode 6: Function pointers

I’m starting to write volume 2 tonight!

See you soon,

Rancune.

This post is licensed under CC BY 4.0 by the author.