Post

Of Pointers and Men (4)

This post is an automatic translation from French. You can read the original version here.

Hello! It seems that – vacation time obliges – updates on this blog are happening at a much higher frequency than I expected! Today, I suggest we continue our exploration of pointers by digging into arrays and their relationship with pointers.

Arrays and pointers

In C, the simplest form of array is the static array. To declare one, you proceed as follows:

int jolitab[10] ;

With this declaration, the compiler reserves space in RAM, enough to hold 10 integers.

As your C professor surely explained, the cells of our array are numbered starting from zero. Accessing each cell is done with a simple syntax using the “cell number” in brackets.

For example, to put the values 0,1,2…9 in our array, we could write:

int
main()
{
    int jolitab[10] ;
    int i ;

    for (i=0; i<10; i++)
        jolitab[i] = i ;

    return 0 ;
}

In RAM, it looks like this:

image

Each cell of the array is a perfectly ordinary integer, encoded on 4 bytes, and the cells are stored one after the other.

We can easily verify this by slightly modifying our program:

#include <stdio.h>

int
main()
{
    int jolitab[10] ;
    int i ;

    for (i=0; i<10; i++)
        jolitab[i] = i ;

    for (i=0; i<10; i++) {
        printf("La case %d est à l'adresse %p \n", i, &(jolitab[i])  ) ;
    }

    return 0 ;
}
$ gcc -o test main.c

$ ./test
La case 0 est à l'adresse 0x7ffe7bb73330
La case 1 est à l'adresse 0x7ffe7bb73334
La case 2 est à l'adresse 0x7ffe7bb73338
La case 3 est à l'adresse 0x7ffe7bb7333c
La case 4 est à l'adresse 0x7ffe7bb73340
La case 5 est à l'adresse 0x7ffe7bb73344
La case 6 est à l'adresse 0x7ffe7bb73348
La case 7 est à l'adresse 0x7ffe7bb7334c
La case 8 est à l'adresse 0x7ffe7bb73350
La case 9 est à l'adresse 0x7ffe7bb73354

However, if jolitab[i] is an int… what is the nature of jolitab? Yes, yes, jolitab by itself, without brackets? Well, it’s a pointer. It’s even a pointer that contains the address of the first cell of the array.

And I’ll prove it:

#include <stdio.h>

int
main()
{
    int jolitab[10] ;
    int i ;

    for (i=0; i<10; i++)
        jolitab[i] = i ;

    for (i=0; i<10; i++) {
        printf("La case %d est à l'adresse %p \n", i, &(jolitab[i])  ) ;
    }

    printf("\n => jolitab vaut : %p\n", jolitab ) ;

    return 0 ;
}
$ gcc -o test main.c

$ ./test
La case 0 est à l'adresse 0x7fff82b60bb0
La case 1 est à l'adresse 0x7fff82b60bb4
La case 2 est à l'adresse 0x7fff82b60bb8
La case 3 est à l'adresse 0x7fff82b60bbc
La case 4 est à l'adresse 0x7fff82b60bc0
La case 5 est à l'adresse 0x7fff82b60bc4
La case 6 est à l'adresse 0x7fff82b60bc8
La case 7 est à l'adresse 0x7fff82b60bcc
La case 8 est à l'adresse 0x7fff82b60bd0
La case 9 est à l'adresse 0x7fff82b60bd4

 => jolitab vaut : 0x7fff82b60bb0

Because yes, from the very beginning, you have been manipulating pointers without knowing it when using arrays! And this is what you should take away:

An array, in C, is nothing more than the address of its first cell!

Normally, at this point in the explanation, something should click in your head. A bit like “yeah, I always thought there was something fishy about this array business…”. And you were right!!!

So, was I lied to?

Yes, just a tiny bit. But don’t hold it against your professor too much – put yourself in their shoes: you needed arrays, it was too early to throw you into pointers, what could they do?

When you pass an array to a function, you can choose between two syntaxes for the function declaration:

void fonction( int* tablo ) ;

or

void fonction( int tablo[] ) ;

The two are strictly equivalent!

You were probably told that C passes arguments “by value”, but that arrays are passed “by reference”, and so you send the array itself, not a copy. Do me a favor: burn that part of your notes. It’s just making things complicated when they are actually so simple!!!!!!

When you pass an array to a function, you are just passing the address of its first cell. And it works exactly like every other variable: since it’s a passing of the address, the function can access the array.

By the way… since we’re talking about it, what are these brackets we use? That’s also pointer notation!

Writing jolitab[i] is exactly the same as writing *(jolitab+i)

You take what is at the base address “jolitab”, offset by i cells! Elegant, isn’t it?

Look, let me illustrate right away:

#include <stdio.h>

int
main()
{
    int jolitab[10] ;
    int i ;

    for (i=0; i<10; i++)
        jolitab[i] = i ;

    printf("\n => jolitab vaut : %p\n", jolitab ) ;
    printf("\n => L'adresse de jolitab[3] est : %p\n", &(jolitab[3]) ) ;
    printf("\n => jolitab[3] vaut : %d\n", jolitab[3] ) ;
    printf("\n => *(jolitab+3) vaut : %d\n", *(jolitab+3) ) ;

    return 0 ;
}
$ gcc -o test main.c

$ ./test

 => jolitab vaut : 0x7ffc8620eb00

 => L'adresse de jolitab[3] est : 0x7ffc8620eb0c

 => jolitab[3] vaut : 3

 => *(jolitab+3) vaut : 3

Whoa. At this point, some comments are necessary, because it’s less obvious than it appears.

If jolitab equals 0x7ffc8620eb00, and the address of jolitab[3] is 0x7ffc8620eb0c, there’s something off with the arithmetic!

Yes, but that’s because we’re doing pointer arithmetic! And I’m going to tell you about it right now…

Pointer arithmetic

Pointers in C are typed. And this matters. Let’s say I have a pointer to an integer, called P:

#include <stdio.h>

int main()
{
    int a[5] = { 12, 21, 14, 13, 121 } ;
    int *P ;

    P = a ;         // Here, we could also have written P = &(a[0])

    printf("P contient l'adresse : %p\n", P ) ;
    printf("P pointe vers l'entier %d\n", *P ) ;

    return 0 ;
}

When I shift P, I want it to point to the next integer, right? So I need to shift it by 4 bytes.

The P+1 operation will do exactly that by incrementing the address contained in P by 4 bytes, which is the size of an int. Operations on pointers take into account the type of said pointers:

#include <stdio.h>

int main()
{
    int a[5] = { 12, 21, 14, 13, 121 } ;
    int *P ;

    P = a ;

    printf("P contient l'adresse : %p\n", P ) ;
    printf("P pointe vers l'entier %d\n", *P ) ;

    printf("=> P = P + 1\n") ;
    P = P + 1 ;

    printf("P contient l'adresse : %p\n", P ) ;
    printf("P pointe vers l'entier %d\n", *P ) ;

    return 0 ;
}
$ gcc -o test main.c

$ ./test
P contient l'adresse : 0x7fff158dfd30
P pointe vers l'entier 12
=> P = P + 1
P contient l'adresse : 0x7fff158dfd34
P pointe vers l'entier 21

P went from 0x7fff158dfd30 to 0x7fff158dfd34: the address increment in P is indeed 4 bytes, the size of an int.

Graphically:

image

We can verify that this also works for other types like double, for example:

#include <stdio.h>

int main()
{
    double a[5] = { 12, 21, 14, 13, 121 } ;
    double *P ;

    P = a ;

    printf("P contient l'adresse : %p\n", P ) ;
    printf("P pointe vers le nombre %lf\n", *P ) ;

    printf("=> P = P + 1\n") ;
    P = P + 1 ;

    printf("P contient l'adresse : %p\n", P ) ;
    printf("P pointe vers le nombre %lf\n", *P ) ;

    return 0 ;
}
$ gcc -o test main.c

$ ./test
P contient l'adresse : 0x7ffdc445ce60
P pointe vers le nombre 12.000000
=> P = P + 1
P contient l'adresse : 0x7ffdc445ce68
P pointe vers le nombre 21.000000

This time, since the pointer is of type double*, we make 8-byte jumps! (the size of a double…)

image

Whoa, I can see you coming! You’re going to say: “And what if the pointer is untyped? What if it’s a void*?”

#include <stdio.h>

int main()
{
    int a[5] = { 12, 21, 14, 13, 121 } ;
    void *P ;

    P = (void*) &a ;

    printf("P contient l'adresse : %p\n", P ) ;

    printf("=> P = P + 1\n") ;
    P = P + 1 ;

    printf("P contient l'adresse : %p\n", P ) ;

    return 0 ;
}
$ gcc -o test main.c

$ ./test
P contient l'adresse : 0x7ffe9b6d1eb0
=> P = P + 1
P contient l'adresse : 0x7ffe9b6d1eb1

Well, it’s very simple: if the pointer is untyped, incrementing happens byte by byte.

You now know how to manipulate pointers. None of this is that complicated, but it is very cleverly designed in my humble opinion. In C, the interplay between arrays and addresses is particularly natural. (100% biased opinion from a C enthusiast: I own it!)

Back to arrays

This bracket notation, tab[i], or its equivalent *(tab+i), is ultimately nothing more than address calculations.

We can actually highlight two things. The first, and not the least, is that all of this clearly explains why array cells are indexed from 0 to N-1, and not from 1 to N like in some other languages. Cell zero is the address at the beginning of the array… offset by zero, precisely!

Another thing, lesser known, is that you can reverse the notation. If tab[i] is *(tab+i)… then it’s also *(i+tab), agreed? Yes, your wide eyes and sudden spike in blood pressure don’t lie – you’ve figured out where I’m going with this.

It is perfectly legal, in C, to use i[tab] instead of tab[i]. It works the same! And the compiler accepts it!!!!!!

Doubt me? Then compile this:

#include <stdio.h>

int main()
{
    int a[5] = { 12, 21, 14, 13, 121 } ;

    printf("a[3] %d\n", a[3] ) ;
    printf("3[a] %d\n", 3[a] ) ;

    return 0 ;
}

Gcc doesn’t even spit out the tiniest warning: it compiles and works perfectly!

However, it is absolutely not readable, and no serious programmer does this sort of thing: I’m just giving you this anecdote to illustrate my point and show you the underlying logic. NEVER DO THIS IN REAL LIFE. Seriously. Otherwise I’ll kill a baby kitten and it’ll be your fault!

Playing with pointers…

This, however, I do allow you to do: cleverly use pointers and types.

Take an integer. It’s a 4-byte piece of data, you agree?

Now take an array of 4 chars… that’s also a 4-byte piece of data, you agree?

Since an array is just an address, let’s use our knowledge a bit. I’ll let you read the following code:

#include <stdio.h>

int main() {
    int A ;
    unsigned char* tab ;

    A = 115200 ;
    tab = (unsigned char*)  &A ;

    printf("Le 1er octet vaut %x\n", tab[0] ) ;
    printf("Le 2nd octet vaut %x\n", tab[1] ) ;
    printf("Le 3eme octet vaut %x\n", tab[2] ) ;
    printf("Le 4eme octet vaut %x\n", tab[3] ) ;

    return 0 ;

}
$ ./test
Le 1er octet vaut 0
Le 2nd octet vaut c2
Le 3eme octet vaut 1
Le 4eme octet vaut 0

In this program, A holds a value. If we go through the pointer tab, which contains A’s address, we see these 4 bytes as an array of 4 one-byte cells.

It’s the same data, but interpreted differently!

According to Google, 115200 is written in 4 bytes whose respective values are: 00 01 C2 00. That is indeed the content of tab’s cells.

Note: Remember that PCs are little-endian architectures! It’s normal that the bytes are “reversed” :)

Final words

It’s getting late (1:20 AM by my watch) so I suggest we stop here. I hope this was clear, and please don’t hesitate to let me know what you understood or didn’t when you read these lines. There are plenty of exercises out there on the Net that you can do to better master this. I strongly encourage you to, because it’s by programming that you’ll master all of this!

See you soon,

Rancune.

This post is licensed under CC BY 4.0 by the author.