The Allegro Wiki is migrating to github at https://github.com/liballeg/allegro_wiki/wiki

OOC

From Allegro Wiki
Jump to: navigation, search

Why not use C++?

By all means, use C++. It makes OOP much easier and offers also a lot of other useful things. I myself used C++ until I was somewhat more experienced and then was able to do the same things in C which I wouldn't have known first how to do in C.

Why are you writing this?

I'm just writing it to order my mind about some issues.. for some reason this helps me understand things - I don't care if anyone actually reads this. Also, I do not actually know much about any of these things. This is just some thoughts about how it could be done, probably it's the worst possible way to do it all. There's no references or anything, nor have I ever written a C++ compiler or something like that, nor did I do any tests or research. So if you read this, then only use the things I say as something to think about, keep the things that make sense, and forget about the things that don't make sense. This is a wiki, so feel free to add comments anywhere you like as well.

Objects

What is an object? It is a datatype, with a set of variables describing it, and a set of functions operating on it. In some languages, this is called a class or a type, the functions are often called methods or other things, and the variables are called member variables or properties or similiar things.

The idea behind objects is that you can have many of them. And each will have the same variables describing it (just with different values), and the same functions operating on it. Actually, I'm being confusing. You can have a lot of instances of any oject type. That is, if you have an object "Sprite", you can create lots and lots of sprites. They will all have the same variables (e.g. x and y), and you can call the same functions on them (e.g. a move method). Note that I use as synonyms and mix up the following things:

  • "class", "type", "struct", "object", "object type", "object class", "interface"
  • "instance", "class instance", "object instance", "object"
  • "function", "method"
  • "variable", "member variable", "instance variable", "class variable"

So, make sure you understand what this is, then it should be clear what I mean. Especially note that "object" appears in the first as well as the second list - the meaning gets clear from the context only :)

In C, you can use a struct. For example:

struct Sprite
{
    float x, y;
    int animation;
};

Sprite *sprite_new(void)
{
    Sprite *self = calloc(1, sizeof *self);
    return self;
}

void sprite_del(Sprite *self)
{
    free(self);
}

void sprite_init(Sprite *self)
{
    self->x = 0;
    self->y = 0;
    self->animation = animation_reference("red ball");
}

void sprite_exit(Sprite *self)
{
    animation_release(self->animation);
}

void sprite_move(Sprite *self, float dx, float dy)
{
    self->x += dx;
    self->y += dy;
}

void sprite_draw(Sprite *self)
{
    draw_animation(self->animation, self->x, self->y);
}

We have an object called "Sprite", with the variables "x", "y", "image", and the methods "new", "del", "init", "exit", "move" and "draw".

A possible use could look like this:

Sprite *player = sprite_new();
sprite_init(player);
...
sprite_move(player, 1, 0);
...
sprite_draw(player);
...
sprite_exit(player);
sprite_del(player);

Inheritance

One of the advantages of object orientation is inheritance. You can define a new object and let it simply inherit things off another one. The advantage is, you only need to implement now the things that are different, but all the others you can re-use. For example you want a sprite (and assume, it is not as simple like before, but contains 100 variables and 100 methods), but for every sprite in your game you need a score. Maybe do it like this:

struct ScoreSprite
{
    Sprite super;
    int score;
};

ScoreSprite *scoresprite_new(void)
{
    ScoreSprite *self = calloc(1, sizeof *self);
    return self;
}

void scoresprite_del(ScoreSprite *self)
{
    free(self);
}

void scoresprite_init(Sprite *self)
{
    sprite_init(&self->sprite);
    self->score = 0;
}

void scoresprite_exit(Sprite *self)
{
    sprite_exit(&self->sprite);
}

void scoresprite_addscore(ScoreSprite *self, int score)
{
    self->score += score;
}

That is, we simple added the super (or parent) object into our own sprite object. Already it inherits all the variables of the Sprite object. To inherit methods, we have to explicitly pass &self->sprite to the sprite_* methods. The usage example from before would now look like this:

ScoreSprite *player = scoresprite_new();
scoresprite_init(player);
...
sprite_move(&player->super, 1, 0);
scoresprite_addscore(player, 1);
...
sprite_draw(&player->super);
...
scoresprite_exit(player);
scoresprite_del(player);

There is no risk of ever mixing up sprite and scoresprite, because the C compiler will immediately warn about a wrong type.

Often, you will find that there is the need for multiple inheritance, or multiple interfaces. A possible way for this is to use ->super, ->super2, ->super3, and so on. Or ->superSprite and so on, i.e. include the name of the parent type, so you need not remember the order. Or anything else you want, maybe simply ->sprite, i.e. just the name of the basetype. I only would advice to follow the same convention if there are lots of different objects.

Polymorphism

Suppose, you have a game with lots of different types of sprites. There are some of type "Sprite", some "ScoreSprite", some "FlyingSprite", some "JumpingSprite", some "TrollSpriteWithClub", and so on.. all they have in common is that they have a member variable "super" (or "superSprite"), which is of type Sprite.

The sprite example contained this method:

void sprite_draw(Sprite *self)
{
    draw_animation(self->animation, self->x, self->y);
}

Now assume, we want ScoreSprite to look different, so it has its own draw method:

void scoresprite_draw(ScoreSprite *self)
{
    sprite_draw(&self->super);
    draw_number(self->score, self->x, self->y);
}

So far, this is no problem. If the player is a ScoreSprite, we have:

ScoreSprite *player;
...
scoresprite_draw(player);

If it is a Sprite, we have:

Sprite *player;
...
sprite_draw(player);

Same if it is any other type of sprite.. we just need to call the right "draw" method. If we accidently call the wrong, the C compiler will immediately tell - so it is not a problem.

Next, let's say, this is a game with a scrolling map, and we have a list with all the currently visible sprites. For simplicity, assume a fixed array of 100 visible sprites, like this:

Sprite *visible[100];

void sprites_draw(void)
{
    int i;
    for (i = 0; i < 100; i++)
    {
        sprite_draw(sprites[i]);
    }
}

It will draw 100 sprites. But now, we want there to be lots of different types of sprites, not just "Sprite". There also will be "ScoreSprite" and "TrollSpriteWithClub" and so on. What should we do? Switch to C++?

Well, in C++, we would have the same problem. Assume e.g. this:

class Sprite
{
...
public:
    void draw()
    {
        ...
    }
...
};

class ScoreSprite : public Sprite
{
...
public:
    void draw()
    {
        ...
    }
...
}

Then later:

Sprite *visible[100];
visible[0] = new Sprite();
visible[1] = new ScoreSprite();

visible[0]->draw;
visible[1]->draw;

This would (without even issuing any warning) draw the ScoreSprite as a Sprite. And it makes perfect sense. How should the C++ compiler know that we actually want "visible[1]->draw" to use the "draw" method of ScoreSprite? Maybe we want it to actually use the method of the base sprite?

The way to solve this with a C++ compiler is the "virtual" keyword. If we mark the draw method in the Sprite class as virtual, then "visible[1]->draw" will not call the draw method of Sprite anymore, but the draw method of ScoreSprite. That way, we can simply iterate through a list of visible sprites, and for each the proper method is called. This explains the name "polymorphism" btw. The Sprite* pointer can morph to really be a pointer to whatever it really points to. And of course, this has to happen at runtime, because at compile time, it is not known what pointers will be in the list.

Now, remember, we are using C here. So what should we do? We have no "virtual" keyword. A simple approach would be to have a variable "type" for each object. Then whenever sprite_draw is called, examine the real type, and if it is an inherited class, call its draw method instead. Something like:

#define SpriteID 1
#define ScoreSpriteID 2

struct Sprite
{
    int type;
    float x, y;
    int animation;
};

struct ScoreSprite
{
    int type;
    Sprite super;
    int score;
};

void sprite_draw(Sprite *self)
{
    switch(self->type)
    {
        case SpriteID: break;
        case ScoreSpriteID: scoresprite_draw((ScoreSprite *)self); return;
        default: return;
    }
    draw_animation(self->animation, self->x, self->y);
}

The "new" method now needs to fill in the correct type whenever a new instance of an object is created. All pointers of inherited sprites simply would be inserted into the list of sprites, with their type cast to "Sprite *". The new draw method then checks the ->type field, and can cast back to the real type, and pass to the right method.

If you wonder if it is safe to access the "type" field of a ScoreSprite over a Sprite* pointer, then the answer is yes. The C standard specifically requires the first member of a struct to have the same address as the struct, so the variable "type" in Sprite and in ScoreSprite can be accessed over both.

One problem is that every time a new inherited type is added, sprite_draw must be updated to check for its type and call the appropriate method. We can use something called a vtable instead. It could look like this:

struct SpriteVTable
{
    void (*draw)(Sprite *self);
};

SpriteVTable sprite_vtable =
{
    sprite_draw
};

struct Sprite
{
    SpriteVTable *vtable;
    float x, y;
    int animation;
};

SpriteVTable scoresprite_vtable =
{
    scoresprite_draw
};

struct ScoreSprite
{
    SpriteVTable *vtable;
    Sprite super;
    int score;
};

void sprite_draw_polymorph(Sprite *self)
{
    self->vtable->draw(self);
}

Instead of filling in a type in the "new" method, each object's "new" method now would fill in the appropriate vtable. Instead of requiring an entry for every member function the vtable, could also make NULL simply refer to the base method. In the above example, this would make it easier to chose "sprite_draw" as name for "sprite_draw_polymorph" - but here this makes things clearer. In the same way, also things like abstract classes or interfaces can be made. They would simply specify a vtable, and objects implementing the interface would fill in the vtable.

With this, we again lost the ability for multiple inheritance though. Each class derived from Sprite has a vtable, and the first element in the struct of each must be a pointer to the vtable. A way to have multiple inheritance again is for example to have a pointer to a linked list of vtables, and each of the vtables identifies itself with a type. Then in sprite_draw_polymorph, we would not just use the vtable, but first try to find the vtable with the right type, cast to it, and use it. TODO better method. how does C++ do it.

Variations

Yet another variation would be to not have the parent classes as member variables in sub classes, but have it the other way around. For each class which allows sub-classing, have a pointer, which can point to the sub-class. E.g:

struct Sprite
{
    SpriteVTable *vtable;
    float x, y;
    int animation;
    void *sub;
};

struct ScoreSprite
{
    SpriteVTable *vtable;
    int score;
};

With this, all sprite methods for all types of sprites can actually use Sprite* pointers, and only need to cast the subclass pointer to their own type. The vtable entries then would know how to deal with the sub pointer. If could look like this:

void sprite_draw_polymorph(Sprite *self)
{
    vtable->draw(self);
}

void scoresprite_draw(Sprite *self)
{
    sprite_draw(self);
    ScoreSprite *sub = self->sub;
    draw_number(sub->score, self->x, self->y);
}

There is a lot of variation possibilities, you can combine anything, and implement as much as you want. There are also some other methods still. Going to describe them later, since one of them is the one I'm actually using myself in my latest project.

Containers

Now we know how to have polymorphic and multiple inheritance objects, or at least, a small selection of ways how to achieve it. There was no mention yet how to actually work with them - the example with the list of visible sprites actually used a static array. Here I will tell a bit about how I usually implement containers in C, mainly I will concentrate on two quite general ones, dynamic arrays (like STL vectors) and doubly linked lists (STL lists).

What now?

For all the things described here, there are much better elaborated and more elegant ways to do them in C. Just look at some good C code, like the linux kernel source, and you will see.

References

Actually, by now, despite what I said in the introductions, there are some references, even if I didn't know them to write this. Paul Pridham gave me those:

Seems the first one completely supercedes what this article was meant to be :) I think I'll still finish it though, at least I've got different naming conventions.

gcc

Desipte declining it in the introduction, I actually did some reasearch:

I disassembled some C++ code generated by gcc, to see how it implements things in C++. Looking at it is quite interesting, maybe I'll share my findings here by providing a short C++ code and then explain the generated asm.


This is the C++ code:

int x;

class A
{
    int a[11];
public:
    virtual void print()
    {
        x = 1;
    }
};

class B
{
    int b[12];
public:
    virtual void print()
    {
        x = 2;
    }
};

class C : public A, public B
{
    int c[13];
public:
    void print()
    {
        x = 3;
    }
};

A a;
B b;
C c;

int main()
{
    A *ptr = &c;
    a.print();
    b.print();
    c.print();
    ptr->print();
    return 0;
}

Below is the asm generated by gcc -S -fverbose-asm a.cc. Things not interesting to us have been stripped. Let's dissect it line by line.

x:
   .zero   4

a:
   .zero   48

b:
   .zero   52

c:
   .zero   152

Not surprisingly, the four global variables from the C++ code also exist in asm. Interesting are the sizes: x is an int, and since I used 32bit output, it gets 4 bytes. a has 11 ints, which would be 44 bytes. Apparently we get 4 extra bytes. b has 12 ints, so should be 48, but again, it has 4 extra bytes. c has 13 ints (52 bytes), and inherits a and b, so should have 144 bytes for the int arrays. Apparently, there are 8 additional bytes.

main:
.LFB6:
   pushl   %ebp
.LCFI0:
   movl   %esp, %ebp
.LCFI1:
   subl   $8, %esp
.LCFI2:
   andl   $-16, %esp
   movl   $0, %eax
   subl   %eax, %esp
   movl   $c, -4(%ebp)   #  ptr
   movl   $a, (%esp)
   call   _ZN1A5printEv
   movl   $b, (%esp)
   call   _ZN1B5printEv
   movl   $c, (%esp)
   call   _ZN1C5printEv
   movl   -4(%ebp), %eax   #  ptr
   movl   (%eax), %edx   #  <variable>._vptr.A
   movl   -4(%ebp), %eax   #  ptr
   movl   %eax, (%esp)
   movl   (%edx), %eax
   call   *%eax
   movl   $0, %eax
   leave
   ret
.LFE6:

This is the main function, translated to asm.

First, a pointer to "c" is moved into the local "ptr" at -4(%esp) on the stack.

Then, the address of "a" is put to the stack, and _ZN1A5printEv is called.

Then, the same for "b" and "c".

What can be seen is how there is a single stack argument, the "this" pointer, with any method call.

Finally, the polymorph call: The contents of ptr are fetched into edx. That is, the first 4 bytes out of the object "c". Looks like it is a pointer to a vtable. From this vtable, the first 4 bytes are used as function pointer, and it is called. The parameter given to it is "ptr". What we have seen is how the actual method to be called must be read from a vtable, and we now know what the 4 bytes we wondered above are fore: They are a vtable pointer at the beginning of each object.


_Z41__static_initialization_and_destruction_0ii:
.LFB8:
   pushl   %ebp
.LCFI3:
   movl   %esp, %ebp
.LCFI4:
   subl   $8, %esp
.LCFI5:
   cmpl   $65535, 12(%ebp)   #  __priority
   jne   .L3
   cmpl   $1, 8(%ebp)   #  __initialize_p
   jne   .L3
   movl   $a, (%esp)
   call   _ZN1AC1Ev
.L3:
   cmpl   $65535, 12(%ebp)   #  __priority
   jne   .L4
   cmpl   $1, 8(%ebp)   #  __initialize_p
   jne   .L4
   movl   $b, (%esp)
   call   _ZN1BC1Ev
.L4:
   cmpl   $65535, 12(%ebp)   #  __priority
   jne   .L2
   cmpl   $1, 8(%ebp)   #  __initialize_p
   jne   .L2
   movl   $c, (%esp)
   call   _ZN1CC1Ev
.L2:
   leave
   ret
.LFE8:

This apparently is a function called before main, which initializes some things. It calls three functions: _ZN1AC1Ev, _ZN1BC1Ev, _ZN1CC1Ev, passing them "a", "b" and "c" respectively. Apparently, the constructors of "A", "B" and "C".

_ZN1AC1Ev:
.LFB12:
   pushl   %ebp
.LCFI6:
   movl   %esp, %ebp
.LCFI7:
   movl   8(%ebp), %eax   #  this
   movl   $_ZTV1A+8, (%eax)   #  <variable>._vptr.A
   popl   %ebp
   ret
.L7:
.LFE12:

This is the constructor of A. The address of $_ZTV1A+8 is moved into the first field of the object. $_ZTV1A+8 must be the location of the vtable.

_ZN1A5printEv:
.LFB13:
   pushl   %ebp
.LCFI8:
   movl   %esp, %ebp
.LCFI9:
   movl   $1, x   #  x
   popl   %ebp
   ret
.LFE13:

This is "A"s print method.

_ZN1BC1Ev:
.LFB17:
   pushl   %ebp
.LCFI10:
   movl   %esp, %ebp
.LCFI11:
   movl   8(%ebp), %eax   #  this
   movl   $_ZTV1B+8, (%eax)   #  <variable>._vptr.B
   popl   %ebp
   ret
.L11:
.LFE17:

B's constructor. Like A's.

_ZN1B5printEv:
.LFB18:
   pushl   %ebp
.LCFI12:
   movl   %esp, %ebp
.LCFI13:
   movl   $2, x   #  x
   popl   %ebp
   ret
.LFE18:

This is "B"s print method.

_ZN1CC1Ev:
.LFB22:
   pushl   %ebp
.LCFI14:
   movl   %esp, %ebp
.LCFI15:
   subl   $8, %esp
.LCFI16:
   movl   8(%ebp), %eax   #  this
   movl   %eax, (%esp)
   call   _ZN1AC2Ev
   movl   8(%ebp), %eax   #  this
   addl   $48, %eax
   movl   %eax, (%esp)
   call   _ZN1BC2Ev
   movl   8(%ebp), %eax   #  this
   movl   $_ZTV1C+8, (%eax)   #  <variable>._vptr.A
   movl   8(%ebp), %eax   #  this
   addl   $48, %eax
   movl   $_ZTV1C+20, (%eax)   #  <variable>._vptr.B
   leave
   ret
.L15:
.LFE22:

C's constructor. This one looks considerably different from A's and B's constructors. It calls _ZN1AC2Ev with the this pointer, and _ZN1BC2Ev with a pointer to this + 48. Apparently, class A is put first (including the extra 4 bytes for its vtable, then the 11 ints) and a constructor for A is called. B is put immediately after that. Then both the vtables of A and B are filled in, with _ZTV1C+8 and _ZTV1C+20.

_ZN1C5printEv:
.LFB23:
   pushl   %ebp
.LCFI17:
   movl   %esp, %ebp
.LCFI18:
   movl   $3, x   #  x
   popl   %ebp
   ret
.LFE23:

This is "C"s print method.

_ZThn48_N1C5printEv:
   addl   $-48, 4(%esp)
   jmp   _ZN1C5printEv

This is an entry in C's vtable, which calls C's print method, but offsets the this pointer by -48. So we can conclude it is the intry inside B's vtable inside C.

_ZTV1C:
   .long   0
   .long   _ZTI1C
   .long   _ZN1C5printEv
   .long   -48
   .long   _ZTI1C
   .long   _ZThn48_N1C5printEv

This must be C's vtable. It has C's print, and then the above B's C's print.

_ZTV1B:
   .long   0
   .long   _ZTI1B
   .long   _ZN1B5printEv

B's vtable. Simply has B's print method in it.

_ZTV1A:
   .long   0
   .long   _ZTI1A
   .long   _ZN1A5printEv

A's vtable.

_ZN1AC2Ev:
.LFB25:
   pushl   %ebp
.LCFI19:
   movl   %esp, %ebp
.LCFI20:
   movl   8(%ebp), %eax   #  this
   movl   $_ZTV1A+8, (%eax)   #  <variable>._vptr.A
   popl   %ebp
   ret
.L19:
.LFE25:

Another constructor for "A", by the looks of it.

_ZN1BC2Ev:
.LFB26:
   pushl   %ebp
.LCFI21:
   movl   %esp, %ebp
.LCFI22:
   movl   8(%ebp), %eax   #  this
   movl   $_ZTV1B+8, (%eax)   #  <variable>._vptr.B
   popl   %ebp
   ret
.L22:
.LFE26:

Another constructor for "B".

_ZTI1A:
   .long   _ZTVN10__cxxabiv117__class_type_infoE+8
   .long   _ZTS1A

This seems to be type information for A, as references by the vtable.

_ZTI1B:
   .long   _ZTVN10__cxxabiv117__class_type_infoE+8
   .long   _ZTS1B

Type information for B.

_ZTI1C:
   .long   _ZTVN10__cxxabiv121__vmi_class_type_infoE+8
   .long   _ZTS1C
   .long   8
   .long   2
   .long   _ZTI1A
   .long   2
   .long   _ZTI1B
   .long   12290
   .zero   8

Type information for C. Apparently it includes the class relations to A and B.

_ZTS1C:
   .string   "1C"

A string identifying the "C" class?

_ZTS1B:
   .string   "1B"

Another string for "B".

_ZTS1A:
   .string   "1A"

And one for "A".

_GLOBAL__I_x:
.LFB28:
   pushl   %ebp
.LCFI23:
   movl   %esp, %ebp
.LCFI24:
   subl   $8, %esp
.LCFI25:
   movl   $65535, 4(%esp)
   movl   $1, (%esp)
   call   _Z41__static_initialization_and_destruction_0ii
   leave
   ret
.LFE28:
   .long   _GLOBAL__I_x

This calls the global initialization function, so we can conclude it is the function actualyl included before main, with the task to initialize global variables.