zsmith.co

Object-Oriented C Programming

Revision 17
© 2013-2019 by Zack Smith. All rights reserved.

Introduction

Object-oriented programming languages (OOPLs) became dominant a few decades ago alongside procedural languages such as C and Pascal. With these OO languages came fundamental burdens and complexities that are not necessarily desired in every project, which is why some projects like the Linux kernel are still coded in C, and non-OO languages are still being invented.

The case of C++ exemplifies the pitfalls of using OO languages: The fact that C++ books are often 1000 pages long should tell you something is wrong. But more specifically, its templates, multiple inheritance and class libraries make software design, implementation, and especially debugging more difficult than it ought to be. While some C++ code is very well-written, other C++ code can be an illegible tangle of derived classes and nested templates, known as lasagna code, which can be worse than any goto based spaghetti code from the dawn of computing.

Due to the problems arising from certain object-oriented languages, it may be preferable for a programmer to use a non-OOP language such as C or Go or even assembly language, but augment it to use object-oriented principles, namely:

  1. Encapsulation: in C this is done by putting member variables and method pointers inside a struct.
  2. Polymorphism: in C this is done by overwriting method pointers in a class struct.
  3. Inheritance: in C this can be done by either:
    • manually calling the parent class method e.g. Shape_area(myCircle)
    • using the C preprocessor to embed parent classes' methods in the object struct.
    • locating the parent class's method in its class struct and calling that method.

Techniques of object-oriented C programming

Method calls

Here are some scenarios for making a method call in C:

Call a non-overrideable method i.e. the non-polymorphic case.

 float area = Shape_area (myShape);

Call an overrideable method.

If the object itself contains the method pointer, which is not very memory-efficient, but may be cache-efficient, the call is easy:

 float area = myShape->computeArea (myShape);

A small object which fits in one cache line:

 typedef struct shape {
  float width;
  float height;
  struct shape* (*init) (struct shape* self);
  struct shape* (*printMe) (struct shape* self, FILE *output);
  float (*computeArea) (struct shape* self);
 } Shape;

Calling an embedded pointer in this way is risky. If the computeArea pointer is NULL, you'll get a segmentation fault. Notice also the use of myShape twice, which is tedious and harms readability.

We can make it safer using a macro:

 #define $(OBJ,METHOD,...) (OBJ && OBJ->METHOD ? OBJ->METHOD(OBJ, ##__VA_ARGS__):0)

Example method calls:

 float area = $(myShape, computeArea);
 $(myShape, printMe, stdout);
 $(myArray, append, myShape);

Using this macro, if the pointer is NULL, the result is 0 but there's no segmentation fault. Furthermore, it accepts methods with 0 or more parameters. Furthermore, the parameters are checked at compile-time.

However ideally, each object struct would be as small as possible and contain only a pointer to the class's struct where all method pointers reside.

Imagine you will have 1,000,000,000 objects and you will understand why OOP languages are getting pushback from some e.g. game programmers.

Let's put methods pointers into a class struct.

If each object contains methods pointers that are common to all instances of its class, that's very wasteful.

The more memory-efficient approach is to put a pointer to a class struct in each instance object, i.e. the is_a pointer, and the class struct contains all method pointers.

 typedef struct shapeClass {
  ObjectClass *parent_class;
  char *name;
  struct shape* (*init) (struct shape* self);
  struct shape* (*printMe) (struct shape* self, FILE *output);
  float (*computeArea) (struct shape* self);
 } ShapeClass;
 //
 typedef struct shape {
  ShapeClass *is_a;
  // Parent class ivars go here.
  float width;
  float height;
 } Shape;

The object itself is now only 4+4+8=16 bytes, so you can fit 4 Shape objects in one typical 64-byte cache line. Presumably the ShapeClass is hanging around in the L1 cache as well and won't be easily displaced.

We can revise the macro from above to use the is_a pointer thus:

 #define $(OBJ,METHOD,...) (OBJ && OBJ->is_a && OBJ->is_a->METHOD ? OBJ->is_a->METHOD(OBJ, ##__VA_ARGS__):0)

This does four things:

  1. check the object pointer is non-NULL.
  2. check the object pointer has an is_a pointer.
  3. check the class struct has the method in question.
  4. if all above are true, perform the method call, else provide a 0 result.

Usage is the same:

 float area = $(myShape, computeArea);

What this macro does not do however is call an inherited method, which is a problem that I will explain farther below.

Calling a non-inherited parent-class method.

Calling any non-polymorphic method, be it inherited or not, is simple. You just call it directly:

 Parent_doSomething (object);

Of course you will have to cast the object e.g. passing a Shape* when an Object* is expected fails the type check.

Also note! Passing derived class object to a parent class method only works if you lay out your object structs progressively i.e. instance variables of derived classes always come after their parent classes' instance variables.

Calling an inherited parent-class method.

Ideally we would let the C compiler perform static binding of not only the derived class's methods but also the parent class(es)' methods.

But this is tricky. Consider the following revised $ macro that reaches into the parent class's struct:

 #define $(OBJ,METHOD,...) (OBJ && OBJ->is_a ? (OBJ->is_a->METHOD ? OBJ->is_a->METHOD(OBJ, ##__VA_ARGS__) : (OBJ->is_a->parent_class->METHOD ? OBJ->is_a->parent_class->METHOD(OBJ, ##__VA_ARGS__) : 0)) : 0)

This looks clever, doesn't it? What this revised macro ought to do is:

  1. check the object pointer is non-NULL, else 0 result.
  2. check the object pointer has an is_a pointer, else 0 result.
  3. check the class struct has the method in question.
    1. if all above are true, perform the method call.
    2. if a method is not present, check the immediate parent for the same method symbol.
      1. if present, call it.
      2. if missing, zero result.

However you'll notice a major flaw in this macro:

It looks in each class struct for the method name at compile time, each time assuming it can be found in each class struct. But this is a bad assumption. If either class does not have this method in its struct, this won't even compile.

  • Case 1: Parent has method but derived doesn't. Compilation fails.
  • Case 2: Parent lacks method but derived has it. Compilation fails.
  • Case 3: Parent has method and derived does too. Compilation succeeds.

A workaround for Case 1 is to declare in the derived class the name of the parent's class's method but leave the pointer NULL. (C++ has a similar requirement.)

But this doesn't fix Case 2.

Furthermore what if the class hierarchy is quite deep? If the $ macro has to be revised for not just 2 but 4, 8 or 16 levels of inheritance, what will be the result on the final machine code that implements each method in the program? It will be a tangle of if-then checks.

Therefore this macro truly can't be used.

My workaround is to put all methods, both parent class's and derived class's into the derived's class struct, like so:

 #define DECLARE_OBJECT_POLYMORPHIC_METHODS(CLASS) \
  char *(*name)(void);
 #define DECLARE_STRING_POLYMORPHIC_METHODS(CLASS) \
  unsigned (*length)(void); \
  void (*print)(void); \
  wchar_t (*characterAt)(unsigned);
 #define DECLARE_MUTABLE_STRING_POLYMORPHIC_METHODS(CLASS) \
  void (*append)(wchar_t); \
  void (*truncate)(unsigned); \
  void (*toupper)(void);
 //
 typedef struct {
  ObjectClass *parent_struct;
  char *class_name;
  DECLARE_OBJECT_POLYMORPHIC_METHODS(struct mutable_string)
  DECLARE_STRING_POLYMORPHIC_METHODS(struct mutable_string)
  DECLARE_MUTABLE_STRING_POLYMORPHIC_METHODS(struct mutable_string)
 } MutableStringClass;

It's simple and effective, in that it maintains the needed static binding.

Object layout

If there will be any inheritance at all, you should lay out your object structs such that instance variables of derived classes come after their parent classes' instance variables.

  1. The is_a pointer.
  2. Ivars from parent class(es).
  3. Ivars from your class.

To facilitate this, each class header file should provide macros that declares instance variables and methods:

 #define DECLARE_OBJECT_POLYMORPHIC_METHODS(CLASS) \
  char *(*name)(void);

 #define DECLARE_STRING_IVARS \
  int stringLength; \
  wchar_t *characters;

For a derived class, layouts of class struct and object struct would look like this:

 typedef struct {
  DECLARE_OBJECT_POLYMORPHIC_METHODS(struct mutable_string)
  DECLARE_STRING_POLYMORPHIC_METHODS(struct mutable_string)
  DECLARE_MUTABLE_STRING_POLYMORPHIC_METHODS(struct mutable_string)
 } MutableStringClass;
 //
 typedef struct mutable_string {
  MutableStringClass *is_a;
  DECLARE_OBJECT_IVARS
  DECLARE_STRING_IVARS
  DECLARE_MUTABLE_STRING_IVARS
 } MutableString;

Memory management and debugging

C is known for its lack of memory management and certainly its lack of garbage collection. Some OSes impose reference counting (retain and release) to make code more reliable.

Sometimes for debugging it is also useful to tag each object with a magic number to help debug situations where memory gets overwritten, e.g. every method can check its objects' magic numbers to look for memory corruption. They serve as heap canaries.

A revised object declaration:

 #define MAGIC_SHAPE 'SHAP'
 #define DECLARE_SHAPE_IVARS \
  float width; \
  float height;
 //
 typedef struct shape {
  uint32_t magic; // initialize to MAGIC_SHAPE
  int retain_count;
  ShapeClass *is_a;
  DECLARE_OBJECT_IVARS
  DECLARE_SHAPE_IVARS
 } Shape;

Conclusion

Does this approach really have an advantage over C++ or another OOPL? Yes, because although it is a fully manual approach, like driving a stick-shift car, where all OO infrastructure is manually specified, there is not much question about what is going on under the hood. The hood is transparent.

Advantages:

  • Fast code.
  • It's transparent about what is going on.
  • Requires only a simple C compiler.