zsmith.co

Object-Oriented C Programming

Revision 20
© 2013-2019 by Zack Smith. All rights reserved.

Introduction

Object-oriented programming was invented circa 1967 by Alan Kay and OOP languages became dominant in the 1980's and 1990's with the inventions of Objective C and C++.

With the benefits of object-oriented languages came burdens and complexities that are not necessarily welcome in every project, which is why some projects are still coded in procedural programming languages. A famous example is the Linux kernel which is written in C. Non-object-oriented languages such as Go are still being invented.

C++

The case of C++ exemplifies the pitfalls of using OO languages: The fact that books explaining C++ are often 1000 pages long should tell you something has gone wrong. A simple idea has undergone a series of developments, some logical and some overly exuberant, and become an ornate and complex system. Complexity is the enemy of security but also of resiliency.

While some programmers are certainly capable of writing very legible, bug-free and secure C++ code, the language and libraries make it all too easy to make mistakes. C++ code can quickly become an illegible, unintelligible tangle of inherited classes and nested templates, sometimes called lasagna code, that is as bad as any goto-fraught spaghetti code from the dawn of computing.

Back to basics

Due to the problems arising from certain object-oriented languages, it may be preferable for a programmer to use a non-OOP language such as C or even assembly language, but to augment it to support object-oriented principles, namely:

  1. Encapsulation: in C this is done by putting member variables and method pointers inside a struct.
  2. Polymorphism: in C this is done by overwriting method pointers in a class struct.
  3. Inheritance: in C this can be done by either:
    • manually calling the parent class method e.g. Shape_area(myCircle)
    • using the C preprocessor to embed parent classes' methods in the object struct.
    • locating the parent class's method in its class struct and calling that method.

In addition, architectures that do not yet have a C++ compiler written for them can still support OOP using this approach to C.

What are some examples of object-oriented C?

GTK+, which uses glib, uses a type of object-oriented C.

My own benchmark bandwidth now uses my approach to object-oriented C.

Techniques for object-oriented C programming

Objects and classes

An object struct should be as small as possible. If each object struct contains methods pointers that are common to all instances of its class, that's very wasteful.

For efficiency, it is best to put method pointers in a class struct that is pointed to by each object using an is_a pointer.

Imagine you will have 1,000,000,000 objects and they need to be as small as possible for speed and efficiency's sake. You can understand why OOP languages are getting pushback from some developers who deal with huge numbers of objects e.g. those who write games.

Method calls

Methods pointers in the class struct.

Here is an example of a class struct and an object struct.

 typedef struct shapeClass {
  ObjectClass *parent_class;
  char *name;
  struct shape* (*init) (struct shape* self);
  struct shape* (*printMe) (struct shape* self, FILE *output);
  float (*computeArea) (struct shape* self);
 } ShapeClass;
 //
 typedef struct shape {
  ShapeClass *is_a;
  // Parent class ivars go here.
  float width;
  float height;
 } Shape;

The object itself is only 4+4+8=16 bytes, so you can fit 4 Shape objects in one typical 64-byte cache line. Presumably the ShapeClass is hanging around in the L1 cache as well and won't be easily displaced.

Now we need a macro to make a method call readable that uses the is_a pointer:

 #define $(OBJ,METHOD,...) (OBJ && OBJ->is_a && OBJ->is_a->METHOD ? OBJ->is_a->METHOD(OBJ, ##__VA_ARGS__):0)

This does some things:

  1. Verify that the object pointer is non-NULL.
  2. Verify that the object pointer has an is_a pointer.
  3. Verify that the class struct has the method in question.
  4. If all above are true, perform the method call, else provide a 0 result.
  5. It supports any number of arguments.

Usage is the same:

 float area = $(myShape, computeArea);

What this macro does not do however is call an inherited method, which is a problem that I will explain below.

This macro is also problematic because it could lead to an integer-to-pointer conversion for methods that return pointers. Therefore for production code, it should be simplified to just the call:

 #ifdef DEBUG
 #define $(OBJ,METHOD,...) (OBJ && OBJ->is_a && OBJ->is_a->METHOD ? OBJ->is_a->METHOD(OBJ, ##__VA_ARGS__):0)
 #else
 #define $(OBJ,METHOD,...) OBJ->is_a->METHOD(OBJ, ##__VA_ARGS__)
 #endif

Call a non-overrideable method i.e. the non-polymorphic case.

Let's say a method will never be inherited and overridden in a derived class.

 Shape* myShape = Shape_new ();
 myShape->width = 10;
 myShape->height = 12;
 float area = Shape_area (myShape);

Calling a non-polymorphic super-class method.

Calling any non-polymorphic method of the parent class is simple. You just call it directly:

 Parent_doSomething (object);

Of course you will have to cast the object e.g. passing a Shape* when an Object* is expected fails the type check.

Also note! Passing a derived class object to a parent class method only works if you lay out your object structs progressively i.e. instance variables of derived classes always come after their parent classes' instance variables.

For instance imagine this 3-level class hierarchy:

  • Bytes 0-31 are Object instance data
  • Bytes 32-47 are Shape instance data
  • Bytes 48-51 are Oval instance data

You can only pass an Oval to an Object method because the earlier bytes of Oval instances are the same as those of an Object instance.

If you have overridden some methods that the parent class needs to call, this may cause a problem as well.

Calling an overrideable (i.e. polymorphic) super-class method.

It is necessary to put all methods, both parent class's and derived class's into the derived class's class struct, like so:

 #define DECLARE_OBJECT_POLYMORPHIC_METHODS(CLASS) \
  char *(*name)(void);
 #define DECLARE_STRING_POLYMORPHIC_METHODS(CLASS) \
  unsigned (*length)(void); \
  void (*print)(void); \
  wchar_t (*characterAt)(unsigned);
 #define DECLARE_MUTABLE_STRING_POLYMORPHIC_METHODS(CLASS) \
  void (*append)(wchar_t); \
  void (*truncate)(unsigned); \
  void (*toupper)(void);
 //
 typedef struct {
  ObjectClass *parent_struct;
  char *class_name;
  DECLARE_OBJECT_POLYMORPHIC_METHODS(struct mutable_string)
  DECLARE_STRING_POLYMORPHIC_METHODS(struct mutable_string)
  DECLARE_MUTABLE_STRING_POLYMORPHIC_METHODS(struct mutable_string)
 } MutableStringClass;

It's simple and effective, because trying to jump from class struct to class struct to reach a desired method pointer would be slow and doesn't work in practice using just a macro.

Note, any inherited super-class methods pointers have to be copied into the derived class's struct.

Object struct layout

If there will be any inheritance at all, you should lay out your object structs such that instance variables of derived classes come after their parent classes' instance variables.

  1. The is_a pointer.
  2. Ivars of parent class(es).
  3. Ivars of derived class.
  4. (Ivars of further subclass here.)

To facilitate this, each class header file should provide macros that declare instance variables and methods:

 #define DECLARE_OBJECT_POLYMORPHIC_METHODS(CLASS) \
  char *(*name)(void);

 #define DECLARE_STRING_IVARS \
  int stringLength; \
  wchar_t *characters;

For a derived class, layouts of class struct and object struct would look like this:

 typedef struct {
  DECLARE_OBJECT_POLYMORPHIC_METHODS(struct mutable_string)
  DECLARE_STRING_POLYMORPHIC_METHODS(struct mutable_string)
  DECLARE_MUTABLE_STRING_POLYMORPHIC_METHODS(struct mutable_string)
 } MutableStringClass;
 //
 typedef struct mutable_string {
  MutableStringClass *is_a;
  DECLARE_OBJECT_IVARS
  DECLARE_STRING_IVARS
  DECLARE_MUTABLE_STRING_IVARS
 } MutableString;

Memory management and debugging

C is known for its use of manual memory management e.g. using malloc, free, stack variables, and alloca. It certainly lacks anything sophisticated like garbage collection.

Some systems impose reference counting (retain and release) to make software's use of memory more efficient and reliable.

I would recommend using reference counting for object-oriented C.

For debugging it is also useful to tag each object with a magic number to serve as a canary to help with debugging bugs that cause memory overwritten or otherwise corrupted.

A revised object declaration:

 #define MAGIC_SHAPE 'SHAP'
 #define DECLARE_SHAPE_IVARS \
  float width; \
  float height;
 //
 typedef struct shape {
  uint32_t magic; // initialize to MAGIC_SHAPE
  int32_t retain_count;
  ShapeClass *is_a;
  DECLARE_OBJECT_IVARS
  DECLARE_SHAPE_IVARS
 } Shape;

Conclusion

Does this approach really have an advantage over C++ or another OOPL? Yes, because although it is a fully manual approach, like driving a stick-shift car wherein all OO infrastructure is manually specified, there is (or can be) no question about what is going on under the hood. The hood is transparent.

For systems where no C++ compiler exists, this approach also presents a viable means for object-oriented programming.

Advantages:

  • Fast code.
  • It's transparent about what is going on.
  • Requires only a simple C compiler.