zsmith.co

Mobile

Open source

Documentation

Contact:
1 at zsmith dot co


Document revision 0.22
Copyright © 2009-2013 by .
All rights reserved.

What is C@?

C@ is my experimental, object-oriented computer language and accompanying compiler project. It requires that you use an assembler (NASM) to get the final executable. The assembly code that it produces is very readable. The initial focus of C@ is the IA-32 architecture, although eventually my goal is to get it producing code for both Intel64 and for RAVM.

Who is designing C@ and writing the compiler?

It is my effort alone. There is no C@ Steering Committee. There is no ISO Standard C@.

What principles does C@ express?

The four basic goals of this project were:
  • Fast execution.
  • Code readability (C@ source and assembly output).
  • Similarity to C and C++.
  • Frugality.

In addition, it is meant to be fun and to encourage thinking outside the box.

Original characteristics

From the four goals followed some unusual characteristics, which were purely experimental.
  1. Two kinds of variables: Register and globals. No local stack variables yet.
  2. Main-register variables are one size only. Globals, struct members, array data can be any size.
  3. SIMD: not neglecting the vector processor as C and C++ do.
  4. Support for single-inheritance OOP.

The principles expressed through this project include

  1. It is proper to keep in mind the speed of storage units: Registers, caches, main memory. Registers are twice as fast as L1 cache.
  2. In an age of bloat and complexity, careful simplicity still has value.
  3. Readability of code at all levels is beneficial.

No use of the stack for local storage?

Correct. In the original experiment, the stack was not used for:
  • Local variables.
  • Expression evaluation temporary variables.
It's only used to pass data to functions and on rare occasions to hold temporary values for division.

This may change because obviously making a call to malloc is very time consuming, whereas use of the stack for local variables is fast. The stack is however slower than registers so an extra keyword may be required to remind the programmer of the potential delay incurred.

Stack-based variables are however a security risk.

In addition, it is not inconceivable that I will use a separate data stack for this purpose, so as to reduce the chances of the program stack getting corrupted during a hacking attempt.

How was SIMD supported?

XMM registers were merely used as arrays on Intel processors. wth five XMM types:
  • quadlong = array of four signed 32-bit values
  • quadulong = array of four unsigned 32-bit values
  • octashort = array of eight signed 16-bit values
  • octaushort = array of eight unsigned 16-bit values
  • quadfloat = array of four floats (32 bits each)

Obviously this nomenclature could become tedious, for instance, an XMM register used as a byte array would need to be called a hexadekachar variable, so I intend to add a simpler naming system, like array16.

Speed of XMM array access is not advantageous. My benchmarking has shown that accesses to XMM using PINSRB/W/D and PEXTRB/W/D instructions are not faster than accesses to data on the (L1 cache based) stack.

Main registers can serve as simple arrays as well. I've currently implemented arrays of booleans only. I may yet add arrays of nybbles and bytes.

  • bitarray = array of 32 booleans on x86

Only one size of local variables?

In the case of register variables, this makes implementation much easier. If main register set variables were not the same size, conversions would constantly be necessary.

For instance, imagine that you want to perform addition of a char variable (8-bit signed) that is in the CH register to an unsigned long in EDX. You'd have to first signed-extend the char using MOVSX into a temporary. You can't use ECX, of course. But what if you don't have a free register for that temporary? Moving onto the stack takes roughly twice as long. Only once the char is sign-extended can you do the add.

By keeping all main register set variables the same size (int, uint, or pointer) I can achieve faster execution in general because it reduces the need for conversions and temporaries on the stack.

That said, sometimes it is wiser to pack variables into the registers as tightly as possible. That's why I provide arrays within registers e.g. bitarray, which is 32 booleans in one 32-bit register, as well as XMM-based arrays such as quadlong, which is four 32-bit signed values inside an XMM 128-bit register.

How does the OOP work?

C@ presently supports classes with single inheritance. Each class has two required functions: init() and destroy(). Method calls are similar to C++.

class A {
        int a;
        void init ();
        void destroy ();
        void print (int i);
}
void A.init () { 
	a = 5; 
}
void A.destroy () { 
}
void A.print (int i) { 
	printf ("%d %d\n", a, i); 
}
int main () {
        A* obj = new A ();
        obj->print (567);
        delete obj;
}

No dynamic binding?

Not at present. For features that are not speed-critical such as GUI event passing, it would be beneficial to add this feature.

Major limitations

  • No floating-point.
  • No "for" loop.
  • No switch/case.
  • No nested stucts.
  • No unions.
  • No templates.

Testing

I implemented two approaches to testing:
  • Automated randomized testing: about 55 randomized tests.
  • Hand-checked.

Download

C@ works under:
  • Mac OS/X
  • Windows/Cygwin
It should also work under Linux but I haven't tested it lately.

Tarball

Release 0.112 changes function call syntax to remove the requirement of parentheses around parameters.

Thus this:
     printf ("Values = %d, %d\n", foo, bar);
becomes this:
     printf "Values = %d, %d\n", foo, bar;
or if you like (it's an option):
     printf: "Values = %d, %d\n", foo, bar;

Compiling its output

C@ produces asm output. You will need the latest nasm to compile it. On the Mac, build the object file thus:

nasm -o macho myfile.asm

And generate the executable like so:

gcc -m32 myfile.o -o myfile

Future directions

C@ is not a dead project, it's just napping.

For C@ to be useful, it needs to put some local variables on the stack. This will allow for less fastidious coding to be sure, but for C@ to not be tedious it must not impose unnecessary limitations. Thus stack-based variables will be included in any future C@ compiler.

In addition, a new parser is a top priority.

And finally, support for generating RAVM assembly code (not just x86) is essential, since I personally need or want a write-only, run-anywhere language to serve as an alternative to Java.