Taz C Compiler
Dave Hein
Posts: 6,347
I've been playing around with writing a C compiler that I call taz. It implements a subset of the C language that only contains the types int, short and char. They can be signed or unsigned. It also supports pointers and one-dimensional arrays. It supports the main C features of if, else, while and for. It currently doesn't support structs or unions, and it doesn't support do, switch, case, goto, enum or typedef.
taz compiles code to P2 assembly, and I use the qasm assembler to produce object files. The object files can be link together using a linker I wrote called qlink. Object files can be copied together to create larger files, or if they use a .a extension they are treated as libraries.
One useful feature of taz is that it can produce position-independent-code, which makes that task of relocating and linking object files easier. It also makes it possible to produce executable files that can be executed anywhere in memory. This is useful for reading executable files from an SD card an running them.
taz isn't intended to be a replacement for gcc. It's just a way to write high level code until gcc and spin are available. Cygwin or linux is require to build and run taz. The scripfile, tzc is a bash script that is used to compile C programs and produce a P2 executable. Look at the readme.txt file in the top directory of the zip file for more information.
EDIT: I should mention that this is still a work in progress. The taz compiler itself is mostly done, but there is still a lot of work to do on the preprocessor and implementing an optimizer. Taz generates highly unoptimized code. At some point I also intend to write a converter that will convert standard C with structs, unions and typedefs into the simpler taz C language.
taz compiles code to P2 assembly, and I use the qasm assembler to produce object files. The object files can be link together using a linker I wrote called qlink. Object files can be copied together to create larger files, or if they use a .a extension they are treated as libraries.
One useful feature of taz is that it can produce position-independent-code, which makes that task of relocating and linking object files easier. It also makes it possible to produce executable files that can be executed anywhere in memory. This is useful for reading executable files from an SD card an running them.
taz isn't intended to be a replacement for gcc. It's just a way to write high level code until gcc and spin are available. Cygwin or linux is require to build and run taz. The scripfile, tzc is a bash script that is used to compile C programs and produce a P2 executable. Look at the readme.txt file in the top directory of the zip file for more information.
EDIT: I should mention that this is still a work in progress. The taz compiler itself is mostly done, but there is still a lot of work to do on the preprocessor and implementing an optimizer. Taz generates highly unoptimized code. At some point I also intend to write a converter that will convert standard C with structs, unions and typedefs into the simpler taz C language.

Comments
A few issues here. Millions of warnings like:
$ ./buildbin gencode.c: In function ‘EmitLoadReg’: gencode.c:225:42: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Emit2(" mov reg%d, %s\n", (char *)(reg_indx++), varname); ... ...Build fails with:symsubs.c:(.text+0xbaa): undefined reference to `floor' collect2: error: ld returned 1 exit status mv: cannot stat ‘qasm’: No such file or directory convert.c: In function ‘main’: convert.c:67:5: warning: ‘gets’ is deprecated (declared at /usr/include/stdio.h:638) [-Wdeprecated-declarations] while (gets(buffer)) ... ...I might have time to look into this in the morning.
Does it support in-line assembler ?
Can portions of the ASM out files be pasted into the source, and manually tuned, as a human optimize pass ?
I'm not sure why you are getting a segmentation fault. Maybe you're on a 64-bit machine and things are aligned correctly. I should be using stdint to improve portability. I'll look into that also.
@jmg, taz does support an inline function that is used to insert assembly code. However, you have to know which registers the variables are located in to use it. Calling parameters are passed in registers and local variables are stored on the stack. The return value is returned through reg0. Do a grep on "inline" in the samples and libsrc directories and you'll see examples of how inline is used.
Something that I want to tweak in assembly is the sdspi.c driver. It is very slow right now. It takes a few seconds to read a 64K file. One approach is to add inline statements. Another approach is to tweak the assembly output from the C compiler, and then link it with the C code. The tzc script files allows mixing C, assembly and object files. So you could do something like this:
tzc main.c drivers.a otherstuff.o
Not sure if you care about it at this point but here's my results building and exec'ing taz on Mac OS X (10.11.2):
Running ./buildbin gives a lot of warnings, but creates executables:
Running tzc on hello.c, displays several errors, but does create a .bin file:
$ tzc hello.c taz -pic -g hello.c 3: ERROR 1 3: puts("Hello World\n"); Encountered unexpected EOF qasm -o -c hello.s ERROR: voidP is undefined ldl voidP ? EvaluateExpression: Symbol main is undefined EvaluateExpression: Symbol _hub_pic_table is undefined ERROR: voidP is undefined ldl voidP ? qlink hello.o /Users/altergator/source/taz001/lib/stdio.a /Users/altergator/source/taz001/lib/string.a /Users/altergator/source/taz001/lib/stdlib.a ERROR: Expected 1032 bytes, but only read 1028 bytesThe created .bin looks similar to the pre-compiled .bin from your .zip, but fails to run correctly in spinsim094.I've attached the ./buildbin output, which may give some clues to any issues.
I hope this helps...
dgately
I also noticed on my Linux machine that the script files didn't have the executable bit set. I've added a script file called chmods that will fix this. Of course, you will need to do a "chmod +x chmods" before running the chmods script.
github is waiting for Taz
I'll give it a spin later.
$ taz -pic -g ../samples/hello.c 3: ERROR 1 3: puts("Hello World\n"); Encountered unexpected EOFSomething to do with Mac OS X using just CRs? and not LFCR?Also, noticed that your filename buffer is only 40 characters long. It was trying to include a long path like: "/Users/myUserName/source/tar002/sample/hello.c"
dgately
./taz ../samples/hello.c 1: void main(void) <void><main><(><void><)> 2: { <{> 3: puts("Hello World\n"); <puts><(><"Hello World\n"><)><;> 4: } <}> 5: EOF <EOF>Thanks for finding the issue with pathnames. I assumed the filename would only include the name and not the entire path. I'll increase the size of fname1[] like you suggested.$ taz hello.c 1: void main(void) <void? @P?><main><(><void><)> 2: { <{> 3: puts("Hello World\n"); <putsP3@P?><(><"Hello World\n"><)><;> 3: ERROR 1 3: puts("Hello World\n"); 4: } <}> 5: EOF <EOF> Encountered unexpected EOFdgately
$ ls -al | grep hello.c -rwxr-xr-x@ 1 altergator admin 47 Dec 21 13:46 hello.c $ hexdump -C hello.c 00000000 76 6f 69 64 20 6d 61 69 6e 28 76 6f 69 64 29 0a |void main(void).| 00000010 7b 0a 20 20 20 20 70 75 74 73 28 22 48 65 6c 6c |{. puts("Hell| 00000020 6f 20 57 6f 72 6c 64 5c 6e 22 29 3b 0a 7d 0a |o World\n");.}. | 0000002fOpenFiles: hello.c Tokenize: len = 4, str = void, sptr = 80068510 Tokenize: len = 4, str = main, sptr = 80068528 Tokenize: len = 1, str = (, sptr = 80068540 Tokenize: len = 4, str = void, sptr = 80068550 Tokenize: len = 1, str = ), sptr = 80068568 1: void main(void) <void><main><(><void><)> Tokenize: len = 1, str = {, sptr = 80068510 2: { <{> Tokenize: len = 4, str = puts, sptr = 80068510 Tokenize: len = 1, str = (, sptr = 80068528 Tokenize: len = 15, str = "Hello World\n", sptr = 80068538 Tokenize: len = 1, str = ), sptr = 80068558 Tokenize: len = 1, str = ;, sptr = 80068568 3: puts("Hello World\n"); <puts><(><"Hello World\n"><)><;> Tokenize: len = 1, str = }, sptr = 80068510 4: } <}> Tokenize: len = 3, str = EOF, sptr = 80068510 5: EOF <EOF>$ git pull
is so much easier that jerking around downloading and unpacking zips or dropping odd files into place.
Just for Christmas ?
That's it... works now!
Thanks,
dgately
@dgately, I'm glad that solved the problem. I'll do another update soon -- maybe on github.
Hi Dave, nice work as usual. My team uses SmartGit as a GIT client, works on all platforms, easy to use and free for non-commercial use!
This looks very interesting. Can you tell us a bit about your ABI? How do you pass parameters to functions? How do you handle register allocation?
Thanks,
David
int testfunc(char *ptr1, char *ptr2, int size) { int x, y, z; return 10; }The variables x, y and z are stored on the stack. Functions are called using CALLD, such as "calld adra, #testfunc". On entry into the function the return address in adra is stored on the stack, then space is allocated on the stack for the local variables. On return, the return value is put in reg0, the local variable stack space is added back to the stack, and the return address is popped off of the stack.Another thing that has to be done when calling a function is to save all of the caller's working registers and his own function parameters on the stack, and then move the calling parameters down to reg0. On return, the working registers and the function's calling parameters are restored from the stack. I may change this later on so that parameters are passed on the stack, but for now they are passed through registers.
Can your compiler compile itself? If not, are you considering that? That's one element of "self hosting" that we haven't talked about. For a true self-hosted environment, you should be able to rebuild the entire environment on the target itself. The old Small C compiler could do that although it paid a pretty high price because it had to implement things like symbol table structures manually due to a lack of support for struct.
At least I got hello.c to compile.
Glad to see that at least some of the examples compile and run under GCC on my PC. So it is actually C syntax.
$ ./buildbin qasm.c: In function ‘PrintUnexpected’: qasm.c:64:58: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] PrintError("ERROR: (%d) Unexpected symbol \"%s\"\n", (void *)num, str); ^ qasm.c: In function ‘EncodeAddressField’: qasm.c:226:57: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] PrintError("ERROR: Bad symbol type - %d\n", (void *)type, 0); ^ qasm.c: In function ‘ParseDat’: qasm.c:699:90: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] PrintError("ERROR: ORGH address %x less than previous address %x\n", (void *)new_hub_addr, (void *)hub_addr); ^ qasm.c:699:112: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] PrintError("ERROR: ORGH address %x less than previous address %x\n", (void *)new_hub_addr, (void *)hub_addr); ^ strsubs.c: In function ‘Tokenize’: strsubs.c:63:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] len = (int)ptr - (int)str + 1; ^ strsubs.c:63:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] len = (int)ptr - (int)str + 1; ^ strsubs.c:81:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] len = (int)ptr - (int)str; ^ strsubs.c:81:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] len = (int)ptr - (int)str; ^ strsubs.c: In function ‘ReadString’: strsubs.c:280:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] if (!unicode) count = (int)fgets(buf, size, infile);@Heater, I'm glad that it's working for you. Thanks for posting the build output. I'll remove those warnings in the next update. Eventually the example programs should become fully standard as I add more features to the taz compiler.
Sort of like the C prepocessor but more like the original C++ implementation(s).
Hope goto does not sneak in there