Taz C Compiler

I've been playing around with writing a C compiler that I call taz. It implements a subset of the C language that only contains the types int, short and char. They can be signed or unsigned. It also supports pointers and one-dimensional arrays. It supports the main C features of if, else, while and for. It currently doesn't support structs or unions, and it doesn't support do, switch, case, goto, enum or typedef.
taz compiles code to P2 assembly, and I use the qasm assembler to produce object files. The object files can be link together using a linker I wrote called qlink. Object files can be copied together to create larger files, or if they use a .a extension they are treated as libraries.
One useful feature of taz is that it can produce position-independent-code, which makes that task of relocating and linking object files easier. It also makes it possible to produce executable files that can be executed anywhere in memory. This is useful for reading executable files from an SD card an running them.
taz isn't intended to be a replacement for gcc. It's just a way to write high level code until gcc and spin are available. Cygwin or linux is require to build and run taz. The scripfile, tzc is a bash script that is used to compile C programs and produce a P2 executable. Look at the readme.txt file in the top directory of the zip file for more information.
EDIT: I should mention that this is still a work in progress. The taz compiler itself is mostly done, but there is still a lot of work to do on the preprocessor and implementing an optimizer. Taz generates highly unoptimized code. At some point I also intend to write a converter that will convert standard C with structs, unions and typedefs into the simpler taz C language.
taz compiles code to P2 assembly, and I use the qasm assembler to produce object files. The object files can be link together using a linker I wrote called qlink. Object files can be copied together to create larger files, or if they use a .a extension they are treated as libraries.
One useful feature of taz is that it can produce position-independent-code, which makes that task of relocating and linking object files easier. It also makes it possible to produce executable files that can be executed anywhere in memory. This is useful for reading executable files from an SD card an running them.
taz isn't intended to be a replacement for gcc. It's just a way to write high level code until gcc and spin are available. Cygwin or linux is require to build and run taz. The scripfile, tzc is a bash script that is used to compile C programs and produce a P2 executable. Look at the readme.txt file in the top directory of the zip file for more information.
EDIT: I should mention that this is still a work in progress. The taz compiler itself is mostly done, but there is still a lot of work to do on the preprocessor and implementing an optimizer. Taz generates highly unoptimized code. At some point I also intend to write a converter that will convert standard C with structs, unions and typedefs into the simpler taz C language.
Comments
A few issues here. Millions of warnings like:
$ ./buildbin gencode.c: In function ‘EmitLoadReg’: gencode.c:225:42: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Emit2(" mov reg%d, %s\n", (char *)(reg_indx++), varname); ... ...
Build fails with:symsubs.c:(.text+0xbaa): undefined reference to `floor' collect2: error: ld returned 1 exit status mv: cannot stat ‘qasm’: No such file or directory convert.c: In function ‘main’: convert.c:67:5: warning: ‘gets’ is deprecated (declared at /usr/include/stdio.h:638) [-Wdeprecated-declarations] while (gets(buffer)) ... ...
$ taz samples/hello.c Segmentation fault
I might have time to look into this in the morning.
Does it support in-line assembler ?
Can portions of the ASM out files be pasted into the source, and manually tuned, as a human optimize pass ?
I'm not sure why you are getting a segmentation fault. Maybe you're on a 64-bit machine and things are aligned correctly. I should be using stdint to improve portability. I'll look into that also.
@jmg, taz does support an inline function that is used to insert assembly code. However, you have to know which registers the variables are located in to use it. Calling parameters are passed in registers and local variables are stored on the stack. The return value is returned through reg0. Do a grep on "inline" in the samples and libsrc directories and you'll see examples of how inline is used.
Something that I want to tweak in assembly is the sdspi.c driver. It is very slow right now. It takes a few seconds to read a 64K file. One approach is to add inline statements. Another approach is to tweak the assembly output from the C compiler, and then link it with the C code. The tzc script files allows mixing C, assembly and object files. So you could do something like this:
tzc main.c drivers.a otherstuff.o
Not sure if you care about it at this point but here's my results building and exec'ing taz on Mac OS X (10.11.2):
Running ./buildbin gives a lot of warnings, but creates executables:
$ ls -al bin total 368 drwxr-xr-x@ 11 me me 374 Dec 22 21:52 . drwxr-xr-x+ 13 me me 442 Dec 22 21:26 .. -rwxr-xr-x+ 1 me me 8896 Dec 22 21:52 convert -rwxr-xr-x+ 1 me me 8592 Dec 22 21:52 dumpbin -rwxr-xr-x+ 1 me me 8732 Dec 22 21:52 dumpobj -rwxr-xr-x+ 1 me me 8552 Dec 22 21:52 genqasmsym -rwxr-xr-x+ 1 me me 13688 Dec 22 21:52 prep -rwxr-xr-x+ 1 me me 46220 Dec 22 21:52 qasm -rwxr-xr-x+ 1 me me 14260 Dec 22 21:52 qlink -rwxr-xr-x+ 1 me me 49716 Dec 22 21:52 taz -rwxr-x--x@ 1 me me 1045 Dec 21 13:11 tzc
Running tzc on hello.c, displays several errors, but does create a .bin file:
$ tzc hello.c taz -pic -g hello.c 3: ERROR 1 3: puts("Hello World\n"); Encountered unexpected EOF qasm -o -c hello.s ERROR: voidP is undefined ldl voidP ? EvaluateExpression: Symbol main is undefined EvaluateExpression: Symbol _hub_pic_table is undefined ERROR: voidP is undefined ldl voidP ? qlink hello.o /Users/altergator/source/taz001/lib/stdio.a /Users/altergator/source/taz001/lib/string.a /Users/altergator/source/taz001/lib/stdlib.a ERROR: Expected 1032 bytes, but only read 1028 bytes
The created .bin looks similar to the pre-compiled .bin from your .zip, but fails to run correctly in spinsim094.$ hexdump -C hello.bin 00000000 00 f0 0f f2 f8 59 00 56 ff 03 00 af 80 59 04 a6 |.....Y.V.....Y..| 00000010 00 00 20 ff 00 fa 47 f5 00 00 20 ff 00 f6 47 f5 |.. ...G... ...G.| 00000020 58 5a 00 f6 f9 5b 00 f1 2d 5c 48 fb 24 00 90 ad |XZ...[..-\H.$...| 00000030 04 5a 04 f1 2d b0 40 fb f9 b1 00 f1 01 00 00 ff |.Z..-.@.........| 00000040 00 1a 04 f1 01 00 00 ff 00 1c 04 f1 04 5a 04 f1 |.............Z..| 00000050 f8 5d cc f9 57 5a 00 f6 f9 5b 00 f1 2d 5c 48 fb |.]..WZ...[..-\H.| 00000060 04 5a 04 f1 1c 00 90 ad 2d 5e 40 fb f9 5f 00 f1 |.Z......-^@.._..| 00000070 2f 60 48 fb f9 61 00 f1 2f 60 60 fc 04 5a 04 f1 |/`H..a../``..Z..| 00000080 f9 5d cc f9 00 5a 04 f6 f9 5b 00 f1 2d ec a3 fa |.]...Z...[..-...| 00000090 01 5a 60 fd 2d 5c 00 f6 02 5c 64 f0 ff 03 00 ff |.Z`.-\...\d.....| 000000a0 80 5d 04 f1 2e 00 68 fc 03 5a 60 fd 00 00 00 00 |.]....h..Z`.....| 000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000f0 00 a4 0c f6 00 a8 54 f2 54 a8 40 f6 01 a4 0c c6 |......T.T.@.....| 00000100 00 aa 54 f2 55 aa 40 f6 01 a4 6c c5 55 aa 40 f6 |..T.U.@...l.U.@.| 00000110 55 a8 10 fd 18 a8 60 fd 54 a8 60 56 31 00 60 fd |U.....`.T.`V1.`.| 00000120 00 a4 0c f6 00 a8 54 f2 54 a8 40 f6 01 a4 0c c6 |......T.T.@.....| 00000130 55 aa 40 f6 55 a8 10 fd 19 a8 60 fd 54 a8 60 56 |U.@.U.....`.T.`V| 00000140 31 00 60 fd 00 00 00 00 00 00 00 00 00 00 00 00 |1.`.............| 00000150 00 00 00 00 00 00 00 00 00 00 00 00 08 04 00 00 |................| 00000160 04 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000003f0 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 |................| 00000400 01 00 00 f0 00 00 00 00 00 00 00 00 |............| 0000040c
$ hexdump -C hello.bin.ORIG 00000000 00 f0 0f f2 f8 59 00 56 ff 03 00 af 80 59 04 a6 |.....Y.V.....Y..| 00000010 00 00 20 ff 00 fa 47 f5 00 00 20 ff 00 f6 47 f5 |.. ...G... ...G.| 00000020 58 5a 00 f6 f9 5b 00 f1 2d 5c 48 fb 24 00 90 ad |XZ...[..-\H.$...| 00000030 04 5a 04 f1 2d b0 40 fb f9 b1 00 f1 01 00 00 ff |.Z..-.@.........| 00000040 00 1a 04 f1 01 00 00 ff 00 1c 04 f1 04 5a 04 f1 |.............Z..| 00000050 f8 5d cc f9 57 5a 00 f6 f9 5b 00 f1 2d 5c 48 fb |.]..WZ...[..-\H.| 00000060 04 5a 04 f1 1c 00 90 ad 2d 5e 40 fb f9 5f 00 f1 |.Z......-^@.._..| 00000070 2f 60 48 fb f9 61 00 f1 2f 60 60 fc 04 5a 04 f1 |/`H..a../``..Z..| 00000080 f9 5d cc f9 02 00 00 ff 00 5a 04 f6 f9 5b 00 f1 |.].......Z...[..| 00000090 2d ec a3 fa 01 5a 60 fd 2d 5c 00 f6 02 5c 64 f0 |-....Z`.-\...\d.| 000000a0 ff 03 00 ff 80 5d 04 f1 2e 00 68 fc 03 5a 60 fd |.....]....h..Z`.| 000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000f0 00 00 00 00 00 a4 0c f6 00 a8 54 f2 54 a8 40 f6 |..........T.T.@.| 00000100 01 a4 0c c6 00 aa 54 f2 55 aa 40 f6 01 a4 6c c5 |......T.U.@...l.| 00000110 55 aa 40 f6 55 a8 10 fd 18 a8 60 fd 54 a8 60 56 |U.@.U.....`.T.`V| 00000120 31 00 60 fd 00 a4 0c f6 00 a8 54 f2 54 a8 40 f6 |1.`.......T.T.@.| 00000130 01 a4 0c c6 55 aa 40 f6 55 a8 10 fd 19 a8 60 fd |....U.@.U.....`.| 00000140 54 a8 60 56 31 00 60 fd 00 00 00 00 00 00 00 00 |T.`V1.`.........| 00000150 00 00 00 00 00 00 00 00 00 00 00 00 a4 05 00 00 |................| 00000160 a0 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000400 04 58 84 f1 2c ec 63 fc 10 00 10 fe 48 65 6c 6c |.X..,.c.....Hell| 00000410 6f 20 57 6f 72 6c 64 0a 00 00 00 00 f6 5b 00 f6 |o World......[..| 00000420 10 00 10 fe 2c ec 43 fb 04 58 04 f1 2c ec 63 fd |....,.C..X..,.c.| 00000430 00 00 00 00 04 58 84 f1 2c ec 63 fc 2d 5c 00 f6 |.....X..,.c.-\..| 00000440 2e 5c 00 fb 00 5c 0c f2 30 00 90 ad 2d 5c 00 f6 |.\...\..0...-\..| 00000450 2e 5e 00 f6 01 5e 04 f1 2f 5a 00 f6 2e 5c 00 fb |.^...^../Z...\..| 00000460 04 58 84 f1 2c 5a 60 fc 2e 5a 00 f6 c4 00 10 fe |.X..,Z`..Z......| 00000470 2c 5a 40 fb 04 58 04 f1 c0 ff 9f fd 2c ec 43 fb |,Z@..X......,.C.| 00000480 04 58 04 f1 2c ec 63 fd 00 00 00 00 04 58 84 f1 |.X..,.c......X..| 00000490 2c ec 63 fc 04 58 84 f1 00 5d 04 f6 2d 60 00 f6 |,.c..X...]..-`..| 000004a0 30 5c 40 f5 2e 5a 00 f6 01 5c 04 f6 2d 60 00 f6 |0\@..Z...\..-`..| 000004b0 2e 60 60 f0 30 5a 00 f6 0a 5c 04 f6 00 5e 04 f6 |.``.0Z...\...^..| 000004c0 2c 5e 00 f1 2f 5c 60 fc 1a a6 60 fd 00 60 04 f6 |,^../\`...`..`..| 000004d0 2c 60 00 f1 30 5c 40 fb 2e 5e 00 f6 01 5e 84 f1 |,`..0\@..^...^..| 000004e0 30 5e 60 fc 00 5c 0c f2 2c 00 90 ad 2d 5c 00 f6 |0^`..\..,...-\..| 000004f0 1e 5e 04 f6 2f 5c 60 f0 2e fa 03 f6 b2 a7 84 fa |.^../\`.........| 00000500 24 22 60 fd 01 5c 04 f6 2d 60 00 f6 2e 60 c0 f0 |$"`..\..-`...`..| 00000510 30 5a 00 f6 b4 ff 9f fd 03 00 00 ff d0 5d 04 f6 |0Z...........]..| 00000520 28 5c 60 fd 04 58 04 f1 2c ec 43 fb 04 58 04 f1 |(\`..X..,.C..X..| 00000530 2c ec 63 fd 04 58 84 f1 2c ec 63 fc 2d 5c 00 f6 |,.c..X..,.c.-\..| 00000540 0a 5e 04 f6 2f 5c 08 f2 01 5c 04 a6 00 5c 04 56 |.^../\...\...\.V| 00000550 00 5c 0c f2 1c 00 90 ad 0d 5c 04 f6 04 58 84 f1 |.\.......\...X..| 00000560 2c 5a 60 fc 2e 5a 00 f6 20 ff 1f fe 2c 5a 40 fb |,Z`..Z.. ...,Z@.| 00000570 04 58 04 f1 2d 5c 00 f6 04 58 84 f1 2c 5a 60 fc |.X..-\...X..,Z`.| 00000580 2e 5a 00 f6 04 ff 1f fe 2c 5a 40 fb 04 58 04 f1 |.Z......,Z@..X..| 00000590 2c ec 43 fb 04 58 04 f1 2c ec 63 fd 00 00 00 00 |,.C..X..,.c.....| 000005a0 00 00 00 00 00 00 00 00 |........| 000005a8
I've attached the ./buildbin output, which may give some clues to any issues.
I hope this helps...
dgately
I also noticed on my Linux machine that the script files didn't have the executable bit set. I've added a script file called chmods that will fix this. Of course, you will need to do a "chmod +x chmods" before running the chmods script.
github is waiting for Taz
I'll give it a spin later.
$ taz -pic -g ../samples/hello.c 3: ERROR 1 3: puts("Hello World\n"); Encountered unexpected EOF
Something to do with Mac OS X using just CRs? and not LFCR?Also, noticed that your filename buffer is only 40 characters long. It was trying to include a long path like: "/Users/myUserName/source/tar002/sample/hello.c"
char fname1[120]; // mod from [40] to hold longer pathname, but needs a better solution
dgately
./taz ../samples/hello.c 1: void main(void) <void><main><(><void><)> 2: { <{> 3: puts("Hello World\n"); <puts><(><"Hello World\n"><)><;> 4: } <}> 5: EOF <EOF>
Thanks for finding the issue with pathnames. I assumed the filename would only include the name and not the entire path. I'll increase the size of fname1[] like you suggested.$ taz hello.c 1: void main(void) <void? @P?><main><(><void><)> 2: { <{> 3: puts("Hello World\n"); <putsP3@P?><(><"Hello World\n"><)><;> 3: ERROR 1 3: puts("Hello World\n"); 4: } <}> 5: EOF <EOF> Encountered unexpected EOF
dgately
$ ls -al | grep hello.c -rwxr-xr-x@ 1 altergator admin 47 Dec 21 13:46 hello.c $ hexdump -C hello.c 00000000 76 6f 69 64 20 6d 61 69 6e 28 76 6f 69 64 29 0a |void main(void).| 00000010 7b 0a 20 20 20 20 70 75 74 73 28 22 48 65 6c 6c |{. puts("Hell| 00000020 6f 20 57 6f 72 6c 64 5c 6e 22 29 3b 0a 7d 0a |o World\n");.}. | 0000002f
OpenFiles: hello.c Tokenize: len = 4, str = void, sptr = 80068510 Tokenize: len = 4, str = main, sptr = 80068528 Tokenize: len = 1, str = (, sptr = 80068540 Tokenize: len = 4, str = void, sptr = 80068550 Tokenize: len = 1, str = ), sptr = 80068568 1: void main(void) <void><main><(><void><)> Tokenize: len = 1, str = {, sptr = 80068510 2: { <{> Tokenize: len = 4, str = puts, sptr = 80068510 Tokenize: len = 1, str = (, sptr = 80068528 Tokenize: len = 15, str = "Hello World\n", sptr = 80068538 Tokenize: len = 1, str = ), sptr = 80068558 Tokenize: len = 1, str = ;, sptr = 80068568 3: puts("Hello World\n"); <puts><(><"Hello World\n"><)><;> Tokenize: len = 1, str = }, sptr = 80068510 4: } <}> Tokenize: len = 3, str = EOF, sptr = 80068510 5: EOF <EOF>
$ git pull
is so much easier that jerking around downloading and unpacking zips or dropping odd files into place.
Just for Christmas ?
That's it... works now!
Thanks,
dgately
@dgately, I'm glad that solved the problem. I'll do another update soon -- maybe on github.
Hi Dave, nice work as usual. My team uses SmartGit as a GIT client, works on all platforms, easy to use and free for non-commercial use!
This looks very interesting. Can you tell us a bit about your ABI? How do you pass parameters to functions? How do you handle register allocation?
Thanks,
David
int testfunc(char *ptr1, char *ptr2, int size) { int x, y, z; return 10; }
The variables x, y and z are stored on the stack. Functions are called using CALLD, such as "calld adra, #testfunc". On entry into the function the return address in adra is stored on the stack, then space is allocated on the stack for the local variables. On return, the return value is put in reg0, the local variable stack space is added back to the stack, and the return address is popped off of the stack.Another thing that has to be done when calling a function is to save all of the caller's working registers and his own function parameters on the stack, and then move the calling parameters down to reg0. On return, the working registers and the function's calling parameters are restored from the stack. I may change this later on so that parameters are passed on the stack, but for now they are passed through registers.
Can your compiler compile itself? If not, are you considering that? That's one element of "self hosting" that we haven't talked about. For a true self-hosted environment, you should be able to rebuild the entire environment on the target itself. The old Small C compiler could do that although it paid a pretty high price because it had to implement things like symbol table structures manually due to a lack of support for struct.
At least I got hello.c to compile.
Glad to see that at least some of the examples compile and run under GCC on my PC. So it is actually C syntax.
$ ./buildbin qasm.c: In function ‘PrintUnexpected’: qasm.c:64:58: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] PrintError("ERROR: (%d) Unexpected symbol \"%s\"\n", (void *)num, str); ^ qasm.c: In function ‘EncodeAddressField’: qasm.c:226:57: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] PrintError("ERROR: Bad symbol type - %d\n", (void *)type, 0); ^ qasm.c: In function ‘ParseDat’: qasm.c:699:90: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] PrintError("ERROR: ORGH address %x less than previous address %x\n", (void *)new_hub_addr, (void *)hub_addr); ^ qasm.c:699:112: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] PrintError("ERROR: ORGH address %x less than previous address %x\n", (void *)new_hub_addr, (void *)hub_addr); ^ strsubs.c: In function ‘Tokenize’: strsubs.c:63:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] len = (int)ptr - (int)str + 1; ^ strsubs.c:63:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] len = (int)ptr - (int)str + 1; ^ strsubs.c:81:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] len = (int)ptr - (int)str; ^ strsubs.c:81:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] len = (int)ptr - (int)str; ^ strsubs.c: In function ‘ReadString’: strsubs.c:280:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] if (!unicode) count = (int)fgets(buf, size, infile);
@Heater, I'm glad that it's working for you. Thanks for posting the build output. I'll remove those warnings in the next update. Eventually the example programs should become fully standard as I add more features to the taz compiler.
Sort of like the C prepocessor but more like the original C++ implementation(s).
Hope goto does not sneak in there