Taz C Compiler

Dave HeinDave Hein Posts: 5,898
edited 2015-12-23 - 01:19:36 in Propeller 2
I've been playing around with writing a C compiler that I call taz. It implements a subset of the C language that only contains the types int, short and char. They can be signed or unsigned. It also supports pointers and one-dimensional arrays. It supports the main C features of if, else, while and for. It currently doesn't support structs or unions, and it doesn't support do, switch, case, goto, enum or typedef.

taz compiles code to P2 assembly, and I use the qasm assembler to produce object files. The object files can be link together using a linker I wrote called qlink. Object files can be copied together to create larger files, or if they use a .a extension they are treated as libraries.

One useful feature of taz is that it can produce position-independent-code, which makes that task of relocating and linking object files easier. It also makes it possible to produce executable files that can be executed anywhere in memory. This is useful for reading executable files from an SD card an running them.

taz isn't intended to be a replacement for gcc. It's just a way to write high level code until gcc and spin are available. Cygwin or linux is require to build and run taz. The scripfile, tzc is a bash script that is used to compile C programs and produce a P2 executable. Look at the readme.txt file in the top directory of the zip file for more information.

EDIT: I should mention that this is still a work in progress. The taz compiler itself is mostly done, but there is still a lot of work to do on the preprocessor and implementing an optimizer. Taz generates highly unoptimized code. At some point I also intend to write a converter that will convert standard C with structs, unions and typedefs into the simpler taz C language.
«13

Comments

  • Cool.

    A few issues here. Millions of warnings like:
    $ ./buildbin 
    gencode.c: In function ‘EmitLoadReg’:
    gencode.c:225:42: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
         Emit2("        mov     reg%d, %s\n", (char *)(reg_indx++), varname);
    ...
    ...
    
    Build fails with:
    symsubs.c:(.text+0xbaa): undefined reference to `floor'
    collect2: error: ld returned 1 exit status
    mv: cannot stat ‘qasm’: No such file or directory
    convert.c: In function ‘main’:
    convert.c:67:5: warning: ‘gets’ is deprecated (declared at /usr/include/stdio.h:638) [-Wdeprecated-declarations]
         while (gets(buffer))
    ...
    ...
    
  • Heater.Heater. Posts: 21,213
    edited 2015-12-23 - 01:57:40
    But wait, there is taz binary in bin. But:
    $ taz samples/hello.c
    Segmentation fault
    

    I might have time to look into this in the morning.
  • jmgjmg Posts: 13,453
    Dave Hein wrote: »
    I've been playing around with writing a C compiler that I call taz. It implements a subset of the C language that only contains the types int, short and char. They can be signed or unsigned. It also supports pointers and one-dimensional arrays. It supports the main C features of if, else, while and for. It currently doesn't support structs or unions, and it doesn't support do, switch, case, goto, enum or typedef.
    Sounds nice.
    Does it support in-line assembler ?
    Can portions of the ASM out files be pasted into the source, and manually tuned, as a human optimize pass ?

  • @Heater, my gcc isn't generating the warning, but the warning level must be set differently. I add -Wall to the build and clean up all the warnings. You should be able to add -lm to the build line to fix the undefined reference to floor. I'll add that also.

    I'm not sure why you are getting a segmentation fault. Maybe you're on a 64-bit machine and things are aligned correctly. I should be using stdint to improve portability. I'll look into that also.

    @jmg, taz does support an inline function that is used to insert assembly code. However, you have to know which registers the variables are located in to use it. Calling parameters are passed in registers and local variables are stored on the stack. The return value is returned through reg0. Do a grep on "inline" in the samples and libsrc directories and you'll see examples of how inline is used.

    Something that I want to tweak in assembly is the sdspi.c driver. It is very slow right now. It takes a few seconds to read a 64K file. One approach is to add inline statements. Another approach is to tweak the assembly output from the C compiler, and then link it with the C code. The tzc script files allows mixing C, assembly and object files. So you could do something like this:

    tzc main.c drivers.a otherstuff.o

  • Dave,

    Not sure if you care about it at this point but here's my results building and exec'ing taz on Mac OS X (10.11.2):

    Running ./buildbin gives a lot of warnings, but creates executables:
    $ ls -al bin
    total 368
    drwxr-xr-x@ 11 me  me    374 Dec 22 21:52 .
    drwxr-xr-x+ 13 me  me    442 Dec 22 21:26 ..
    -rwxr-xr-x+  1 me  me   8896 Dec 22 21:52 convert
    -rwxr-xr-x+  1 me  me   8592 Dec 22 21:52 dumpbin
    -rwxr-xr-x+  1 me  me   8732 Dec 22 21:52 dumpobj
    -rwxr-xr-x+  1 me  me   8552 Dec 22 21:52 genqasmsym
    -rwxr-xr-x+  1 me  me  13688 Dec 22 21:52 prep
    -rwxr-xr-x+  1 me  me  46220 Dec 22 21:52 qasm
    -rwxr-xr-x+  1 me  me  14260 Dec 22 21:52 qlink
    -rwxr-xr-x+  1 me  me  49716 Dec 22 21:52 taz
    -rwxr-x--x@  1 me  me   1045 Dec 21 13:11 tzc
    

    Running tzc on hello.c, displays several errors, but does create a .bin file:
    $ tzc hello.c
    taz -pic -g hello.c
    3: ERROR 1
    3:     puts("Hello World\n");
    Encountered unexpected EOF
    qasm -o -c hello.s
    ERROR: voidP is undefined
      ldl   voidP	?
    EvaluateExpression: Symbol main is undefined
    EvaluateExpression: Symbol _hub_pic_table is undefined
    ERROR: voidP is undefined
      ldl   voidP	?
    qlink hello.o /Users/altergator/source/taz001/lib/stdio.a /Users/altergator/source/taz001/lib/string.a /Users/altergator/source/taz001/lib/stdlib.a
    ERROR: Expected 1032 bytes, but only read 1028 bytes
    
    The created .bin looks similar to the pre-compiled .bin from your .zip, but fails to run correctly in spinsim094.
    $ hexdump -C hello.bin
    00000000  00 f0 0f f2 f8 59 00 56  ff 03 00 af 80 59 04 a6  |.....Y.V.....Y..|
    00000010  00 00 20 ff 00 fa 47 f5  00 00 20 ff 00 f6 47 f5  |.. ...G... ...G.|
    00000020  58 5a 00 f6 f9 5b 00 f1  2d 5c 48 fb 24 00 90 ad  |XZ...[..-\H.$...|
    00000030  04 5a 04 f1 2d b0 40 fb  f9 b1 00 f1 01 00 00 ff  |.Z..-.@.........|
    00000040  00 1a 04 f1 01 00 00 ff  00 1c 04 f1 04 5a 04 f1  |.............Z..|
    00000050  f8 5d cc f9 57 5a 00 f6  f9 5b 00 f1 2d 5c 48 fb  |.]..WZ...[..-\H.|
    00000060  04 5a 04 f1 1c 00 90 ad  2d 5e 40 fb f9 5f 00 f1  |.Z......-^@.._..|
    00000070  2f 60 48 fb f9 61 00 f1  2f 60 60 fc 04 5a 04 f1  |/`H..a../``..Z..|
    00000080  f9 5d cc f9 00 5a 04 f6  f9 5b 00 f1 2d ec a3 fa  |.]...Z...[..-...|
    00000090  01 5a 60 fd 2d 5c 00 f6  02 5c 64 f0 ff 03 00 ff  |.Z`.-\...\d.....|
    000000a0  80 5d 04 f1 2e 00 68 fc  03 5a 60 fd 00 00 00 00  |.]....h..Z`.....|
    000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    000000f0  00 a4 0c f6 00 a8 54 f2  54 a8 40 f6 01 a4 0c c6  |......T.T.@.....|
    00000100  00 aa 54 f2 55 aa 40 f6  01 a4 6c c5 55 aa 40 f6  |..T.U.@...l.U.@.|
    00000110  55 a8 10 fd 18 a8 60 fd  54 a8 60 56 31 00 60 fd  |U.....`.T.`V1.`.|
    00000120  00 a4 0c f6 00 a8 54 f2  54 a8 40 f6 01 a4 0c c6  |......T.T.@.....|
    00000130  55 aa 40 f6 55 a8 10 fd  19 a8 60 fd 54 a8 60 56  |U.@.U.....`.T.`V|
    00000140  31 00 60 fd 00 00 00 00  00 00 00 00 00 00 00 00  |1.`.............|
    00000150  00 00 00 00 00 00 00 00  00 00 00 00 08 04 00 00  |................|
    00000160  04 04 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000170  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    000003f0  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 f0  |................|
    00000400  01 00 00 f0 00 00 00 00  00 00 00 00              |............|
    0000040c
    
    $ hexdump -C hello.bin.ORIG 
    00000000  00 f0 0f f2 f8 59 00 56  ff 03 00 af 80 59 04 a6  |.....Y.V.....Y..|
    00000010  00 00 20 ff 00 fa 47 f5  00 00 20 ff 00 f6 47 f5  |.. ...G... ...G.|
    00000020  58 5a 00 f6 f9 5b 00 f1  2d 5c 48 fb 24 00 90 ad  |XZ...[..-\H.$...|
    00000030  04 5a 04 f1 2d b0 40 fb  f9 b1 00 f1 01 00 00 ff  |.Z..-.@.........|
    00000040  00 1a 04 f1 01 00 00 ff  00 1c 04 f1 04 5a 04 f1  |.............Z..|
    00000050  f8 5d cc f9 57 5a 00 f6  f9 5b 00 f1 2d 5c 48 fb  |.]..WZ...[..-\H.|
    00000060  04 5a 04 f1 1c 00 90 ad  2d 5e 40 fb f9 5f 00 f1  |.Z......-^@.._..|
    00000070  2f 60 48 fb f9 61 00 f1  2f 60 60 fc 04 5a 04 f1  |/`H..a../``..Z..|
    00000080  f9 5d cc f9 02 00 00 ff  00 5a 04 f6 f9 5b 00 f1  |.].......Z...[..|
    00000090  2d ec a3 fa 01 5a 60 fd  2d 5c 00 f6 02 5c 64 f0  |-....Z`.-\...\d.|
    000000a0  ff 03 00 ff 80 5d 04 f1  2e 00 68 fc 03 5a 60 fd  |.....]....h..Z`.|
    000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    000000f0  00 00 00 00 00 a4 0c f6  00 a8 54 f2 54 a8 40 f6  |..........T.T.@.|
    00000100  01 a4 0c c6 00 aa 54 f2  55 aa 40 f6 01 a4 6c c5  |......T.U.@...l.|
    00000110  55 aa 40 f6 55 a8 10 fd  18 a8 60 fd 54 a8 60 56  |U.@.U.....`.T.`V|
    00000120  31 00 60 fd 00 a4 0c f6  00 a8 54 f2 54 a8 40 f6  |1.`.......T.T.@.|
    00000130  01 a4 0c c6 55 aa 40 f6  55 a8 10 fd 19 a8 60 fd  |....U.@.U.....`.|
    00000140  54 a8 60 56 31 00 60 fd  00 00 00 00 00 00 00 00  |T.`V1.`.........|
    00000150  00 00 00 00 00 00 00 00  00 00 00 00 a4 05 00 00  |................|
    00000160  a0 05 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000170  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    00000400  04 58 84 f1 2c ec 63 fc  10 00 10 fe 48 65 6c 6c  |.X..,.c.....Hell|
    00000410  6f 20 57 6f 72 6c 64 0a  00 00 00 00 f6 5b 00 f6  |o World......[..|
    00000420  10 00 10 fe 2c ec 43 fb  04 58 04 f1 2c ec 63 fd  |....,.C..X..,.c.|
    00000430  00 00 00 00 04 58 84 f1  2c ec 63 fc 2d 5c 00 f6  |.....X..,.c.-\..|
    00000440  2e 5c 00 fb 00 5c 0c f2  30 00 90 ad 2d 5c 00 f6  |.\...\..0...-\..|
    00000450  2e 5e 00 f6 01 5e 04 f1  2f 5a 00 f6 2e 5c 00 fb  |.^...^../Z...\..|
    00000460  04 58 84 f1 2c 5a 60 fc  2e 5a 00 f6 c4 00 10 fe  |.X..,Z`..Z......|
    00000470  2c 5a 40 fb 04 58 04 f1  c0 ff 9f fd 2c ec 43 fb  |,Z@..X......,.C.|
    00000480  04 58 04 f1 2c ec 63 fd  00 00 00 00 04 58 84 f1  |.X..,.c......X..|
    00000490  2c ec 63 fc 04 58 84 f1  00 5d 04 f6 2d 60 00 f6  |,.c..X...]..-`..|
    000004a0  30 5c 40 f5 2e 5a 00 f6  01 5c 04 f6 2d 60 00 f6  |0\@..Z...\..-`..|
    000004b0  2e 60 60 f0 30 5a 00 f6  0a 5c 04 f6 00 5e 04 f6  |.``.0Z...\...^..|
    000004c0  2c 5e 00 f1 2f 5c 60 fc  1a a6 60 fd 00 60 04 f6  |,^../\`...`..`..|
    000004d0  2c 60 00 f1 30 5c 40 fb  2e 5e 00 f6 01 5e 84 f1  |,`..0\@..^...^..|
    000004e0  30 5e 60 fc 00 5c 0c f2  2c 00 90 ad 2d 5c 00 f6  |0^`..\..,...-\..|
    000004f0  1e 5e 04 f6 2f 5c 60 f0  2e fa 03 f6 b2 a7 84 fa  |.^../\`.........|
    00000500  24 22 60 fd 01 5c 04 f6  2d 60 00 f6 2e 60 c0 f0  |$"`..\..-`...`..|
    00000510  30 5a 00 f6 b4 ff 9f fd  03 00 00 ff d0 5d 04 f6  |0Z...........]..|
    00000520  28 5c 60 fd 04 58 04 f1  2c ec 43 fb 04 58 04 f1  |(\`..X..,.C..X..|
    00000530  2c ec 63 fd 04 58 84 f1  2c ec 63 fc 2d 5c 00 f6  |,.c..X..,.c.-\..|
    00000540  0a 5e 04 f6 2f 5c 08 f2  01 5c 04 a6 00 5c 04 56  |.^../\...\...\.V|
    00000550  00 5c 0c f2 1c 00 90 ad  0d 5c 04 f6 04 58 84 f1  |.\.......\...X..|
    00000560  2c 5a 60 fc 2e 5a 00 f6  20 ff 1f fe 2c 5a 40 fb  |,Z`..Z.. ...,Z@.|
    00000570  04 58 04 f1 2d 5c 00 f6  04 58 84 f1 2c 5a 60 fc  |.X..-\...X..,Z`.|
    00000580  2e 5a 00 f6 04 ff 1f fe  2c 5a 40 fb 04 58 04 f1  |.Z......,Z@..X..|
    00000590  2c ec 43 fb 04 58 04 f1  2c ec 63 fd 00 00 00 00  |,.C..X..,.c.....|
    000005a0  00 00 00 00 00 00 00 00                           |........|
    000005a8
    

    I've attached the ./buildbin output, which may give some clues to any issues.

    I hope this helps...

    dgately
    Livermore, CA (50 miles SE of San Francisco)
  • I tried taz on my Linux box, and I also got a lot of warnings. I've cleaned up the code so that I no longer get any warnings. I also removed all of the "(char *)intval" type casts that may be causing a problem with 64-bit systems. Please try this version and see if it works better.

    I also noticed on my Linux machine that the script files didn't have the executable bit set. I've added a script file called chmods that will fix this. Of course, you will need to do a "chmod +x chmods" before running the chmods script.
  • Any reason for not using Makefiles to build this and save all that chmod business?

    github is waiting for Taz :)

    I'll give it a spin later.
  • Ah yes, the wonderful make syntax. I always have to give myself a refresher course every time I tackle a makefile. I'll look into. However, tzc is a script file, so that one needs the execute bit set anyhow. Of course, I could make tzc an executable also. It was just easier for me to write it as a bash script.
  • I can't wait to try this! Very cool Dave.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • dgatelydgately Posts: 1,095
    edited 2015-12-24 - 00:50:24
    taz002 still returns:
    $ taz -pic -g ../samples/hello.c
    3: ERROR 1
    3:     puts("Hello World\n");
    Encountered unexpected EOF
    
    Something to do with Mac OS X using just CRs? and not LFCR?

    Also, noticed that your filename buffer is only 40 characters long. It was trying to include a long path like: "/Users/myUserName/source/tar002/sample/hello.c"
        char fname1[120];        // mod from [40] to hold longer pathname, but needs a better solution
    

    dgately
    Livermore, CA (50 miles SE of San Francisco)
  • Dave HeinDave Hein Posts: 5,898
    edited 2015-12-24 - 01:57:37
    I think the Mac OS X uses the same line terminator as Linux, which is just a LF. However, taz works with MS files as well that terminate with CRLF. The terminator shouldn't matter since I remove any CR and LF characters at the end of the line before processing it. Could you use the attached version of ctokens.c with your build? It will print out the line that is read in, and also the tokens that are extracted. It should produce an output like this:
    ./taz ../samples/hello.c
    1: void main(void)
    <void><main><(><void><)>
    2: {
    <{>
    3:     puts("Hello World\n");
    <puts><(><"Hello World\n"><)><;>
    4: }
    <}>
    5: EOF
    <EOF>
    
    Thanks for finding the issue with pathnames. I assumed the filename would only include the name and not the entire path. I'll increase the size of fname1[] like you suggested.
  • Definitely getting a different output than you are...
    $ taz hello.c
    1: void main(void)
    <void?
          @P?><main><(><void><)>
    2: {
    <{>
    3:     puts("Hello World\n");
    <putsP3@P?><(><"Hello World\n"><)><;>
    3: ERROR 1
    3:     puts("Hello World\n");
    4: }
    <}>
    5: EOF
    <EOF>
    Encountered unexpected EOF
    

    dgately
    Livermore, CA (50 miles SE of San Francisco)
  • It seems like your version of hello.c has extra characters in it. They appear immediately after two words associated with C -- "void" and "puts". Did you use an editor to view hello.c before compiling it. If so, maybe the editor added the extra characters. Can you look at the size of hello.c by doing an "ls -l hello.c"? It should be 47 bytes long. If it's not 47, could you try extracting hello.c from ZIP file again, and then check the size again?
  • Nope... That's what I thought as well, but here's hello.c:
    $ ls -al | grep hello.c
    -rwxr-xr-x@  1 altergator  admin     47 Dec 21 13:46 hello.c
    
    $ hexdump -C hello.c
    00000000  76 6f 69 64 20 6d 61 69  6e 28 76 6f 69 64 29 0a  |void main(void).|
    00000010  7b 0a 20 20 20 20 70 75  74 73 28 22 48 65 6c 6c  |{.    puts("Hell|
    00000020  6f 20 57 6f 72 6c 64 5c  6e 22 29 3b 0a 7d 0a     |o World\n");.}. |
    0000002f
    
    Livermore, CA (50 miles SE of San Francisco)
  • Dave HeinDave Hein Posts: 5,898
    edited 2015-12-24 - 18:11:55
    So there must be something going wrong in the tokenizer. Can you please try this version of ctokens.c? It should produce an output like this:
    OpenFiles: hello.c
    Tokenize: len = 4, str = void, sptr = 80068510
    Tokenize: len = 4, str = main, sptr = 80068528
    Tokenize: len = 1, str = (, sptr = 80068540
    Tokenize: len = 4, str = void, sptr = 80068550
    Tokenize: len = 1, str = ), sptr = 80068568
    1: void main(void)
    <void><main><(><void><)>
    Tokenize: len = 1, str = {, sptr = 80068510
    2: {
    <{>
    Tokenize: len = 4, str = puts, sptr = 80068510
    Tokenize: len = 1, str = (, sptr = 80068528
    Tokenize: len = 15, str = "Hello World\n", sptr = 80068538
    Tokenize: len = 1, str = ), sptr = 80068558
    Tokenize: len = 1, str = ;, sptr = 80068568
    3:     puts("Hello World\n");
    <puts><(><"Hello World\n"><)><;>
    Tokenize: len = 1, str = }, sptr = 80068510
    4: }
    <}>
    Tokenize: len = 3, str = EOF, sptr = 80068510
    5: EOF
    <EOF>
    
  • I think I may have found the problem. I had made some assumptions about the size of StringT that may not be true in certain systems, such as 64-bit systems. Please try this version of ctokens.c.
  • Is it OK if you put this code on github or bit bucket?

    $ git pull

    is so much easier that jerking around downloading and unpacking zips or dropping odd files into place.

    Just for Christmas ?

    :)
  • Dave Hein wrote: »
    I think I may have found the problem. I had made some assumptions about the size of StringT that may not be true in certain systems, such as 64-bit systems. Please try this version of ctokens.c.

    That's it... works now!

    Thanks,
    dgately
    Livermore, CA (50 miles SE of San Francisco)
  • @Heater, yes I'll look into using github. I have an account, but I've never used it. Just like I have facebook, twitter and linkedin accounts that I never use, but I'll try using github.

    @dgately, I'm glad that solved the problem. I'll do another update soon -- maybe on github. :)
  • D.PD.P Posts: 790
    edited 2015-12-25 - 01:08:22
    Dave Hein wrote: »
    @Heater, yes I'll look into using github. I have an account, but I've never used it. Just like I have facebook, twitter and linkedin accounts that I never use, but I'll try using github.

    @dgately, I'm glad that solved the problem. I'll do another update soon -- maybe on github. :)

    Hi Dave, nice work as usual. My team uses SmartGit as a GIT client, works on all platforms, easy to use and free for non-commercial use!

  • Hi Dave,

    This looks very interesting. Can you tell us a bit about your ABI? How do you pass parameters to functions? How do you handle register allocation?

    Thanks,
    David
  • Dave HeinDave Hein Posts: 5,898
    edited 2015-12-25 - 02:29:48
    ABI? After looking that one up I found that it means Application Binary Interface. Function parameters are passed through registers, starting from reg0. The return value is passed in reg0. There are 16 registers defined, going from reg0 to reg15. There is also a stack pointer called sp. Registers are allocated sequentially, starting with the next register after the function parameters. The following example uses reg0, reg1 and reg2 for the calling parameters and reg3 to reg15 for working registers.
    int testfunc(char *ptr1, char *ptr2, int size)
    {
        int x, y, z;
        return 10;
    }
    
    The variables x, y and z are stored on the stack. Functions are called using CALLD, such as "calld adra, #testfunc". On entry into the function the return address in adra is stored on the stack, then space is allocated on the stack for the local variables. On return, the return value is put in reg0, the local variable stack space is added back to the stack, and the return address is popped off of the stack.

    Another thing that has to be done when calling a function is to save all of the caller's working registers and his own function parameters on the stack, and then move the calling parameters down to reg0. On return, the working registers and the function's calling parameters are restored from the stack. I may change this later on so that parameters are passed on the stack, but for now they are passed through registers.
  • Dave Hein wrote: »
    ABI? After looking that one up I found that it means Application Binary Interface. Function parameters are passed through registers, starting from reg0. The return value is passed in reg0. There are 16 registers defined, going from reg0 to reg15. There is also a stack pointer called sp. Registers are allocated sequentially, starting with the next register after the function parameters. The following example uses reg0, reg1 and reg2 for the calling parameters and reg3 to reg15 for working registers.
    int testfunc(char *ptr1, char *ptr2, int size)
    {
        int x, y, z;
        return 10;
    }
    
    The variables x, y and z are stored on the stack. Functions are called using CALLD, such as "calld adra, #testfunc". On entry into the function the return address in adra is stored on the stack, then space is allocated on the stack for the local variables. On return, the return value is put in reg0, the local variable stack space is added back to the stack, and the return address is popped off of the stack.

    Another thing that has to be done when calling a function is to save all of the caller's working registers and his own function parameters on the stack, and then move the calling parameters down to reg0. On return, the working registers and the function's calling parameters are restored from the stack. I may change this later on so that parameters are passed on the stack, but for now they are passed through registers.
    Sounds pretty standard. Thanks for the description. I've wanted to create a VM that uses registers for function parameter passing but always get hung up when I start reading the register allocation chapters of compiler books. I guess it never occurred to me to just allocate registers sequentially rather than worrying about liveness proofs. I imagine it works well for functions that are not very complex.

    Can your compiler compile itself? If not, are you considering that? That's one element of "self hosting" that we haven't talked about. For a true self-hosted environment, you should be able to rebuild the entire environment on the target itself. The old Small C compiler could do that although it paid a pretty high price because it had to implement things like symbol table structures manually due to a lack of support for struct.

  • Yay it works on 64 bit Debian Linux!

    At least I got hello.c to compile.

    Glad to see that at least some of the examples compile and run under GCC on my PC. So it is actually C syntax.
    $ ./buildbin 
    qasm.c: In function ‘PrintUnexpected’:
    qasm.c:64:58: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
         PrintError("ERROR: (%d) Unexpected symbol \"%s\"\n", (void *)num, str);
                                                              ^
    qasm.c: In function ‘EncodeAddressField’:
    qasm.c:226:57: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
                 PrintError("ERROR: Bad symbol type - %d\n", (void *)type, 0);
                                                             ^
    qasm.c: In function ‘ParseDat’:
    qasm.c:699:90: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
                         PrintError("ERROR: ORGH address %x less than previous address %x\n", (void *)new_hub_addr, (void *)hub_addr);
                                                                                              ^
    qasm.c:699:112: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
                         PrintError("ERROR: ORGH address %x less than previous address %x\n", (void *)new_hub_addr, (void *)hub_addr);
                                                                                                                    ^
    strsubs.c: In function ‘Tokenize’:
    strsubs.c:63:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
                 len = (int)ptr - (int)str + 1;
                       ^
    strsubs.c:63:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
                 len = (int)ptr - (int)str + 1;
                                  ^
    strsubs.c:81:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
                 len = (int)ptr - (int)str;
                       ^
    strsubs.c:81:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
                 len = (int)ptr - (int)str;
                                  ^
    strsubs.c: In function ‘ReadString’:
    strsubs.c:280:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
         if (!unicode) count = (int)fgets(buf, size, infile);
    
  • @David, the compiler cannot compile itself yet, but I plan on making that happen. At some point I'll implement another stage of the compiler that runs between the preporcessor and taz. It will handle structs and other things that taz doesn't understand. I'm hoping to modify cspin to do that task. It will just generate Taz C instead of Spin.

    @Heater, I'm glad that it's working for you. Thanks for posting the build output. I'll remove those warnings in the next update. Eventually the example programs should become fully standard as I add more features to the taz compiler.
  • I didn't know that cspin handled structs. How complete is its C syntax handling?
  • cspin handles struct, switch, case, sizeof and typedef, and also multi-dimensional arrays. I just need to add support for enum and union, and taz will handle the rest.
  • Dave Hein wrote: »
    cspin handles struct, switch, case, sizeof and typedef, and also multi-dimensional arrays. I just need to add support for enum and union, and taz will handle the rest.
    Sounds great! Can we cancel the PropGCC project now? :-)

  • That's kind of an interesting approach to building a compiler. A low level language that handles a lot of machine level stuff, one step up from assembler, and a higher level language superset that outputs the base language.

    Sort of like the C prepocessor but more like the original C++ implementation(s).

    Hope goto does not sneak in there :)

  • I think PropGCC will be able to serve a small niche in Prop2 development. :) It would probably be nice to have C++, error checking, floating point, pthreads and a few other things that GCC provides.
Sign In or Register to comment.