Shop OBEX P1 Docs P2 Docs Learn Events
Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S - Page 20 — Parallax Forums

Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S

1171820222338

Comments

  • Heater.Heater. Posts: 21,230
    edited 2010-09-24 01:22
    Jazzed,

    Yes I have made a variant of dhrystone v1.1 that uses the Prop CNT timer. It will overflow after 40 seconds or so, so I reduced the number of loops to a mere 500.

    That dhrystone is attached here, you will have to recompile it with whatever clock frequency you are using. Some results are in the compiler benchmarks thread.

    I will put out a new ZOG release soon when i have Lonesocks floating point code working with it and get a Whetstone benchmark result. I'll probably make that ZOG V2.0.
  • jazzedjazzed Posts: 11,803
    edited 2010-09-24 08:04
    Heater, thanks for the image.

    USE_JCACHED_MEMORY SDRAM results are below.
    Running from HUB gives 0 dhrystones/second.
    The results are not very clear to me.

    EDIT ADD: Maybe my ZOG is out of date. Guess I'll wait for 2.0.
    Booting dhrys11.bin
    00000000
    
    Reading image... 4839 Bytes Loaded.
    Done
    
    Clearing bss: ....
    Running Program!
    Dhrystone(1.1) time for 5000 passes = 1
    This machine benchmarks at 5000 dhrystones/second
    
  • Heater.Heater. Posts: 21,230
    edited 2010-09-24 08:20
    That does not look right. 5000 passes cannot possibly happen in 1 second?

    Below is what I got, as previously posted in the compiler benchmark thread.

    I'll try all this again when I get home, in a couple of hours, and try to see what is up.
    ZOG v1.6 (HUB)
    zpu memory at 0000008C
    Dhrystone(1.1) time for 5000 passes = 9
    This machine benchmarks at 555 dhrystones/second
    
    #pc,opcode,sp,top_of_stack,next_on_stack
    #----------
    
    0X0000E53 0X00 0X000017B0 0X00000C9B
    BREAKPOINT
    
    ZOG v1.6 (VM, No SD)
    Dhrystone(1.1) time for 5000 passes = 37
    This machine benchmarks at 135 dhrystones/second
    
    #pc,opcode,sp,top_of_stack,next_on_stack
    #----------
    
    0X0000E53 0X00 0X000017B0 0X00000C9B
    BREAKPOINT
    
  • Heater.Heater. Posts: 21,230
    edited 2010-09-26 05:26
    In case anyone is waiting for a Zog with floating point support. Here is what happened to day:
    ZOG v1.6 (HUB)
    zpu memory at 0000008C
    Test float32:
    Float32 cog = 3
    __addsf3 OK
    __subsf3 OK
    __mulsf3 OK
    __divsf3 OK
    sqrtf OK
    sinf OK
    asinf OK
    cosf OK
    acosf OK
    tanf OK
    atanf OK
    atan2f OK
    floorf OK
    ceilf OK
    logf OK
    log2f OK
    log10f OK
    expf OK
    exp2f OK
    exp10f OK
    powf OK
    float > OK
    float >= OK
    float < OK
    float <= OK
    The first 100000 terms of the Euler series sum to...164472528
    Euler OK
    All tests complete.
    

    The test program running there does not check accuracy but just that the right float commands are issued to the COG and the results are good to 4 or 5 digits.

    So it looks like Lonesock's new Float32 provides pretty much everything C requires.

    There are some odd things going on with the zpu-gcc compiler though. If you do some arithmetic on floats and ints it can end up coercing things to doubles. At which point it pulls in a load of double precision floating point code that we don't want bloating the binaries by many killo bytes. I have had to write the tests in such a way that it does not do that.

    Good news is that using the Float32 COG reduces this tests binary down from the 45Kbytes when using softfloat routines to 9900 bytes. This includes the 4K of PASM code loaded to the COG and 1K of redundant vector table in the C start up code.

    Speed on the Euler series summation is about half that of Spin.

    Now to get this working from ext RAM as well...
  • lonesocklonesock Posts: 917
    edited 2010-09-26 09:49
    Does "-fallow-single-precision" work for the ZPU version of the tool-chain?

    Jonathan
  • Heater.Heater. Posts: 21,230
    edited 2010-09-26 09:58
    I've just been looking into this.

    There is also -fsingle-precision-constant which looks like it would fix some of the issues I had.

    Then -ffloat-store is said to "This option prevents undesirable excess precision...." by man gcc.

    Then there are things like -mpc32 and -msingle-float but they seem to be x86 specific.
  • Heater.Heater. Posts: 21,230
    edited 2010-09-26 10:17
    OK, none of those options is allowed except -fsingle-precision-constant.

    But the results of using it can be dramatic. For example changing this line:

    c = es * 100000000;

    to this:

    c = es * 100000000.0;

    bloats that test program from 9908 to 18516 bytes.

    Using -fsingle-precision-constant brings it back down again:)
  • Heater.Heater. Posts: 21,230
    edited 2010-09-26 11:09
    First MandleProp set?
    The first 100000 terms of the Euler series sum to...164472528
    Euler OK
                                                                           *********
                                                                           ********
                                                                          *********
                                                                          *********
                                                                           ********
                                                                            ******
                                                             *      ** * ************ *  *
                                                              * *   ** **************** *
                                                             ***** ********************** *
                                                              *****************************  ****
                                                              ****************************** ****
                                                               **********************************
                                                          * * ***********************************
                                                            ***********************************
                                                        * ***************************************
                                                         *****************************************
                                                           **************************************
                                                          ****************************************** *
                                  *                    **********************************************
                                                         *******************************************
                                     **     *          ********************************************
                                     *** * **  *        ********************************************
                                    ************ *    *********************************************
                                      ************     *********************************************
                                    ****************  *********************************************
                                   *****************  *********************************************
                                   ****************** *********************************************
                                  ******************* ********************************************
                                  *****************************************************************
                              *** ****************************************************************
                              *******************************************************************
                            ********************************************************************
                         * ********************************************************************
                             ********************************************************************
                              *******************************************************************
                             * ** ****************************************************************
                                  ******************* ********************************************
                                   ****************** **********************************************
                                   ****************** *********************************************
                                    ****************  *********************************************
                                     **************    *********************************************
                                     *************    **********************************************
                                      ** **** **        *******************************************
                                           **         *********************************************
                                                         *******************************************
                                                       ***********************************************
                                                          ******************************************
                                                           ***************************************
                                                          ****************************************
                                                          ***************************************
                                                             **********************************
                                                             ************************************
                                                               **********************************
                                                              ****************************** ****
                                                              *****************************  *****
                                                              **** ********************** **  * *
                                                            * *     ******************* **
                                                             *      **  **************   *
                                                                            ******
                                                                           ********
                                                                          **********
                                                                          **********
                                                                           ********
                                                                           *********
                                                                           ********
                                                                             ***
                                                                              ***
                                                                             * *
                                                                               *
    All tests complete.
    
  • David BetzDavid Betz Posts: 14,516
    edited 2010-09-29 19:19
    Is 1.6 still the most recent version of ZOG? I finally got my Basic bytecode compiler split into enough pieces that I think each piece will fit in hub RAM and I'd now like to try to get the whole thing working under ZOG. The first pass needs to be able to read one file at a time so I'm assuming I'll have to interface fsrw to enable file I/O. I'm planning to use the two SPI SRAMs on the C3 as scratch buffers to communicate between the compiler passes and eventually to store the virtual image of the program to run. Later I'll try retargeting the bytecode compiler to ZOG instead of my own VM.

    So, how do I go about doing this? I have four different executable programs that need to run in succession. Is there a way to chain from one program to the next? I need to setup the following pipeline:

    test.bas on an SD card --> tokenize --> parse --> generate --> vm

    What's the best way to do that? Ideally, I'd use some sort of chain("parse") call at the end of the code for tokenize. Alternatively, I guess I could try to put together a simple shell. That would require something like the Unix exec() to launch programs. Does any of this support exist yet for ZOG?

    Thanks,
    David
  • Heater.Heater. Posts: 21,230
    edited 2010-09-29 21:50
    David,

    Yes v1.6 is still the latest. The next version will probably be V2.0 with floating point support. Lonesock as pretty much finalized F32 I just have to find the time to tidy things up a bit.

    Sounds like you need an operating system. Somewhat out of scope for Zog itself. Currently there is no support in Zog or the supporting Spin debugger for reading/writing files or selecting different ZPU images to run. Even using SPI RAM as a scratch pad will be an issue unless there is a SPI driver that can be used from the running C code.

    I will have to have a little think about what can reasonably be done about this.

    I wonder how far along Bill Henning is with his Largos operating system version that was to use Zog?
  • Heater.Heater. Posts: 21,230
    edited 2010-09-29 22:57
    David,

    Just having a quick look at debug_zog to see what it would take to make a "chain" like system for your compiler.

    Looks like everything needed is there it just needs rearranging a bit.

    1) You said you want to run ZPU code from HUB. Currently debug_zog does not read it's ZPU binary from SD file when doing that but that is easily changed.

    2) It would be simple to add a little menu system to select which binary to run from SD. Here different phases of the compiler could be run. Eventually that could be automated.

    3) Currently debug_zog runs the ZPU code and normally that ends with a BREAKPOINT instruction. This is caught in "on_break" which hangs in a loop. This could easily be changed so that it returns back to the main start method where we will have the user menu selection again.

    All of this seems quite easy to do. It might be nice to have anyway. There might be an issue that including the file system code may not leave much space in HUB for the ZPU memory space.

    After that wee need access to your SPI scratch RAM....
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-30 08:13
    Heater. wrote: »
    Sounds like you need an operating system. Somewhat out of scope for Zog itself. Currently there is no support in Zog or the supporting Spin debugger for reading/writing files or selecting different ZPU images to run. Even using SPI RAM as a scratch pad will be an issue unless there is a SPI driver that can be used from the running C code.

    Quickest path may be to run VMCOG+ZOG under SphinxOS, which has simple FAT services available via a "randevouz" location.

    As far as scratchpad goes, simplest / least work is use VMCOG, and access the SPI ram via the VMCOG mailbox. If memory is tight, give it a one page working set, if you are just streaming out object code, it will still be pretty fast.
    Heater. wrote: »
    I wonder how far along Bill Henning is with his Largos operating system version that was to use Zog?

    Have not had time to work on it. I have to get all my boards out there first...

    I did design a new uSD based file system for Largos on my cruise, based on my unreleased Largos flash file system. Long file names (31 characters), sub directories... and needs *MUCH* less code to implement than FAT. As soon as the rest of the boards are in production, I will tackle the new sdfs, as it is needed for Largos. 64KB clusters, 512 byte sectors, max file size is 4GB, max disk size is (2^48)-1 bytes. Unix like, with unix like API.

    The previous Largos memory map reserved the first 8KB for the OS, however with VMCOG, ZOG and SDFS I might be able to cut that back to a 4KB footprint in the hub :)

    In case anyone is wondering - why a new file system?

    Simple. FAT is a resource hog, and long file names are patent-encumbered. SDFS is designed for a tiny memory footprint, and SD cards.
  • David BetzDavid Betz Posts: 14,516
    edited 2010-09-30 09:13
    Heater. wrote: »
    After that we need access to your SPI scratch RAM....
    I was assuming I could handle that myself using the syscall interface? I looked at the way the cog functions are implemented and it looks like it shouldn't be too hard to build an interface to some low-level SPI block read/write functions.
  • jazzedjazzed Posts: 11,803
    edited 2010-09-30 10:19
    I'm using Dave Hein's spinix which is very much like unix and supports y-modem at 57600. The size of programs is limited if you want to keep the OS running. The rendezvous area is less than 200 bytes.

    The latest spinix is here http://forums.parallax.com/attachment.php?attachmentid=72555&d=1282015545
  • Heater.Heater. Posts: 21,230
    edited 2010-09-30 11:14
    David,
    I was assuming I could handle that myself using the syscall interface?

    Yep, good idea, the SYSCALL interface is basically a free for all. As long as we don't clash over syscall ids.

    I must get round to hooking up the C open/close/read/write to the FAT FS at some point.
  • David BetzDavid Betz Posts: 14,516
    edited 2010-09-30 11:16
    Heater. wrote: »
    I must get round to hooking up the C open/close/read/write to the FAT FS at some point.
    I was thinking about that but it will be tricky if we're going to use fsrw since it only supports one open file at a time. Or has that restriction been relaxed in newer versions?
  • Heater.Heater. Posts: 21,230
    edited 2010-09-30 11:28
    I'm really not sure.

    I thought I'd heard there was some FAT code around that one can run two instances of in order to have two open files.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-30 11:30
    Kyle's code handles multiple files *and* subdirectories
    Heater. wrote: »
    I'm really not sure.

    I thought I'd heard there was some FAT code around that one can run two instances of in order to have two open files.
  • lonesocklonesock Posts: 917
    edited 2010-09-30 11:57
    Both Kye's code and FSRW can handle multiple files at the same time. With both, though, if you want to access the SD card from different cogs, you will need to do some wrapping with a lock (or similar).

    As Bill points out, Kye's code can also handle subdirectories, whereas FSRW can not yet do that.

    Jonathan
  • jazzedjazzed Posts: 11,803
    edited 2010-09-30 13:23
    Dave's spinix uses fsrw and provides sub directory support.
  • Heater.Heater. Posts: 21,230
    edited 2010-10-04 23:07
    In case anyone is wondering...Zog has been held up by a wee bug that has had me tearing my hair out for quite a few days.

    I have been integrating Lonesocks F32 to ZOG for floating point support. Well, not integrating so much as F32 is being loaded and used without any change to the actual ZOG in interpreter.

    Everything was progressing OK with all C float operators working and most functions needed for the standard C math library.

    Until I added just one more function, modf, at which point all sorts of strange things started to happen. Tests were failing, messages printed out at the beginning of my test harness, prior to the modf test, were missing, things that had worked stopped working the whole thing would fail with an illegal ZPU op.

    This all seemed to be down to a memory corruption as adding/removing code fixed the problem or moving things around fixed the problem. Only to fail again when the next thing was added.

    Was it my C test harness code, was it the ZOG interpreter, ZOGs syscall handler, the debug Spin code, Lonesocks F32 COG? Where to look. Experiment after experiment did not help pin it down.

    This morning I woke up with the idea to turn off optimizations in the zpu-gcc compiler. BINGO the test now works!. Seems the GCC optimization of float code is screwing things up.

    Now that's a little problem as without optimization the code gets a bit bigger, as shown with the following options:
    none   13820    PASS
    -Os    11416    FAIL      This is "optimize for size"
    -01    11636    FAIL
    -02    14056    PASS
    -03    15928    PASS
    
    Note that turning up the optimization level causes the code to get bigger. Perhaps things are getting unrolled and in lined. Sadly optimizing for binary executable size kills it.

    There have been some fixes to zpu-gcc recently so I will try that soon.

    If anyone is interested I could post what I have working so far. It does not work from external memory yet.
  • lonesocklonesock Posts: 917
    edited 2010-10-05 14:00
    From the current GCC docs (http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options):
    -Os disables the following optimization flags:

    -falign-functions -falign-jumps -falign-loops
    -falign-labels -freorder-blocks -freorder-blocks-and-partition
    -fprefetch-loop-arrays -ftree-vect-loop-version
    I'm betting that disabling any of the -falign-* options would choke Zog.

    Jonathan
  • David BetzDavid Betz Posts: 14,516
    edited 2010-10-05 20:01
    I'm trying to add some simple file I/O to ZOG and am having trouble getting my syscalls working. I'm trying to run the following code but I never seem to get to the _open syscall function.
    #include <unistd.h>
    #include <fcntl.h>
    
    int main(void)
    {
        int ifd, ofd;
    
        ifd = open("in.dat", O_RDONLY);
        ofd = open("out.dat", O_WRONLY | O_CREAT, 0);
    
        close(ifd);
        close(ofd);
    
        return 0;
    }
    

    And here is how I changed the _open and _close functions in debug_zog.spin:
    PRI _open | name, flags
    'Open a file
    'In C, result=_syscall(&t, SYS_open, name, flags, mode);
      name := @zpu_memory[long[framep + 12]] 'Get name ptr from ZPU stack
      flags := long[framep + 16]
      ser.str(string("_open: "))
      ser.str(name)
      ser.crlf
      repeat
    
    PRI _close | fileno
    'Close a file
    'In C, result=_syscall(&t, SYS_close, fileno);
      fileno := long[framep + 16]
      ser.str(string("_close: "))
      ser.dec(fileno)
      ser.crlf
      repeat
    

    Do I have to link with some special library to get the syscall stuff working? I would have expected to get to the _open function and see the string "_open: in.dat" printed out.
  • Heater.Heater. Posts: 21,230
    edited 2010-10-05 22:31
    David,

    The ZPU C library code does not us the SYSCALL instruction by default. And as it has no file system will just return an error. Also stdio is actually going via the virtual UART port rather than SYSCALL.

    You can force the library to use SYSCALL by setting the variable "_use_syscall" to a non-zero value. Have a look in "test/hello/hello.c" to see how I have done this before.

    Have a look in the zpu-gcc library sources to see what is going on with SYSCALL. See toolchain/gcc/libgloss/zpu/syscalls.c
  • Heater.Heater. Posts: 21,230
    edited 2010-10-06 00:59
    Lonesock,

    Interesting observation about -Os.

    However I'm sure -Os is used by Zylin for building small programs. Also I read somewhere on their web pages that zpu-gcc should never emit any unaligned 16 or 32 bit data and if it does that is a bug. Code itself need not be aligned are instructions are only ever byte wide.

    BUT. Way back in time I had checks for unaligned access in my C version of ZPU. It used to catch one unaligned access somewhere prior to starting the executables main().

    Perhaps it's time to investigate this further...
  • David BetzDavid Betz Posts: 14,516
    edited 2010-10-14 05:20
    I'm trying to implement a syscall interface to do some simple file I/O and I'm having some endian problems. I am trying to run the following program:
    #include <unistd.h>
    #include <fcntl.h>
    
    extern int _use_syscall;
    
    int main(void)
    {
        int ifd, ofd;
    
        _use_syscall = 1;
    
        ifd = open("in.dat", O_RDONLY);
        ofd = open("out.dat", O_WRONLY | O_CREAT, 0);
    
        close(ifd);
        close(ofd);
    
        return 0;
    }
    

    Here is the makefile I use to build my program:
    # Path to the gcc executable and friends
    TOOLPATH=/cygdrive/c/Users/dbetz/zpu/bin
    
    CC=$(TOOLPATH)/zpu-elf-gcc
    OC=$(TOOLPATH)/zpu-elf-objcopy -O binary
    ECHO=echo
    
    CFLAGS=-g -Wall -Os 
    LDFLAGS=-phi -Wl,--relax -Wl,--gc-sections
    
    filetest.bin:	filetest.elf
    	@$(OC) filetest.elf filetest.bin
    	@objcopy -I binary -O binary --reverse-bytes=4 filetest.bin
    	@$(ECHO) $@
    
    filetest.elf:	filetest.c
    	@$(CC) $(CFLAGS) $(LDFLAGS) filetest.c -o $@
    	@$(ECHO) $@
    
    clean:
    	rm -rf filetest.elf
    

    Here is the code I added to debug_zog.spin to implement the open and close syscalls:
    PRI _open | name, flags, fileno
    'Open a file
    'In C, result=_syscall(&t, SYS_open, name, flags, mode);
      name := @zpu_memory[long[framep + 12]] 'Get name ptr from ZPU stack
      flags := long[framep + 16]
      ser.str(string("_open: "))
      ser.str(name)
      ser.crlf
      if flags == O_RDONLY
        fileno := RDFILE_FILENO
      elseif flags == (O_WRONLY | O_CREAT)
        fileno := WRFILE_FILENO
      else
        fileno := -1
      long[@zpu_memory[0]) := fileno                        'Return value via _mreg (aka R0) at ZPU address 0 !
    
    PRI _close | fileno
    'Close a file
    'In C, result=_syscall(&t, SYS_close, fileno);
      fileno := long[framep + 16]
      ser.str(string("_close: "))
      ser.dec(fileno)
      ser.crlf
      long[@zpu_memory[0]) := 0                             'Return value via _mreg (aka R0) at ZPU address 0 !
    

    And here is the output I get when I run it in the hub memory mode of ZOG:
    zpu memory at 0000008C
    _open: d.ni
    _open: .tuo
    _close: 1
    _close: 1
    
    #pc,opcode,sp,top_of_stack,next_on_stack
    #----------
    
    0X0000792 0X00 0X000067B8 0X00000889
    BREAKPOINT
    

    It looks like the filenames are byte swapped. Am I doing my build wrong? I copied the build instructions from zog_v1_6/test/fibo/Makefile and it does the objcopy call that byte swaps. Is that correct for my program?
  • Heater.Heater. Posts: 21,230
    edited 2010-10-14 07:24
    David,

    On a quick glance your code and build looks good.

    Yes there will be endianness problems. The Prop and the ZPU are opposite endian.

    Thing is this. When zpu-gcc compiles a 32 bit constant, say, and plops it into the binary it's bytes are wrong way around for the Prop. So 0xDEADBEEF in the code will be picked up by a rdlong as $FEEBDAED.

    At some point some endian swapping has to be done. There are two possibilities:

    1) Have the ZPU binary, as produced by zpu-gcc, sitting in Prop memory and arrange that every access to a LONG or WORD is reversed by the ZOG interpreter.

    2) Reverse the byte order of every LONG in the ZPU binary then run it.

    I decided against option 1) as it would impose a huge performance hit on the interpreter. Most data accesses are 32 bits wide.

    So I introduced the objcopy step to reverse the byte order of LONGs in the binary. This then means that all the 16 bit data and strings are scrambled.

    You will notice in the interpreter code that when dealing with BYTES and WORDs I have put in some XOR operations on the addresses which sorts out the endianness of those items. As there is a lot less WORD and BYTE access going on this is a lot less of a performance hit.

    Turning to debug-zog.spin you will see similar XOR operations going on when required. For example in SYS_Write the loop that actually outputs a buffer of bytes to the console port (stdout) looks like:
    repeat i from 0 to long[framep + 20] - 1              'Output nbytes
            UART.tx(byte[p ^ %11])                              'XOR here is an endianess fix.
            p++
    

    Notice the sneaky "^%11" which inverts the two least significant bits of the address pointed at by p such that what goes out to the UART is in the correct order.

    I hope you can do something similar with the file names in SYS_open. Then of course any data buffers read or written will have to be fixed as well....
  • David BetzDavid Betz Posts: 14,516
    edited 2010-10-14 08:06
    Thanks for your comments. It looks like I have some more work to do to sort out the endian issues. I suspected that but wanted to make sure I was building this correctly so I didn't waste time solving a problem that didn't really exist. Now I know the path I have to take and it shouldn't be too difficult.
  • David BetzDavid Betz Posts: 14,516
    edited 2010-10-14 19:30
    Heater. wrote: »
    Yes there will be endianness problems. The Prop and the ZPU are opposite endian.
    There are certainly CPUs that can run in either big or little endian mode (MIPS for one). Since we have the sources for the ZPU version of GCC, wouldn't it be relatively easy to make a compiler with the correct endianness to match the Propeller? Have you been able to build GCC from source?
  • Heater.Heater. Posts: 21,230
    edited 2010-10-15 00:22
    zpu-gcc sources can be obtained with
    # git clone git://repo.or.cz/zpugcc.git
    

    I have built the compiler from those sources very easily using the build.sh script included in there.

    As for changing the compiler endianness I have no idea what might be involved and no idea where to even start tackling the issue.

    As it is I quite like Zog binaries to be usable without change in Zog, the ZPU itself, the ZPU emulator (Java), my own ZPU emulator (C) and soon ZPU on the micro-controller that cannot be named here.

    Still feel free to have a go, if it's doable we can make an opposite ended Zog (Goz?).
Sign In or Register to comment.