Shop OBEX P1 Docs P2 Docs Learn Events
Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S - Page 18 — Parallax Forums

Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S

1151618202138

Comments

  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-01 18:42
    heater,

    Can you try the 0.981 I just posted to the vmcog thread? I think it might fix the fibo problem...

    David found a bug - I used an 'andn' where I should have used an 'and' ... sheesh ... plus a performance boost for pre-setting counters!

    Let me know...

    Well, if it really was a bug, it wasn't the one that prevented my Hydra SRAM board from working. It still doesn't work even with this fix. I guess I'll keep looking...
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-01 19:44
    Sorry, this bug had to do with LRU wear-leveling.

    Some bugs are VERY elusive; I am still having trouble tracking down why writes are not working to FlexMem...
    David Betz wrote: »
    Well, if it really was a bug, it wasn't the one that prevented my Hydra SRAM board from working. It still doesn't work even with this fix. I guess I'll keep looking...
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-01 19:47
    Nice...

    I've also been chewing on a very large VM for big SDRAM's... I think I've come up with a way to make a 2-way associative VMCOG that can handle 256MB with only a 512 byte table in the hub!
    jazzed wrote: »
    @Bill. Cool. I'll try to integrate SDRAM code with the new VMCOG tomorrow.

    Meanwhile, I have some new test results. I found a way to cut out a window miss in the SdramCache.spin code and decided to chop out some other things that were giving me a headache.

    Recent changes have shaved 12 minutes off the 16MB psrandom memory test (was 1:22, now 1:10 H:MM). Fibo 20 now runs in 2383ms and was 2825ms (all tests with an 80MHz system clock). All ZOG code remains the same as before. Kernel and memory enhancements are the only differences.

    There are more longs free in zog.spin now. I'll post new versions of zog.spin and SdramCache.spin later.

    --Steve
  • jazzedjazzed Posts: 11,803
    edited 2010-09-02 00:12
    I've also been chewing on a very large VM for big SDRAM's... I think I've come up with a way to make a 2-way associative VMCOG that can handle 256MB with only a 512 byte table in the hub!
    Cool. The smaller and faster, the better. After all, some buffer space will be necessary for a TV or VGA driver. A 2 way set would be very good for ZOG. I tried a 4 way set briefly, but the set decision instructions ate up lots of performance in my case.

    Cheers.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-02 09:28
    Thanks... I am forcing myself to get FlexMem working, and the elusive ZOG issue fixed, before I do the big VM.
    jazzed wrote: »
    Cool. The smaller and faster, the better. After all, some buffer space will be necessary for a TV or VGA driver. A 2 way set would be very good for ZOG. I tried a 4 way set briefly, but the set decision instructions ate up lots of performance in my case.

    Cheers.
  • lonesocklonesock Posts: 917
    edited 2010-09-02 14:04
    I have a new faster version of the division & remainder code, below, but looking at zog brought up this question: in a /0 error, the code jumps to "done"...so does pc stay the same and the program loops forever, continuously popping data off the stack?

    Also, what is the desired return value for /0? Spin sets the return value to 0, but if we just say /0 is "undefined", it's easier to just return the numerator untouched. Any thoughts?

    Jonathan
    {{==    div_flags[0]: store remainder           ==}}
    {{==    if data == 0, tos remains unchanged     ==}}
    fast_div                ' tos = tos / data
                            ' handle the signs, and check for a 0 numerator or divisor
                            abs     tos, tos        wc,wz   ' new version
                 if_z       jmp     #done_and_inc_pc
                            and     div_flags, #1   wz      ' keep only the 0 bit, and remember if it's a 0
                            muxc    div_flags, #2           ' set bit 1 so match sign of tos (quotient or remainder)
                            abs     t1, data        wc      ' was the denominator also negative?
                 if_z_and_c xor     div_flags, #2           ' den was negative, and we're looking for quotient, so invert bit 1
    
                            mov     t2, #32         wz      ' do 32 iterations, and clear the z flag
                            mov     data, #0                ' my remainder is reset to 0
    :align                  shl     tos, #1         wc      ' shift the numerator into the remainder a bit at a time
                  if_nc     djnz    t2, #:align     wz      ' while skipping all the initial bits
                            ' Done aligning...make sure we have some bits left to process (t2 is not 0)
    :loop         if_nz     rcl     data, #1                ' shift the bit from our numerator into our remainder term
                  if_nz     cmpsub  data, t1        wc      ' sub the divisor if possible
                  if_nz     rcl     tos, #1         wc      ' move C into the low bit, and store the high bit
                  if_nz     djnz    t2, #:loop              ' go back for more bits
    
                            ' correct the sign, and select quotient or remainder
                            shr     div_flags, #1   wc,wz
                  if_c      mov     tos, data               ' user wanted the remainder, not the quotient
                            negnz   tos, tos                ' need to invert the result
                            jmp     #done_and_inc_pc
    
  • Heater.Heater. Posts: 21,230
    edited 2010-09-02 21:00
    lonesock,

    Wow, great stuff.

    I think what happened is that I was in two minds about what to do for /0, return an undefined result and continue or hit a breakpoint. Being undecided it just got left as "wrong" as it is:)

    Hitting a break point would be better when working under debug_zog.
  • lonesocklonesock Posts: 917
    edited 2010-09-03 08:16
    Gotcha, thanks. So, what do you think of this code?
    zpu_div                 call    #pop                    ' pop sets z for me
                 'if_z      jmp     #div_zero_error         ' hmmm, does this work?
                  if_z      call    #break                  ' fast_div can handle a 0 denominator, just let the user know
                            mov     div_flags, #SPIN_DIV_OP ' signify we want the quotient
                            tjnz    tos, #fast_div          ' if tos <> 0, perform the division: tos = tos / data
                            jmp     #done_and_inc_pc        ' in case tos = 0, we're done already
    
    zpu_mod                 call    #pop                    ' pop sets z for me
                 'if_z      jmp     #div_zero_error         ' hmmm, does this work?
                  if_z      call    #break                  ' fast_div can handle a 0 denominator, just let the user know
                            mov     div_flags, #SPIN_REM_OP ' signify we want the remainder
                            tjnz    tos, #fast_div          ' if tos <> 0, perform the remainder: tos = tos // data
                            jmp     #done_and_inc_pc        ' in case tos = 0, we're done already
    

    tjnz will jump in 4 clocks if tos is non-0, but will take an extra 4 clocks to fall through if tos is 0. If tos is 0, though, falling through is _much_ cheaper than calling the division routine. And I took the tos==0 check out of fast_div.

    Jonathan
  • jazzedjazzed Posts: 11,803
    edited 2010-09-03 12:24
    jazzed wrote: »
    Heater, the first generally available 32MB SDRAM board will be Gadget Gangster Propeller Platform compatible. A Propeller Single Board Computer will follow that.
    Got SDRAM on the new board working today. :)

    ZOG v1.6 (CACHE)
    Starting SD driver...0000FFFF
    Mounting SD...00000000
    Booting fibo.bin
    00000000

    Reading image... 17055 Bytes Loaded.
    Done

    Clearing bss: ....
    Running Program!
    fibo(00) = 000000 (00000ms)
    fibo(01) = 000001 (00000ms)
    fibo(02) = 000001 (00000ms)
    fibo(03) = 000002 (00000ms)
    fibo(04) = 000003 (00000ms)
    fibo(05) = 000005 (00001ms)
    fibo(06) = 000008 (00002ms)
    fibo(07) = 000013 (00004ms)
    fibo(08) = 000021 (00007ms)
    fibo(09) = 000034 (00011ms)
    fibo(10) = 000055 (00019ms)
    fibo(11) = 000089 (00031ms)
    fibo(12) = 000144 (00050ms)
    fibo(13) = 000233 (00082ms)
    fibo(14) = 000377 (00132ms)
    fibo(15) = 000610 (00213ms)
    fibo(16) = 000987 (00346ms)
    fibo(17) = 001597 (00561ms)
    fibo(18) = 002584 (00910ms)
    fibo(19) = 004181 (01471ms)
    fibo(20) = 006765 (02376ms)
    fibo(21) = 010946 (03844ms)
    fibo(22) = 017711 (06225ms)

    Updated zog.spin and SdramCache.spin attached.

    This zog.spin uses fewer longs.
    This SdramCache.spin is for latest SDRAM.
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-04 19:23
    I finally got my Hydra SPI SRAM and microSD card working with ZOG/VMCOG. Here is my first attempt at running fibo.bin. It starts out okay but it looks like it gets lost at fibo(22) and then crashes at fibo(24). I guess I still have some debugging to do but at least things are starting to work.
    ZOG v1.6 (VM)
    Starting SD driver...0000FFFF
    Mounting SD...00000000
    Opening ZPU image...00000000
    Reading image...Done
    fibo(00) = 000000 (00000ms)
    fibo(01) = 000001 (00004ms)
    fibo(02) = 000001 (00001ms)
    fibo(03) = 000002 (00001ms)
    fibo(04) = 000003 (00001ms)
    fibo(05) = 000005 (00002ms)
    fibo(06) = 000008 (00003ms)
    fibo(07) = 000013 (00005ms)
    fibo(08) = 000021 (00009ms)
    fibo(09) = 000034 (00014ms)
    fibo(10) = 000055 (00023ms)
    fibo(11) = 000089 (00036ms)
    fibo(12) = 000144 (00058ms)
    fibo(13) = 000233 (00094ms)
    fibo(14) = 000377 (00152ms)
    fibo(15) = 000610 (00245ms)
    fibo(16) = 000987 (00396ms)
    fibo(17) = 001597 (00642ms)
    fibo(18) = 002584 (01039ms)
    fibo(19) = 004181 (01681ms)
    fibo(20) = 006765 (02720ms)
    fibo(21) = 010946 (04402ms)
    fibo(22) = 018255 (07123ms)
    fibo(23) = 029741 (11525ms)
    fibo(24) =
    #pc,opcode,sp,top_of_stack,next_on_stack
    #----------
    
    0X0000001 0X00 0X0000DD44 0X00000002
    BREAKPOINT
    
  • jazzedjazzed Posts: 11,803
    edited 2010-09-04 23:41
    Hi David. Glad to see you making progress.

    Cheers.
  • Heater.Heater. Posts: 21,230
    edited 2010-09-05 00:06
    David,

    Excellent progress. It's great to see Zog up and running on another platform.

    Be aware that there is a problem with VMCog. I have similar failures under VMCog on the TriBlade. So you probably have your SPI RAM interface working fine.

    I still had a slight suspicion that the fault could be with the TriBlade memory driver code I put into VMCog but as you have similar symptoms with your SPI RAM interface and as Jazzed has fibo working fine with his SDRAM cache it points more to the fault being in the actual VMCog logic.

    Do you have a C application in mind for your SPI RAM system? I'm sure there are some Prop specific features you might want Zog to support.

    Looks like my next big hurdle is getting floating point accelerated with a floating point "coprocessor" COG using a modified float32 object.
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-05 05:23
    Yeah, I know about the problem with VMCOG. I've been trying to understand the code to help fix it but haven't found the problem yet. I haven't written my own SPI code for VMCOG. I'm using Bill's MORPHEUS1 SPI SRAM driver with the pins changed to match my Hydra board. I'll try to look harder today to see if I can figure out what might be wrong with VMCOG.

    My original plan was to port my simple Basic bytecode interpreter to ZOG for use on Andre LaMothe's C3 board so I won't have a lot of memory available. It will have to fit in 64k if it is going to work. If I can get my bytecode compiler/interpreter working under ZOG I may try retargeting the compiler to ZOG opcodes instead of my own VM.
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-05 05:51
    Heater. wrote: »
    Be aware that there is a problem with VMCog. I have similar failures under VMCog on the TriBlade. So you probably have your SPI RAM interface working fine.
    Okay, I have what is probably a dumb question but how do you know that this problem with fibo under VMCOG isn't related to stack overflow and variables being corrupted by that? The fibo function is heavily recursive.
  • Heater.Heater. Posts: 21,230
    edited 2010-09-05 06:28
    David,
    how do you know that this problem with fibo under VMCOG isn't related to stack overflow and variables being corrupted by that?

    Not a dumb question at all, Bill asked it a while back and it had crossed my mind. I think I convinced my self that it is not the problem:

    1) This fibo is not as heavily recursive as it appears. In order to calculate fibo(26) it has to make a call on fibo(25), to calculate that it needs to make a call on fibo(24) and so on. So the maximum call depth is only 26 or so.

    2) The fibo binary is 3.5KB in size. It runs from HUB memory quite happily in a zpu_memory size of only 4KB. I just tried again with the special SD free version I made for Bill. If it were running out of stack then the problem would show when using such small HUB memory as well. Running from 64K external memory is way more than enough.

    3) I have used small numbers of VMCog pages, 4, 8, 10 etc such that when VM working memory is far away from any Spin code in HUB. I have no idea how VMCog uses those pages, does it, for example start using more pages from the top downwards as it needs them? Or is it working from the bottom up? Either way there should be no collision with such small sets.

    4) We have seen other, non recursive, programs fail with odd results. Like Dhrystone.

    The fact that you have the same issues eliminates my TriBlade memory interface functions as the source of the problem, I think. There is still the possibility that there is something up with the Zog/VMCog interface but I can't see it.
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-05 09:38
    Does ZOG always do aligned accesses to memory? It looks like VMCOG ignores the low order bits on WORD and LONG accesses.
  • Heater.Heater. Posts: 21,230
    edited 2010-09-05 11:02
    Yep, all ZPU and hence Zog accesses are aligned appropriately for the size.

    The zpu-gcc compiler should only ever generate aligned accesses unless you are doing some funky stuff with pointer casting or unions. But as we only ever write portable code, don't we, that is not a problem.

    This is very fortunate as it makes the fixes required for endianness in Zog much simpler/quicker.

    That reminds me. My ZPU simulator in C checks for access alignment and was complaining that there was an unaligned access somewhere in the C start up code. I should pin that down and report a bug to Zylin.
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-14 15:46
    After getting my Hydra SPI SRAM card working with VMCOG I decided to try creating an SDRamCache-compatible version. It seems like it makes sense to pass the cache line addresses back to ZOG rather than one byte at a time. I sort of have it working but the printout is a little odd. There are strange characters here and there. Has anyone else run into this issue? Also, I couldn't get it to work at all until I commented out the BSS clearing in the USE_JCACHED_MEMORY path. The check_bytecode function would fail part way through when BSS was cleared. I also noticed that the USE_VIRTUAL_MEMORY path doesn't clear BSS. Why is it cleared on the JCACHE path and could commenting that out be what is causing my odd behavior.

    Here are my results:
    ZOG v1.6 (CACHE)
    Starting SD driver...0000FFFF
    Mounting SD...00000000
    Booting fibo.bin
    00000000
    
    Reading image... 17055 Bytes Loaded.
    Done
    
    Waiting 2 seconds before program check...
    
    Restarting SD driver...0000FFFF
    Remounting SD...00000000
    Checking image... 17055 Bytes Checked.
    Program Load OK.
    
    Running Program!
    fibo(0 ) = 000000 (00008ms)
    fibo(0 ) = 000001 (00008ms)
    fibo(0 ) = 000001 (00008ms)
    fibo(0 ) = 000002 (00008ms)
    fibo(0 ) = 000003 (00008ms)
    fibo(0 ) = 000005 (00008ms)
    fibo(0 ) = 000008 (00008ms)
    fibo(0 ) = 000013 (00008ms)
    fibo(0 ) = 000021 (00008ms)
    fibo(0 ) = 000034 (00014ms)
    fibo(10) = 000055 (00023ms)
    fibo(11) = 000089 (00037ms)
    fibo(12) = 000144 (00060ms)
    fibo(13) = 000233 (00097ms)
    fibo(14) = 000377 (00157ms)
    fibo(15) = 000610 (00254ms)
    fibo(16) = 000987 (00414ms)
    fibo(17) = 001597 (00673ms)
    fibo(18) = 002584 ( 1092ms)
    fibo(19) = 004181 ( 1765ms)
    fibo(20) = 006765 ( 2846ms)
    fibo(21) = f10946 ( 4589ms)
    fibo(22) = f17711 ( 7402ms)
    fibo(23) = f28657 (11954ms)
    fibo(24) = f46368 (19329ms)
    fibo(25) = f75025 (31299ms)
    fibo(26) = 121393 ( 9798ms)
    
    #pc,opcode,sp,top_of_stack,next_on_stack
    #----------
    
    0X00034D1 0X00 0X0000FFB8 0X00003822
    BREAKPOINT
    

    Notice that there are leading zeros on the times in the first few entries and there is a leading 'f' on the fibo results toward the end of the table. The funny thing is, the results are correct. Any ideas on what might be going wrong here? I've attached my code but you won't be able to run it without a few changes to debug_zog.spin.
  • jazzedjazzed Posts: 11,803
    edited 2010-09-14 19:07
    David Betz wrote: »
    I sort of have it working but the printout is a little odd. There are strange characters here and there. Has anyone else run into this issue?
    David I saw problems like that when memory was corrupted. I've not seen any problems like that since I removed the write-back on dirty page only code. Some day I'll look at that again, but I figure performance is good enough as is. I see your cache performance stacks up pretty good :)
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-14 19:13
    jazzed wrote: »
    David I saw problems like that when memory was corrupted. I've not seen any problems like that since I removed the write-back on dirty page only code. Some day I'll look at that again, but I figure performance is good enough as is. I see your cache performance stacks up pretty good :)

    Do you think the memory corruption could be due to me commenting out the BSS clearing code? One problem is that I'm using the USE_JCACHED_MEMORY path but my memory is only 64k in size. Maybe there are some dependencies on much larger memories in that build of ZOG?
  • jazzedjazzed Posts: 11,803
    edited 2010-09-14 20:25
    David Betz wrote: »
    Do you think the memory corruption could be due to me commenting out the BSS clearing code? One problem is that I'm using the USE_JCACHED_MEMORY path but my memory is only 64k in size. Maybe there are some dependencies on much larger memories in that build of ZOG?
    David, BSS clearing mainly affects uninitialized global variables such as heap_ptr which impacts successful use of malloc for example (mall.c fails without clearing bss). There is a memory size setting for USE_JCACHED_MEMORY which is set by default to 32MB (where the stack starts and grows down) and should be adjusted for your memory size. I've thought briefly that we should use an aliasing algorithm to set the memory size automatically ....
  • Heater.Heater. Posts: 21,230
    edited 2010-09-15 01:45
    David, I have not seen such scrambled output from Zog for a long time. I suspect that memory size setting might be to blame.

    However, the initial stack pointer is set according to the given memory size and I would expect specifying 32M when you have 64K would cause the stack to appear correctly at the top of your real RAM, the top address bits being ignored, and hence work correctly.

    I have been happy with not automatically detecting RAM size.
    Sometimes when hunting interpreter bugs I have had to run tens of thousands of instructions whilst capturing the register dump on each step. That trace is then compared with that produced by the real VHDL ZPU core running under the GHDL simulator on Linux.

    Those two execution traces don't compare (diff) well unless the stack pointer is in the same place. The VHDL model only has 16K RAM or so.

    Also it is possible that not all external RAM is dedicated to Zog. Perhaps we might want to put a RAM disk there or such.
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-15 03:56
    David Betz wrote: »
    There are strange characters here and there. Has anyone else run into this issue? Also, I couldn't get it to work at all until I commented out the BSS clearing in the USE_JCACHED_MEMORY path. The check_bytecode function would fail part way through when BSS was cleared. I also noticed that the USE_VIRTUAL_MEMORY path doesn't clear BSS. Why is it cleared on the JCACHE path and could commenting that out be what is causing my odd behavior.
    It looks like my strange characters were due to the fact that I removed the code that clears BSS. When I rewrote that code so that it works with my 64k SPI SRAM all of the strange characters went away and my cache code seems to be working fine. Unfortunately, I don't seem to have gotten any benefit from moving to the cache model from the VMZOG scheme. I would have thought that fewer calls to the VM code would speed things up but they don't in my case. Maybe it's because both use the same direct mapped cache replacement algorithm and that may not work as well as Bill's "least often used" scheme.
  • Heater.Heater. Posts: 21,230
    edited 2010-09-15 04:12
    David,
    Unfortunately, I don't seem to have gotten any benefit from moving to the cache model from the VMCOG scheme.
    Are you basing that on just running fibo?

    The fibo test runs very well in very small memory space, 3 or 4 pages in VMCOG is the same speed as 10 or 20. It never has to swap anything whilst running the code that is actually timed.

    We have to try a range of bigger programs that use more data and stack to get a feel for these things.
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-15 04:36
    Yes, so far I've only tried fibo. Now that I have both the VMCOG and Cache solutions working I'll start trying to compile bigger programs. I did all of this so I could try porting a simple Basic bytecode compiler to the Propeller. I guess I should try that next.
  • jazzedjazzed Posts: 11,803
    edited 2010-09-15 06:27
    @David it was an interesting experiment at least and I appreciate the data point. The cache offers 2 benefits today: 1) support for large memory, and 2) room for unrolled read/write burst code best suited for SDRAM and other bursty technology. Thanks to you VMCOG is actually functional so my 3rd reason no longer applies :)

    @Heater I've tried running the Dhrystone tests, but all I get at the end is 0 MIPS. Everything else seems to work fine. I noticed I have to fix the Makefile to build the test code, so I'm not sure things are right with that. Have you run any Dhrystone tests lately?
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-15 06:49
    jazzed wrote: »
    @David it was an interesting experiment at least and I appreciate the data point. The cache offers 2 benefits today: 1) support for large memory, and 2) room for unrolled read/write burst code best suited for SDRAM and other bursty technology. Thanks to you VMCOG is actually functional so my 3rd reason no longer applies :)
    Thanks but it was actually Bill who fixed the bug. I made an attempt to fix a bug but my fix itself was broken and turned out not to be the biggest problem anyway. I will try some other code later tonight. What sorts of real programs have people built using ZOG? All I've heard about so far are benchmarks.
  • Heater.Heater. Posts: 21,230
    edited 2010-09-15 07:03
    Jazzed,

    I never got around to providing Dhrystone with a good timer, after all it did not work anyway. If it runs in less than 40 seconds it could be hacked to use the same CNT based timer as fibo.
  • Heater.Heater. Posts: 21,230
    edited 2010-09-15 07:15
    David,
    ...real programs...

    What? We don't do "real programs" around here, only virtual machines, interpreters and compilers. You know, "virtual programs":)

    Basically we've only just got everything working nicely on the various platforms so it's still early days yet.

    Speaking of which, when I have something that works with more that 64K RAM I want to get my C version of the ZPU running under Zog. That is then ZPU on ZPU...

    I'd really like to make a C version of debug_zog for a totally Spin free environment. Problem is I'm not sure any of the FAT drivers in C I have are small enough to fit in HUB. In my mind a file system is essential for any real programs.

    Hmm.. there was that idea to use the FAT boot sectors to up ZiCog or yZ80 perhaps Zog could do that.

    What would you have in mind as a "real program"?
  • David BetzDavid Betz Posts: 14,511
    edited 2010-09-15 07:37
    Heater. wrote: »
    What would you have in mind as a "real program"?

    XLISP of course! :-)

    I'm going to try building my Basic compiler and VM tonight. I may have to split the compiler into multiple passes to make it fit though.
Sign In or Register to comment.