Shop OBEX P1 Docs P2 Docs Learn Events
Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S - Page 17 — Parallax Forums

Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S

1141517192038

Comments

  • jazzedjazzed Posts: 11,803
    edited 2010-08-28 08:12
    lonesock wrote: »
    Do you happen to have a version of mall.c precompiled that I can test in Hub RAM?
    Try this one. Meanwhile I'll try your suggested mmul changes.
  • jazzedjazzed Posts: 11,803
    edited 2010-08-28 08:52
    @lonesock, both versions seem to multipy ok for dereferencing the end of the block, but something else is happening. I pulled in an old version of math_F4 just to make sure the file I posted works for my own comfort.
    testmarch 800 200
    0x4e60 - 0x505c [b][color=orange]<-- correct now. 0x505c was 0x4e5c before recommended changes.[/color][/b]
    W0^ = Write 0 march up.
    W0v = Write 0 march down.
    R0^ = Read  0 march up.
    R0v = Read  0 march down.
    W0^ 0x00004e60 0x00000000 0x55555555
    W0^ 0x00005260 0x00000100 0x55555555
    R0^W1^ 0x00004e60 0xaaaaaaaa
    R0^W1^ 0x00005260 0xaaaaaaaa
    R1vW0v 0x00004e5f  [b][color=orange]<-- this is wrong though, it should be R1vW0v 0x00004e60[/color][/b]
    Error @ 0x4e5f[17f] Expected 0xaaaaaaaa Received 0x00000809
    
    00004e5f: 0x00000809 0x55555555 0x55555555 0x55555555 
    
    
  • lonesocklonesock Posts: 917
    edited 2010-08-28 12:11
    Found it. My version of mult didn't handle the top 32 bits. I'm not sure how to do a 64-bit result without a worst case that is slower than the original (zog 1.3) implementation. I think we should just punt for now, and revert to the 1.3 multiplication code, and I'll look at this entire math block hopefully this coming week.

    Sorry for the confusion,
    Jonathan

    EDIT: Does ZPU even support returning the top 32 bits of a 64 bit multiplication? If not, I can strip out some of the logic in that math block. You would only need 32x32=>32, 32/32=>32, and 32remainder32=>32, right? And it currently seems to be all set for signed multiplication...is that what ZPU expects?
  • Heater.Heater. Posts: 21,230
    edited 2010-08-28 12:36
    Hey Guys,

    I have a working v1.5 here now. At least mall(function) and fibo are working from HUB and VMCog.

    Basically I have adopted Jazzed's SDRAM version whole sale, back fitted the v1.3 math_F4 and tweaked a few details around.

    I want to make some fixes around the ZPU RAM - HUB RAM - I/O mapping in read/write_long then if nothing else comes up that is version 1.5.

    Just now I have to eat some good food and drink some good wine:) Perhaps I can post 1.5 tomorrow afternoon.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-28 18:08
    Lonesock:
    Does ZPU even support returning the top 32 bits of a 64 bit multiplication?

    The best reference we have for zpu_mult is ZyLin's simulator Java code. In there MULT is implimented as:
            pushIntStack(popIntStack() * popIntStack());
    
    So, what we are looking at is signed 32 bit multiplication with the top 32 bits discarded. There aren't any longer integer operations in the ZPU.

    The simulators DIV is:
                                int a;
                                int b;
                                a = popIntStack();
                                b = popIntStack();
                                if (b == 0)
                                {
                                    throw new CPUException();
                                }
                                pushIntStack(a / b);
    
    With MOD looking much the same.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-28 18:32
    Out of curiosity I had to look at what zpu-gcc produces for integer multiplication, here is the listing:
    int multest(int a, int b)
    {
         622:       ff              im -1
         623:       3d              pushspadd
         624:       0d              popsp
    
    00000625 <.LM17>:
    /home/michael/zog_v1_5/test/hello/hello.c:59
        return(a * b);
         625:       73              loadsp 12
         626:       75              loadsp 20
         627:       29              mult
         628:       80              im 0
         629:       0c              store
    
    0000062a <.LM18>:
    /home/michael/zog_v1_5/test/hello/hello.c:60
    }
         62a:       83              im 3
         62b:       3d              pushspadd
         62c:       0d              popsp
         62d:       04              poppc
    

    Changing to longs instead of ints we get the same code.

    Changing to unsigned we get the same code again.

    Changing to short ints we get:
    short multest(short a, short b)
    {
         622:       fd              im -3
         623:       3d              pushspadd
         624:       0d              popsp
         625:       75              loadsp 20
         626:       90              im 16
         627:       2b              ashiftleft
         628:       77              loadsp 28
         629:       90              im 16
         62a:       2b              ashiftleft
    
    0000062b <.LM17>:
    /home/michael/zog_v1_5/test/hello/hello.c:59
        return(a * b);
         62b:       71              loadsp 4
         62c:       90              im 16
         62d:       2c              ashiftright
         62e:       71              loadsp 4
         62f:       90              im 16
         630:       2c              ashiftright
         631:       29              mult
         632:       70              loadsp 0
         633:       90              im 16
         634:       2b              ashiftleft
         635:       70              loadsp 0
         636:       90              im 16
         637:       2c              ashiftright
         638:       80              im 0
         639:       0c              store
         63a:       51              storesp 4
         63b:       52              storesp 8
         63c:       55              storesp 20
         63d:       53              storesp 12
    
    0000063e <.LM18>:
    /home/michael/zog_v1_5/test/hello/hello.c:60
    }
    

    So 16 bit working on Zog is going to tedious.

    Note the "IM 0" and "STORE" used to return the results. Results are returned via the fake register at location zero in memory. This is a pain for running from read only memory. I think it may be possible to relocate it though.
  • jazzedjazzed Posts: 11,803
    edited 2010-08-28 18:46
    Did you try "long long" ? Many 32 bit machines use that for 64 bits.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-28 18:59
    Ah yes, what am I thinking?

    The long long multiplication gets a bit, well, long:
    multest():
    /home/michael/zog_v1_5/test/hello/hello.c:58
    
    long long multest(long long a, long long b)
    {
         622:       f7              im -9
         623:       3d              pushspadd
         624:       0d              popsp
         625:       7b              loadsp 44
    
    00000626 <.LM17>:
    /home/michael/zog_v1_5/test/hello/hello.c:59
        return(a / b);
         626:       7f              loadsp 60
         627:       61              loadsp 68
         628:       59              storesp 36
         629:       55              storesp 20
         62a:       77              loadsp 28
         62b:       56              storesp 24
         62c:       7d              loadsp 52
         62d:       7f              loadsp 60
         62e:       59              storesp 36
         62f:       53              storesp 12
         630:       77              loadsp 28
         631:       54              storesp 16
         632:       8c              im 12
         633:       3d              pushspadd
         634:       f8              im -8
         635:       05              add
         636:       52              storesp 8
         637:       58              storesp 32
         638:       83              im 3
         639:       cf              im -49
         63a:       3f              callpcrel
         63b:       78              loadsp 32
         63c:       7a              loadsp 40
         63d:       58              storesp 32
         63e:       78              loadsp 32
         63f:       0c              store
         640:       76              loadsp 24
         641:       84              im 4
         642:       19              addsp 36
         643:       0c              store
    
    00000644 <.LM18>:
    /home/michael/zog_v1_5/test/hello/hello.c:60
    }
         644:       77              loadsp 28
         645:       80              im 0
         646:       0c              store
         647:       8b              im 11
         648:       3d              pushspadd
         649:       0d              popsp
         64a:       04              poppc
    

    The worst thing we can do is unsigned mod and div. They dive off into some lengthy subroutines.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-29 08:42
    Attached is Zog v1.5

    This version has:

    Adopted Jazzed's 32MByte SD RAM cache.
    Adopted Jazzed's use of userdefs.spin for hardware configuration-
    Backed out the math_F4 changes of v1.4, now mall.c works again.
    Moved all user configurable items to nearer the top of debug_zog.spin.

    I did not make any changes to the ZPU memory map, HUB RAM space, I/O space etc. Basically I could not decide on a memory layout.

    I would like to have, moving up memory:

    1) ZPU RAM space (HUB or external, 32K up to 32M)
    2) HUB RAM access space(32KB)
    3) COG RAM access space(512 LONGs, 2KB)
    4) Memory mapped IO (The rest)

    The memory map should be the same for all memory hardware solutions. This way C programs won't need any #defines or such to find HUB, COG, and I/O.

    Currently it means the HUB area would start at 32MByte to accommodate Jazzed's SDRAM solution. This would be so even if the actual ZPU RAM available is less when using HUB or smaller external RAM.

    Are we likely to want to go bigger? Seems unlikely.

    One could define a ZPU RAM address space as, say, 1GByte but then ZPU needs more IM instructions to build the bigger addresses.

    Any ideas?
  • jazzedjazzed Posts: 11,803
    edited 2010-08-29 10:47
    Looks good Heater :)

    On memory size: I haven't tested anything other than 32MB yet, but theoretically one SDRAM chip can be 8MB, 16MB, 32MB, 64MB, or even 128MB. Here's a list of what's currently available.
    TSOP 54, 3.3V SDRAM Part Numbers on Digikey
    
    MT48LC8M8A2TG-75    64M   (8M x 8)
    HYB39S128800FE-7    128M (16M x 8)
    HYI39S128800FE-7    128M (16M x 8)
    IS42S81600E-7TL     128M (16M x 8)
    MT48LC32M8A2P-75    256M (32M x 8)
    MT48LC32M8A2P-7E    256M (32M x 8)
    IS42S83200D-7TL     256M (32M x 8) *
    MT48LC64M8A2TG-75   512M (64M x 8)
    
    *Note the IS42S83200D-7TL is the only part I've tested so far.
    

    One thing to note about the USE_JCACHED_MEMORY design is the way cache memory is accessed.

    The cache driver lives in a separate COG and provides buffers. So zog.spin keeps up with the address range represented by the buffer and accesses data with a single HUB transaction on a cache hit.

    I've found one small performance improvement for zog's cache interface, and I think there is another in there.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-29 11:43
    128MB Wow.

    Question is: Is it really practical to make use of so much RAM. I mean, despite page tables or caches we are seriously lagging behind in performance compared to what a processor with a memory bus can do. Generating large addresses (many IMs) for data access and function calls slows things a bit more.

    As I said, I'd like the memory map of COG, HUB and I/O access to be pretty well fixed for all time and all platforms so that one does not have to "port" C code from place to place.

    So we could settle on 32Mb, or 64 or 128 and be done with it forever.

    Of course one day this will all have to work on a Prop II with whatever help it has for external RAM in hardware. So perhaps we should push the envelope a bit on Prop I now.

    What the heck, I'll throw my vote in for putting HUB space up at 128MB. That should be enough for anyone, as Bill Gates famously didn't say. It should be enough to keep uCLinux happy :)
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-29 11:56
    My vote is 256MB max - why?

    4x IM = 28 bit constant

    2^28 = 256MB
    Heater. wrote: »
    128MB Wow.

    Question is: Is it really practical to make use of so much RAM. I mean, despite page tables or caches we are seriously lagging behind in performance compared to what a processor with a memory bus can do. Generating large addresses (many IMs) for data access and function calls slows things a bit more.

    As I said, I'd like the memory map of COG, HUB and I/O access to be pretty well fixed for all time and all platforms so that one does not have to "port" C code from place to place.

    So we could settle on 32Mb, or 64 or 128 and be done with it forever.

    Of course one day this will all have to work on a Prop II with whatever help it has for external RAM in hardware. So perhaps we should push the envelope a bit on Prop I now.

    What the heck, I'll throw my vote in for putting HUB space up at 128MB. That should be enough for anyone, as Bill Gates famously didn't say. It should be enough to keep uCLinux happy :)
  • Heater.Heater. Posts: 21,230
    edited 2010-08-29 12:23
    Bill, very compelling argument for 256M.

    256MB = $10000000

    The ZPU tool chain uses:

    UART TX = $80000024
    UART TX = $80000028
    TIMER = $80000100

    So it will all fit nicely.
  • jazzedjazzed Posts: 11,803
    edited 2010-08-29 14:43
    Thought I would experiment with in-zog one-line cache management as implemented in zog_v1_5 (extra-zog SDRAM Cache management is direct-mapped 128 line). The result is about 200ms improvements in fibo(20) and fibo(18) is under 1 second. Now this could just be tuning for fibo performance, but if my speculation is right, I think other programs would benefit. If nothing else, the small changes save a few longs in zog.spin.
    Starting SD driver...0000FFFF
    Mounting SD...00000000
    Booting fibo.bin
    00000000
    
    Reading image... 17055 Bytes Loaded.
    Done
    
    Clearing bss: ....
    Running Program!
    fibo(00) = 000000 (00000ms)
    fibo(01) = 000001 (00000ms)
    fibo(02) = 000001 (00000ms)
    fibo(03) = 000002 (00000ms)
    fibo(04) = 000003 (00001ms)
    fibo(05) = 000005 (00001ms)
    fibo(06) = 000008 (00002ms)
    fibo(07) = 000013 (00004ms)
    fibo(08) = 000021 (00007ms)
    fibo(09) = 000034 (00012ms)
    fibo(10) = 000055 (00020ms)
    fibo(11) = 000089 (00034ms)
    fibo(12) = 000144 (00055ms)
    fibo(13) = 000233 (00089ms)
    fibo(14) = 000377 (00145ms)
    fibo(15) = 000610 (00234ms)
    fibo(16) = 000987 (00379ms)
    fibo(17) = 001597 (00615ms)
    fibo(18) = 002584 (00996ms)
    fibo(19) = 004181 (01611ms)
    fibo(20) = 006765 (02602ms)
    fibo(21) = 010946 (04209ms)
    fibo(22) = 017711 (06818ms)
    fibo(23) = 028657 (11045ms)
    fibo(24) = 046368 (17871ms)
    fibo(25) = 075025 (28891ms)
    fibo(26) = 121393 (05754ms)
    
    32MB SDRAM with 80MHz Propeller Clock.

    Cheers.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-29 22:33
    I have not dared to look into VMCog or SdramCache code much.

    Could some one briefly state the difference between VMCog's virtual working pages and SdramCache's cache lines?

    As I see it ZPU memory access is constantly thrashing between code fetch and stack read/write at substantially different addresses. Then there is access to the programs actual data areas and the mysterious ZPU pseudo registers down at address zero.

    So the smallest program needs at least 4 different buffers (VM pages or cache lines) in order to prevent constant thrashing of buffers between these 4 areas. As the program and data gets bigger further buffers are required to prevent thrashing between different code (and data) areas.

    We should be careful comparing speeds using that fbo test. The speed of execution programs under VMCog depends heavily on the number of pages in the working set. The fibo loop itself is very small. This leads to an interesting effect on execution time:

    fibo(20) 30 pages : 2782ms
    fibo(20) 20 pages : 2782ms
    fibo(20) 4 pages : 2782ms.

    All the same?!!

    Not quite, that fibo program only measures the fibo function execution time not the surrounding test loop and result printing. The fibo function is very small and as no data so it fits in 4 pages and runs fast. The program itself gets visibly much slower with decreasing pages.
  • jazzedjazzed Posts: 11,803
    edited 2010-08-30 01:55
    Heater. wrote: »
    Could some one briefly state the difference between VMCog's virtual working pages and SdramCache's cache lines?
    A VMCOG page is 512 bytes. An SdramCache line is 64 bytes.
    VMCOG has a replacement policy, SdramCache does not.

    VMCOG uses 2.5MB/s SDRAM burst read/write.
    SdramCache uses 10MB/s burst read/write.

    VMCOG can read a 512 byte SDRAM page in about 102us.
    SdramCache can read a 64 byte page in less than 7us.

    I chose to implement a separate interface because:
    1) There is no room in VMCOG for 10MB/s burst code.
    2) VMCOG is limits usable external memory to 64KB.
    3) VMCOG apparently can not run fibo 26 to completion.
    (surely this can be fixed*).

    I recognized fibo performance testing is not especially useful.
    Once I used a 256 byte cache line for testing. Fibo calculation
    performance improved a bit, but everything else degraded.

    I've run drhystone tests with SDRAM, but results are not obvious.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-30 03:07
    Are you saying that SdramCache only has one 64 byte cache buffer?'

    i.e. it has to refill the buffer when activity moves from a code fetch to a stack access for example?
  • jazzedjazzed Posts: 11,803
    edited 2010-08-30 06:55
    SdramCache has 256 buffers in a 16KB cache.

    Zog.spin selects the buffer being used. If the current buffer contains the data, one HUB operation fetches the data after some comparisons (around 9 instructions). A new buffer is selected if the required address is not in the current buffer's range (around 29 instructions). When a new buffer is selected, it is up to the SdramCache.spin to deliver the buffer in tact and will swap a buffer if necessary (up to 15us at 80MHz, potentially 9us on reads).

    I've considered exploring alternatives to the current method when I have time just to compare performance. Using a per data item request model with the current SdramCache driver, the best case cache fetch will be 8 instructions + SdramCache overhead of about 16 instructions.

    My cycle counting could be wrong and that's why I'm willing to experiment. However, I don't expect the alternative to perform any better than what's there now.

    Basically it's a choice of always taking 24 instructions -vs- a mix of 9 or 29 instructions on a cache hit.
  • lonesocklonesock Posts: 917
    edited 2010-08-30 17:33
    I took zog 1.5, stripped out the math and put in a faster and smaller version of multiply, and a slightly faster and smaller version of divide & modulus. This version also has the special cases for loadsp and storesp for the high bit, as well as the special cases for when offset = 0.

    I'm also including the file I used for testing the multiply and divide and modulus routines against the SPIN equivalents, so you can verify it separately.

    Jonathan

    EDIT: added the 'and smaller' parts.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-31 01:56
    Lonesock: No time to look into your code but a quick run shows almost 4% performance gain running from HUB and a tad more than 2% running from VMCog. Excellent:)

    Jazzed, when are we going to see a self contained 32MB Propeller PC board, DracBlade style?
  • jazzedjazzed Posts: 11,803
    edited 2010-08-31 08:50
    Heater, the first generally available 32MB SDRAM board will be Gadget Gangster Propeller Platform compatible. A Propeller Single Board Computer will follow that.

    The Gadget Gangster board is in FAB now and will allow a Propeller Computer solution with Keyboard, Mouse, Serial Port, TV, and uSD card.

    I've spent most of this last week working on the I2C Keyboard/Mouse controller code. Hopefully by the time the FAB comes, I'll have the I2C solution ready. I have plenty of room for the controller code in the atTiny85.

    A VGA connector is available on the FAB as a stuffing option. It is mainly intended for VGA graphics experimentation and is not usable with uSD without cut/jump rework. The VGA and uSD sections could be reworked to provide a black/white or some other grey-scale video with access to uSD. I've considered adding another latch to free more pins, but that's just floating in the air like vapor now.
  • lonesocklonesock Posts: 917
    edited 2010-08-31 11:21
    I just noticed something on zpu_*shift*: tos is ANDed with $3F. $3F is 63, so I'm guessing $1F was desired, but, SHL, SHR, and SAR all only use the low 5 bits of S anyway to shift D, so the AND is unnecessary.

    Jonathan
  • Heater.Heater. Posts: 21,230
    edited 2010-08-31 12:11
    Interesting.

    In the ZPU Java simulator they have ANDed with 0x3f so I just blindly used it in Zog.

    Are you sure the Props shifts only use 5 bits of the src field?
    According my prop manual:
    –INSTR– ZCRI –CON– –DEST–    –SRC–
    001011 001i 1111 ddddddddd sssssssss
    
  • lonesocklonesock Posts: 917
    edited 2010-08-31 12:22
    Here's my test results using shr in PASM:
    Beginning test:
    -1153374642 >> 0 = -1153374642
    -1153374642 >> 1 = 1570796327
    -1153374642 >> 2 = 785398163
    -1153374642 >> 3 = 392699081
    -1153374642 >> 4 = 196349540
    -1153374642 >> 5 = 98174770
    -1153374642 >> 6 = 49087385
    -1153374642 >> 7 = 24543692
    -1153374642 >> 8 = 12271846
    -1153374642 >> 9 = 6135923
    -1153374642 >> 10 = 3067961
    -1153374642 >> 11 = 1533980
    -1153374642 >> 12 = 766990
    -1153374642 >> 13 = 383495
    -1153374642 >> 14 = 191747
    -1153374642 >> 15 = 95873
    -1153374642 >> 16 = 47936
    -1153374642 >> 17 = 23968
    -1153374642 >> 18 = 11984
    -1153374642 >> 19 = 5992
    -1153374642 >> 20 = 2996
    -1153374642 >> 21 = 1498
    -1153374642 >> 22 = 749
    -1153374642 >> 23 = 374
    -1153374642 >> 24 = 187
    -1153374642 >> 25 = 93
    -1153374642 >> 26 = 46
    -1153374642 >> 27 = 23
    -1153374642 >> 28 = 11
    -1153374642 >> 29 = 5
    -1153374642 >> 30 = 2
    -1153374642 >> 31 = 1
    -1153374642 >> 32 = -1153374642
    -1153374642 >> 33 = 1570796327
    -1153374642 >> 34 = 785398163
    -1153374642 >> 35 = 392699081
    -1153374642 >> 36 = 196349540
    -1153374642 >> 37 = 98174770
    -1153374642 >> 38 = 49087385
    -1153374642 >> 39 = 24543692
    -1153374642 >> 40 = 12271846
    -1153374642 >> 41 = 6135923
    -1153374642 >> 42 = 3067961
    -1153374642 >> 43 = 1533980
    -1153374642 >> 44 = 766990
    -1153374642 >> 45 = 383495
    -1153374642 >> 46 = 191747
    -1153374642 >> 47 = 95873
    -1153374642 >> 48 = 47936
    -1153374642 >> 49 = 23968
    -1153374642 >> 50 = 11984
    -1153374642 >> 51 = 5992
    -1153374642 >> 52 = 2996
    -1153374642 >> 53 = 1498
    -1153374642 >> 54 = 749
    -1153374642 >> 55 = 374
    -1153374642 >> 56 = 187
    -1153374642 >> 57 = 93
    -1153374642 >> 58 = 46
    -1153374642 >> 59 = 23
    -1153374642 >> 60 = 11
    -1153374642 >> 61 = 5
    -1153374642 >> 62 = 2
    -1153374642 >> 63 = 1
    

    Jonathan
  • Heater.Heater. Posts: 21,230
    edited 2010-08-31 12:51
    OK. In it goes. Good catch.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-31 15:27
    Attached: Zog v1.6.

    This has Lonesock's multiply and divide routines and his other optimization tweaks.

    I have set up the memory map for ZPU code as:
    $00000000 to $0FFFFFFF Normal RAM (256MB) for ZPU code, data, stack. 
    $10000000 to $10007FFF Maps to  HUB RAM
    $10008000 to $100087FF Maps to ZPU COGs memory.
    $10008800 to $FFFFFFFF Memory mapped IO
    

    Note that the is no checking of memory addresses so for example an out of range write when running from HUB may well damage any Spin/PASM code running there.

    The map allows access to all COG locations so programs running under ZOG can modify the ZPU instruction set and state at runtime for some interesting effects:)

    Only 30 LONGs left in the ZPU COG now with SdRamCache.
  • lonesocklonesock Posts: 917
    edited 2010-08-31 19:02
    There are shorter division routines, they're just slower, but I'd be happy to stub one in if you'd like. And if you need, feel free to undefine USE_FASTER_MULT, that will save you 5 longs. (It just means that a*b will not execute at the same speed as b*a, but both will be faster than the old multiply.)

    Jonathan.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-31 23:04
    Lonesock,

    I think we can stick with the fast math code unless we have to really scrape for LONGs at some point but that is looking unlikely.

    30 LONGS should be enough to add some handling for the Propellers LOCKs.

    What other Prop features is Zog missing support for?

    What else would it be useful to squeeze into those remaining LONGs?

    One thing would be unsigned divide and modulus operations. zpu-gcc inserts lengthy subroutines for those ops and it would be better to have the COG do it directly. It's would be a bit messy as we would have to use some currently unused opcodes or, preferably SYSCALL, and the provide some functions C/assembler to use it.

    Zog needs to be able to handle an interrupt. If only for a timer tick. I'd like to see FreeRTOS running one day not to mention uCLinux:)

    I just recently discovered that there is a VHDL implementation of ZPU that includes an interrupt and a new instruction "POPINT". The idea being that an interrupt automatically masks further interrupts. When the handler is done POPINT pops the return address and clears the interrupt mask.

    Floating point support can be added without changes to the Zog interpreter. Just use float32 PASM code with a C wrapper and provide C functions to use it that override those provided by GCC's soft-float library. http://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html Perhaps those unsigned mul, div, mod functions could be included in the soft-float COG.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-01 16:35
    heater,

    Can you try the 0.981 I just posted to the vmcog thread? I think it might fix the fibo problem...

    David found a bug - I used an 'andn' where I should have used an 'and' ... sheesh ... plus a performance boost for pre-setting counters!

    Let me know...
  • jazzedjazzed Posts: 11,803
    edited 2010-09-01 17:57
    @Bill. Cool. I'll try to integrate SDRAM code with the new VMCOG tomorrow.

    Meanwhile, I have some new test results. I found a way to cut out a window miss in the SdramCache.spin code and decided to chop out some other things that were giving me a headache.

    Recent changes have shaved 12 minutes off the 16MB psrandom memory test (was 1:22, now 1:10 H:MM). Fibo 20 now runs in 2383ms and was 2825ms (all tests with an 80MHz system clock). All ZOG code remains the same as before. Kernel and memory enhancements are the only differences.

    There are more longs free in zog.spin now. I'll post new versions of zog.spin and SdramCache.spin later.

    --Steve
Sign In or Register to comment.