Shop OBEX P1 Docs P2 Docs Learn Events
Big Spin - is it still a pipedream? - Page 5 — Parallax Forums

Big Spin - is it still a pipedream?

1235710

Comments

  • jazzedjazzed Posts: 11,803
    edited 2011-02-08 15:54
    Dave Hein wrote: »
    The only special case where it isn't masked off is the system I/O address of $1234000X, which is used for conio and fileio.
    I noticed this today. One could easily move the HUB access "segment" to $2000_0000 or some other address if necessary.

    The only external memory interface requirement for read/write RAM are read buffer and write buffer of N size starting at some address. I suppose a read-only device like flash should also be simulated.

    Is it possible to use a similar scheme for simulated external memory? That is $12340000+some offset = base could be a read/write address, base+4 the resulting data pointer, and base+8 the data length. A second memory device could be added to flash. I use the LSB of the address for the read/write flag since I typically use a 32byte block of data for the interface.
  • David BetzDavid Betz Posts: 14,516
    edited 2011-02-08 18:27
    jazzed wrote: »
    The problem with C3 or any serial RAM is speed. The first time I booted an external memory software solution on C3 I thought the thing didn't load ... astonishingly a minute later the TV screen finally turned blue and printed "Hello, world!".
    Was that with ZOG? I don't see performance that bad on my C3.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-02-08 19:46
    jazzed
    Getting a DracBlade port running should be a high priority. I don't have hardware, but others do.

    With all the amazing work you are doing here, I am thinking about sending you a freebie board in return. Can you pm me your postal address?
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-09 07:44
    jazzed wrote: »
    Is it possible to use a similar scheme for simulated external memory? That is $12340000+some offset = base could be a read/write address, base+4 the resulting data pointer, and base+8 the data length. A second memory device could be added to flash. I use the LSB of the address for the read/write flag since I typically use a 32byte block of data for the interface.
    Steve,

    The SpinSim system I/O addresses are currently defined as follows:
    con
      SYS_COMMAND = $12340000 'word - conio and fileio
      SYS_LOCKNUM = $12340002 'word - conio and fileio
      SYS_PARM    = $12340004 'long - conio and fileio
      SYS_DEBUG   = $12340008 'long - turns debug prints on/off
      SYS_INVALID = $1234000c 'long - invalid hub memory address accesses are routed here
    
    I could change the debug flag to be a word and add a "SYS_EXTMEM" word at $1234000a, which would simulate the external memory control lines and byte address/data mailbox. The Prop program would need to clock in a memory address and read/write the data one byte at a time. I could make the interface identical to the modified HX512 card, except it would use the SYS_EXTMEM mailbox. Does that sound OK, or would you prefer the interface you suggested? If so, could you describe that a bit more? I don't quite understand how it would work.

    Dave
  • jazzedjazzed Posts: 11,803
    edited 2011-02-09 09:07
    Dave, I'm not sure the simulator needs to be cycle accurate. It could though for plug-ins etc....

    Here's what I'm doing now:
    #define SYS_CACHELNLEN 	  32
    #define SYS_XRAM_SIZE  	  33
    #define SYS_XRAM_ADDR  	  34
    #define SYS_XRAM_DATA  	  35
    #define SYS_FLSH_SIZE  	  36
    #define SYS_FLSH_ADDR  	  37
    #define SYS_FLSH_DATA  	  38
    
    // external memory interface
    //
    #define XRAM_MAX 64*1024*1024
    static int cachelinelen = 0;
    static uint32_t xmemdata = 0;
    static uint32_t xramsize = 0;
    static char *xrambuff = 0;
    static uint32_t flshsize = 0;
    static char *flshbuff = 0;
    

    The idea is just to tell the simulator to malloc a memory block (once of course) at cache startup, then tell the simulator to write/read blocks of data that the SimCache.spin driver can use. I've added a flash block to potentially simulate the C3, but the interpreter will need a small re-write for address mappings.
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-09 09:52
    The easiest way for me to implement this would be to just add support for SYS_EXMEMREAD and SYS_EXMEMWRITE functions to the conio/fileio commands. These two commands would act like the file read and write commands, except they would access an external memory buffer instead of a file. I could add a -x# command-line option to specified the size of the external memory. Does that sound OK?

    Dave
  • jazzedjazzed Posts: 11,803
    edited 2011-02-09 12:50
    Dave Hein wrote: »
    The easiest way for me to implement this would be to just add support for SYS_EXMEMREAD and SYS_EXMEMWRITE functions to the conio/fileio commands. These two commands would act like the file read and write commands, except they would access an external memory buffer instead of a file. I could add a -x# command-line option to specified the size of the external memory. Does that sound OK?

    Dave
    As long as the memory is addressable I guess it doesn't matter. The design i'm using reads and writes a buffer at a time of a given length. for C3 simulation, two memory spaces are required.

    We can cognew multiple cogs right? I'm having trouble with that ... it could just be my bug though.
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-09 13:12
    jazzed wrote: »
    As long as the memory is addressable I guess it doesn't matter. The design i'm using reads and writes a buffer at a time of a given length. for C3 simulation, two memory spaces are required.

    We can cognew multiple cogs right? I'm having trouble with that ... it could just be my bug though.
    The external memory methods would be something like ExtMemRead(HubAddr, ExtAddr, NumBytes) and ExMemWrite(HubAddr, ExtAddr, NumBytes). The HubAddr would be between $0000 and $7FFF, and the ExtAddr would be between $00000000 and whatever the max address is for the external memory. The logical mapping of the external memory to some higher address space would be done by the caching routines. The caching routines would also need to manage the writebacks.

    cognew works with all my tests. The program test.binary does several cognew's. If you are running without the "-p" option a cognew of Spin code, or a cognew at address $F004 will run the C version of the interpreter. A cognew of any other address or with the "-p" option will run PASM code instead.

    Dave
  • jazzedjazzed Posts: 11,803
    edited 2011-02-09 13:21
    I'm hoping to recycle the cache code implemented in PASM.
    Can i just wrlong val, addr where addr is $12340002 ?
    I could do it in spin, but need another cog for that.
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-09 14:19
    $12340002 is actually the location for the system I/O lock number. The I/O command is at location $12340000 and a single long parameter is at $12340004. If the command requires more than one parameter $12340004 will then contain a pointer to an argument list. The Spin code and the PASM code are shown below. They compile, but I haven't tested them. I'll go ahead and add it to SpinSim, and I'll write a memory test program to check it out.

    Dave
    con
      SYS_COMMAND = $12340000
      SYS_LOCKNUM = $12340002
      SYS_PARM    = $12340004
    
      SYS_EXTMEM_READ   = 17
      SYS_EXTMEM_WRITE  = 18
    
    pub ExtMemRead(HubAddr, ExtMemAddr, NumBytes)
      result := SystemCall(SYS_EXTMEM_READ, @HubAddr)
    
    pub ExtMemWrite(HubAddr, ExtMemAddr, NumBytes)
      result := SystemCall(SYS_EXTMEM_Write, @HubAddr)
    
    pri SystemCall(command, parm) | locknum
      locknum := word[SYS_LOCKNUM] - 1
      if locknum == -1
        return -1
        
      repeat until not lockset(locknum)
      long[SYS_PARM] := parm
      word[SYS_COMMAND] := command
      repeat while word[SYS_COMMAND]
      result := long[SYS_PARM]
      lockclr(locknum)
    
    con
      SYS_COMMAND = $12340000
      SYS_LOCKNUM = $12340002
      SYS_PARM    = $12340004
    
      SYS_EXTMEM_READ   = 17
      SYS_EXTMEM_WRITE  = 18
    
    dat
    ExtMemRead              mov     ExtMemCommand, #SYS_EXTMEM_READ
                            jmp     #SystemCall
    
    ExtMemWrite             mov     ExtMemCommand, #SYS_EXTMEM_WRITE
    
                            'Set up the parameter list in hub RAM
    SystemCall              mov     temp, par
                            wrlong  HubAddr, temp
                            add     temp, #4
                            wrlong  ExtMemAddr, temp
                            add     temp, #4
                            wrlong  NumByte, temp
    
                            'Wait for lock not set
    :loop1                  lockset locknum                       wc
            if_c            jmp     #:loop1
            
                            'Write the address of the parmeter list
                            wrlong  par, SysParm
                            
                            'Write the external memory command
                            wrword  ExtMemCommand, SysCommand
                            
                            'Wait for the command to be completed
    :loop2                  rdword  ExtMemCommand, SysCommand     wz
            if_nz           jmp     #:loop2
            
                            'Clear the lock
                            lockclr locknum
    ExtMemRead_ret
    ExtMemWrite_ret         ret                     
    
    HubAddr                 long    0
    ExtMemAddr              long    0
    NumByte                 long    0
    ExtMemCommand           long    0
    locknum                 long    0
    temp                    long    
    SysCommand              long    SYS_COMMAND
    SysLocknum              long    SYS_LOCKNUM
    SysParm                 long    SYS_PARM
    
  • jazzedjazzed Posts: 11,803
    edited 2011-02-09 15:08
    I was just editing my reply regarding that address :)

    What you have looks OK for a start. I would like to see a way to set the memory size at startup if possible.

    I would much rather "prearrange" the buffer length for performance, but if you have 10 external memories, you would need 10 lengths.
    ExtMemSetMemorySize(BaseAddress, size)
    ExtMemSetBufferSize(BaseAddress, size)
    ExtMemWrite(addrp, datap, numbytes)
    ExtMemRead(addrp, datap, numbytes)
    ExtMemWriteBuffer(addrp, datap)
    ExtMemReadBuffer(addrp, datap)
    

    Thanks for adding whatever external memory access features you deem appropriate.
    --Steve
  • jazzedjazzed Posts: 11,803
    edited 2011-02-09 17:06
    Dr_Acula wrote: »
    jazzed

    With all the amazing work you are doing here, I am thinking about sending you a freebie board in return. Can you pm me your postal address?

    I may give you the address later. My free time and desk space are limited right now.
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-09 18:42
    I can add the functions you suggested, but let me make sure I understand them. Please look at the description below to make sure they're correct.

    ExtMemSetMemorySize(BaseAddress, size)
    Allocate an external memory segment of "size" bytes starting at address given by "BaseAddress".

    ExtMemSetBufferSize(BaseAddress, size)
    Set the buffer size for the memory segment starting at "BaseAddress" to "size".

    ExtMemWrite(addrp, datap, numbytes)
    Copy a block of data of size "numbytes" from hub RAM starting at "addrp" into a memory segment containing the starting address "datap".

    ExtMemRead(addrp, datap, numbytes)
    Copy a block of data of size "numbytes" to hub RAM starting at "addrp" from a memory segment containing the starting address "datap".

    ExtMemWriteBuffer(addrp, datap)
    Copy a block of data from hub RAM starting at "addp" into a memory segment containg the staring address "datap". The size of the block is determined by an earlier call to ExtMemSetBufferSize for that memory segment.

    ExtMemReadBuffer(addrp, datap)
    Copy a block of data to hub RAM starting at "addrp" from a memory segment containing the starting address "datap". The size of the block is determined by an earlier call to ExtMemSetBufferSize for that memory segment.

    How many different memory segments do you think there should be? I don't understand the need for setting buffer sizes since the ExtMemRead and ExtMemWrite have an explicit buffer size parameter. I don't see how it makes it more efficient to pre-set the buffer size.
  • jazzedjazzed Posts: 11,803
    edited 2011-02-09 21:42
    That interpretation is reasonable. I guess addrp was supposed to be the physical backstore address (not a pointer) for a cache and datap would be the hub start address to read or write. It doesn't matter much as long as the parameter definitions are clear for the next guy/gal that comes along.
    Dave Hein wrote: »
    How many different memory segments do you think there should be? I don't understand the need for setting buffer sizes since the ExtMemRead and ExtMemWrite have an explicit buffer size parameter. I don't see how it makes it more efficient to pre-set the buffer size.
    Two segments might get it, but I've seen separate packet memory before. Also, it's possible that someone might use part of SDRAM for a giant disk buffer.

    Presetting a buffer size for a cache for example eliminates the need to do it more than once which gives performance advantage in a PASM cache swap driver.
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-10 15:49
    Steve,

    I posted an update to SpinSim that supports up to four external memories. I added three system functions to allocate memory and read or write it to hub RAM. I wasn't clear on how your "Buffer" functions would work, so I didn't implement them. I think that functionality can be done in the caching code. Take a look at memtest.spin and extmem.spin. They demonstrate how external memory is accessed. I also included extmempasm.spin, which has the PASM routines to read and write external memory.

    Dave
  • jazzedjazzed Posts: 11,803
    edited 2011-02-10 16:18
    Attached is a C3 version of the LittleBigSpin .zip I posted before. It still has the memory push-up hack, and requires BST/BSTC or Homespun to compile this time. Really I just want to post the working code before it gets lost some way. Also, since David uses the same interface for DracBlade as he does for C3, theoretically all you should need to do for DracBlade is change the #define at the top of the LittleBigSpin file and make a compatible hello.bin program (make a new userdefs.spin and change the u#TvPin). If it doesn't work, just look for changes around "#ifdef C3" and adjust. I've included copies of dracblade files from David's latest Zog.

    Please use the hello_c3 .zip to create a hello_c3.bin and load it on your SDcard. That is the program that the LittleBigSpin interpreter will load/run.

    @Dave. Thanks for adding all that code. I'll write a compliant PASM driver that uses your block copy methods.
  • jazzedjazzed Posts: 11,803
    edited 2011-02-10 23:05
    @Dave,

    I'm having some trouble with my SPIN writeByte routine not detecting a command done = 0 handshake signal from PASM. I can see in your listing where 0 gets written to the command register, but writeByte waits forever. The problem only occurs after a cache line is swapped on a write-back to simulator external memory on the 33rd byte. On a "cache hit" the done = 0 handshake works with no problem. The code is attached if you want to have a look. The problem happens when trying to writeback the first buffer (see the SimCache.spin::flush routine) well before the SPIN interpreter gets replaced.
    $ ./spinsim.exe LittleBigSpin.binary
    
    Starting ... 448
    Writing Cache ...
     0000 00 B4 C4 04 6F 24 10 00 C0 01 C8 01 1C 00 CC 01
     0010 30 00 02 01 0C 00 00 00 30 00 00 00 01 37 24 38
    

    Please have a look.
    The same algorithm works with real code of course.
    Am I abusing something in the simulator?

    Thanks.


    BTW: Have you ever considered attaching a virtual serial port to your simulator?
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-11 10:49
    Steve,

    I found some problem in the PASM code I wrote. It uses the PAR register to store the three parameters in memory, and I use a register called temp, which you also used for some code. Maybe that code at temp is only called once, so it's OK to overwrite it, but I changed my temp register to temp0. I also added a par0 value that points to it's image in hub RAM. It should be OK to re-use this area of hub RAM unless you reload the cog from it or use it for something else. I have attached a version of SimCache.spin with my changes. I've also included spinsim.c with some debug prints added. It looks like your code is never doing a writeback. It only reads from the external memory.

    Dave

    Edit: I also added a putch function to SimCache.spin to print out some debug characters. I print an "R" when we read external memory and a "W" when it writes. The "R" gets printed, but not the "W".
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-11 13:03
    OK, I understand why there weren't any cache writebacks. It's because the cache starts out empty and a writeback isn't needed until we need to fill the cache line with another chunk of memory.

    Another problem I found is that SpinSim's pread never returns a value less than zero. The test "if result < 0" needs to be changed to "if result =< 0". I'll fix that in the next update. With that change it gets to the point where it prints the "Startup addresses".
    jazzed wrote: »
    BTW: Have you ever considered attaching a virtual serial port to your simulator?
    Steve, what do you mean by a virtual serial port? Are you talking about a memory mapped register where you could read and write characters?
  • jazzedjazzed Posts: 11,803
    edited 2011-02-11 16:21
    Great David. Thanks. I'll give < 1 a try for terminating the pread loop later today. I've been out of the office.

    I've noticed that many of our file objects have different expectations and I guess that return code is a subtle difference. It makes more since to me to check for <= 0 for end of file rather than < 0, and < 0 should probably mean some kind of an error.

    I was thinking a virtual serial port like being able to download to the simulation and talk to it with a serial terminal. It's just an idea that would be a lot like using a real board from one of the GUIs. Guess it was just a crazy thought ... I don't expect you to take it seriously.
  • jazzedjazzed Posts: 11,803
    edited 2011-02-13 13:38
    The serial port is working now with BigSpin ... forgot to stop the loader cog before starting the new interpreter before. I'm not sure what it will take to make the simulation fully functional with the bigspin interpreter.

    I guess the next step is resolving some code issues.

    I'd like to see a windows GUI application to automate the build/download process.
    Maybe the PZST Qt IDE can have a mode to deal with bigspin later.


    I thought a fibo comparison would be interesting. Here are some FIBO* results at 80MHz:
    Hardware  |  Language  |  FIBO(20) time  |  FIBO 0 to 26
    ----------+------------+-----------------+--------------
    C3        |  SPIN      |  547ms          |  30s
    SDRAM     |  SPIN      |  547ms          |  30s
    C3        |  BigSPIN   |  3601ms         |  2m53s
    SDRAM     |  BigSPIN   |  2858ms         |  2m19s
    C3        |  ZOG C     |  3644ms         |  3m18s
    SDRAM     |  ZOG C     |  2773ms         |  2m18s
    ----------+------------+-----------------+--------------
    
    The numbers are interesting because ZOG and BigSpin use similar code, data, and stack storage and access methods where everything is cached on SDRAM. By this "weak" measure, BigSpin is 5 times slower than Spin and fractionally slower than ZOG. There is still room for performance improvement in the BigSpin interpreter. Language implementations that keep data and stack in HUB RAM will be faster.

    *The fibo test is not very good for benchmarking, but it is a fair simple relative test for small code loops. Large programs will have very different results.
  • RossHRossH Posts: 5,503
    edited 2011-02-13 14:12
    Hi jazzed,

    Good stuff! I'll have to get a move on with Catalina's support for Flash RAM on the C3. Can you post the source you use for fibo for future reference?

    Thanks!

    Ross.
  • jazzedjazzed Posts: 11,803
    edited 2011-02-13 16:40
    RossH wrote: »
    Can you post the source you use for fibo for future reference?

    We all agree a fibo test is not a good benchmark, but it can be used to judge primitive relative value.

    See attachments.

    The spin code will run on either SPIN or BIGSPIN interpreters.
    The zog code will run on C3, DracBlade, or PropellerPlatform SDRAM.

    If you want to run the zog example, you will need the toolchain from here:
    http://opensource.zylin.com/zpudownload.html

    LittleBigSpin files are included which will load/run fibo.bin from SDCARD.
    The DracBlade package is unknown but may work. Can someone please test it?
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-13 19:14
    Steve, I got the hello.bin program to work with the Big Spin interpreter under SpinSIm. There's a problem when using the SMALLER cache mode in the interpreter. It uses the cache line address immediately after the cache cog clears the command. However, the cache cog doesn't write the cache line address until three instructions later. The cache cog assumes that the calling cog will already have the address if the correct line is in the cache, so it clears the command early.

    I moved the instruction that clears the command after the cache line address is written. This fixes the problem. I put XXXXXXXXXXXXXXXXXXXXXX lines around my changes. Of course the correct solution is to leave the cog cache code the way it is and fix the problem in the interpreter. I didn't try it with SMALLER undefined, so that version of the code may be correct.

    I was a bit surprised to see the "Hello World" message print on the screen. I assumed the interpreter would try to map the $1234000x addresses to the cache. Do you make a special case for that address space, or are there certain addresses modes that you don't map to the cache, such as the absolute addressing mode that uses an absolute address from the stack?

    The cog didn't terminate correctly. It ran to the end of memory and a debug print in SpinSIm show that is was addressing beyond the external memory. The interpreter normally terminates a program by inserting an $FFF9 return address on the stack. Of course in this case that would require having a cogstop(cogid) spin instruction at $FFF9 in external memory.

    I attached the SimCache.spin file that I used. I also defined a 3-long area at the beginning with the label SysParms that is used for the three system function call parameters.

    Dave
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-13 20:05
    I tried commenting out the "#define SMALLER" and the image was too large, so I'm guessing this mode doesn't work yet. I looked at your other versions for C3 and SDRAM and it seems like they would have the same problem as SpinSim. The cache line address will have an error every time you cross a cache line boundary. Maybe I'm missing something, but that's how it looks to me.

    Dave
  • jazzedjazzed Posts: 11,803
    edited 2011-02-13 20:34
    Hey Dave.

    Thanks for sleuthing that problem in SimCache.spin. I'm aware that what you found was possible, but I've never had a problem with it on real hardware, and just took it for granted. I believe the reason it works as well as it does on the chip is the cog sequence. It is a nice optimization, so it would be hard for me to change it.

    It's great to have another functional platform. Having the simulator working is especially sweet since no hardware is required for general software development. Maybe someday a nicely thought out set of virtual devices could be added to the simulator via a generic interface - gear supports some virtual devices but how that happens is mysterious and as such a dead end.

    Yes, #define SMALLER is the only way the cache works right now. At some point the other one can be done I hope because it is FASTER - up to 15% faster in some measurements with ZOG.

    The reason you saw "Hello World" is because any address >= $10000000 is interpreted as a HUB address in the interpreter. That's the shared memory mechanism.

    Now, if we could only get a DracBlade volunteer :)

    Thanks.
    --Steve
  • Heater.Heater. Posts: 21,230
    edited 2011-02-13 23:51
    BigSpin and Zog perform a fibo(20) in about 3 seconds. RossH has posted a result for Catalina's fibo(20) on a DracBlade as a tad less tan 1 second.

    Should we all pack up and go home:)

    Mind you I believe that Catalina result is with stack/data in HUB RAM.
  • RossHRossH Posts: 5,503
    edited 2011-02-14 01:22
    Heater. wrote: »
    BigSpin and Zog perform a fibo(20) in about 3 seconds. RossH has posted a result for Catalina's fibo(20) on a DracBlade as a tad less tan 1 second.

    Should we all pack up and go home:)

    Mind you I believe that Catalina result is with stack/data in HUB RAM.

    Catalina's stack is always in Hub RAM. The data is in Hub RAM when using either the LMM memory model or the XMM SMALL memory model. The data is in XMM RAM when using the XMM LARGE model.

    I'm not really sure what Jazzed's benchmarks figures mean. I presume "C3" mean executing from SPI RAM on the C3? If so, those times for Zog and BigSpin look pretty good. But SDRAM means ... what?

    Anyway, I just did a quick test with Catalina. On the C3 executing from Hub RAM, FIBO 20 is 306ms, and FIBO 0 to FIBO 26 is about 11 seconds. Faster than Spin, but not lightning fast. This is due to the recursive nature of the FIBO benchmark - it doesn't really matter what language your're using - stack manipulation still takes about the same amount of time.

    But I wouldn't pack up and go home just yet if I were you ... when I tried Catalina on the C3 executing from SPI RAM, I get FIBO 20 of 7386ms, and FIB0 to FIB 26 of 5m50s! Groan!

    I presume having a caching SPI driver makes the big difference here - it seems to double the executon speed! I'll need to sharpen my pencils and get back to work on my own caching Catalina SPI driver.

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2011-02-14 01:59
    RossH,

    Seems to me that both BigSpin and Zog need an option to get stack into HUB at least. Given that most micro.controller code is not expected to be massively recursive or have huge local variables that would be fine.

    Data in HUB is more problematic if you want "big" programs.
    But SDRAM means ... what?

    Well, there is the Gandet Ganster 32MB SDRAM board with SDRAM cache interface by Jazzed. As far as I can tell that result is obtained with all code/data/stack out there in the 32MB.
    ...a quick test with Catalina. Faster than Spin, but not lightning fast. This is due to the recursive nature of the FIBO benchmark

    Yep. this fibo thing really needs to shot in the head. It is totally unrepresentative of what software in a typical mcu-application does. It is really only a good test of subroutine calling efficiency.

    We need something with some more normal loops and conditionals and a selection of operators in use.

    It did cross my mind that as I now have implementations of my FFT in C, Spin and PASM that this would make a better benchmark. It is a substantial piece of code with a good selection of loops and operators that takes a nice time to execute. All three versions are written to be as similar as possible in approach and so are more or less directly equivalent line for line. Well not so much the PASM of course, especially since Lonesock optimized it, but that need not concern us. The FFT is a bit short "if" statements though.
    ...when I tried Catalina on the C3 executing from SPI RAM, I get FIBO 20 of 8237ms,

    Oh goodie, the race is still on:)
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-14 05:32
    jazzed wrote: »
    Thanks for sleuthing that problem in SimCache.spin. I'm aware that what you found was possible, but I've never had a problem with it on real hardware, and just took it for granted. I believe the reason it works as well as it does on the chip is the cog sequence. It is a nice optimization, so it would be hard for me to change it.
    I see how the hub access stalls will prevent the problem on the real hardware. I guess I need to add a cycle-accurate mode in the simulator. I currently don't simulate instruction pipe-lining or the hub access time slots.
Sign In or Register to comment.