Shop OBEX P1 Docs P2 Docs Learn Events
Catalina 2.9 - Page 15 — Parallax Forums

Catalina 2.9

191011121315»

Comments

  • jazzedjazzed Posts: 11,803
    edited 2011-03-20 11:06
    RossH wrote: »
    I've just emailed you both an updated 3.0 pre-release ....
    Got it. I'll be spending some quality time with it today.
  • RossHRossH Posts: 5,547
    edited 2011-03-20 15:22
    David Betz wrote: »
    Hi Ross,

    Thanks for the update! I tried this new version and it was not noticably faster at running my xbasic test program. The old version of Catalina using the -x3 memory layout took 12 seconds and this version took 11 seconds. That could easily be explained by errors in pushing the stopwatch button on my watch. In any case, it's good to have the cache bug fixed since I'm sure it could have caused problems.

    Thanks,
    David

    Something is not right. On my C3 it is at least twice as fast. I hope I haven't missed something out of the upgrade. Can you post your binary (and your makefile options) and I'll try it when I get home.

    Thanks,

    Ross.
  • David BetzDavid Betz Posts: 14,519
    edited 2011-03-20 15:36
    RossH wrote: »
    Something is not right. On my C3 it is at least twice as fast. I hope I haven't missed something out of the upgrade. Can you post your binary (and your makefile options) and I'll try it when I get home.

    Thanks,

    Ross.

    Here is the binary and the makefile. Hopefully I didn't mess something up!
  • RossHRossH Posts: 5,547
    edited 2011-03-20 15:58
    David Betz wrote: »
    Here is the binary and the makefile. Hopefully I didn't mess something up!
    The makefile looks ok. I'll try running it when I get home.

    Ross.
  • RossHRossH Posts: 5,547
    edited 2011-03-21 00:50
    David Betz wrote: »
    Here is the binary and the makefile. Hopefully I didn't mess something up!

    No, it looks like it was me that messed up. I was working with some experimental changes to the caching algortithm, and I appear to have left them enabled.

    In the file Catalina_SPI_Cache.spin you will find a line (currently commented out) that says:
    '#define DISABLE_HASH
    
    Remove the quote mark (i.e. define the symbol DISABLE_HASH) and try your program again. Note that you also have to recompile both the xmm.binary (in the utilities folder). You should see the program speed double.

    Ross.
  • David BetzDavid Betz Posts: 14,519
    edited 2011-03-21 03:47
    RossH wrote: »
    No, it looks like it was me that messed up. I was working with some experimental changes to the caching algortithm, and I appear to have left them enabled.

    In the file Catalina_SPI_Cache.spin you will find a line (currently commented out) that says:
    '#define DISABLE_HASH
    
    Remove the quote mark (i.e. define the symbol DISABLE_HASH) and try your program again. Note that you also have to recompile both the xmm.binary (in the utilities folder). You should see the program speed double.

    Ross.

    Thanks Ross! As you suggested, defining DISABLE_HASH almost doubled the speed of xbasic. It now takes about 7 seconds to compile and run my test program rather than 11-12. While that is certainly an improvement, it is still too slow to be useful. This is only a 35 line program. This isn't Catalina's fault entirely though. The xbasic bytecode compiler makes three passes over the source code so it is parsing the program three times. I may try compiling xbasic for the PIC24H on Andre' LaMothe's Chameleon PIC board just to see how it performs. It may not be much better. Of course, xbasic runs with blinding speed on my MacBook Pro! :-)
  • RossHRossH Posts: 5,547
    edited 2011-03-21 05:18
    David Betz wrote: »
    Thanks Ross! As you suggested, defining DISABLE_HASH almost doubled the speed of xbasic. It now takes about 7 seconds to compile and run my test program rather than 11-12. While that is certainly an improvement, it is still too slow to be useful. This is only a 35 line program. This isn't Catalina's fault entirely though. The xbasic bytecode compiler makes three passes over the source code so it is parsing the program three times. I may try compiling xbasic for the PIC24H on Andre' LaMothe's Chameleon PIC board just to see how it performs. It may not be much better. Of course, xbasic runs with blinding speed on my MacBook Pro! :-)

    Hi David,

    Additional speed improvements are possible, but it's never going to make the C3 an order of magnitude faster - not while programs have to be executed out of serial memory! At some point someone may make a parallel RAM add-on board for the C3, and that could change things.

    I will keep the caching driver as an option since it also improve performances on other platforms - provided you can afford to sacrifice that much Hub RAM!

    Ross.
  • RossHRossH Posts: 5,547
    edited 2011-03-21 13:45
    RossH wrote: »
    Hi David,

    Additional speed improvements are possible, but it's never going to make the C3 an order of magnitude faster - not while programs have to be executed out of serial memory! At some point someone may make a parallel RAM add-on board for the C3, and that could change things.

    I will keep the caching driver as an option since it also improve performances on other platforms - provided you can afford to sacrifice that much Hub RAM!

    Ross.

    David,

    One more suggestion - why not arrange to load and save the byte-coded format? This was common practice in the "old" days of Basic interpreters (which were all generally pretty slow!). This makes the compilation speed less of an issue.

    Ross.
  • David BetzDavid Betz Posts: 14,519
    edited 2011-03-21 14:00
    RossH wrote: »
    David,

    One more suggestion - why not arrange to load and save the byte-coded format? This was common practice in the "old" days of Basic interpreters (which were all generally pretty slow!). This makes the compilation speed less of an issue.

    Ross.

    That is certainly possible. In fact, this basic system started out as a compiler that ran on a PC and a VM that ran on the PIC, AVR, or Propeller. Andre' convinced me that we needed a language that would run on the Propeller without need for a PC so I stripped my compiler down and made it fit on the Propeller with external memory.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-03-21 16:08
    Does xbasic run on the dracblade? Also, do you have a link to a xbasic download by any chance?
  • David BetzDavid Betz Posts: 14,519
    edited 2011-03-21 18:37
    Dr_Acula wrote: »
    Does xbasic run on the dracblade? Also, do you have a link to a xbasic download by any chance?

    It's kind of a work in progress. For instance, I haven't completed the heap manager for dynamic strings yet. It should run on the Dracblade but I haven't tried it. I'll attach the sources to this message if you promise not to laugh too loud when you look at them! :-)
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-03-21 21:08
    Thanks for that. So - dumb question here, but is this the same as the xbasic you find when you search google? Or is this something you are writing yourself?
  • RossHRossH Posts: 5,547
    edited 2011-03-22 03:20
    Dr_A,

    I think David's xbasic is different to the one you are probably finding on Google.

    Also, I have added improvements to the way plugins are registered for release 3.0 ...

    This program:
    #include <catalina_plugin.h>
    #include <catalina_hmi.h>
    
    char *name(int type) {
       switch (type) {
          case 0  : return "Kernel";
          case 1  : return "HMI";
          case 2  : return "Library";
          case 3  : return "Float_A";
          case 4  : return "Float_B";
          case 5  : return "Real-Time Clock";
          case 6  : return "SD File System";
          case 7  : return "Serial I/O";
          case 8  : return "Dummy";
          case 9  : return "Graphics";
          case 10 : return "Keyboard";
          case 11 : return "Screen";
          case 12 : return "Mouse";
          case 13 : return "Proxy";
          default : return "Unknown/None";    
       }
    }
    
    void main() {
       int i;
       int type;
       request_t *rqst;
    
       t_string(1, "Press any key to start\n\n");
       k_wait();
       t_printf("Registry Address = %x\n\n", _registry());
       for (i = 0; i < 8; i++) {
          type = REGISTERED_TYPE(i);
          rqst = REQUEST_BLOCK(i);
          t_printf("Cog %d (%x) Type = %s\n", i, (unsigned)rqst, name(type)); 
       }
       t_string(1, "\nPress any key to reboot");
       k_wait();
    }  
    
    produces this output:
    Press any key to start
    
    Registry Address = 00007FD4
    
    Cog 0 (00007F94) Type = Kernel
    Cog 1 (00007F9C) Type = Keyboard
    Cog 2 (00007FA4) Type = Screen
    Cog 3 (00007FAC) Type = HMI
    Cog 4 (00007FB4) Type = Unknown/None
    Cog 5 (00007FBC) Type = Unknown/None
    Cog 6 (00007FC4) Type = Unknown/None
    Cog 7 (00007FCC) Type = Unknown/None
    
    Press any key to reboot
    
    This should much simplify identifying, stopping and re-starting cogs at runtime.

    Ross.
  • RossHRossH Posts: 5,547
    edited 2011-03-23 01:23
    Dr_Acula wrote: »
    Does xbasic run on the dracblade?

    Hi Dr_A,

    I thought David would answer this question, so I didn't.

    Yes, xbasic runs on the DracBlade using the same caching driver as the C3. It is slightly faster than on the C3 - say 5s rather than 6s or 7s to run David's test program.

    I don't think David would regard that as a really significant speed up.

    However, just out of interest, I also tried it on the RamBlade and it runs in about 1.5s - this is partly due to the faster XMM RAM on the RamBlade (I think it is the fastest platform in that respect) and also because the RamBlade clock speed is 100Mz instead of 80Mhz. I wonder if David would consider that fast enough for his purposes?

    Ross.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-03-23 02:27
    David and I have exchanged a few PMs today and maybe it is worth taking this to a discussion here as this is very interesting.

    Cluso's ramblade is definitely the fastest platform around. I think this gives us a benchmark to work from in terms of how fast things can be if you really optimise the code.

    I took another look at the dracblade driver code and there are a few things that could be improved.
    ''Dracblade driver for talking to a ram chip via three latches
    '' Modified code from Cluso's triblade
    ' DoCmd(command_, hub_address, ram_address, block_length)
    ' R - read bytes at address n up (n to n+block_length) where n =0 to 65535 (ie lower 64k of the sram chip)
    ' W - write bytes at address n up
    ' I - initialise
    ' N - Led on
    ' F - Led off
    ' H - set high latch to value in ramaddress A16 to A23 (will include the led)
     
    VAR
    
    ' communication params(5) between cog driver code - only "command" and "errx" are modified by the driver
       long  command, hubaddrs, ramaddrs, blocklen, errx, cog ' rendezvous between spin and assembly (can be used cog to cog)
    '        command  = R, W, N, F H =0 when operation completed by cog
    '        hubaddrs = hub address for data buffer
    '        ramaddrs = ram address for data ($0000 to $FFFF)
    '        blocklen = ram buffer length for data transfer
    '        errx     = returns =0 (false=good), else <>0 (true & error code)
    '        cog      = cog no of driver (set by spin start routine)
       
    PUB start : err_
    ' Initialise the Drac Ram driver. No actual changes to ram as the read/write routines handle this
      command := "I"
      cog := 1 + cognew(@tbp2_start, @command)
      if cog == 0
        err_ := $FF                 ' error = no cog
      else
        repeat while command        ' driver cog sets =0 when done
        err_ := errx                ' driver cog sets =0 if no error, else xx = error code
    
    PUB stop
       if cog
          cogstop(cog~ - 1)      
    
    PUB DoCmd(command_, hub_address, ram_address, block_length) : err_
    ' Do the command: R, W, N, F, H
      hubaddrs := hub_address       ' hub address start
      ramaddrs := ram_address       ' ram address start
      blocklen := block_length      ' block length
      command  := command_          ' must be last !!
    ' Wait for command to complete and get status
      repeat while command          ' driver cog sets =0 when done
      err_ := errx                  ' driver cog sets =0 if no error, else xx = error code
    
    PUB rendezvous
      return @command
    
    DAT
    '' +--------------------------------------------------------------------------+
    '' | Dracblade Ram Driver (with grateful acknowlegements to Cluso)            |
    '' +--------------------------------------------------------------------------+
                            org     0
    tbp2_start    ' setup the pointers to the hub command interface (saves execution time later
                                          '  +-- These instructions are overwritten as variables after start
    comptr                  mov     comptr, par     ' -|  hub pointer to command                
    hubptr                  mov     hubptr, par     '  |  hub pointer to hub address            
    ramptr                  add     hubptr, #4      '  |  hub pointer to ram address            
    lenptr                  mov     ramptr, par     '  |  hub pointer to length                 
    errptr                  add     ramptr, #8      '  |  hub pointer to error status           
    cmd                     mov     lenptr, par     '  |  command  I/R/W/G/P/Q                  
    hubaddr                 add     lenptr, #12     '  |  hub address                           
    ramaddr                 mov     errptr, par     '  |  ram address                           
    len                     add     errptr, #16     '  |  length                                
    err                     nop                     ' -+  error status returned (=0=false=good) 
    
    
    ' Initialise hardware (unlike the triblade, just tristates everything and read/write set the pins)
    init                    mov     err, #0                  ' reset err=false=good
                            mov     dira,zero                ' tristate the pins
    
    done                    wrlong  err, errptr             ' status  =0=false=good, else error x
                            wrlong  zero, comptr            ' command =0 (done)
    ' wait for a command (pause short time to reduce power)
    pause                   mov     ctr, delay      wz      ' if =0 no pause
                  if_nz     add     ctr, cnt
                  if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                            rdlong  cmd, comptr     wz      ' command ?
                  if_z      jmp     #pause                  ' not yet
    ' decode command
                            cmp     cmd, #"R"       wz      ' R = read block
                  if_z      jmp     #rdblock
                            cmp     cmd, #"W"       wz      ' W = write block
                  if_z      jmp     #wrblock
                            cmp     cmd, #"N"       wz      ' N= led on
                  if_z      jmp     #led_turn_on
                            cmp     cmd, #"F"       wz      ' F = led off
                  if_z      jmp     #led_turn_off
                            cmp     cmd, #"H"       wz      ' H sets the high latch
                  if_z      jmp     #sethighlatch
                            mov     err, cmd                ' error = cmd (unknown command)
                            jmp     #done
    
    
    tristate                mov     dira,zero                ' all inputs to zero
                            jmp     #done
    
    ' turn led on
    led_turn_on             or      HighLatch,ledpin        ' set the led pin high
                            jmp     #OutputHighLatch         ' send this out
    
    led_turn_off            andn    HighLatch,ledpin        ' set the led pin low
                            jmp     #OutputHighLatch         ' send this out
    
    ' set high address bytes with command H, pass value in third variable of the DoCmd
    ' 4 bytes - masks off all but bits 16 to 23
    
    sethighlatch            call #ram_open                  ' gets address value in 'address'
                            shr  address,#16                ' shift right by 16 places
                            and  address,#$FF               ' ensure rest of bits zero
                            mov  HighLatch,address          ' put value into HighLatch
                            jmp  #OutputHighLatch           ' and output it
    
    '---------------------------------------------------------------------------------------------------------
    'Memory Access Functions
    
    rdblock                 call    #ram_open               ' get variables from hub variables
    rdloop                  call    #read_memory_byte       ' read byte from address into data_8
                            wrbyte  data_8,hubaddr          ' write data_8 to hubaddr ie copy byte to hub
                            add     hubaddr,#1              ' add 1 to hub address
                            add     address,#1              ' add 1 to ram address
                            djnz    len,#rdloop             ' loop until done
                            jmp     #init                   ' reinitialise
    
    wrblock                 call    #ram_open                        
    wrloop                  rdbyte  data_8, hubaddr         ' copy byte from hub
                            call    #write_memory_byte      ' write byte from data_8 to address
                            add     hubaddr,#1              ' add 1 to hub address
                            add     address,#1              ' add 1 to ram address
                            djnz    len,#wrloop             ' loop until done
                            jmp     #init                   ' reinitialise
    
    ram_open                rdlong  hubaddr, hubptr         ' get hub address
                            rdlong  ramaddr, ramptr         ' get ram address
                            rdlong  len, lenptr             ' get length
                            mov     err, #5                 ' err=5
                            mov     address,ramaddr         ' cluso's variable 'ramaddr' to dracblade variable 'address'
    ram_open_ret            ret
      
    read_memory_byte        call #RamAddress                ' sets up the latches with the correct ram address
                            mov dira,LatchDirection2        ' for reads so P0-P7 tristate till do read
                            mov outa,GateHigh               ' actually ReadEnable but they are the same
                            andn outa,GateHigh              ' set gate low
                            nop                             ' short delay to stabilise
                            nop
                            mov data_8, ina                 ' read SRAM
                            and data_8, #$FF                ' extract 8 bits
                            or  outa,GateHigh               ' set the gate high again
    read_memory_byte_ret    ret
    
    write_memory_byte       call #RamAddress                ' sets up the latches with the correct ram address
                            mov outx,data_8                 ' get the byte to output
                            and outx, #$FF                  ' ensure upper bytes=0
                            or outx,WriteEnable             ' or with correct 138 address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                            nop                             ' no nop doesn't work, one does, so put in two to be sure
                            nop                             ' another NOP
                            or outa,GateHigh                ' set it high again
    write_memory_byte_ret   ret
    
    RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram
                            mov dira,LatchDirection         ' set up the pins for programming latch chips
                            mov outx,address                ' get the address into a temp variable
                            and outx,#$FF                   ' mask the low byte
                            or  outx,LowAddress             ' or with 138 low address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                                                            ' ?? a NOP
                            or outa,GateHigh                ' set it high again  
                                                            ' now repeat for the middle byte     
                            mov outx,address                ' get the address into a temp variable
                            shr outx,#8                     ' shift right by 8 places
                            and outx,#$FF                   ' mask the low byte
                            or  outx,MiddleAddress          ' or with 138 middle address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                            or outa,GateHigh                ' set it high again 
    RamAddress_ret          ret
    
    OutputHighLatch ' sends out HighLatch to the 374 that does A16-19, led and the 4 spare outputs
                            mov     dira,latchdirection     ' setup active pins 138 and bus
                            mov     outa,HighLatch          ' send out HighLatch
                            or      outa,HighAddress        ' or with the high address
                            andn    outa,GateHigh           ' set gate low
                            or      outa,GateHigh           ' set the gate high again
    OutputHighLatch_ret     jmp     #tristate               ' set pins tristate
    
    
    
    
    
    delay                   long    80                                    ' waitcnt delay to reduce power (#80 = 1uS approx)
    ctr                     long    0                                     ' used to pause execution (lower power use) & byte counter
    GateHigh                long    %00000000_00000000_00000001_00000000  ' HC138 gate high, all others must be low
    Outx                    long    0                                     ' for temp use, same as n in the spin code
    LatchDirection          long    %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active
    LatchDirection2         long    %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read
    LowAddress              long    %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1
    MiddleAddress           long    %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1
    HighAddress             long    %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1
    'ReadEnable long    %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1
                                                            ' commented out as the same as GateHigh
    WriteEnable             long    %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1
    Zero                    long    %00000000_00000000_00000000_00000000 ' for tristating all pins
    data_8                  long    %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver
    address                 long    %00000000_00000000_00000000_00000000 ' address for ram chip
    ledpin                  long    %00000000_00000000_00000000_00001000 ' to turn on led
    HighLatch               long    %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs
    

    1) there is a deliberate delay
    ' wait for a command (pause short time to reduce power)
    pause                   mov     ctr, delay      wz      ' if =0 no pause
                  if_nz     add     ctr, cnt
                  if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                            rdlong  cmd, comptr     wz      ' command ?
    

    - maybe save some lines there
    2) Reading in blocks of data. There are 19 address lines on a 512k chip and at the moment these are in two groups - the High group A16 to A18 and the Low and Middle group which are grouped together. This seemed natural for the Z80 emulations with 16 bit addresses.

    But what if we separate out the Low and Middle latches?

    I count 46 instructions to read one byte from external memory. Surely that can be decreased?!!

    First thing might be to leave the middle latch unchanged and just change the lower latch. Maybe do it in groups of 4 bytes, or maybe in groups of 16 or 256?

    I think that can save 8 instructions per byte.

    Also I think by doing things in blocks, you don't have to keep checking for new instructions each byte. Say the requesting program wanted a Long, well then you can skip a whole lot of rechecking code for new requests.

    I think that can halve the number of instructions per byte if you do Longs.

    And then one might think about optimising further. For C, it depends on the probability that an instruction will cause a branch outside a block of n bytes. At the extremes, say you requested byte x and it read in the next 64k of bytes. This will take a lot of time but with a small probability that a jump will go outside this block. Read in 1 long, and that is inefficient too. I'm not sure of the maths, but say the probability of a jump was 10%, then maybe as a guess it might be best to read in 16 bytes as a block?

    The driver code above already has an instruction for reading in blocks, it is just that I think mostly we read in blocks of 1, ie a byte. Ross, a) is that how catalina works and b) where is the source code for the dracblade driver file and what is it called?

    So you might pass an address n=0 to 512k.
    1) is this in the same high/medium latch range as the last request?
    2) If yes, read bytes but only change the low latch.
    3) If no then update the medium and high latches.

    I wonder also about a lookahead cache.The requesting spin code requests a byte at address n. The cog goes and starts reading from this address. I'd need to check speeds, but there is a fairly good chance the cog will be faster than the requesting spin, so the cog will always be ahead of the requesting program, so from the requesting programs point of view, it requests byte n and for the next 256 bytes the values are always correct in a buffer.

    Then there is another variable - how often would the cog code check the passed parameter to see if the calling program wants a different block. Maybe if the probability of a branch in C is 10%, you check only every 10 bytes? If so, that saves even more code.
  • RossHRossH Posts: 5,547
    edited 2011-03-23 02:41
    Dr_Acula wrote: »
    I took another look at the dracblade driver code and there are a few things that could be improved.

    Hi Dr_A ...

    Yes absolutely - I've not really done any optimization on the original caching driver code yet. In fact it only currently supports the DRACBLADE at all because David's and Jazzed's original driver code already did!

    What I plan to do next is rewrite the interface from the caching driver to use my standard XMM code. That code is already written for all XMM platforms, and is much more optimized (although probably still a long way from being as good as it could be!).

    That's about the last thing I expect to do before I am ready to release Catalina 3.0.

    David, Jazzed ...

    I found a bug in the Catalina SD Card driver initializtion code that seems to show up on the C3. I've now fixed it, but if you are having occasional strange problems with programs sometimes not being able to access the SD card (but which work ok when you reload them) then this may be the reason. It may also have affected other platforms - for example I think it is the reason I was having occasional problems with the SD card on the RamBlade (and for which I was - quite unfairly - blaming Cluso!).
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-03-23 03:17
    Hi Ross, which is the file with the dracblade code? Is it Catalina_XMM.Spin in the \target directory?
  • RossHRossH Posts: 5,547
    edited 2011-03-23 03:26
    Dr_Acula wrote: »
    Hi Ross, which is the file with the dracblade code? Is it Catalina_XMM.Spin in the \target directory?

    Hi Dr_A,

    All the XMM code for all platforms is now in the file XMM.inc in the target directory. Look for the section marked #elseifdef DRACBLADE

    If you can streamline the DracBlade code, I'll include it in the next release.

    Ross.

    P.S. If you modify the code, try not to use any more longs - the XMM kernel has very few longs to spare!
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-03-23 03:45
    Thanks Ross.

    Oh darn. Someone *extremely* clever has already split the middle and lower latch! This XMM driver looks extremely well optimised. I think only caching would improve that, and any improvements due to caching will apply equally to the C3.
    XMM_IncAddr
                  add     XMM_Addr,#1               ' inc sram address
                  mov     outx,XMM_Addr             ' does result of incrementing ...
                  and     outx,#$FF                 ' ... require updating latch 8 - 15 or 16 - 19?
                  tjnz    outx,#XMM_Set0_7          ' if not, just set latch for addr bits 0 - 7
                  call    #XMM_SetAddr              ' otherwise we must set all latches
                  jmp     #XMM_IncAddr_ret          ' done 
    
  • David BetzDavid Betz Posts: 14,519
    edited 2011-03-23 03:49
    RossH wrote: »
    Hi Dr_A,

    I thought David would answer this question, so I didn't.

    Yes, xbasic runs on the DracBlade using the same caching driver as the C3. It is slightly faster than on the C3 - say 5s rather than 6s or 7s to run David's test program.

    I don't think David would regard that as a really significant speed up.

    However, just out of interest, I also tried it on the RamBlade and it runs in about 1.5s - this is partly due to the faster XMM RAM on the RamBlade (I think it is the fastest platform in that respect) and also because the RamBlade clock speed is 100Mz instead of 80Mhz. I wonder if David would consider that fast enough for his purposes?

    Ross.

    Sorry I didn't post my reply in here. I was trying not to hijack your thread to discuss xbasic. I guess I should try the RamBlade. I've had one for a long time but have never done anything with it. I guess I stopped when I discovered that you couldn't use the standard pin 31/30 serial I/O. How do you have your RamBlade configured?

    Thanks,
    David
  • RossHRossH Posts: 5,547
    edited 2011-03-23 04:10
    David Betz wrote: »
    Sorry I didn't post my reply in here. I was trying not to hijack your thread to discuss xbasic. I guess I should try the RamBlade. I've had one for a long time but have never done anything with it. I guess I stopped when I discovered that you couldn't use the standard pin 31/30 serial I/O. How do you have your RamBlade configured?

    Thanks,
    David

    Just with the SRAM and SD Card. I just use the normal PropPlug for comms. As shown on the diagram below, you plug it onto the middle 4 pins for programming the EEPROM, and the bottom 4 pins for terminal I/O (and use Catalyst to load programs off the SD Card).

    Ross.

    RamBlade.jpg
    576 x 311 - 58K
  • David BetzDavid Betz Posts: 14,519
    edited 2011-03-23 04:14
    Thanks Ross! I'll have to try that setup. I guess I was put off a bit by the fact that I would have to reconnect the serial interface to reprogram the card. It makes development a bit of a pain. I wonder why he used the high numbered pins for his SRAM interface?
  • RossHRossH Posts: 5,547
    edited 2011-03-23 04:31
    David Betz wrote: »
    Thanks Ross! I'll have to try that setup. I guess I was put off a bit by the fact that I would have to reconnect the serial interface to reprogram the card. It makes development a bit of a pain. I wonder why he used the high numbered pins for his SRAM interface?

    If I know Cluso, it was done for a good reason - most likely because it allowed the SRAM to be used with the least possible number of instructions.

    Ross.

    P.S. In a lot of ways, the RamBlade is my favorite board. If only it could be powered by the USB port, it would be the ideal "portable" Prop platform!
  • David BetzDavid Betz Posts: 14,519
    edited 2011-03-23 04:35
    Yes, it is a nice board. I guess I've just been too lazy to hook mine up. I wonder if he is still selling them?
  • RossHRossH Posts: 5,547
    edited 2011-04-01 17:52
    All,

    Catalina 3.0 has been released. It has a new thread here.

    Ross.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-04-03 05:11
    The cogjects project now has 8 cogjects. These can be used by Spin, and they can also be used by Catalina. I have plans to write drivers for other languages as well.

    What this means in C is that no more 'inline' pasm code in the C program. Do the debugging in Spin and then when it works, move it over to C. The following code is for the Serial driver and I have left the Spin code in as this will be useful in translating spin in the future.

    From a practical perspective, Spin can only do so much even with cogjects. The SD driver takes about 1/4 of hub, a decent video buffer takes just under 20k, and there is not much space left for code.

    C in XMM on the other hand puts the SD driver into external memory and most of the hub is free for a video buffer.
    /* PASM cogject demonstration, see also cogject example in spin*/
    
    #include <stdio.h>
          
    	unsigned long cogarray[511];					// external memory common cog array
    
    // start of C functions
    
    void clearscreen()                                                   // white text on dark blue background
    {
           int i;
           for (i=0;i<40;i++)
           {
                   t_setpos(0,0,i);                                      // move cursor to next line
                   t_color(0,0x08FC);                                    // RRGGBBxx eg dark blue background 00001000 white text 11111100
           }
    }
    
    void sleep(int milliseconds)                                         // sleep function
    {
           _waitcnt(_cnt()+(milliseconds*(_clockfreq()/1000))-4296);
    }
    
    char peek(int address)                                               // function implementation of peek
    {
           return *((char *)address);
    }
    
    void poke(int address, char value)                                   // function implementation of poke
    {
           *((char *)address) = value;
    }
    
    void external_memory_cog_load(int cognumber, unsigned long cogdata[], unsigned long parameters_array[])    	//  load a cog from external memory
    {
    	unsigned long hubcog[511];						// create a local array, this is in hub ram, not external ram	
    	int i;	
    	for(i=0;i<512;i++)								
    	{
    		hubcog[i]=cogdata[i];					// move from external memory to a local array in hub
    	}
     	_coginit((int)parameters_array>>2, (int)hubcog>>2, cognumber);		// load the cog
    }  
    
    unsigned long serial_start(unsigned long rxpin,unsigned long txpin,unsigned long mode, unsigned long baudrate, int cognumber, unsigned long par[], unsigned long cogdata[])
    {
    /*
    PUB start(rxpin, txpin, mode, baudrate) : okay
    
    '' Start serial driver - starts a cog
    '' returns false if no cog available
    ''
    '' mode bit 0 = invert rx
    '' mode bit 1 = invert tx
    '' mode bit 2 = open-drain/source tx
    '' mode bit 3 = ignore tx echo on rx
    
      stop
      longfill(@rx_head, 0, 4)
      longmove(@rx_pin, @rxpin, 3)
      bit_ticks := clkfreq / baudrate
      buffer_ptr := @rx_buffer
      okay := cog := cognew(@entry, @rx_head) + 1
    */
    
    	unsigned long okay;
    	unsigned long bit_ticks;
    	unsigned long buffer_ptr;
    	par[0] = 0;						// rx_head   longfill(@rx_head, 0, 4)
    	par[1] = 0;						// rx_tail
    	par[2] = 0;						// tx_head
    	par[3] = 0;						// tx_tail
    	par[4] = rxpin;					//   longmove(@rx_pin, @rxpin, 3)
    	par[5] = txpin;					// note - if rewrite the pasm code could save a couple of hub longs here
    	par[6] = mode;					// as rxpin and txpin are not used anywhere else
    	bit_ticks = _clockfreq() / baudrate;   		//   bit_ticks := clkfreq / baudrate
    	par[7] = bit_ticks;
    	buffer_ptr = (unsigned long)&par[9];		//   buffer_ptr := @rx_buffer  points to start of circular buffer
    	par[8] = buffer_ptr;					// pointer to the start of the circular buffers
    								// rx buffer is 9 to 12 and tx buffer is 13 to 16 (16 bytes =4 longs)
    	external_memory_cog_load(cognumber,cogdata,par);		// load from external ram
    	// okay returns the cog number or -1 if a fail page 119 manual. Ignored here
    	// printf("par array is at %u \n",(unsigned long)&par[0]);
    	// printf("par array entry 1 is at %u \n",(unsigned long)&par[1]);
    	// printf("par array entry 7 is at %u \n",(unsigned long)&par[7]);
    	// printf("rx_head is at %u \n",(unsigned long)&par[9]);
    	// printf("buffer_ptr is %u \n",par[8]);
    	return okay;
    }
    
    void serial_tx(char tx,unsigned long par[])
    {
    /*
    PUB tx(txbyte)
    '' Send byte (may wait for room in buffer)
      repeat until (tx_tail <> (tx_head + 1) & $F)
      tx_buffer[tx_head] := txbyte
      tx_head := (tx_head + 1) & $F
      if rxtx_mode & %1000
        rx
    */
    	unsigned long tx_head;
    	int address;
    	while ( par[3] == ((par[2] + 1 ) & 0xF)) {} // wait if the head has looped right round and is now one less than the tail
    	tx_head = par[2];				// get the head value
    	address = par[8] + 16 + tx_head;		// location of rx buffer plus 16 to get tx buffer plus the head value
    	poke(address,tx);				// poke the tx byte value to hub ram
    	tx_head = tx_head + 1;			// add one
    	tx_head = tx_head & 0xF; 			// logical and with 15
    	par[2] = tx_head;				// store it back again
    							// need to add the echo mode?
    }
    
    unsigned long serial_rxcheck(unsigned long par[])
    {
    /*
    PUB rxcheck : rxbyte
    '' Check if byte received (never waits)
    '' returns -1 if no byte received, $00..$FF if byte
      rxbyte--
      if rx_tail <> rx_head
        rxbyte := rx_buffer[rx_tail]
        rx_tail := (rx_tail + 1) & $F
    */
    	unsigned long rxbyte;			// actually is a long, so can return -1 FFFFFFFF if nothing and 0-FF if a byte
    	int address;					// hub address
    	rxbyte = 0;					// set explicitly to zero
    	rxbyte = rxbyte - 1;				// return ffffffff if nothing
    	if (par[1] != par[0])
    	{
    		address = par[8] + par[1];		// par[8] is the rx buffer, par[1] is rx_tail
    		rxbyte = peek(address);		// get the return byte from the buffer
    		par[1] = (par[1] +1) & 0xF;		// add one to tail
    	}
    	return rxbyte;
    }
    
    unsigned long serial_rx(unsigned long par[])
    {
    /*
    PUB rx : rxbyte
    '' Receive byte (may wait for byte)
    '' returns $00..$FF
      repeat while (rxbyte := rxcheck) < 0	
    */
    	unsigned long rxbyte;					// actually is a long, not a byte
    	while ((rxbyte = serial_rxcheck(par)) == -1) {} 	// 0xffffffff and -1 works, but " < 0" gives a compiler error
    	return rxbyte;						// return the value
    }
    
    void serial_rxflush(unsigned long par[])				// flush receive buffer
    {
    	while (serial_rxcheck(par) != -1) {}			// keep checking until buffer clear
    }
    
    unsigned long serial_rxtime(unsigned long ms,unsigned long par[]) // wait ms milliseconds for byte, -1 if nothing
    {
    	unsigned long rxbyte = -1;
    	unsigned long counter = 0;					// start a counter, 10ms ticks
    	ms = ms / 10;							// internal delay for 1ms ticks is too high
    	while (((rxbyte = serial_rxcheck(par)) == -1) & (counter < ms))	// wait until a byte or counter times out
    	{
    		_waitcnt(_cnt()+(10*(_clockfreq()/1000))-4296);		// wait 10 milliseconds
    		counter +=1; 						// add one to counter
    	}
    	return rxbyte;
    }
    
    void serial_str(char lineoftext[],unsigned long par[])		// send out the string
    {
    /*
    '' Send string                    
    
      repeat strsize(stringptr)
        tx(byte[stringptr++])
    */
    	int i;
    	for(i=0; i<strlen(lineoftext);i++)
    	{
    		serial_tx(lineoftext[i],par);			// send out the bytes one at a time
    	}
    }
    
    void serial_dec(signed long value,unsigned long par[])	// send out decimal value - unsigned
    {
    /*
    '' Print a decimal number
      if value < 0
        -value
        tx("-")
      i := 1_000_000_000
      repeat 10
        if value => i
          tx(value / i + "0")
          value //= i
          result~~
        elseif result or i == 1
          tx("0")
        i /= 10
    */
    	char lineoftext[12] = "";					// enough room for a 32 bit long 2^32 and possibly the minus sign
      	sprintf(lineoftext, "%d", value);				// convert to a string
      									// printf  ("lineoftext is now: %s\n", lineoftext);
    	serial_str(lineoftext,par);					// send out the string
    }
    
    void serial_hex(unsigned long value, unsigned long par[])	// send out a hex value
    /*
    '' Print a hexadecimal number
      value <<= (8 - digits) << 2
      repeat digits
        tx(lookupz((value <-= 4) & $F : "0".."9", "A".."F"))
    */
    {
    	char lineoftext[8] = "";					// enough room for FFFFFFFF
    	sprintf(lineoftext,"%x",value);				// convert to hex value
    	serial_str(lineoftext,par);					// send it out
    }
    
    void serial_crlf(unsigned long par[])				// send a crlf
    {
    	serial_tx(13,par);						// cr
    	serial_tx(10,par);						// lf
    }
    
    int EoF (FILE* stream)
    {
      	register int c, status = ((c = fgetc(stream)) == EOF);
      	ungetc(c,stream);
      	return status;
    }
    
    void readcog(char *filename,unsigned long external_cog[])		// read in a .cog file into external memory array 
    {
    	int i;
    	FILE *FP1;
    	i = 0;
    	if((FP1=fopen(filename,"rb"))==0)					// open the file
       	{
      		fprintf(stderr,"Can't open file %s\n",filename);
    		exit(1);
       	}
      	fseek(FP1,0,0);
    	for(i=0;i<24;i++)
    	{
    		getc(FP1);							// read in the first 24 bytes and discard
    	}
    	i = 0;
      	while(!EoF(FP1) & (i<505))						// run until end of file or 511-6
    	{
    		external_cog[i] = getc(FP1) | (getc(FP1)<<8) | (getc(FP1)<<16) | (getc(FP1)<<24);	// get the long
    		i+=1;
    	}
    	if(FP1)
           {
         		fclose(FP1);							// close the file
         		FP1=NULL;
       	}
    	printf("external array cog first long = 0x%x \n",external_cog[0]);	// hex value
    }
    
    void serial_demo(unsigned long serial_parameters[])			// demonstrate the serial cog code
    {
    	int i;
    	unsigned long value = 0x80000000;					// 80000000 is -1
    	char lineoftext[80];							// for string testing
    	unsigned long received_byte;					// actually a long, not a byte
           clearscreen();							// white on blue vga
           printf("Clock speed %u \n",_clockfreq());                     // see page 28 of the propeller manual for other useful commands
           printf("Catalina running in cog number %i \n",_cogid());      // integer
           readcog("serial.cog",cogarray);					// read into general external memory cog array
           serial_start(31,30,0,38400,7,serial_parameters,cogarray);      // start serial cog pins 31,30, mode 0, cog 7, 38400 baud
           printf("Started serial driver\n");
    	for(i=0; i<10; i++)
    	{
    		serial_tx(65+i,serial_parameters); 			// test sending a byte 10x (delay for starting a serial terminal program)
    		sleep(500);
    		printf("send byte %u \n",65+i);
    	}
    	serial_crlf(serial_parameters);
    	strcpy(lineoftext,"This is a really long string test with a slow baud rate to check buffer overruns");			// store a string
    	serial_str(lineoftext,serial_parameters);				// send it out
    	serial_crlf(serial_parameters);					// new line
    	serial_dec(value,serial_parameters);				// send out a big decimal number
    	serial_crlf(serial_parameters);					// new line
    	serial_str("Hex value is ",serial_parameters);
    	serial_hex(value,serial_parameters);				// send out a hex value
    	serial_crlf(serial_parameters);
    	serial_rxflush(serial_parameters);					// flush the receive buffer
    	printf("Type a character within the next 3 seconds \n");	// test the timeout
    	received_byte = serial_rxtime(3000,serial_parameters);		// get a byte with a timeout
    	printf("character was ascii %d \n",received_byte);		// %d is signed
    	printf("type some characters \n");
    	for (i=0;i<10;i++)							// test 19 times, so tests buffer restarting
    	{
    		received_byte = serial_rx(serial_parameters);		// get a byte
    		serial_tx(received_byte,serial_parameters);		// echo it back
    		printf("sent back byte %u \n",received_byte);
    	}
    	printf("demo program finished \n");
    }
    
    void main ()
    {
    	unsigned long serial_parameters[16];				// reserve hub space in main for buffer, head tail pointers
    	serial_demo(serial_parameters);					// demo routines       
    	while (1); 								// endless loop as prop reboots on exit from main() 
    }
    
    

    A quick question
    In spin
    n <-= 1

    in C, is this

    n = (n << 1) | (n >> 31);

    Also - I now have catalina booting up in text mode, then stopping the vga drivers and reloading a graphics driver 160x120. I can change the colors from within C eg screen[0] = 0xffffffff sets 4 pixels to white.

    However, the screen buffer is stored in longs, and I want to access it in bytes. In spin, the command is
    byte[myarray][number] := n

    but how would you do this in C?

    get the unsigned long, and clear one byte and replace with the new byte?
    or get a pointer to the start of the array, add n bytes, then poke a value into hub ram?
    or another way?
Sign In or Register to comment.