Catalina 2.9

jazzed · 2011-03-20 11:06

RossH wrote: »

I've just emailed you both an updated 3.0 pre-release ....

Got it. I'll be spending some quality time with it today.

RossH · 2011-03-20 15:22

David Betz wrote: »

Hi Ross,

Thanks for the update! I tried this new version and it was not noticably faster at running my xbasic test program. The old version of Catalina using the -x3 memory layout took 12 seconds and this version took 11 seconds. That could easily be explained by errors in pushing the stopwatch button on my watch. In any case, it's good to have the cache bug fixed since I'm sure it could have caused problems.

Thanks,
David

Something is not right. On my C3 it is at least twice as fast. I hope I haven't missed something out of the upgrade. Can you post your binary (and your makefile options) and I'll try it when I get home.

Thanks,

Ross.

David Betz · 2011-03-20 15:36

RossH wrote: »

Something is not right. On my C3 it is at least twice as fast. I hope I haven't missed something out of the upgrade. Can you post your binary (and your makefile options) and I'll try it when I get home.

Thanks,

Ross.

Here is the binary and the makefile. Hopefully I didn't mess something up!

RossH · 2011-03-20 15:58

David Betz wrote: »

Here is the binary and the makefile. Hopefully I didn't mess something up!

The makefile looks ok. I'll try running it when I get home.

Ross.

RossH · 2011-03-21 00:50

David Betz wrote: »

Here is the binary and the makefile. Hopefully I didn't mess something up!

No, it looks like it was me that messed up. I was working with some experimental changes to the caching algortithm, and I appear to have left them enabled.

In the file Catalina_SPI_Cache.spin you will find a line (currently commented out) that says:

'#define DISABLE_HASH

Remove the quote mark (i.e. define the symbol DISABLE_HASH) and try your program again. Note that you also have to recompile both the xmm.binary (in the utilities folder). You should see the program speed double.

Ross.

David Betz · 2011-03-21 03:47

RossH wrote: »
No, it looks like it was me that messed up. I was working with some experimental changes to the caching algortithm, and I appear to have left them enabled.

In the file Catalina_SPI_Cache.spin you will find a line (currently commented out) that says:
'#define DISABLE_HASH
Remove the quote mark (i.e. define the symbol DISABLE_HASH) and try your program again. Note that you also have to recompile both the xmm.binary (in the utilities folder). You should see the program speed double.

Ross.

Thanks Ross! As you suggested, defining DISABLE_HASH almost doubled the speed of xbasic. It now takes about 7 seconds to compile and run my test program rather than 11-12. While that is certainly an improvement, it is still too slow to be useful. This is only a 35 line program. This isn't Catalina's fault entirely though. The xbasic bytecode compiler makes three passes over the source code so it is parsing the program three times. I may try compiling xbasic for the PIC24H on Andre' LaMothe's Chameleon PIC board just to see how it performs. It may not be much better. Of course, xbasic runs with blinding speed on my MacBook Pro! :-)

RossH · 2011-03-21 05:18

David Betz wrote: »

Thanks Ross! As you suggested, defining DISABLE_HASH almost doubled the speed of xbasic. It now takes about 7 seconds to compile and run my test program rather than 11-12. While that is certainly an improvement, it is still too slow to be useful. This is only a 35 line program. This isn't Catalina's fault entirely though. The xbasic bytecode compiler makes three passes over the source code so it is parsing the program three times. I may try compiling xbasic for the PIC24H on Andre' LaMothe's Chameleon PIC board just to see how it performs. It may not be much better. Of course, xbasic runs with blinding speed on my MacBook Pro! :-)

Hi David,

Additional speed improvements are possible, but it's never going to make the C3 an order of magnitude faster - not while programs have to be executed out of serial memory! At some point someone may make a parallel RAM add-on board for the C3, and that could change things.

I will keep the caching driver as an option since it also improve performances on other platforms - provided you can afford to sacrifice that much Hub RAM!

Ross.

RossH · 2011-03-21 13:45

RossH wrote: »

Hi David,

Additional speed improvements are possible, but it's never going to make the C3 an order of magnitude faster - not while programs have to be executed out of serial memory! At some point someone may make a parallel RAM add-on board for the C3, and that could change things.

I will keep the caching driver as an option since it also improve performances on other platforms - provided you can afford to sacrifice that much Hub RAM!

Ross.

David,

One more suggestion - why not arrange to load and save the byte-coded format? This was common practice in the "old" days of Basic interpreters (which were all generally pretty slow!). This makes the compilation speed less of an issue.

Ross.

David Betz · 2011-03-21 14:00

RossH wrote: »

David,

One more suggestion - why not arrange to load and save the byte-coded format? This was common practice in the "old" days of Basic interpreters (which were all generally pretty slow!). This makes the compilation speed less of an issue.

Ross.

That is certainly possible. In fact, this basic system started out as a compiler that ran on a PC and a VM that ran on the PIC, AVR, or Propeller. Andre' convinced me that we needed a language that would run on the Propeller without need for a PC so I stripped my compiler down and made it fit on the Propeller with external memory.

Dr_Acula · 2011-03-21 16:08

Does xbasic run on the dracblade? Also, do you have a link to a xbasic download by any chance?

David Betz · 2011-03-21 18:37

Dr_Acula wrote: »

Does xbasic run on the dracblade? Also, do you have a link to a xbasic download by any chance?

It's kind of a work in progress. For instance, I haven't completed the heap manager for dynamic strings yet. It should run on the Dracblade but I haven't tried it. I'll attach the sources to this message if you promise not to laugh too loud when you look at them! :-)

Dr_Acula · 2011-03-21 21:08

Thanks for that. So - dumb question here, but is this the same as the xbasic you find when you search google? Or is this something you are writing yourself?

RossH · 2011-03-22 03:20

Dr_A,

I think David's xbasic is different to the one you are probably finding on Google.

Also, I have added improvements to the way plugins are registered for release 3.0 ...

This program:

#include <catalina_plugin.h>
#include <catalina_hmi.h>

char *name(int type) {
   switch (type) {
      case 0  : return "Kernel";
      case 1  : return "HMI";
      case 2  : return "Library";
      case 3  : return "Float_A";
      case 4  : return "Float_B";
      case 5  : return "Real-Time Clock";
      case 6  : return "SD File System";
      case 7  : return "Serial I/O";
      case 8  : return "Dummy";
      case 9  : return "Graphics";
      case 10 : return "Keyboard";
      case 11 : return "Screen";
      case 12 : return "Mouse";
      case 13 : return "Proxy";
      default : return "Unknown/None";    
   }
}

void main() {
   int i;
   int type;
   request_t *rqst;

   t_string(1, "Press any key to start\n\n");
   k_wait();
   t_printf("Registry Address = %x\n\n", _registry());
   for (i = 0; i < 8; i++) {
      type = REGISTERED_TYPE(i);
      rqst = REQUEST_BLOCK(i);
      t_printf("Cog %d (%x) Type = %s\n", i, (unsigned)rqst, name(type)); 
   }
   t_string(1, "\nPress any key to reboot");
   k_wait();
}

produces this output:

Press any key to start

Registry Address = 00007FD4

Cog 0 (00007F94) Type = Kernel
Cog 1 (00007F9C) Type = Keyboard
Cog 2 (00007FA4) Type = Screen
Cog 3 (00007FAC) Type = HMI
Cog 4 (00007FB4) Type = Unknown/None
Cog 5 (00007FBC) Type = Unknown/None
Cog 6 (00007FC4) Type = Unknown/None
Cog 7 (00007FCC) Type = Unknown/None

Press any key to reboot

This should much simplify identifying, stopping and re-starting cogs at runtime.

Ross.

RossH · 2011-03-23 01:23

Dr_Acula wrote: »

Does xbasic run on the dracblade?

Hi Dr_A,

I thought David would answer this question, so I didn't.

Yes, xbasic runs on the DracBlade using the same caching driver as the C3. It is slightly faster than on the C3 - say 5s rather than 6s or 7s to run David's test program.

I don't think David would regard that as a really significant speed up.

However, just out of interest, I also tried it on the RamBlade and it runs in about 1.5s - this is partly due to the faster XMM RAM on the RamBlade (I think it is the fastest platform in that respect) and also because the RamBlade clock speed is 100Mz instead of 80Mhz. I wonder if David would consider that fast enough for his purposes?

Ross.

Dr_Acula · 2011-03-23 02:27

David and I have exchanged a few PMs today and maybe it is worth taking this to a discussion here as this is very interesting.

Cluso's ramblade is definitely the fastest platform around. I think this gives us a benchmark to work from in terms of how fast things can be if you really optimise the code.

I took another look at the dracblade driver code and there are a few things that could be improved.

''Dracblade driver for talking to a ram chip via three latches
'' Modified code from Cluso's triblade
' DoCmd(command_, hub_address, ram_address, block_length)
' R - read bytes at address n up (n to n+block_length) where n =0 to 65535 (ie lower 64k of the sram chip)
' W - write bytes at address n up
' I - initialise
' N - Led on
' F - Led off
' H - set high latch to value in ramaddress A16 to A23 (will include the led)
 
VAR

' communication params(5) between cog driver code - only "command" and "errx" are modified by the driver
   long  command, hubaddrs, ramaddrs, blocklen, errx, cog ' rendezvous between spin and assembly (can be used cog to cog)
'        command  = R, W, N, F H =0 when operation completed by cog
'        hubaddrs = hub address for data buffer
'        ramaddrs = ram address for data ($0000 to $FFFF)
'        blocklen = ram buffer length for data transfer
'        errx     = returns =0 (false=good), else <>0 (true & error code)
'        cog      = cog no of driver (set by spin start routine)
   
PUB start : err_
' Initialise the Drac Ram driver. No actual changes to ram as the read/write routines handle this
  command := "I"
  cog := 1 + cognew(@tbp2_start, @command)
  if cog == 0
    err_ := $FF                 ' error = no cog
  else
    repeat while command        ' driver cog sets =0 when done
    err_ := errx                ' driver cog sets =0 if no error, else xx = error code

PUB stop
   if cog
      cogstop(cog~ - 1)      

PUB DoCmd(command_, hub_address, ram_address, block_length) : err_
' Do the command: R, W, N, F, H
  hubaddrs := hub_address       ' hub address start
  ramaddrs := ram_address       ' ram address start
  blocklen := block_length      ' block length
  command  := command_          ' must be last !!
' Wait for command to complete and get status
  repeat while command          ' driver cog sets =0 when done
  err_ := errx                  ' driver cog sets =0 if no error, else xx = error code

PUB rendezvous
  return @command

DAT
'' +--------------------------------------------------------------------------+
'' | Dracblade Ram Driver (with grateful acknowlegements to Cluso)            |
'' +--------------------------------------------------------------------------+
                        org     0
tbp2_start    ' setup the pointers to the hub command interface (saves execution time later
                                      '  +-- These instructions are overwritten as variables after start
comptr                  mov     comptr, par     ' -|  hub pointer to command                
hubptr                  mov     hubptr, par     '  |  hub pointer to hub address            
ramptr                  add     hubptr, #4      '  |  hub pointer to ram address            
lenptr                  mov     ramptr, par     '  |  hub pointer to length                 
errptr                  add     ramptr, #8      '  |  hub pointer to error status           
cmd                     mov     lenptr, par     '  |  command  I/R/W/G/P/Q                  
hubaddr                 add     lenptr, #12     '  |  hub address                           
ramaddr                 mov     errptr, par     '  |  ram address                           
len                     add     errptr, #16     '  |  length                                
err                     nop                     ' -+  error status returned (=0=false=good) 


' Initialise hardware (unlike the triblade, just tristates everything and read/write set the pins)
init                    mov     err, #0                  ' reset err=false=good
                        mov     dira,zero                ' tristate the pins

done                    wrlong  err, errptr             ' status  =0=false=good, else error x
                        wrlong  zero, comptr            ' command =0 (done)
' wait for a command (pause short time to reduce power)
pause                   mov     ctr, delay      wz      ' if =0 no pause
              if_nz     add     ctr, cnt
              if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                        rdlong  cmd, comptr     wz      ' command ?
              if_z      jmp     #pause                  ' not yet
' decode command
                        cmp     cmd, #"R"       wz      ' R = read block
              if_z      jmp     #rdblock
                        cmp     cmd, #"W"       wz      ' W = write block
              if_z      jmp     #wrblock
                        cmp     cmd, #"N"       wz      ' N= led on
              if_z      jmp     #led_turn_on
                        cmp     cmd, #"F"       wz      ' F = led off
              if_z      jmp     #led_turn_off
                        cmp     cmd, #"H"       wz      ' H sets the high latch
              if_z      jmp     #sethighlatch
                        mov     err, cmd                ' error = cmd (unknown command)
                        jmp     #done


tristate                mov     dira,zero                ' all inputs to zero
                        jmp     #done

' turn led on
led_turn_on             or      HighLatch,ledpin        ' set the led pin high
                        jmp     #OutputHighLatch         ' send this out

led_turn_off            andn    HighLatch,ledpin        ' set the led pin low
                        jmp     #OutputHighLatch         ' send this out

' set high address bytes with command H, pass value in third variable of the DoCmd
' 4 bytes - masks off all but bits 16 to 23

sethighlatch            call #ram_open                  ' gets address value in 'address'
                        shr  address,#16                ' shift right by 16 places
                        and  address,#$FF               ' ensure rest of bits zero
                        mov  HighLatch,address          ' put value into HighLatch
                        jmp  #OutputHighLatch           ' and output it

'---------------------------------------------------------------------------------------------------------
'Memory Access Functions

rdblock                 call    #ram_open               ' get variables from hub variables
rdloop                  call    #read_memory_byte       ' read byte from address into data_8
                        wrbyte  data_8,hubaddr          ' write data_8 to hubaddr ie copy byte to hub
                        add     hubaddr,#1              ' add 1 to hub address
                        add     address,#1              ' add 1 to ram address
                        djnz    len,#rdloop             ' loop until done
                        jmp     #init                   ' reinitialise

wrblock                 call    #ram_open                        
wrloop                  rdbyte  data_8, hubaddr         ' copy byte from hub
                        call    #write_memory_byte      ' write byte from data_8 to address
                        add     hubaddr,#1              ' add 1 to hub address
                        add     address,#1              ' add 1 to ram address
                        djnz    len,#wrloop             ' loop until done
                        jmp     #init                   ' reinitialise

ram_open                rdlong  hubaddr, hubptr         ' get hub address
                        rdlong  ramaddr, ramptr         ' get ram address
                        rdlong  len, lenptr             ' get length
                        mov     err, #5                 ' err=5
                        mov     address,ramaddr         ' cluso's variable 'ramaddr' to dracblade variable 'address'
ram_open_ret            ret
  
read_memory_byte        call #RamAddress                ' sets up the latches with the correct ram address
                        mov dira,LatchDirection2        ' for reads so P0-P7 tristate till do read
                        mov outa,GateHigh               ' actually ReadEnable but they are the same
                        andn outa,GateHigh              ' set gate low
                        nop                             ' short delay to stabilise
                        nop
                        mov data_8, ina                 ' read SRAM
                        and data_8, #$FF                ' extract 8 bits
                        or  outa,GateHigh               ' set the gate high again
read_memory_byte_ret    ret

write_memory_byte       call #RamAddress                ' sets up the latches with the correct ram address
                        mov outx,data_8                 ' get the byte to output
                        and outx, #$FF                  ' ensure upper bytes=0
                        or outx,WriteEnable             ' or with correct 138 address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        nop                             ' no nop doesn't work, one does, so put in two to be sure
                        nop                             ' another NOP
                        or outa,GateHigh                ' set it high again
write_memory_byte_ret   ret

RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram
                        mov dira,LatchDirection         ' set up the pins for programming latch chips
                        mov outx,address                ' get the address into a temp variable
                        and outx,#$FF                   ' mask the low byte
                        or  outx,LowAddress             ' or with 138 low address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                                                        ' ?? a NOP
                        or outa,GateHigh                ' set it high again  
                                                        ' now repeat for the middle byte     
                        mov outx,address                ' get the address into a temp variable
                        shr outx,#8                     ' shift right by 8 places
                        and outx,#$FF                   ' mask the low byte
                        or  outx,MiddleAddress          ' or with 138 middle address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        or outa,GateHigh                ' set it high again 
RamAddress_ret          ret

OutputHighLatch ' sends out HighLatch to the 374 that does A16-19, led and the 4 spare outputs
                        mov     dira,latchdirection     ' setup active pins 138 and bus
                        mov     outa,HighLatch          ' send out HighLatch
                        or      outa,HighAddress        ' or with the high address
                        andn    outa,GateHigh           ' set gate low
                        or      outa,GateHigh           ' set the gate high again
OutputHighLatch_ret     jmp     #tristate               ' set pins tristate





delay                   long    80                                    ' waitcnt delay to reduce power (#80 = 1uS approx)
ctr                     long    0                                     ' used to pause execution (lower power use) & byte counter
GateHigh                long    %00000000_00000000_00000001_00000000  ' HC138 gate high, all others must be low
Outx                    long    0                                     ' for temp use, same as n in the spin code
LatchDirection          long    %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active
LatchDirection2         long    %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read
LowAddress              long    %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1
MiddleAddress           long    %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1
HighAddress             long    %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1
'ReadEnable long    %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1
                                                        ' commented out as the same as GateHigh
WriteEnable             long    %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1
Zero                    long    %00000000_00000000_00000000_00000000 ' for tristating all pins
data_8                  long    %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver
address                 long    %00000000_00000000_00000000_00000000 ' address for ram chip
ledpin                  long    %00000000_00000000_00000000_00001000 ' to turn on led
HighLatch               long    %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs

1) there is a deliberate delay

' wait for a command (pause short time to reduce power)
pause                   mov     ctr, delay      wz      ' if =0 no pause
              if_nz     add     ctr, cnt
              if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                        rdlong  cmd, comptr     wz      ' command ?

- maybe save some lines there
2) Reading in blocks of data. There are 19 address lines on a 512k chip and at the moment these are in two groups - the High group A16 to A18 and the Low and Middle group which are grouped together. This seemed natural for the Z80 emulations with 16 bit addresses.

But what if we separate out the Low and Middle latches?

I count 46 instructions to read one byte from external memory. Surely that can be decreased?!!

First thing might be to leave the middle latch unchanged and just change the lower latch. Maybe do it in groups of 4 bytes, or maybe in groups of 16 or 256?

I think that can save 8 instructions per byte.

Also I think by doing things in blocks, you don't have to keep checking for new instructions each byte. Say the requesting program wanted a Long, well then you can skip a whole lot of rechecking code for new requests.

I think that can halve the number of instructions per byte if you do Longs.

And then one might think about optimising further. For C, it depends on the probability that an instruction will cause a branch outside a block of n bytes. At the extremes, say you requested byte x and it read in the next 64k of bytes. This will take a lot of time but with a small probability that a jump will go outside this block. Read in 1 long, and that is inefficient too. I'm not sure of the maths, but say the probability of a jump was 10%, then maybe as a guess it might be best to read in 16 bytes as a block?

The driver code above already has an instruction for reading in blocks, it is just that I think mostly we read in blocks of 1, ie a byte. Ross, a) is that how catalina works and b) where is the source code for the dracblade driver file and what is it called?

So you might pass an address n=0 to 512k.
1) is this in the same high/medium latch range as the last request?
2) If yes, read bytes but only change the low latch.
3) If no then update the medium and high latches.

I wonder also about a lookahead cache.The requesting spin code requests a byte at address n. The cog goes and starts reading from this address. I'd need to check speeds, but there is a fairly good chance the cog will be faster than the requesting spin, so the cog will always be ahead of the requesting program, so from the requesting programs point of view, it requests byte n and for the next 256 bytes the values are always correct in a buffer.

Then there is another variable - how often would the cog code check the passed parameter to see if the calling program wants a different block. Maybe if the probability of a branch in C is 10%, you check only every 10 bytes? If so, that saves even more code.

RossH · 2011-03-23 02:41

Dr_Acula wrote: »

I took another look at the dracblade driver code and there are a few things that could be improved.

Hi Dr_A ...

Yes absolutely - I've not really done any optimization on the original caching driver code yet. In fact it only currently supports the DRACBLADE at all because David's and Jazzed's original driver code already did!

What I plan to do next is rewrite the interface from the caching driver to use my standard XMM code. That code is already written for all XMM platforms, and is much more optimized (although probably still a long way from being as good as it could be!).

That's about the last thing I expect to do before I am ready to release Catalina 3.0.

David, Jazzed ...

I found a bug in the Catalina SD Card driver initializtion code that seems to show up on the C3. I've now fixed it, but if you are having occasional strange problems with programs sometimes not being able to access the SD card (but which work ok when you reload them) then this may be the reason. It may also have affected other platforms - for example I think it is the reason I was having occasional problems with the SD card on the RamBlade (and for which I was - quite unfairly - blaming Cluso!).

Dr_Acula · 2011-03-23 03:17

Hi Ross, which is the file with the dracblade code? Is it Catalina_XMM.Spin in the \target directory?

RossH · 2011-03-23 03:26

Dr_Acula wrote: »

Hi Ross, which is the file with the dracblade code? Is it Catalina_XMM.Spin in the \target directory?

Hi Dr_A,

All the XMM code for all platforms is now in the file XMM.inc in the target directory. Look for the section marked #elseifdef DRACBLADE

If you can streamline the DracBlade code, I'll include it in the next release.

Ross.

P.S. If you modify the code, try not to use any more longs - the XMM kernel has very few longs to spare!

Dr_Acula · 2011-03-23 03:45

Thanks Ross.

Oh darn. Someone *extremely* clever has already split the middle and lower latch! This XMM driver looks extremely well optimised. I think only caching would improve that, and any improvements due to caching will apply equally to the C3.

XMM_IncAddr
              add     XMM_Addr,#1               ' inc sram address
              mov     outx,XMM_Addr             ' does result of incrementing ...
              and     outx,#$FF                 ' ... require updating latch 8 - 15 or 16 - 19?
              tjnz    outx,#XMM_Set0_7          ' if not, just set latch for addr bits 0 - 7
              call    #XMM_SetAddr              ' otherwise we must set all latches
              jmp     #XMM_IncAddr_ret          ' done

David Betz · 2011-03-23 03:49

RossH wrote: »

Hi Dr_A,

I thought David would answer this question, so I didn't.

Yes, xbasic runs on the DracBlade using the same caching driver as the C3. It is slightly faster than on the C3 - say 5s rather than 6s or 7s to run David's test program.

I don't think David would regard that as a really significant speed up.

However, just out of interest, I also tried it on the RamBlade and it runs in about 1.5s - this is partly due to the faster XMM RAM on the RamBlade (I think it is the fastest platform in that respect) and also because the RamBlade clock speed is 100Mz instead of 80Mhz. I wonder if David would consider that fast enough for his purposes?

Ross.

Sorry I didn't post my reply in here. I was trying not to hijack your thread to discuss xbasic. I guess I should try the RamBlade. I've had one for a long time but have never done anything with it. I guess I stopped when I discovered that you couldn't use the standard pin 31/30 serial I/O. How do you have your RamBlade configured?

Thanks,
David

RossH · 2011-03-23 04:10

David Betz wrote: »

Sorry I didn't post my reply in here. I was trying not to hijack your thread to discuss xbasic. I guess I should try the RamBlade. I've had one for a long time but have never done anything with it. I guess I stopped when I discovered that you couldn't use the standard pin 31/30 serial I/O. How do you have your RamBlade configured?

Thanks,
David

Just with the SRAM and SD Card. I just use the normal PropPlug for comms. As shown on the diagram below, you plug it onto the middle 4 pins for programming the EEPROM, and the bottom 4 pins for terminal I/O (and use Catalyst to load programs off the SD Card).

Ross.

David Betz · 2011-03-23 04:14

Thanks Ross! I'll have to try that setup. I guess I was put off a bit by the fact that I would have to reconnect the serial interface to reprogram the card. It makes development a bit of a pain. I wonder why he used the high numbered pins for his SRAM interface?

RossH · 2011-03-23 04:31

David Betz wrote: »

Thanks Ross! I'll have to try that setup. I guess I was put off a bit by the fact that I would have to reconnect the serial interface to reprogram the card. It makes development a bit of a pain. I wonder why he used the high numbered pins for his SRAM interface?

If I know Cluso, it was done for a good reason - most likely because it allowed the SRAM to be used with the least possible number of instructions.

Ross.

P.S. In a lot of ways, the RamBlade is my favorite board. If only it could be powered by the USB port, it would be the ideal "portable" Prop platform!

David Betz · 2011-03-23 04:35

Yes, it is a nice board. I guess I've just been too lazy to hook mine up. I wonder if he is still selling them?

RossH · 2011-04-01 17:52

All,

Catalina 3.0 has been released. It has a new thread here.

Ross.

Dr_Acula · 2011-04-03 05:11

The cogjects project now has 8 cogjects. These can be used by Spin, and they can also be used by Catalina. I have plans to write drivers for other languages as well.

What this means in C is that no more 'inline' pasm code in the C program. Do the debugging in Spin and then when it works, move it over to C. The following code is for the Serial driver and I have left the Spin code in as this will be useful in translating spin in the future.

From a practical perspective, Spin can only do so much even with cogjects. The SD driver takes about 1/4 of hub, a decent video buffer takes just under 20k, and there is not much space left for code.

C in XMM on the other hand puts the SD driver into external memory and most of the hub is free for a video buffer.

/* PASM cogject demonstration, see also cogject example in spin*/

#include <stdio.h>
      
	unsigned long cogarray[511];					// external memory common cog array

// start of C functions

void clearscreen()                                                   // white text on dark blue background
{
       int i;
       for (i=0;i<40;i++)
       {
               t_setpos(0,0,i);                                      // move cursor to next line
               t_color(0,0x08FC);                                    // RRGGBBxx eg dark blue background 00001000 white text 11111100
       }
}

void sleep(int milliseconds)                                         // sleep function
{
       _waitcnt(_cnt()+(milliseconds*(_clockfreq()/1000))-4296);
}

char peek(int address)                                               // function implementation of peek
{
       return *((char *)address);
}

void poke(int address, char value)                                   // function implementation of poke
{
       *((char *)address) = value;
}

void external_memory_cog_load(int cognumber, unsigned long cogdata[], unsigned long parameters_array[])    	//  load a cog from external memory
{
	unsigned long hubcog[511];						// create a local array, this is in hub ram, not external ram	
	int i;	
	for(i=0;i<512;i++)								
	{
		hubcog[i]=cogdata[i];					// move from external memory to a local array in hub
	}
 	_coginit((int)parameters_array>>2, (int)hubcog>>2, cognumber);		// load the cog
}  

unsigned long serial_start(unsigned long rxpin,unsigned long txpin,unsigned long mode, unsigned long baudrate, int cognumber, unsigned long par[], unsigned long cogdata[])
{
/*
PUB start(rxpin, txpin, mode, baudrate) : okay

'' Start serial driver - starts a cog
'' returns false if no cog available
''
'' mode bit 0 = invert rx
'' mode bit 1 = invert tx
'' mode bit 2 = open-drain/source tx
'' mode bit 3 = ignore tx echo on rx

  stop
  longfill(@rx_head, 0, 4)
  longmove(@rx_pin, @rxpin, 3)
  bit_ticks := clkfreq / baudrate
  buffer_ptr := @rx_buffer
  okay := cog := cognew(@entry, @rx_head) + 1
*/

	unsigned long okay;
	unsigned long bit_ticks;
	unsigned long buffer_ptr;
	par[0] = 0;						// rx_head   longfill(@rx_head, 0, 4)
	par[1] = 0;						// rx_tail
	par[2] = 0;						// tx_head
	par[3] = 0;						// tx_tail
	par[4] = rxpin;					//   longmove(@rx_pin, @rxpin, 3)
	par[5] = txpin;					// note - if rewrite the pasm code could save a couple of hub longs here
	par[6] = mode;					// as rxpin and txpin are not used anywhere else
	bit_ticks = _clockfreq() / baudrate;   		//   bit_ticks := clkfreq / baudrate
	par[7] = bit_ticks;
	buffer_ptr = (unsigned long)&par[9];		//   buffer_ptr := @rx_buffer  points to start of circular buffer
	par[8] = buffer_ptr;					// pointer to the start of the circular buffers
								// rx buffer is 9 to 12 and tx buffer is 13 to 16 (16 bytes =4 longs)
	external_memory_cog_load(cognumber,cogdata,par);		// load from external ram
	// okay returns the cog number or -1 if a fail page 119 manual. Ignored here
	// printf("par array is at %u \n",(unsigned long)&par[0]);
	// printf("par array entry 1 is at %u \n",(unsigned long)&par[1]);
	// printf("par array entry 7 is at %u \n",(unsigned long)&par[7]);
	// printf("rx_head is at %u \n",(unsigned long)&par[9]);
	// printf("buffer_ptr is %u \n",par[8]);
	return okay;
}

void serial_tx(char tx,unsigned long par[])
{
/*
PUB tx(txbyte)
'' Send byte (may wait for room in buffer)
  repeat until (tx_tail <> (tx_head + 1) & $F)
  tx_buffer[tx_head] := txbyte
  tx_head := (tx_head + 1) & $F
  if rxtx_mode & %1000
    rx
*/
	unsigned long tx_head;
	int address;
	while ( par[3] == ((par[2] + 1 ) & 0xF)) {} // wait if the head has looped right round and is now one less than the tail
	tx_head = par[2];				// get the head value
	address = par[8] + 16 + tx_head;		// location of rx buffer plus 16 to get tx buffer plus the head value
	poke(address,tx);				// poke the tx byte value to hub ram
	tx_head = tx_head + 1;			// add one
	tx_head = tx_head & 0xF; 			// logical and with 15
	par[2] = tx_head;				// store it back again
							// need to add the echo mode?
}

unsigned long serial_rxcheck(unsigned long par[])
{
/*
PUB rxcheck : rxbyte
'' Check if byte received (never waits)
'' returns -1 if no byte received, $00..$FF if byte
  rxbyte--
  if rx_tail <> rx_head
    rxbyte := rx_buffer[rx_tail]
    rx_tail := (rx_tail + 1) & $F
*/
	unsigned long rxbyte;			// actually is a long, so can return -1 FFFFFFFF if nothing and 0-FF if a byte
	int address;					// hub address
	rxbyte = 0;					// set explicitly to zero
	rxbyte = rxbyte - 1;				// return ffffffff if nothing
	if (par[1] != par[0])
	{
		address = par[8] + par[1];		// par[8] is the rx buffer, par[1] is rx_tail
		rxbyte = peek(address);		// get the return byte from the buffer
		par[1] = (par[1] +1) & 0xF;		// add one to tail
	}
	return rxbyte;
}

unsigned long serial_rx(unsigned long par[])
{
/*
PUB rx : rxbyte
'' Receive byte (may wait for byte)
'' returns $00..$FF
  repeat while (rxbyte := rxcheck) < 0	
*/
	unsigned long rxbyte;					// actually is a long, not a byte
	while ((rxbyte = serial_rxcheck(par)) == -1) {} 	// 0xffffffff and -1 works, but " < 0" gives a compiler error
	return rxbyte;						// return the value
}

void serial_rxflush(unsigned long par[])				// flush receive buffer
{
	while (serial_rxcheck(par) != -1) {}			// keep checking until buffer clear
}

unsigned long serial_rxtime(unsigned long ms,unsigned long par[]) // wait ms milliseconds for byte, -1 if nothing
{
	unsigned long rxbyte = -1;
	unsigned long counter = 0;					// start a counter, 10ms ticks
	ms = ms / 10;							// internal delay for 1ms ticks is too high
	while (((rxbyte = serial_rxcheck(par)) == -1) & (counter < ms))	// wait until a byte or counter times out
	{
		_waitcnt(_cnt()+(10*(_clockfreq()/1000))-4296);		// wait 10 milliseconds
		counter +=1; 						// add one to counter
	}
	return rxbyte;
}

void serial_str(char lineoftext[],unsigned long par[])		// send out the string
{
/*
'' Send string                    

  repeat strsize(stringptr)
    tx(byte[stringptr++])
*/
	int i;
	for(i=0; i<strlen(lineoftext);i++)
	{
		serial_tx(lineoftext[i],par);			// send out the bytes one at a time
	}
}

void serial_dec(signed long value,unsigned long par[])	// send out decimal value - unsigned
{
/*
'' Print a decimal number
  if value < 0
    -value
    tx("-")
  i := 1_000_000_000
  repeat 10
    if value => i
      tx(value / i + "0")
      value //= i
      result~~
    elseif result or i == 1
      tx("0")
    i /= 10
*/
	char lineoftext[12] = "";					// enough room for a 32 bit long 2^32 and possibly the minus sign
  	sprintf(lineoftext, "%d", value);				// convert to a string
  									// printf  ("lineoftext is now: %s\n", lineoftext);
	serial_str(lineoftext,par);					// send out the string
}

void serial_hex(unsigned long value, unsigned long par[])	// send out a hex value
/*
'' Print a hexadecimal number
  value <<= (8 - digits) << 2
  repeat digits
    tx(lookupz((value <-= 4) & $F : "0".."9", "A".."F"))
*/
{
	char lineoftext[8] = "";					// enough room for FFFFFFFF
	sprintf(lineoftext,"%x",value);				// convert to hex value
	serial_str(lineoftext,par);					// send it out
}

void serial_crlf(unsigned long par[])				// send a crlf
{
	serial_tx(13,par);						// cr
	serial_tx(10,par);						// lf
}

int EoF (FILE* stream)
{
  	register int c, status = ((c = fgetc(stream)) == EOF);
  	ungetc(c,stream);
  	return status;
}

void readcog(char *filename,unsigned long external_cog[])		// read in a .cog file into external memory array 
{
	int i;
	FILE *FP1;
	i = 0;
	if((FP1=fopen(filename,"rb"))==0)					// open the file
   	{
  		fprintf(stderr,"Can't open file %s\n",filename);
		exit(1);
   	}
  	fseek(FP1,0,0);
	for(i=0;i<24;i++)
	{
		getc(FP1);							// read in the first 24 bytes and discard
	}
	i = 0;
  	while(!EoF(FP1) & (i<505))						// run until end of file or 511-6
	{
		external_cog[i] = getc(FP1) | (getc(FP1)<<8) | (getc(FP1)<<16) | (getc(FP1)<<24);	// get the long
		i+=1;
	}
	if(FP1)
       {
     		fclose(FP1);							// close the file
     		FP1=NULL;
   	}
	printf("external array cog first long = 0x%x \n",external_cog[0]);	// hex value
}

void serial_demo(unsigned long serial_parameters[])			// demonstrate the serial cog code
{
	int i;
	unsigned long value = 0x80000000;					// 80000000 is -1
	char lineoftext[80];							// for string testing
	unsigned long received_byte;					// actually a long, not a byte
       clearscreen();							// white on blue vga
       printf("Clock speed %u \n",_clockfreq());                     // see page 28 of the propeller manual for other useful commands
       printf("Catalina running in cog number %i \n",_cogid());      // integer
       readcog("serial.cog",cogarray);					// read into general external memory cog array
       serial_start(31,30,0,38400,7,serial_parameters,cogarray);      // start serial cog pins 31,30, mode 0, cog 7, 38400 baud
       printf("Started serial driver\n");
	for(i=0; i<10; i++)
	{
		serial_tx(65+i,serial_parameters); 			// test sending a byte 10x (delay for starting a serial terminal program)
		sleep(500);
		printf("send byte %u \n",65+i);
	}
	serial_crlf(serial_parameters);
	strcpy(lineoftext,"This is a really long string test with a slow baud rate to check buffer overruns");			// store a string
	serial_str(lineoftext,serial_parameters);				// send it out
	serial_crlf(serial_parameters);					// new line
	serial_dec(value,serial_parameters);				// send out a big decimal number
	serial_crlf(serial_parameters);					// new line
	serial_str("Hex value is ",serial_parameters);
	serial_hex(value,serial_parameters);				// send out a hex value
	serial_crlf(serial_parameters);
	serial_rxflush(serial_parameters);					// flush the receive buffer
	printf("Type a character within the next 3 seconds \n");	// test the timeout
	received_byte = serial_rxtime(3000,serial_parameters);		// get a byte with a timeout
	printf("character was ascii %d \n",received_byte);		// %d is signed
	printf("type some characters \n");
	for (i=0;i<10;i++)							// test 19 times, so tests buffer restarting
	{
		received_byte = serial_rx(serial_parameters);		// get a byte
		serial_tx(received_byte,serial_parameters);		// echo it back
		printf("sent back byte %u \n",received_byte);
	}
	printf("demo program finished \n");
}

void main ()
{
	unsigned long serial_parameters[16];				// reserve hub space in main for buffer, head tail pointers
	serial_demo(serial_parameters);					// demo routines       
	while (1); 								// endless loop as prop reboots on exit from main() 
}

A quick question
In spin
n <-= 1

in C, is this

n = (n << 1) | (n >> 31);

Also - I now have catalina booting up in text mode, then stopping the vga drivers and reloading a graphics driver 160x120. I can change the colors from within C eg screen[0] = 0xffffffff sets 4 pixels to white.

However, the screen buffer is stored in longs, and I want to access it in bytes. In spin, the command is
byte[myarray][number] := n

but how would you do this in C?

get the unsigned long, and clear one byte and replace with the new byte?
or get a pointer to the start of the array, add n bytes, then poke a value into hub ram?
or another way?

Catalina 2.9

Comments