Thanks for the update! I tried this new version and it was not noticably faster at running my xbasic test program. The old version of Catalina using the -x3 memory layout took 12 seconds and this version took 11 seconds. That could easily be explained by errors in pushing the stopwatch button on my watch. In any case, it's good to have the cache bug fixed since I'm sure it could have caused problems.
Thanks,
David
Something is not right. On my C3 it is at least twice as fast. I hope I haven't missed something out of the upgrade. Can you post your binary (and your makefile options) and I'll try it when I get home.
Something is not right. On my C3 it is at least twice as fast. I hope I haven't missed something out of the upgrade. Can you post your binary (and your makefile options) and I'll try it when I get home.
Thanks,
Ross.
Here is the binary and the makefile. Hopefully I didn't mess something up!
Here is the binary and the makefile. Hopefully I didn't mess something up!
No, it looks like it was me that messed up. I was working with some experimental changes to the caching algortithm, and I appear to have left them enabled.
In the file Catalina_SPI_Cache.spin you will find a line (currently commented out) that says:
'#define DISABLE_HASH
Remove the quote mark (i.e. define the symbol DISABLE_HASH) and try your program again. Note that you also have to recompile both the xmm.binary (in the utilities folder). You should see the program speed double.
No, it looks like it was me that messed up. I was working with some experimental changes to the caching algortithm, and I appear to have left them enabled.
In the file Catalina_SPI_Cache.spin you will find a line (currently commented out) that says:
'#define DISABLE_HASH
Remove the quote mark (i.e. define the symbol DISABLE_HASH) and try your program again. Note that you also have to recompile both the xmm.binary (in the utilities folder). You should see the program speed double.
Ross.
Thanks Ross! As you suggested, defining DISABLE_HASH almost doubled the speed of xbasic. It now takes about 7 seconds to compile and run my test program rather than 11-12. While that is certainly an improvement, it is still too slow to be useful. This is only a 35 line program. This isn't Catalina's fault entirely though. The xbasic bytecode compiler makes three passes over the source code so it is parsing the program three times. I may try compiling xbasic for the PIC24H on Andre' LaMothe's Chameleon PIC board just to see how it performs. It may not be much better. Of course, xbasic runs with blinding speed on my MacBook Pro! :-)
Thanks Ross! As you suggested, defining DISABLE_HASH almost doubled the speed of xbasic. It now takes about 7 seconds to compile and run my test program rather than 11-12. While that is certainly an improvement, it is still too slow to be useful. This is only a 35 line program. This isn't Catalina's fault entirely though. The xbasic bytecode compiler makes three passes over the source code so it is parsing the program three times. I may try compiling xbasic for the PIC24H on Andre' LaMothe's Chameleon PIC board just to see how it performs. It may not be much better. Of course, xbasic runs with blinding speed on my MacBook Pro! :-)
Hi David,
Additional speed improvements are possible, but it's never going to make the C3 an order of magnitude faster - not while programs have to be executed out of serial memory! At some point someone may make a parallel RAM add-on board for the C3, and that could change things.
I will keep the caching driver as an option since it also improve performances on other platforms - provided you can afford to sacrifice that much Hub RAM!
Additional speed improvements are possible, but it's never going to make the C3 an order of magnitude faster - not while programs have to be executed out of serial memory! At some point someone may make a parallel RAM add-on board for the C3, and that could change things.
I will keep the caching driver as an option since it also improve performances on other platforms - provided you can afford to sacrifice that much Hub RAM!
Ross.
David,
One more suggestion - why not arrange to load and save the byte-coded format? This was common practice in the "old" days of Basic interpreters (which were all generally pretty slow!). This makes the compilation speed less of an issue.
One more suggestion - why not arrange to load and save the byte-coded format? This was common practice in the "old" days of Basic interpreters (which were all generally pretty slow!). This makes the compilation speed less of an issue.
Ross.
That is certainly possible. In fact, this basic system started out as a compiler that ran on a PC and a VM that ran on the PIC, AVR, or Propeller. Andre' convinced me that we needed a language that would run on the Propeller without need for a PC so I stripped my compiler down and made it fit on the Propeller with external memory.
Does xbasic run on the dracblade? Also, do you have a link to a xbasic download by any chance?
It's kind of a work in progress. For instance, I haven't completed the heap manager for dynamic strings yet. It should run on the Dracblade but I haven't tried it. I'll attach the sources to this message if you promise not to laugh too loud when you look at them! :-)
Thanks for that. So - dumb question here, but is this the same as the xbasic you find when you search google? Or is this something you are writing yourself?
I think David's xbasic is different to the one you are probably finding on Google.
Also, I have added improvements to the way plugins are registered for release 3.0 ...
This program:
#include<catalina_plugin.h>#include<catalina_hmi.h>char *name(int type){
switch (type) {
case0 : return"Kernel";
case1 : return"HMI";
case2 : return"Library";
case3 : return"Float_A";
case4 : return"Float_B";
case5 : return"Real-Time Clock";
case6 : return"SD File System";
case7 : return"Serial I/O";
case8 : return"Dummy";
case9 : return"Graphics";
case10 : return"Keyboard";
case11 : return"Screen";
case12 : return"Mouse";
case13 : return"Proxy";
default : return"Unknown/None";
}
}
voidmain(){
int i;
int type;
request_t *rqst;
t_string(1, "Press any key to start\n\n");
k_wait();
t_printf("Registry Address = %x\n\n", _registry());
for (i = 0; i < 8; i++) {
type = REGISTERED_TYPE(i);
rqst = REQUEST_BLOCK(i);
t_printf("Cog %d (%x) Type = %s\n", i, (unsigned)rqst, name(type));
}
t_string(1, "\nPress any key to reboot");
k_wait();
}
produces this output:
Press any key to start
Registry Address = 00007FD4Cog0 (00007F94) Type = Kernel
Cog1 (00007F9C) Type = Keyboard
Cog2 (00007FA4) Type = Screen
Cog3 (00007FAC) Type = HMI
Cog4 (00007FB4) Type = Unknown/None
Cog5 (00007FBC) Type = Unknown/None
Cog6 (00007FC4) Type = Unknown/None
Cog7 (00007FCC) Type = Unknown/None
Press any key to reboot
This should much simplify identifying, stopping and re-starting cogs at runtime.
I thought David would answer this question, so I didn't.
Yes, xbasic runs on the DracBlade using the same caching driver as the C3. It is slightly faster than on the C3 - say 5s rather than 6s or 7s to run David's test program.
I don't think David would regard that as a really significant speed up.
However, just out of interest, I also tried it on the RamBlade and it runs in about 1.5s - this is partly due to the faster XMM RAM on the RamBlade (I think it is the fastest platform in that respect) and also because the RamBlade clock speed is 100Mz instead of 80Mhz. I wonder if David would consider that fast enough for his purposes?
David and I have exchanged a few PMs today and maybe it is worth taking this to a discussion here as this is very interesting.
Cluso's ramblade is definitely the fastest platform around. I think this gives us a benchmark to work from in terms of how fast things can be if you really optimise the code.
I took another look at the dracblade driver code and there are a few things that could be improved.
''Dracblade driver for talking to a ram chip via three latches'' Modified code from Cluso's triblade' DoCmd(command_, hub_address, ram_address, block_length)' R - read bytes at address n up (n to n+block_length) where n =0 to 65535 (ie lower 64k of the sram chip)' W - write bytes at address n up' I - initialise' N - Led on' F - Led off' H - set high latch to value in ramaddress A16 to A23 (will include the led)VAR' communication params(5) between cog driver code - only "command" and "errx" are modified by the driverlong command, hubaddrs, ramaddrs, blocklen, errx, cog ' rendezvous between spin and assembly (can be used cog to cog)' command = R, W, N, F H =0 when operation completed by cog' hubaddrs = hub address for data buffer' ramaddrs = ram address for data ($0000 to $FFFF)' blocklen = ram buffer length for data transfer' errx = returns =0 (false=good), else <>0 (true & error code)' cog = cog no of driver (set by spin start routine)PUBstart : err_' Initialise the Drac Ram driver. No actual changes to ram as the read/write routines handle this
command := "I"
cog := 1 + cognew(@tbp2_start, @command)
if cog == 0
err_ := $FF' error = no cogelserepeatwhile command ' driver cog sets =0 when done
err_ := errx ' driver cog sets =0 if no error, else xx = error codePUBstopif cog
cogstop(cog~ - 1)
PUBDoCmd(command_, hub_address, ram_address, block_length) : err_' Do the command: R, W, N, F, H
hubaddrs := hub_address ' hub address start
ramaddrs := ram_address ' ram address start
blocklen := block_length ' block length
command := command_ ' must be last !!' Wait for command to complete and get statusrepeatwhile command ' driver cog sets =0 when done
err_ := errx ' driver cog sets =0 if no error, else xx = error codePUBrendezvousreturn @command
DAT'' +--------------------------------------------------------------------------+'' | Dracblade Ram Driver (with grateful acknowlegements to Cluso) |'' +--------------------------------------------------------------------------+org0
tbp2_start ' setup the pointers to the hub command interface (saves execution time later' +-- These instructions are overwritten as variables after start
comptr mov comptr, par' -| hub pointer to command
hubptr mov hubptr, par' | hub pointer to hub address
ramptr add hubptr, #4' | hub pointer to ram address
lenptr mov ramptr, par' | hub pointer to length
errptr add ramptr, #8' | hub pointer to error status
cmd mov lenptr, par' | command I/R/W/G/P/Q
hubaddr add lenptr, #12' | hub address
ramaddr mov errptr, par' | ram address
len add errptr, #16' | length
err nop' -+ error status returned (=0=false=good) ' Initialise hardware (unlike the triblade, just tristates everything and read/write set the pins)
init mov err, #0' reset err=false=goodmovdira,zero ' tristate the pins
done wrlong err, errptr ' status =0=false=good, else error xwrlong zero, comptr ' command =0 (done)' wait for a command (pause short time to reduce power)
pause mov ctr, delay wz' if =0 no pauseif_nzadd ctr, cntif_nzwaitcnt ctr, #0' wait for a short time (reduces power)rdlong cmd, comptr wz' command ?if_zjmp #pause ' not yet' decode commandcmp cmd, #"R"wz' R = read blockif_zjmp #rdblock
cmp cmd, #"W"wz' W = write blockif_zjmp #wrblock
cmp cmd, #"N"wz' N= led onif_zjmp #led_turn_on
cmp cmd, #"F"wz' F = led offif_zjmp #led_turn_off
cmp cmd, #"H"wz' H sets the high latchif_zjmp #sethighlatch
mov err, cmd ' error = cmd (unknown command)jmp #done
tristate movdira,zero ' all inputs to zerojmp #done
' turn led on
led_turn_on or HighLatch,ledpin ' set the led pin highjmp #OutputHighLatch ' send this out
led_turn_off andn HighLatch,ledpin ' set the led pin lowjmp #OutputHighLatch ' send this out' set high address bytes with command H, pass value in third variable of the DoCmd' 4 bytes - masks off all but bits 16 to 23
sethighlatch call #ram_open ' gets address value in 'address'shr address,#16' shift right by 16 placesand address,#$FF' ensure rest of bits zeromov HighLatch,address ' put value into HighLatchjmp #OutputHighLatch ' and output it'---------------------------------------------------------------------------------------------------------'Memory Access Functions
rdblock call #ram_open ' get variables from hub variables
rdloop call #read_memory_byte ' read byte from address into data_8wrbyte data_8,hubaddr ' write data_8 to hubaddr ie copy byte to hubadd hubaddr,#1' add 1 to hub addressadd address,#1' add 1 to ram addressdjnz len,#rdloop ' loop until donejmp #init ' reinitialise
wrblock call #ram_open
wrloop rdbyte data_8, hubaddr ' copy byte from hubcall #write_memory_byte ' write byte from data_8 to addressadd hubaddr,#1' add 1 to hub addressadd address,#1' add 1 to ram addressdjnz len,#wrloop ' loop until donejmp #init ' reinitialise
ram_open rdlong hubaddr, hubptr ' get hub addressrdlong ramaddr, ramptr ' get ram addressrdlong len, lenptr ' get lengthmov err, #5' err=5mov address,ramaddr ' cluso's variable 'ramaddr' to dracblade variable 'address'
ram_open_ret ret
read_memory_byte call #RamAddress ' sets up the latches with the correct ram addressmovdira,LatchDirection2 ' for reads so P0-P7 tristate till do readmovouta,GateHigh ' actually ReadEnable but they are the sameandnouta,GateHigh ' set gate lownop' short delay to stabilisenopmov data_8, ina' read SRAMand data_8, #$FF' extract 8 bitsorouta,GateHigh ' set the gate high again
read_memory_byte_ret ret
write_memory_byte call #RamAddress ' sets up the latches with the correct ram addressmov outx,data_8 ' get the byte to outputand outx, #$FF' ensure upper bytes=0or outx,WriteEnable ' or with correct 138 addressmovouta,outx ' send it outandnouta,GateHigh ' set gate lownop' no nop doesn't work, one does, so put in two to be surenop' another NOPorouta,GateHigh ' set it high again
write_memory_byte_ret ret
RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of rammovdira,LatchDirection ' set up the pins for programming latch chipsmov outx,address ' get the address into a temp variableand outx,#$FF' mask the low byteor outx,LowAddress ' or with 138 low addressmovouta,outx ' send it outandnouta,GateHigh ' set gate low' ?? a NOPorouta,GateHigh ' set it high again ' now repeat for the middle byte mov outx,address ' get the address into a temp variableshr outx,#8' shift right by 8 placesand outx,#$FF' mask the low byteor outx,MiddleAddress ' or with 138 middle addressmovouta,outx ' send it outandnouta,GateHigh ' set gate loworouta,GateHigh ' set it high again
RamAddress_ret ret
OutputHighLatch ' sends out HighLatch to the 374 that does A16-19, led and the 4 spare outputsmovdira,latchdirection ' setup active pins 138 and busmovouta,HighLatch ' send out HighLatchorouta,HighAddress ' or with the high addressandnouta,GateHigh ' set gate loworouta,GateHigh ' set the gate high again
OutputHighLatch_ret jmp #tristate ' set pins tristate
delay long80' waitcnt delay to reduce power (#80 = 1uS approx)
ctr long0' used to pause execution (lower power use) & byte counter
GateHigh long%00000000_00000000_00000001_00000000' HC138 gate high, all others must be low
Outx long0' for temp use, same as n in the spin code
LatchDirection long%00000000_00000000_00001111_11111111' 138 active, gate active and 8 data lines active
LatchDirection2 long%00000000_00000000_00001111_00000000' for reads so data lines are tristate till the read
LowAddress long%00000000_00000000_00000101_00000000' low address latch = xxxx010x and gate high xxxxxxx1
MiddleAddress long%00000000_00000000_00000111_00000000' middle address latch = xxxx011x and gate high xxxxxxx1
HighAddress long%00000000_00000000_00001001_00000000' high address latch = xxxx100x and gate high xxxxxxx1'ReadEnable long %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1' commented out as the same as GateHigh
WriteEnable long%00000000_00000000_00000011_00000000' /WE = xxxx001x and gate high xxxxxxx1
Zero long%00000000_00000000_00000000_00000000' for tristating all pins
data_8 long%00000000_00000000_00000000_00000000' so code compatability with zicog driver
address long%00000000_00000000_00000000_00000000' address for ram chip
ledpin long%00000000_00000000_00000000_00001000' to turn on led
HighLatch long%00000000_00000000_00000000_00000000' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs
1) there is a deliberate delay
' wait for a command (pause short time to reduce power)
pause mov ctr, delay wz' if =0 no pauseif_nzadd ctr, cntif_nzwaitcnt ctr, #0' wait for a short time (reduces power)rdlong cmd, comptr wz' command ?
- maybe save some lines there
2) Reading in blocks of data. There are 19 address lines on a 512k chip and at the moment these are in two groups - the High group A16 to A18 and the Low and Middle group which are grouped together. This seemed natural for the Z80 emulations with 16 bit addresses.
But what if we separate out the Low and Middle latches?
I count 46 instructions to read one byte from external memory. Surely that can be decreased?!!
First thing might be to leave the middle latch unchanged and just change the lower latch. Maybe do it in groups of 4 bytes, or maybe in groups of 16 or 256?
I think that can save 8 instructions per byte.
Also I think by doing things in blocks, you don't have to keep checking for new instructions each byte. Say the requesting program wanted a Long, well then you can skip a whole lot of rechecking code for new requests.
I think that can halve the number of instructions per byte if you do Longs.
And then one might think about optimising further. For C, it depends on the probability that an instruction will cause a branch outside a block of n bytes. At the extremes, say you requested byte x and it read in the next 64k of bytes. This will take a lot of time but with a small probability that a jump will go outside this block. Read in 1 long, and that is inefficient too. I'm not sure of the maths, but say the probability of a jump was 10%, then maybe as a guess it might be best to read in 16 bytes as a block?
The driver code above already has an instruction for reading in blocks, it is just that I think mostly we read in blocks of 1, ie a byte. Ross, a) is that how catalina works and b) where is the source code for the dracblade driver file and what is it called?
So you might pass an address n=0 to 512k.
1) is this in the same high/medium latch range as the last request?
2) If yes, read bytes but only change the low latch.
3) If no then update the medium and high latches.
I wonder also about a lookahead cache.The requesting spin code requests a byte at address n. The cog goes and starts reading from this address. I'd need to check speeds, but there is a fairly good chance the cog will be faster than the requesting spin, so the cog will always be ahead of the requesting program, so from the requesting programs point of view, it requests byte n and for the next 256 bytes the values are always correct in a buffer.
Then there is another variable - how often would the cog code check the passed parameter to see if the calling program wants a different block. Maybe if the probability of a branch in C is 10%, you check only every 10 bytes? If so, that saves even more code.
I took another look at the dracblade driver code and there are a few things that could be improved.
Hi Dr_A ...
Yes absolutely - I've not really done any optimization on the original caching driver code yet. In fact it only currently supports the DRACBLADE at all because David's and Jazzed's original driver code already did!
What I plan to do next is rewrite the interface from the caching driver to use my standard XMM code. That code is already written for all XMM platforms, and is much more optimized (although probably still a long way from being as good as it could be!).
That's about the last thing I expect to do before I am ready to release Catalina 3.0.
David, Jazzed ...
I found a bug in the Catalina SD Card driver initializtion code that seems to show up on the C3. I've now fixed it, but if you are having occasional strange problems with programs sometimes not being able to access the SD card (but which work ok when you reload them) then this may be the reason. It may also have affected other platforms - for example I think it is the reason I was having occasional problems with the SD card on the RamBlade (and for which I was - quite unfairly - blaming Cluso!).
Oh darn. Someone *extremely* clever has already split the middle and lower latch! This XMM driver looks extremely well optimised. I think only caching would improve that, and any improvements due to caching will apply equally to the C3.
XMM_IncAddr
add XMM_Addr,#1' inc sram addressmov outx,XMM_Addr ' does result of incrementing ...and outx,#$FF' ... require updating latch 8 - 15 or 16 - 19?tjnz outx,#XMM_Set0_7 ' if not, just set latch for addr bits 0 - 7call #XMM_SetAddr ' otherwise we must set all latchesjmp #XMM_IncAddr_ret ' done
I thought David would answer this question, so I didn't.
Yes, xbasic runs on the DracBlade using the same caching driver as the C3. It is slightly faster than on the C3 - say 5s rather than 6s or 7s to run David's test program.
I don't think David would regard that as a really significant speed up.
However, just out of interest, I also tried it on the RamBlade and it runs in about 1.5s - this is partly due to the faster XMM RAM on the RamBlade (I think it is the fastest platform in that respect) and also because the RamBlade clock speed is 100Mz instead of 80Mhz. I wonder if David would consider that fast enough for his purposes?
Ross.
Sorry I didn't post my reply in here. I was trying not to hijack your thread to discuss xbasic. I guess I should try the RamBlade. I've had one for a long time but have never done anything with it. I guess I stopped when I discovered that you couldn't use the standard pin 31/30 serial I/O. How do you have your RamBlade configured?
Sorry I didn't post my reply in here. I was trying not to hijack your thread to discuss xbasic. I guess I should try the RamBlade. I've had one for a long time but have never done anything with it. I guess I stopped when I discovered that you couldn't use the standard pin 31/30 serial I/O. How do you have your RamBlade configured?
Thanks,
David
Just with the SRAM and SD Card. I just use the normal PropPlug for comms. As shown on the diagram below, you plug it onto the middle 4 pins for programming the EEPROM, and the bottom 4 pins for terminal I/O (and use Catalyst to load programs off the SD Card).
Thanks Ross! I'll have to try that setup. I guess I was put off a bit by the fact that I would have to reconnect the serial interface to reprogram the card. It makes development a bit of a pain. I wonder why he used the high numbered pins for his SRAM interface?
Thanks Ross! I'll have to try that setup. I guess I was put off a bit by the fact that I would have to reconnect the serial interface to reprogram the card. It makes development a bit of a pain. I wonder why he used the high numbered pins for his SRAM interface?
If I know Cluso, it was done for a good reason - most likely because it allowed the SRAM to be used with the least possible number of instructions.
Ross.
P.S. In a lot of ways, the RamBlade is my favorite board. If only it could be powered by the USB port, it would be the ideal "portable" Prop platform!
The cogjects project now has 8 cogjects. These can be used by Spin, and they can also be used by Catalina. I have plans to write drivers for other languages as well.
What this means in C is that no more 'inline' pasm code in the C program. Do the debugging in Spin and then when it works, move it over to C. The following code is for the Serial driver and I have left the Spin code in as this will be useful in translating spin in the future.
From a practical perspective, Spin can only do so much even with cogjects. The SD driver takes about 1/4 of hub, a decent video buffer takes just under 20k, and there is not much space left for code.
C in XMM on the other hand puts the SD driver into external memory and most of the hub is free for a video buffer.
/* PASM cogject demonstration, see also cogject example in spin*/#include<stdio.h>unsignedlong cogarray[511]; // external memory common cog array// start of C functionsvoidclearscreen()// white text on dark blue background{
int i;
for (i=0;i<40;i++)
{
t_setpos(0,0,i); // move cursor to next line
t_color(0,0x08FC); // RRGGBBxx eg dark blue background 00001000 white text 11111100
}
}
voidsleep(int milliseconds)// sleep function{
_waitcnt(_cnt()+(milliseconds*(_clockfreq()/1000))-4296);
}
charpeek(int address)// function implementation of peek{
return *((char *)address);
}
voidpoke(int address, char value)// function implementation of poke{
*((char *)address) = value;
}
voidexternal_memory_cog_load(int cognumber, unsignedlong cogdata[], unsignedlong parameters_array[])// load a cog from external memory{
unsignedlong hubcog[511]; // create a local array, this is in hub ram, not external ram int i;
for(i=0;i<512;i++)
{
hubcog[i]=cogdata[i]; // move from external memory to a local array in hub
}
_coginit((int)parameters_array>>2, (int)hubcog>>2, cognumber); // load the cog
}
unsignedlongserial_start(unsignedlong rxpin,unsignedlong txpin,unsignedlong mode, unsignedlong baudrate, int cognumber, unsignedlong par[], unsignedlong cogdata[]){
/*
PUB start(rxpin, txpin, mode, baudrate) : okay
'' Start serial driver - starts a cog
'' returns false if no cog available
''
'' mode bit 0 = invert rx
'' mode bit 1 = invert tx
'' mode bit 2 = open-drain/source tx
'' mode bit 3 = ignore tx echo on rx
stop
longfill(@rx_head, 0, 4)
longmove(@rx_pin, @rxpin, 3)
bit_ticks := clkfreq / baudrate
buffer_ptr := @rx_buffer
okay := cog := cognew(@entry, @rx_head) + 1
*/unsignedlong okay;
unsignedlong bit_ticks;
unsignedlong buffer_ptr;
par[0] = 0; // rx_head longfill(@rx_head, 0, 4)
par[1] = 0; // rx_tail
par[2] = 0; // tx_head
par[3] = 0; // tx_tail
par[4] = rxpin; // longmove(@rx_pin, @rxpin, 3)
par[5] = txpin; // note - if rewrite the pasm code could save a couple of hub longs here
par[6] = mode; // as rxpin and txpin are not used anywhere else
bit_ticks = _clockfreq() / baudrate; // bit_ticks := clkfreq / baudrate
par[7] = bit_ticks;
buffer_ptr = (unsignedlong)&par[9]; // buffer_ptr := @rx_buffer points to start of circular buffer
par[8] = buffer_ptr; // pointer to the start of the circular buffers// rx buffer is 9 to 12 and tx buffer is 13 to 16 (16 bytes =4 longs)
external_memory_cog_load(cognumber,cogdata,par); // load from external ram// okay returns the cog number or -1 if a fail page 119 manual. Ignored here// printf("par array is at %u \n",(unsigned long)&par[0]);// printf("par array entry 1 is at %u \n",(unsigned long)&par[1]);// printf("par array entry 7 is at %u \n",(unsigned long)&par[7]);// printf("rx_head is at %u \n",(unsigned long)&par[9]);// printf("buffer_ptr is %u \n",par[8]);return okay;
}
voidserial_tx(char tx,unsignedlong par[]){
/*
PUB tx(txbyte)
'' Send byte (may wait for room in buffer)
repeat until (tx_tail <> (tx_head + 1) & $F)
tx_buffer[tx_head] := txbyte
tx_head := (tx_head + 1) & $F
if rxtx_mode & %1000
rx
*/unsignedlong tx_head;
int address;
while ( par[3] == ((par[2] + 1 ) & 0xF)) {} // wait if the head has looped right round and is now one less than the tail
tx_head = par[2]; // get the head value
address = par[8] + 16 + tx_head; // location of rx buffer plus 16 to get tx buffer plus the head value
poke(address,tx); // poke the tx byte value to hub ram
tx_head = tx_head + 1; // add one
tx_head = tx_head & 0xF; // logical and with 15
par[2] = tx_head; // store it back again// need to add the echo mode?
}
unsignedlongserial_rxcheck(unsignedlong par[]){
/*
PUB rxcheck : rxbyte
'' Check if byte received (never waits)
'' returns -1 if no byte received, $00..$FF if byte
rxbyte--
if rx_tail <> rx_head
rxbyte := rx_buffer[rx_tail]
rx_tail := (rx_tail + 1) & $F
*/unsignedlong rxbyte; // actually is a long, so can return -1 FFFFFFFF if nothing and 0-FF if a byteint address; // hub address
rxbyte = 0; // set explicitly to zero
rxbyte = rxbyte - 1; // return ffffffff if nothingif (par[1] != par[0])
{
address = par[8] + par[1]; // par[8] is the rx buffer, par[1] is rx_tail
rxbyte = peek(address); // get the return byte from the buffer
par[1] = (par[1] +1) & 0xF; // add one to tail
}
return rxbyte;
}
unsignedlongserial_rx(unsignedlong par[]){
/*
PUB rx : rxbyte
'' Receive byte (may wait for byte)
'' returns $00..$FF
repeat while (rxbyte := rxcheck) < 0
*/unsignedlong rxbyte; // actually is a long, not a bytewhile ((rxbyte = serial_rxcheck(par)) == -1) {} // 0xffffffff and -1 works, but " < 0" gives a compiler errorreturn rxbyte; // return the value
}
voidserial_rxflush(unsignedlong par[])// flush receive buffer{
while (serial_rxcheck(par) != -1) {} // keep checking until buffer clear
}
unsignedlongserial_rxtime(unsignedlong ms,unsignedlong par[])// wait ms milliseconds for byte, -1 if nothing{
unsignedlong rxbyte = -1;
unsignedlong counter = 0; // start a counter, 10ms ticks
ms = ms / 10; // internal delay for 1ms ticks is too highwhile (((rxbyte = serial_rxcheck(par)) == -1) & (counter < ms)) // wait until a byte or counter times out
{
_waitcnt(_cnt()+(10*(_clockfreq()/1000))-4296); // wait 10 milliseconds
counter +=1; // add one to counter
}
return rxbyte;
}
voidserial_str(char lineoftext[],unsignedlong par[])// send out the string{
/*
'' Send string
repeat strsize(stringptr)
tx(byte[stringptr++])
*/int i;
for(i=0; i<strlen(lineoftext);i++)
{
serial_tx(lineoftext[i],par); // send out the bytes one at a time
}
}
voidserial_dec(signedlong value,unsignedlong par[])// send out decimal value - unsigned{
/*
'' Print a decimal number
if value < 0
-value
tx("-")
i := 1_000_000_000
repeat 10
if value => i
tx(value / i + "0")
value //= i
result~~
elseif result or i == 1
tx("0")
i /= 10
*/char lineoftext[12] = ""; // enough room for a 32 bit long 2^32 and possibly the minus signsprintf(lineoftext, "%d", value); // convert to a string// printf ("lineoftext is now: %s\n", lineoftext);
serial_str(lineoftext,par); // send out the string
}
voidserial_hex(unsignedlong value, unsignedlong par[])// send out a hex value/*
'' Print a hexadecimal number
value <<= (8 - digits) << 2
repeat digits
tx(lookupz((value <-= 4) & $F : "0".."9", "A".."F"))
*/{
char lineoftext[8] = ""; // enough room for FFFFFFFFsprintf(lineoftext,"%x",value); // convert to hex value
serial_str(lineoftext,par); // send it out
}
voidserial_crlf(unsignedlong par[])// send a crlf{
serial_tx(13,par); // cr
serial_tx(10,par); // lf
}
intEoF(FILE* stream){
registerint c, status = ((c = fgetc(stream)) == EOF);
ungetc(c,stream);
return status;
}
voidreadcog(char *filename,unsignedlong external_cog[])// read in a .cog file into external memory array {
int i;
FILE *FP1;
i = 0;
if((FP1=fopen(filename,"rb"))==0) // open the file
{
fprintf(stderr,"Can't open file %s\n",filename);
exit(1);
}
fseek(FP1,0,0);
for(i=0;i<24;i++)
{
getc(FP1); // read in the first 24 bytes and discard
}
i = 0;
while(!EoF(FP1) & (i<505)) // run until end of file or 511-6
{
external_cog[i] = getc(FP1) | (getc(FP1)<<8) | (getc(FP1)<<16) | (getc(FP1)<<24); // get the long
i+=1;
}
if(FP1)
{
fclose(FP1); // close the file
FP1=NULL;
}
printf("external array cog first long = 0x%x \n",external_cog[0]); // hex value
}
voidserial_demo(unsignedlong serial_parameters[])// demonstrate the serial cog code{
int i;
unsignedlong value = 0x80000000; // 80000000 is -1char lineoftext[80]; // for string testingunsignedlong received_byte; // actually a long, not a byte
clearscreen(); // white on blue vgaprintf("Clock speed %u \n",_clockfreq()); // see page 28 of the propeller manual for other useful commandsprintf("Catalina running in cog number %i \n",_cogid()); // integer
readcog("serial.cog",cogarray); // read into general external memory cog array
serial_start(31,30,0,38400,7,serial_parameters,cogarray); // start serial cog pins 31,30, mode 0, cog 7, 38400 baudprintf("Started serial driver\n");
for(i=0; i<10; i++)
{
serial_tx(65+i,serial_parameters); // test sending a byte 10x (delay for starting a serial terminal program)
sleep(500);
printf("send byte %u \n",65+i);
}
serial_crlf(serial_parameters);
strcpy(lineoftext,"This is a really long string test with a slow baud rate to check buffer overruns"); // store a string
serial_str(lineoftext,serial_parameters); // send it out
serial_crlf(serial_parameters); // new line
serial_dec(value,serial_parameters); // send out a big decimal number
serial_crlf(serial_parameters); // new line
serial_str("Hex value is ",serial_parameters);
serial_hex(value,serial_parameters); // send out a hex value
serial_crlf(serial_parameters);
serial_rxflush(serial_parameters); // flush the receive bufferprintf("Type a character within the next 3 seconds \n"); // test the timeout
received_byte = serial_rxtime(3000,serial_parameters); // get a byte with a timeoutprintf("character was ascii %d \n",received_byte); // %d is signedprintf("type some characters \n");
for (i=0;i<10;i++) // test 19 times, so tests buffer restarting
{
received_byte = serial_rx(serial_parameters); // get a byte
serial_tx(received_byte,serial_parameters); // echo it backprintf("sent back byte %u \n",received_byte);
}
printf("demo program finished \n");
}
voidmain(){
unsignedlong serial_parameters[16]; // reserve hub space in main for buffer, head tail pointers
serial_demo(serial_parameters); // demo routines while (1); // endless loop as prop reboots on exit from main()
}
A quick question
In spin
n <-= 1
in C, is this
n = (n << 1) | (n >> 31);
Also - I now have catalina booting up in text mode, then stopping the vga drivers and reloading a graphics driver 160x120. I can change the colors from within C eg screen[0] = 0xffffffff sets 4 pixels to white.
However, the screen buffer is stored in longs, and I want to access it in bytes. In spin, the command is
byte[myarray][number] := n
but how would you do this in C?
get the unsigned long, and clear one byte and replace with the new byte?
or get a pointer to the start of the array, add n bytes, then poke a value into hub ram?
or another way?
Comments
Something is not right. On my C3 it is at least twice as fast. I hope I haven't missed something out of the upgrade. Can you post your binary (and your makefile options) and I'll try it when I get home.
Thanks,
Ross.
Here is the binary and the makefile. Hopefully I didn't mess something up!
Ross.
No, it looks like it was me that messed up. I was working with some experimental changes to the caching algortithm, and I appear to have left them enabled.
In the file Catalina_SPI_Cache.spin you will find a line (currently commented out) that says:
'#define DISABLE_HASH
Remove the quote mark (i.e. define the symbol DISABLE_HASH) and try your program again. Note that you also have to recompile both the xmm.binary (in the utilities folder). You should see the program speed double.Ross.
Thanks Ross! As you suggested, defining DISABLE_HASH almost doubled the speed of xbasic. It now takes about 7 seconds to compile and run my test program rather than 11-12. While that is certainly an improvement, it is still too slow to be useful. This is only a 35 line program. This isn't Catalina's fault entirely though. The xbasic bytecode compiler makes three passes over the source code so it is parsing the program three times. I may try compiling xbasic for the PIC24H on Andre' LaMothe's Chameleon PIC board just to see how it performs. It may not be much better. Of course, xbasic runs with blinding speed on my MacBook Pro! :-)
Hi David,
Additional speed improvements are possible, but it's never going to make the C3 an order of magnitude faster - not while programs have to be executed out of serial memory! At some point someone may make a parallel RAM add-on board for the C3, and that could change things.
I will keep the caching driver as an option since it also improve performances on other platforms - provided you can afford to sacrifice that much Hub RAM!
Ross.
David,
One more suggestion - why not arrange to load and save the byte-coded format? This was common practice in the "old" days of Basic interpreters (which were all generally pretty slow!). This makes the compilation speed less of an issue.
Ross.
That is certainly possible. In fact, this basic system started out as a compiler that ran on a PC and a VM that ran on the PIC, AVR, or Propeller. Andre' convinced me that we needed a language that would run on the Propeller without need for a PC so I stripped my compiler down and made it fit on the Propeller with external memory.
It's kind of a work in progress. For instance, I haven't completed the heap manager for dynamic strings yet. It should run on the Dracblade but I haven't tried it. I'll attach the sources to this message if you promise not to laugh too loud when you look at them! :-)
I think David's xbasic is different to the one you are probably finding on Google.
Also, I have added improvements to the way plugins are registered for release 3.0 ...
This program:
#include <catalina_plugin.h> #include <catalina_hmi.h> char *name(int type) { switch (type) { case 0 : return "Kernel"; case 1 : return "HMI"; case 2 : return "Library"; case 3 : return "Float_A"; case 4 : return "Float_B"; case 5 : return "Real-Time Clock"; case 6 : return "SD File System"; case 7 : return "Serial I/O"; case 8 : return "Dummy"; case 9 : return "Graphics"; case 10 : return "Keyboard"; case 11 : return "Screen"; case 12 : return "Mouse"; case 13 : return "Proxy"; default : return "Unknown/None"; } } void main() { int i; int type; request_t *rqst; t_string(1, "Press any key to start\n\n"); k_wait(); t_printf("Registry Address = %x\n\n", _registry()); for (i = 0; i < 8; i++) { type = REGISTERED_TYPE(i); rqst = REQUEST_BLOCK(i); t_printf("Cog %d (%x) Type = %s\n", i, (unsigned)rqst, name(type)); } t_string(1, "\nPress any key to reboot"); k_wait(); }
produces this output:Press any key to start Registry Address = 00007FD4 Cog 0 (00007F94) Type = Kernel Cog 1 (00007F9C) Type = Keyboard Cog 2 (00007FA4) Type = Screen Cog 3 (00007FAC) Type = HMI Cog 4 (00007FB4) Type = Unknown/None Cog 5 (00007FBC) Type = Unknown/None Cog 6 (00007FC4) Type = Unknown/None Cog 7 (00007FCC) Type = Unknown/None Press any key to reboot
This should much simplify identifying, stopping and re-starting cogs at runtime.Ross.
Hi Dr_A,
I thought David would answer this question, so I didn't.
Yes, xbasic runs on the DracBlade using the same caching driver as the C3. It is slightly faster than on the C3 - say 5s rather than 6s or 7s to run David's test program.
I don't think David would regard that as a really significant speed up.
However, just out of interest, I also tried it on the RamBlade and it runs in about 1.5s - this is partly due to the faster XMM RAM on the RamBlade (I think it is the fastest platform in that respect) and also because the RamBlade clock speed is 100Mz instead of 80Mhz. I wonder if David would consider that fast enough for his purposes?
Ross.
Cluso's ramblade is definitely the fastest platform around. I think this gives us a benchmark to work from in terms of how fast things can be if you really optimise the code.
I took another look at the dracblade driver code and there are a few things that could be improved.
''Dracblade driver for talking to a ram chip via three latches '' Modified code from Cluso's triblade ' DoCmd(command_, hub_address, ram_address, block_length) ' R - read bytes at address n up (n to n+block_length) where n =0 to 65535 (ie lower 64k of the sram chip) ' W - write bytes at address n up ' I - initialise ' N - Led on ' F - Led off ' H - set high latch to value in ramaddress A16 to A23 (will include the led) VAR ' communication params(5) between cog driver code - only "command" and "errx" are modified by the driver long command, hubaddrs, ramaddrs, blocklen, errx, cog ' rendezvous between spin and assembly (can be used cog to cog) ' command = R, W, N, F H =0 when operation completed by cog ' hubaddrs = hub address for data buffer ' ramaddrs = ram address for data ($0000 to $FFFF) ' blocklen = ram buffer length for data transfer ' errx = returns =0 (false=good), else <>0 (true & error code) ' cog = cog no of driver (set by spin start routine) PUB start : err_ ' Initialise the Drac Ram driver. No actual changes to ram as the read/write routines handle this command := "I" cog := 1 + cognew(@tbp2_start, @command) if cog == 0 err_ := $FF ' error = no cog else repeat while command ' driver cog sets =0 when done err_ := errx ' driver cog sets =0 if no error, else xx = error code PUB stop if cog cogstop(cog~ - 1) PUB DoCmd(command_, hub_address, ram_address, block_length) : err_ ' Do the command: R, W, N, F, H hubaddrs := hub_address ' hub address start ramaddrs := ram_address ' ram address start blocklen := block_length ' block length command := command_ ' must be last !! ' Wait for command to complete and get status repeat while command ' driver cog sets =0 when done err_ := errx ' driver cog sets =0 if no error, else xx = error code PUB rendezvous return @command DAT '' +--------------------------------------------------------------------------+ '' | Dracblade Ram Driver (with grateful acknowlegements to Cluso) | '' +--------------------------------------------------------------------------+ org 0 tbp2_start ' setup the pointers to the hub command interface (saves execution time later ' +-- These instructions are overwritten as variables after start comptr mov comptr, par ' -| hub pointer to command hubptr mov hubptr, par ' | hub pointer to hub address ramptr add hubptr, #4 ' | hub pointer to ram address lenptr mov ramptr, par ' | hub pointer to length errptr add ramptr, #8 ' | hub pointer to error status cmd mov lenptr, par ' | command I/R/W/G/P/Q hubaddr add lenptr, #12 ' | hub address ramaddr mov errptr, par ' | ram address len add errptr, #16 ' | length err nop ' -+ error status returned (=0=false=good) ' Initialise hardware (unlike the triblade, just tristates everything and read/write set the pins) init mov err, #0 ' reset err=false=good mov dira,zero ' tristate the pins done wrlong err, errptr ' status =0=false=good, else error x wrlong zero, comptr ' command =0 (done) ' wait for a command (pause short time to reduce power) pause mov ctr, delay wz ' if =0 no pause if_nz add ctr, cnt if_nz waitcnt ctr, #0 ' wait for a short time (reduces power) rdlong cmd, comptr wz ' command ? if_z jmp #pause ' not yet ' decode command cmp cmd, #"R" wz ' R = read block if_z jmp #rdblock cmp cmd, #"W" wz ' W = write block if_z jmp #wrblock cmp cmd, #"N" wz ' N= led on if_z jmp #led_turn_on cmp cmd, #"F" wz ' F = led off if_z jmp #led_turn_off cmp cmd, #"H" wz ' H sets the high latch if_z jmp #sethighlatch mov err, cmd ' error = cmd (unknown command) jmp #done tristate mov dira,zero ' all inputs to zero jmp #done ' turn led on led_turn_on or HighLatch,ledpin ' set the led pin high jmp #OutputHighLatch ' send this out led_turn_off andn HighLatch,ledpin ' set the led pin low jmp #OutputHighLatch ' send this out ' set high address bytes with command H, pass value in third variable of the DoCmd ' 4 bytes - masks off all but bits 16 to 23 sethighlatch call #ram_open ' gets address value in 'address' shr address,#16 ' shift right by 16 places and address,#$FF ' ensure rest of bits zero mov HighLatch,address ' put value into HighLatch jmp #OutputHighLatch ' and output it '--------------------------------------------------------------------------------------------------------- 'Memory Access Functions rdblock call #ram_open ' get variables from hub variables rdloop call #read_memory_byte ' read byte from address into data_8 wrbyte data_8,hubaddr ' write data_8 to hubaddr ie copy byte to hub add hubaddr,#1 ' add 1 to hub address add address,#1 ' add 1 to ram address djnz len,#rdloop ' loop until done jmp #init ' reinitialise wrblock call #ram_open wrloop rdbyte data_8, hubaddr ' copy byte from hub call #write_memory_byte ' write byte from data_8 to address add hubaddr,#1 ' add 1 to hub address add address,#1 ' add 1 to ram address djnz len,#wrloop ' loop until done jmp #init ' reinitialise ram_open rdlong hubaddr, hubptr ' get hub address rdlong ramaddr, ramptr ' get ram address rdlong len, lenptr ' get length mov err, #5 ' err=5 mov address,ramaddr ' cluso's variable 'ramaddr' to dracblade variable 'address' ram_open_ret ret read_memory_byte call #RamAddress ' sets up the latches with the correct ram address mov dira,LatchDirection2 ' for reads so P0-P7 tristate till do read mov outa,GateHigh ' actually ReadEnable but they are the same andn outa,GateHigh ' set gate low nop ' short delay to stabilise nop mov data_8, ina ' read SRAM and data_8, #$FF ' extract 8 bits or outa,GateHigh ' set the gate high again read_memory_byte_ret ret write_memory_byte call #RamAddress ' sets up the latches with the correct ram address mov outx,data_8 ' get the byte to output and outx, #$FF ' ensure upper bytes=0 or outx,WriteEnable ' or with correct 138 address mov outa,outx ' send it out andn outa,GateHigh ' set gate low nop ' no nop doesn't work, one does, so put in two to be sure nop ' another NOP or outa,GateHigh ' set it high again write_memory_byte_ret ret RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram mov dira,LatchDirection ' set up the pins for programming latch chips mov outx,address ' get the address into a temp variable and outx,#$FF ' mask the low byte or outx,LowAddress ' or with 138 low address mov outa,outx ' send it out andn outa,GateHigh ' set gate low ' ?? a NOP or outa,GateHigh ' set it high again ' now repeat for the middle byte mov outx,address ' get the address into a temp variable shr outx,#8 ' shift right by 8 places and outx,#$FF ' mask the low byte or outx,MiddleAddress ' or with 138 middle address mov outa,outx ' send it out andn outa,GateHigh ' set gate low or outa,GateHigh ' set it high again RamAddress_ret ret OutputHighLatch ' sends out HighLatch to the 374 that does A16-19, led and the 4 spare outputs mov dira,latchdirection ' setup active pins 138 and bus mov outa,HighLatch ' send out HighLatch or outa,HighAddress ' or with the high address andn outa,GateHigh ' set gate low or outa,GateHigh ' set the gate high again OutputHighLatch_ret jmp #tristate ' set pins tristate delay long 80 ' waitcnt delay to reduce power (#80 = 1uS approx) ctr long 0 ' used to pause execution (lower power use) & byte counter GateHigh long %00000000_00000000_00000001_00000000 ' HC138 gate high, all others must be low Outx long 0 ' for temp use, same as n in the spin code LatchDirection long %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active LatchDirection2 long %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read LowAddress long %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1 MiddleAddress long %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1 HighAddress long %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1 'ReadEnable long %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1 ' commented out as the same as GateHigh WriteEnable long %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1 Zero long %00000000_00000000_00000000_00000000 ' for tristating all pins data_8 long %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver address long %00000000_00000000_00000000_00000000 ' address for ram chip ledpin long %00000000_00000000_00000000_00001000 ' to turn on led HighLatch long %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs
1) there is a deliberate delay
' wait for a command (pause short time to reduce power) pause mov ctr, delay wz ' if =0 no pause if_nz add ctr, cnt if_nz waitcnt ctr, #0 ' wait for a short time (reduces power) rdlong cmd, comptr wz ' command ?
- maybe save some lines there
2) Reading in blocks of data. There are 19 address lines on a 512k chip and at the moment these are in two groups - the High group A16 to A18 and the Low and Middle group which are grouped together. This seemed natural for the Z80 emulations with 16 bit addresses.
But what if we separate out the Low and Middle latches?
I count 46 instructions to read one byte from external memory. Surely that can be decreased?!!
First thing might be to leave the middle latch unchanged and just change the lower latch. Maybe do it in groups of 4 bytes, or maybe in groups of 16 or 256?
I think that can save 8 instructions per byte.
Also I think by doing things in blocks, you don't have to keep checking for new instructions each byte. Say the requesting program wanted a Long, well then you can skip a whole lot of rechecking code for new requests.
I think that can halve the number of instructions per byte if you do Longs.
And then one might think about optimising further. For C, it depends on the probability that an instruction will cause a branch outside a block of n bytes. At the extremes, say you requested byte x and it read in the next 64k of bytes. This will take a lot of time but with a small probability that a jump will go outside this block. Read in 1 long, and that is inefficient too. I'm not sure of the maths, but say the probability of a jump was 10%, then maybe as a guess it might be best to read in 16 bytes as a block?
The driver code above already has an instruction for reading in blocks, it is just that I think mostly we read in blocks of 1, ie a byte. Ross, a) is that how catalina works and b) where is the source code for the dracblade driver file and what is it called?
So you might pass an address n=0 to 512k.
1) is this in the same high/medium latch range as the last request?
2) If yes, read bytes but only change the low latch.
3) If no then update the medium and high latches.
I wonder also about a lookahead cache.The requesting spin code requests a byte at address n. The cog goes and starts reading from this address. I'd need to check speeds, but there is a fairly good chance the cog will be faster than the requesting spin, so the cog will always be ahead of the requesting program, so from the requesting programs point of view, it requests byte n and for the next 256 bytes the values are always correct in a buffer.
Then there is another variable - how often would the cog code check the passed parameter to see if the calling program wants a different block. Maybe if the probability of a branch in C is 10%, you check only every 10 bytes? If so, that saves even more code.
Hi Dr_A ...
Yes absolutely - I've not really done any optimization on the original caching driver code yet. In fact it only currently supports the DRACBLADE at all because David's and Jazzed's original driver code already did!
What I plan to do next is rewrite the interface from the caching driver to use my standard XMM code. That code is already written for all XMM platforms, and is much more optimized (although probably still a long way from being as good as it could be!).
That's about the last thing I expect to do before I am ready to release Catalina 3.0.
David, Jazzed ...
I found a bug in the Catalina SD Card driver initializtion code that seems to show up on the C3. I've now fixed it, but if you are having occasional strange problems with programs sometimes not being able to access the SD card (but which work ok when you reload them) then this may be the reason. It may also have affected other platforms - for example I think it is the reason I was having occasional problems with the SD card on the RamBlade (and for which I was - quite unfairly - blaming Cluso!).
Hi Dr_A,
All the XMM code for all platforms is now in the file XMM.inc in the target directory. Look for the section marked #elseifdef DRACBLADE
If you can streamline the DracBlade code, I'll include it in the next release.
Ross.
P.S. If you modify the code, try not to use any more longs - the XMM kernel has very few longs to spare!
Oh darn. Someone *extremely* clever has already split the middle and lower latch! This XMM driver looks extremely well optimised. I think only caching would improve that, and any improvements due to caching will apply equally to the C3.
XMM_IncAddr add XMM_Addr,#1 ' inc sram address mov outx,XMM_Addr ' does result of incrementing ... and outx,#$FF ' ... require updating latch 8 - 15 or 16 - 19? tjnz outx,#XMM_Set0_7 ' if not, just set latch for addr bits 0 - 7 call #XMM_SetAddr ' otherwise we must set all latches jmp #XMM_IncAddr_ret ' done
Sorry I didn't post my reply in here. I was trying not to hijack your thread to discuss xbasic. I guess I should try the RamBlade. I've had one for a long time but have never done anything with it. I guess I stopped when I discovered that you couldn't use the standard pin 31/30 serial I/O. How do you have your RamBlade configured?
Thanks,
David
Just with the SRAM and SD Card. I just use the normal PropPlug for comms. As shown on the diagram below, you plug it onto the middle 4 pins for programming the EEPROM, and the bottom 4 pins for terminal I/O (and use Catalyst to load programs off the SD Card).
Ross.
If I know Cluso, it was done for a good reason - most likely because it allowed the SRAM to be used with the least possible number of instructions.
Ross.
P.S. In a lot of ways, the RamBlade is my favorite board. If only it could be powered by the USB port, it would be the ideal "portable" Prop platform!
Catalina 3.0 has been released. It has a new thread here.
Ross.
What this means in C is that no more 'inline' pasm code in the C program. Do the debugging in Spin and then when it works, move it over to C. The following code is for the Serial driver and I have left the Spin code in as this will be useful in translating spin in the future.
From a practical perspective, Spin can only do so much even with cogjects. The SD driver takes about 1/4 of hub, a decent video buffer takes just under 20k, and there is not much space left for code.
C in XMM on the other hand puts the SD driver into external memory and most of the hub is free for a video buffer.
/* PASM cogject demonstration, see also cogject example in spin*/ #include <stdio.h> unsigned long cogarray[511]; // external memory common cog array // start of C functions void clearscreen() // white text on dark blue background { int i; for (i=0;i<40;i++) { t_setpos(0,0,i); // move cursor to next line t_color(0,0x08FC); // RRGGBBxx eg dark blue background 00001000 white text 11111100 } } void sleep(int milliseconds) // sleep function { _waitcnt(_cnt()+(milliseconds*(_clockfreq()/1000))-4296); } char peek(int address) // function implementation of peek { return *((char *)address); } void poke(int address, char value) // function implementation of poke { *((char *)address) = value; } void external_memory_cog_load(int cognumber, unsigned long cogdata[], unsigned long parameters_array[]) // load a cog from external memory { unsigned long hubcog[511]; // create a local array, this is in hub ram, not external ram int i; for(i=0;i<512;i++) { hubcog[i]=cogdata[i]; // move from external memory to a local array in hub } _coginit((int)parameters_array>>2, (int)hubcog>>2, cognumber); // load the cog } unsigned long serial_start(unsigned long rxpin,unsigned long txpin,unsigned long mode, unsigned long baudrate, int cognumber, unsigned long par[], unsigned long cogdata[]) { /* PUB start(rxpin, txpin, mode, baudrate) : okay '' Start serial driver - starts a cog '' returns false if no cog available '' '' mode bit 0 = invert rx '' mode bit 1 = invert tx '' mode bit 2 = open-drain/source tx '' mode bit 3 = ignore tx echo on rx stop longfill(@rx_head, 0, 4) longmove(@rx_pin, @rxpin, 3) bit_ticks := clkfreq / baudrate buffer_ptr := @rx_buffer okay := cog := cognew(@entry, @rx_head) + 1 */ unsigned long okay; unsigned long bit_ticks; unsigned long buffer_ptr; par[0] = 0; // rx_head longfill(@rx_head, 0, 4) par[1] = 0; // rx_tail par[2] = 0; // tx_head par[3] = 0; // tx_tail par[4] = rxpin; // longmove(@rx_pin, @rxpin, 3) par[5] = txpin; // note - if rewrite the pasm code could save a couple of hub longs here par[6] = mode; // as rxpin and txpin are not used anywhere else bit_ticks = _clockfreq() / baudrate; // bit_ticks := clkfreq / baudrate par[7] = bit_ticks; buffer_ptr = (unsigned long)&par[9]; // buffer_ptr := @rx_buffer points to start of circular buffer par[8] = buffer_ptr; // pointer to the start of the circular buffers // rx buffer is 9 to 12 and tx buffer is 13 to 16 (16 bytes =4 longs) external_memory_cog_load(cognumber,cogdata,par); // load from external ram // okay returns the cog number or -1 if a fail page 119 manual. Ignored here // printf("par array is at %u \n",(unsigned long)&par[0]); // printf("par array entry 1 is at %u \n",(unsigned long)&par[1]); // printf("par array entry 7 is at %u \n",(unsigned long)&par[7]); // printf("rx_head is at %u \n",(unsigned long)&par[9]); // printf("buffer_ptr is %u \n",par[8]); return okay; } void serial_tx(char tx,unsigned long par[]) { /* PUB tx(txbyte) '' Send byte (may wait for room in buffer) repeat until (tx_tail <> (tx_head + 1) & $F) tx_buffer[tx_head] := txbyte tx_head := (tx_head + 1) & $F if rxtx_mode & %1000 rx */ unsigned long tx_head; int address; while ( par[3] == ((par[2] + 1 ) & 0xF)) {} // wait if the head has looped right round and is now one less than the tail tx_head = par[2]; // get the head value address = par[8] + 16 + tx_head; // location of rx buffer plus 16 to get tx buffer plus the head value poke(address,tx); // poke the tx byte value to hub ram tx_head = tx_head + 1; // add one tx_head = tx_head & 0xF; // logical and with 15 par[2] = tx_head; // store it back again // need to add the echo mode? } unsigned long serial_rxcheck(unsigned long par[]) { /* PUB rxcheck : rxbyte '' Check if byte received (never waits) '' returns -1 if no byte received, $00..$FF if byte rxbyte-- if rx_tail <> rx_head rxbyte := rx_buffer[rx_tail] rx_tail := (rx_tail + 1) & $F */ unsigned long rxbyte; // actually is a long, so can return -1 FFFFFFFF if nothing and 0-FF if a byte int address; // hub address rxbyte = 0; // set explicitly to zero rxbyte = rxbyte - 1; // return ffffffff if nothing if (par[1] != par[0]) { address = par[8] + par[1]; // par[8] is the rx buffer, par[1] is rx_tail rxbyte = peek(address); // get the return byte from the buffer par[1] = (par[1] +1) & 0xF; // add one to tail } return rxbyte; } unsigned long serial_rx(unsigned long par[]) { /* PUB rx : rxbyte '' Receive byte (may wait for byte) '' returns $00..$FF repeat while (rxbyte := rxcheck) < 0 */ unsigned long rxbyte; // actually is a long, not a byte while ((rxbyte = serial_rxcheck(par)) == -1) {} // 0xffffffff and -1 works, but " < 0" gives a compiler error return rxbyte; // return the value } void serial_rxflush(unsigned long par[]) // flush receive buffer { while (serial_rxcheck(par) != -1) {} // keep checking until buffer clear } unsigned long serial_rxtime(unsigned long ms,unsigned long par[]) // wait ms milliseconds for byte, -1 if nothing { unsigned long rxbyte = -1; unsigned long counter = 0; // start a counter, 10ms ticks ms = ms / 10; // internal delay for 1ms ticks is too high while (((rxbyte = serial_rxcheck(par)) == -1) & (counter < ms)) // wait until a byte or counter times out { _waitcnt(_cnt()+(10*(_clockfreq()/1000))-4296); // wait 10 milliseconds counter +=1; // add one to counter } return rxbyte; } void serial_str(char lineoftext[],unsigned long par[]) // send out the string { /* '' Send string repeat strsize(stringptr) tx(byte[stringptr++]) */ int i; for(i=0; i<strlen(lineoftext);i++) { serial_tx(lineoftext[i],par); // send out the bytes one at a time } } void serial_dec(signed long value,unsigned long par[]) // send out decimal value - unsigned { /* '' Print a decimal number if value < 0 -value tx("-") i := 1_000_000_000 repeat 10 if value => i tx(value / i + "0") value //= i result~~ elseif result or i == 1 tx("0") i /= 10 */ char lineoftext[12] = ""; // enough room for a 32 bit long 2^32 and possibly the minus sign sprintf(lineoftext, "%d", value); // convert to a string // printf ("lineoftext is now: %s\n", lineoftext); serial_str(lineoftext,par); // send out the string } void serial_hex(unsigned long value, unsigned long par[]) // send out a hex value /* '' Print a hexadecimal number value <<= (8 - digits) << 2 repeat digits tx(lookupz((value <-= 4) & $F : "0".."9", "A".."F")) */ { char lineoftext[8] = ""; // enough room for FFFFFFFF sprintf(lineoftext,"%x",value); // convert to hex value serial_str(lineoftext,par); // send it out } void serial_crlf(unsigned long par[]) // send a crlf { serial_tx(13,par); // cr serial_tx(10,par); // lf } int EoF (FILE* stream) { register int c, status = ((c = fgetc(stream)) == EOF); ungetc(c,stream); return status; } void readcog(char *filename,unsigned long external_cog[]) // read in a .cog file into external memory array { int i; FILE *FP1; i = 0; if((FP1=fopen(filename,"rb"))==0) // open the file { fprintf(stderr,"Can't open file %s\n",filename); exit(1); } fseek(FP1,0,0); for(i=0;i<24;i++) { getc(FP1); // read in the first 24 bytes and discard } i = 0; while(!EoF(FP1) & (i<505)) // run until end of file or 511-6 { external_cog[i] = getc(FP1) | (getc(FP1)<<8) | (getc(FP1)<<16) | (getc(FP1)<<24); // get the long i+=1; } if(FP1) { fclose(FP1); // close the file FP1=NULL; } printf("external array cog first long = 0x%x \n",external_cog[0]); // hex value } void serial_demo(unsigned long serial_parameters[]) // demonstrate the serial cog code { int i; unsigned long value = 0x80000000; // 80000000 is -1 char lineoftext[80]; // for string testing unsigned long received_byte; // actually a long, not a byte clearscreen(); // white on blue vga printf("Clock speed %u \n",_clockfreq()); // see page 28 of the propeller manual for other useful commands printf("Catalina running in cog number %i \n",_cogid()); // integer readcog("serial.cog",cogarray); // read into general external memory cog array serial_start(31,30,0,38400,7,serial_parameters,cogarray); // start serial cog pins 31,30, mode 0, cog 7, 38400 baud printf("Started serial driver\n"); for(i=0; i<10; i++) { serial_tx(65+i,serial_parameters); // test sending a byte 10x (delay for starting a serial terminal program) sleep(500); printf("send byte %u \n",65+i); } serial_crlf(serial_parameters); strcpy(lineoftext,"This is a really long string test with a slow baud rate to check buffer overruns"); // store a string serial_str(lineoftext,serial_parameters); // send it out serial_crlf(serial_parameters); // new line serial_dec(value,serial_parameters); // send out a big decimal number serial_crlf(serial_parameters); // new line serial_str("Hex value is ",serial_parameters); serial_hex(value,serial_parameters); // send out a hex value serial_crlf(serial_parameters); serial_rxflush(serial_parameters); // flush the receive buffer printf("Type a character within the next 3 seconds \n"); // test the timeout received_byte = serial_rxtime(3000,serial_parameters); // get a byte with a timeout printf("character was ascii %d \n",received_byte); // %d is signed printf("type some characters \n"); for (i=0;i<10;i++) // test 19 times, so tests buffer restarting { received_byte = serial_rx(serial_parameters); // get a byte serial_tx(received_byte,serial_parameters); // echo it back printf("sent back byte %u \n",received_byte); } printf("demo program finished \n"); } void main () { unsigned long serial_parameters[16]; // reserve hub space in main for buffer, head tail pointers serial_demo(serial_parameters); // demo routines while (1); // endless loop as prop reboots on exit from main() }
A quick question
In spin
n <-= 1
in C, is this
n = (n << 1) | (n >> 31);
Also - I now have catalina booting up in text mode, then stopping the vga drivers and reloading a graphics driver 160x120. I can change the colors from within C eg screen[0] = 0xffffffff sets 4 pixels to white.
However, the screen buffer is stored in longs, and I want to access it in bytes. In spin, the command is
byte[myarray][number] := n
but how would you do this in C?
get the unsigned long, and clear one byte and replace with the new byte?
or get a pointer to the start of the array, add n bytes, then poke a value into hub ram?
or another way?