That is odd. Can you try a different SD Card? If possible, try one that has just been formatted with a cluster size of 32k and an MBR?
Ross.
Hmmm... This is even more odd. I took a different SD card and formatted it for FAT/32k and wrote test.bas on it. Now when I type "load test.bas" I get an infinite string of error messages complaining about an unknown character code. It seems it can mount the card and open the file but it reads garbage from the file. I put the card back into my PC and verified that the data was written correctly to the card.
Both cards work fine with xbasic compiled with ZOG running on the same C3.
Okay. Zog presumably uses the SD access code in the original caching driver, whereas Catalina uses separate SD code, and needs to synchronize the two.
I've run lots of programs that use the SD card and not seen a problem, but a possible explanation is that the timing of SD cards varies and my code just happens to work with the card I use.
I'll try a few different cards tonight and see if I can get it to fail.
Nice to see that you've integrated the new homespun features.
Yes, homespun's new features allow me to simplify a lot of things internally. I am trying to make 3.0 a lot "cleaner" than previous versions. I still have some way to go!
I think I have identified (but not solved) the problem. It seems to be a problem with the integer version of the library. When I compile with -lcx everything seems to work. When I compile with -lcix it resets the C3 when you try and do a "load".
Attached is a binary (compiled with libcx) that works on my C3. Here are the makefile options:
Well, I just rebuilt all the libraries, and the problem seems to have disappeared - i.e. xbasic now works whether compiled with -lcix or -lcx.
In my 3.0 pre-release "upgrade" I tried to include the only the changed library files, but I must have done something wrong. I have emailed you both a complete new compiled set of library files which you should unzip over the upgrade. Or you can just rebuild the library from source.
Well, I just rebuilt all the libraries, and the problem seems to have disappeared - i.e. xbasic now works whether compiled with -lcix or -lcx.
In my 3.0 pre-release "upgrade" I tried to include the only the changed library files, but I must have done something wrong. I have emailed you both a complete new compiled set of library files which you should unzip over the upgrade. Or you can just rebuild the library from source.
David, let me know if this fixes your problem,
Ross,
Yes, the new libraries fix my SD card problem. Thanks!
The Catalina version of xbasic compiles a lot faster than the ZOG version but takes far longer to load. One reason, of course, is that it is almost twice as big but I think payload is also slower than zogload. I wonder if I could modify zogload to load Catalina binaries? As far as execution speed goes, both are very slow running entirely from external memory but they are close enough that I'll have to use a stopwatch to determine which is faster. Anyway, thanks for the help with Catalina!
Loadable COG code - two options now working:
1) Compile cog code to an array, include the array in the C program (ends up in external memory), then load and run OR
2) Compile cog code to a binary file, rename to a .cog file, download to sd card, load into external ram, load and run
I need to tidy this up and have just one "serial_start". Probably with two functions - one loads the cog from the array in hub, and one loads it from the sd card. There are merits in both systems so all the differences between the two systems ought to come down to calling one load function or the other.
But it does work, and this is pretty cool!
Where do you think a logical place would be for the array that contains the cog code - either at the beginning of the program all in a group, or would you put each array after its associated pasm code? I can see merits in both options so I'm not sure about that.
/* PASM demo for use with Catalina IDE Precompiler and Compiler */
#include <stdio.h>
unsigned long external_serial[511]; // external memory store for .cog file
/* PASM Code for compilation using SpinC and inclusion as a data file
PASM Start mycogject.spin
CON
' add your values here
_clkfreq = 80_000_000 ' 5Mhz Crystal
_clkmode = xtal1 + pll16x ' x 16
' Start of C hub constants
' End of C constants
PUB Main
coginit(1,@entry,0) ' cog 1, cogstart, dummy value
DAT
'***********************************
'* Assembly language serial driver *
'***********************************
org
'
'
' Entry
'
entry mov t1,par 'get structure address
add t1,#4 << 2 'skip past heads and tails
rdlong t2,t1 'get rx_pin
mov rxmask,#1
shl rxmask,t2
add t1,#4 'get tx_pin
rdlong t2,t1
mov txmask,#1
shl txmask,t2
add t1,#4 'get rxtx_mode
rdlong rxtxmode,t1
add t1,#4 'get bit_ticks
rdlong bitticks,t1
add t1,#4 'get buffer_ptr
rdlong rxbuff,t1
mov txbuff,rxbuff
add txbuff,#16
test rxtxmode,#%100 wz 'init tx pin according to mode
test rxtxmode,#%010 wc
if_z_ne_c or outa,txmask
if_z or dira,txmask
mov txcode,#transmit 'initialize ping-pong multitasking
'
'
' Receive
'
receive jmpret rxcode,txcode 'run a chunk of transmit code, then return
test rxtxmode,#%001 wz 'wait for start bit on rx pin
test rxmask,ina wc
if_z_eq_c jmp #receive
mov rxbits,#9 'ready to receive byte
mov rxcnt,bitticks
shr rxcnt,#1
add rxcnt,cnt
:bit add rxcnt,bitticks 'ready next bit period
:wait jmpret rxcode,txcode 'run a chuck of transmit code, then return
mov t1,rxcnt 'check if bit receive period done
sub t1,cnt
cmps t1,#0 wc
if_nc jmp #:wait
test rxmask,ina wc 'receive bit on rx pin
rcr rxdata,#1
djnz rxbits,#:bit
shr rxdata,#32-9 'justify and trim received byte
and rxdata,#$FF
test rxtxmode,#%001 wz 'if rx inverted, invert byte
if_nz xor rxdata,#$FF
rdlong t2,par 'save received byte and inc head
add t2,rxbuff
wrbyte rxdata,t2
sub t2,rxbuff
add t2,#1
and t2,#$0F
wrlong t2,par
jmp #receive 'byte done, receive next byte
'
'
' Transmit
'
transmit jmpret txcode,rxcode 'run a chunk of receive code, then return
mov t1,par 'check for head <> tail
add t1,#2 << 2
rdlong t2,t1
add t1,#1 << 2
rdlong t3,t1
cmp t2,t3 wz
if_z jmp #transmit
add t3,txbuff 'get byte and inc tail
rdbyte txdata,t3
sub t3,txbuff
add t3,#1
and t3,#$0F
wrlong t3,t1
or txdata,#$100 'ready byte to transmit
shl txdata,#2
or txdata,#1
mov txbits,#11
mov txcnt,cnt
:bit test rxtxmode,#%100 wz 'output bit on tx pin according to mode
test rxtxmode,#%010 wc
if_z_and_c xor txdata,#1
shr txdata,#1 wc
if_z muxc outa,txmask
if_nz muxnc dira,txmask
add txcnt,bitticks 'ready next cnt
:wait jmpret txcode,rxcode 'run a chunk of receive code, then return
mov t1,txcnt 'check if bit transmit period done
sub t1,cnt
cmps t1,#0 wc
if_nc jmp #:wait
djnz txbits,#:bit 'another bit to transmit?
jmp #transmit 'byte done, transmit next byte
'
'
' Uninitialized data
'
t1 res 1
t2 res 1
t3 res 1
rxtxmode res 1
bitticks res 1
rxmask res 1
rxbuff res 1
rxdata res 1
rxbits res 1
rxcnt res 1
rxcode res 1
txmask res 1
txbuff res 1
txdata res 1
txbits res 1
txcnt res 1
txcode res 1
PASM End
*/
void mycogject(int cognumber, unsigned long *parameters_array) // this name copied from the .spin name in the pasm section - names must match eg void mycogject matches mycogject.spin. Also first code after this must be the .h array file. Put your code after the };
{
/**
* @file mycogject_array.h
* Created with spin.binary PASM to C Array Converter.
* Copyright (c) 2011, John Doe
*/
unsigned long mycogject_array[] =
{
0xa0bca9f0, 0x80fca810, 0x08bcaa54, 0xa0fcb201,
0x2cbcb255, 0x80fca804, 0x08bcaa54, 0xa0fcbe01,
0x2cbcbe55, 0x80fca804, 0x08bcae54, 0x80fca804,
0x08bcb054, 0x80fca804, 0x08bcb454, 0xa0bcc05a,
0x80fcc010, 0x627cae04, 0x617cae02, 0x689be85f,
0x68abec5f, 0xa0fcc833, 0x5cbcbc64, 0x627cae01,
0x613cb3f2, 0x5c640016, 0xa0fcb809, 0xa0bcba58,
0x28fcba01, 0x80bcbbf1, 0x80bcba58, 0x5cbcbc64,
0xa0bca85d, 0x84bca9f1, 0xc17ca800, 0x5c4c001f,
0x613cb3f2, 0x30fcb601, 0xe4fcb81e, 0x28fcb617,
0x60fcb6ff, 0x627cae01, 0x6cd4b6ff, 0x08bcabf0,
0x80bcaa5a, 0x003cb655, 0x84bcaa5a, 0x80fcaa01,
0x60fcaa0f, 0x083cabf0, 0x5c7c0016, 0x5cbcc85e,
0xa0bca9f0, 0x80fca808, 0x08bcaa54, 0x80fca804,
0x08bcac54, 0x863caa56, 0x5c680033, 0x80bcac60,
0x00bcc256, 0x84bcac60, 0x80fcac01, 0x60fcac0f,
0x083cac54, 0x68fcc300, 0x2cfcc202, 0x68fcc201,
0xa0fcc40b, 0xa0bcc7f1, 0x627cae04, 0x617cae02,
0x6ce0c201, 0x29fcc201, 0x70abe85f, 0x7497ec5f,
0x80bcc658, 0x5cbcc85e, 0xa0bca863, 0x84bca9f1,
0xc17ca800, 0x5c4c004d, 0xe4fcc446, 0x5c7c0033
};
printf("first long is %u \n",mycogject_array[0]);
_coginit((int)parameters_array>>2, (int)mycogject_array>>2, cognumber); // array name built from spin file name
}
void external_cog_load(int cognumber, unsigned long cogdata[], unsigned long *parameters_array) // load a cog from external memory
{
unsigned long hubcog[511];
int i;
for(i=0;i<512;i++)
{
hubcog[i]=cogdata[i]; // move from external memory to a local array in hub
}
_coginit((int)parameters_array>>2, (int)hubcog>>2, cognumber); // load the cog
}
void clearscreen() // white text on dark blue background
{
int i;
for (i=0;i<40;i++)
{
t_setpos(0,0,i); // move cursor to next line
t_color(0,0x08FC); // RRGGBBxx eg dark blue background 00001000 white text 11111100
}
}
void sleep(int milliseconds) // sleep function
{
_waitcnt(_cnt()+(milliseconds*(_clockfreq()/1000))-4296);
}
char peek(int address) // function implementation of peek
{
return *((char *)address);
}
void poke(int address, char value) // function implementation of poke
{
*((char *)address) = value;
}
unsigned long serial_start(unsigned long rxpin,unsigned long txpin,unsigned long mode, unsigned long baudrate, unsigned long par[])
{
/*
PUB start(rxpin, txpin, mode, baudrate) : okay
'' Start serial driver - starts a cog
'' returns false if no cog available
''
'' mode bit 0 = invert rx
'' mode bit 1 = invert tx
'' mode bit 2 = open-drain/source tx
'' mode bit 3 = ignore tx echo on rx
stop
longfill(@rx_head, 0, 4)
longmove(@rx_pin, @rxpin, 3)
bit_ticks := clkfreq / baudrate
buffer_ptr := @rx_buffer
okay := cog := cognew(@entry, @rx_head) + 1
*/
unsigned long okay;
unsigned long bit_ticks;
unsigned long buffer_ptr;
par[0] = 0; // rx_head longfill(@rx_head, 0, 4)
par[1] = 0; // rx_tail
par[2] = 0; // tx_head
par[3] = 0; // tx_tail
par[4] = rxpin; // longmove(@rx_pin, @rxpin, 3)
par[5] = txpin; // note - if rewrite the pasm code could save a couple of hub longs here
par[6] = mode; // as rxpin and txpin are not used anywhere else
bit_ticks = _clockfreq() / baudrate; // bit_ticks := clkfreq / baudrate
par[7] = bit_ticks;
buffer_ptr = (unsigned long)&par[9]; // buffer_ptr := @rx_buffer points to start of circular buffer
par[8] = buffer_ptr; // pointer to the start of the circular buffers
// rx buffer is 9 to 12 and tx buffer is 13 to 16 (16 bytes =4 longs)
mycogject(7,par); // pass the packaged up array
// okay returns the cog number or -1 if a fail page 119 manual. Ignored here
printf("par array is at %u \n",(unsigned long)&par[0]);
printf("par array entry 1 is at %u \n",(unsigned long)&par[1]);
printf("par array entry 7 is at %u \n",(unsigned long)&par[7]);
printf("rx_head is at %u \n",(unsigned long)&par[9]);
printf("buffer_ptr is %u \n",par[8]);
return okay;
}
unsigned long new_serial_start(unsigned long rxpin,unsigned long txpin,unsigned long mode, unsigned long baudrate, unsigned long par[], unsigned long cogdata[])
{
/*
PUB start(rxpin, txpin, mode, baudrate) : okay
'' Start serial driver - starts a cog
'' returns false if no cog available
''
'' mode bit 0 = invert rx
'' mode bit 1 = invert tx
'' mode bit 2 = open-drain/source tx
'' mode bit 3 = ignore tx echo on rx
stop
longfill(@rx_head, 0, 4)
longmove(@rx_pin, @rxpin, 3)
bit_ticks := clkfreq / baudrate
buffer_ptr := @rx_buffer
okay := cog := cognew(@entry, @rx_head) + 1
*/
unsigned long okay;
unsigned long bit_ticks;
unsigned long buffer_ptr;
par[0] = 0; // rx_head longfill(@rx_head, 0, 4)
par[1] = 0; // rx_tail
par[2] = 0; // tx_head
par[3] = 0; // tx_tail
par[4] = rxpin; // longmove(@rx_pin, @rxpin, 3)
par[5] = txpin; // note - if rewrite the pasm code could save a couple of hub longs here
par[6] = mode; // as rxpin and txpin are not used anywhere else
bit_ticks = _clockfreq() / baudrate; // bit_ticks := clkfreq / baudrate
par[7] = bit_ticks;
buffer_ptr = (unsigned long)&par[9]; // buffer_ptr := @rx_buffer points to start of circular buffer
par[8] = buffer_ptr; // pointer to the start of the circular buffers
// rx buffer is 9 to 12 and tx buffer is 13 to 16 (16 bytes =4 longs)
//mycogject(7,par); // pass the packaged up array
external_cog_load(7,cogdata,par); // load from external ram
// okay returns the cog number or -1 if a fail page 119 manual. Ignored here
printf("par array is at %u \n",(unsigned long)&par[0]);
printf("par array entry 1 is at %u \n",(unsigned long)&par[1]);
printf("par array entry 7 is at %u \n",(unsigned long)&par[7]);
printf("rx_head is at %u \n",(unsigned long)&par[9]);
printf("buffer_ptr is %u \n",par[8]);
return okay;
}
void serial_tx(char tx,unsigned long par[])
{
/*
PUB tx(txbyte)
'' Send byte (may wait for room in buffer)
repeat until (tx_tail <> (tx_head + 1) & $F)
tx_buffer[tx_head] := txbyte
tx_head := (tx_head + 1) & $F
if rxtx_mode & %1000
rx
*/
unsigned long tx_head;
int address;
while ( par[3] == (par[2] + 1 ) & 0xF) {} // par[2] is tx_head, par[3] is tx_tail
poke(address,tx); // poke the tx byte value to hub ram
tx_head = par[2]; // get the head value
address = par[8] + 16 + tx_head; // location of rx buffer plus 16 to get tx buffer plus the head value
poke(address,tx); // poke the tx byte value to hub ram
tx_head = tx_head + 1; // add one
tx_head = tx_head & 0xF; // logical and with 15
par[2] = tx_head; // store it back again
// need to add the echo mode?
}
unsigned long serial_rxcheck(unsigned long par[])
{
/*
PUB rxcheck : rxbyte
'' Check if byte received (never waits)
'' returns -1 if no byte received, $00..$FF if byte
rxbyte--
if rx_tail <> rx_head
rxbyte := rx_buffer[rx_tail]
rx_tail := (rx_tail + 1) & $F
*/
unsigned long rxbyte; // actually is a long, so can return -1 FFFFFFFF if nothing and 0-FF if a byte
int address; // hub address
rxbyte = 0; // set explicitly to zero
rxbyte = rxbyte - 1; // return ffffffff if nothing
if (par[1] != par[0])
{
address = par[8] + par[1]; // par[8] is the rx buffer, par[1] is rx_tail
rxbyte = peek(address); // get the return byte from the buffer
par[1] = (par[1] +1) & 0xF; // add one to tail
}
return rxbyte;
}
unsigned long serial_rx(unsigned long par[])
{
/*
PUB rx : rxbyte
'' Receive byte (may wait for byte)
'' returns $00..$FF
repeat while (rxbyte := rxcheck) < 0
*/
unsigned long rxbyte; // actually is a long, not a byte
while ((rxbyte = serial_rxcheck(par)) == -1) {} // 0xffffffff and -1 works, but " < 0" gives a compiler error
return rxbyte; // return the value
}
int EoF (FILE* stream)
{
register int c, status = ((c = fgetc(stream)) == EOF);
ungetc(c,stream);
return status;
}
void readcog(char *filename,unsigned long external_cog[]) // read in a .cog file into external memory array
{
int i;
FILE *FP1;
i = 0;
if((FP1=fopen(filename,"rb"))==0) // open the file
{
fprintf(stderr,"Can't open file %s\n",filename);
exit(1);
}
fseek(FP1,0,0);
while(!EoF(FP1) & (i<511)) // run until end of file
{
external_cog[i] = getc(FP1) | (getc(FP1)<<8) | (getc(FP1)<<16) | (getc(FP1)<<24); // get the long
// printf("%u ",external_cog[i]);
i+=1;
}
if(FP1)
{
fclose(FP1); // close the file
FP1=NULL;
}
printf("cog data 0 = %i \n",external_cog[0]);
}
void main ()
{
int i;
unsigned long received_byte;
unsigned long serial_parameters[16]; // reserve hub space for buffer, head tail pointers
clearscreen();
printf("Clock speed %u \n",_clockfreq()); // see page 28 of the propeller manual for other useful commands
printf("Catalina running in cog number %i \n",_cogid()); // integer
readcog("serial.cog",external_serial); // read in serial.cog to external memory
// serial_start(31,30,0,1200,serial_parameters); // start serial cog pins 31,30, mode 0, 1200 baud
new_serial_start(31,30,0,1200,serial_parameters,external_serial); // start serial cog pins 31,30, mode 0, 1200 baud
printf("Started serial driver\n");
for(i=0; i<10; i++)
{
serial_tx(65+i,serial_parameters); // test sending a byte
sleep(500);
printf("send byte %u \n",65+i);
}
/*
printf("type some characters, will return that character plus 1 \n");
for (i=0;i<19;i++) // test 19 times, so tests buffer restarting
{
received_byte = serial_rx(serial_parameters); // get a byte
received_byte = received_byte + 1; // add one and send it back
serial_tx(received_byte,serial_parameters);
printf("sent back byte %u \n",received_byte);
}
*/
printf("program finished \n");
while (1); // Prop reboots on exit from main()
}
I just timed running my xbasic test program on the C3 under both Catalina C and GCC/ZOG. Here are the results:
Catalina: 12 seconds
GCC/ZOG: 17 seconds
This isn't a competely fair comparison because the Catalina version was using the hires TV driver and the ZOG version was using the lowres TV driver. For some reason the hires driver isn't working under ZOG at the moment.
Of course, both of these results are horrible. This is a fairly simple program and shouldn''t take this long to compile into bytecodes and run. I need to test both C compilers with data in SRAM and stack in hub memory. That should speed up execution significantly. Unfortunately, neither C compiler supports that mode yet. I could also try data/stack in hub. That mode is supported in Catalina and could probably be added to ZOG fairly easily. It isn't done yet though so the test will have to wait.
In case you're interested, here is my test program:
10 dim z$(3) = { "foo", "bar", "silly" }
20 dim q$ = "abc"
30 def greeting$ = "Hello, world!"
40 testargs(greeting$, 42)
50 for x = 1 to 10
60 print x, square(x)
70 next x
80 r$ = "abd"
90 s$ = "cdf"
100 t$ = "cde"
110 if q$ < r$ then print "ok"
120 if s$ < t$ then print "bad"
130 print "q$ is '"; q$; "'"
140 print "length of q$ is "; len(q$)
150 print "first character of q$ is "; asc(q$)
160 for x = 0 to 2
170 print z$(x)
180 next x
190 print left$(greeting$, 5)
200 print right$(greeting$, 6)
210 print mid$(greeting$, 2, 6)
211 xx = 4
212 yy = 3
215 print "4 - 3 = " + str$(xx - yy)
220 def square(n)
230 return n * n
240 end def
250 def testargs(a$, i)
260 print a$, i
270 end def
That code looks interesting David. I wonder what a C version would run at?
Some musings re self contained cog programs:
1) Define the cog arrays at the beginning of the program -eg unsigned long serial_cog[511] and then all the data. The data corresponds to the pasm code. If you want to load a .cog off an sd card, it will overwrite this array with the sd card data. If you want to use the pure sd card system, don't put any data in this array (and you could leave out the pasm part too once .cog objects become standardised enough).
2) The equivalent of adding Spin objects I think becomes a merge of two files. Merge all the .cog arrays at the beginning so they end up one after the other. Then merge all the pasm comment code, one after the other. Then merge all the functions, one after the other. Then merge the main programs. This could be a command line program - two source files and a destination file. Then repeat for more objects. I think this might allow the release of C objects for use in a non object oriented language (and this ought to work the same for BCX basic and maybe PropBasic as well?). The merge program would need to know the start and end of the blocks but this is pretty easy with comment lines.
I just tried building xbasic with -x4 to put the data in hub memory and the resulting binary didn't work. Nothing appeared on the TV when I loaded it. I also tried -DPC to use the serial HMI and got no response from the serial port. Is there anything else I need to do to build for this memory model other than using -x4?
I think you said at one point that you had not used my combined C3 SPI driver for Catalina but instead used separate drivers for SRAM/flash and the SD card. How do you coordinate between those two drivers? I'm wondering what I would have to do to use the SPI SRAM in my C code if I'm using the -x4 memory model that doesn't use it for Catalina itself.
I just compiled xbasic to use the -x4 mode. You have to remember that with everything except code in Hub RAM, you probably cannot afford to dedicate 8K just as cache. I used a 2K cache and it works. You may be able to use a 4K cache - I didn't try that.
Here are the make options I used:
CFLAGS=-x4
LDFLAGS=$(CFLAGS) -DC3 -M256k -DCACHED_2K -D PC -DCR_ON_LF -lcix -y
Also note that you need to recompile the "utilities" folder using a command like:
build C3 CACHED_2K
I used the Parallax Serial Terminal program and everything seems to work ok. Doesn't seem to be any faster, although its difficult to tell with this program - you need to come up with a more compute intensive basic program to compare the two - all this program does is a small amount of I/O and I wouldn't expect much difference there.
I need to test both C compilers with data in SRAM and stack in hub memory. That should speed up execution significantly. Unfortunately, neither C compiler supports that mode yet.
Hi David,
You appear to be describing what Catalina does in the -x3 mode - i.e. all the code is in SPI Flash. All the global data (and the heap) is in SPI RAM. The stack and all local data is in Hub.
I'm impressed with what you've achieved so far - I hope you will keep it up. This weekend I have to spend working school projects with my kids - but I hope next week to try out your IDE and try out some 'cogjects'!
You appear to be describing what Catalina does in the -x3 mode - i.e. all the code is in SPI Flash. All the global data (and the heap) is in SPI RAM. The stack and all local data is in Hub.
Ross.
Yes, that's exactly what I meant. I'll have to try that. Thanks! Do I have to rebuild the utilities to try that memory model?
I think you said at one point that you had not used my combined C3 SPI driver for Catalina but instead used separate drivers for SRAM/flash and the SD card. How do you coordinate between those two drivers? I'm wondering what I would have to do to use the SPI SRAM in my C code if I'm using the -x4 memory model that doesn't use it for Catalina itself.
Thanks,
David
David,
The coordination is done using XMM API functions XMM_Activate and XMM_TriState (or equivalents). We are not really tristating any pins on the C3 (the terminology is a hangover from a platform where this was in fact the case - on the C3 we are instead assigning logical control of the SPI Bus) but the kernel has to make sure that it calls XMM_TriState before requesting a service from another cog that might need to use the SPI bus, and those cogs must do the equivalent of XMM_Activate to gain access to the SPI Bus and XMM_Tristate before returning control to the kernel.
To access the SPI RAM from C, I don't see why you don't just use the Catalina -x3 mode. That's what it's for.
To access the SPI RAM from C, I don't see why you don't just use the Catalina -x3 mode. That's what it's for.
Because it's very slow. I figured my code would run faster if I put the code in flash, data in hub memory and just used the SRAM as a buffer for the editor and for scratch space for the compiler.
Because it's very slow. I figured my code would run faster if I put the code in flash, data in hub memory and just used the SRAM as a buffer for the editor and for scratch space for the compiler.
When you use the -x3 mode, that's essentially what you are doing. If you want data in Hub RAM then declare it as local (e.g. local to the main function). If you want data in SPI RAM then declare it as global (or allocate it on the heap).
For xbasic I didn't notice much speed difference between -x3 and -x4, so I wouldn't expect much of a performance gain in any case.
No, you only have to rebuild the utilities if you change cache size.
Ross.
Thanks Ross! I'll have to give that a try tomorrow.
Edit: Ummm... I must be losing my mind. The -x3 layout is what I started out with. I hadn't realized that it was putting the stack and locals in hub memory. I thought that they were being placed in SRAM along with the rest of the R/W data. So I guess I've pretty much gotten as much as I can out of Catalina. My only hope for better performance is to get ZOG working with the stack and locals in hub memory. That won't be easy though since the ZPU is big-endian and the Propeller is little-endian and there are endless conflicts trying to resolve that difference.
I just compiled xbasic to use the -x4 mode. You have to remember that with everything except code in Hub RAM, you probably cannot afford to dedicate 8K just as cache. I used a 2K cache and it works. You may be able to use a 4K cache - I didn't try that.
Here are the make options I used:
CFLAGS=-x4
LDFLAGS=$(CFLAGS) -DC3 -M256k -DCACHED_2K -D PC -DCR_ON_LF -lcix -y
Also note that you need to recompile the "utilities" folder using a command like:
build C3 CACHED_2K
I used the Parallax Serial Terminal program and everything seems to work ok. Doesn't seem to be any faster, although its difficult to tell with this program - you need to come up with a more compute intensive basic program to compare the two - all this program does is a small amount of I/O and I wouldn't expect much difference there.
Ross.
I just tried -x4 as you described above except with the standard HIRES_TV HMI and it sort of works but loading test.bas fails. Is that likely because I ran out of memory?
However, I think there is scope for improvement - I've just been running some benchmarks, and the cache is not behaving nearly as well as it should. I think there may be something wrong in the cache hit detection - i.e. it is not detecting a cache hit when it should, and is reloading from SPI too often.
I'll let you know what I find out - but my time this weekend is fairly limited.
I just tried -x4 as you described above except with the standard HIRES_TV HMI and it sort of works but loading test.bas fails. Is that likely because I ran out of memory?
Most likely. Try reducing the cache size (I suppose 1K is all that's left!) or use the LORES_TV option.
Comments
Hmmm... This is even more odd. I took a different SD card and formatted it for FAT/32k and wrote test.bas on it. Now when I type "load test.bas" I get an infinite string of error messages complaining about an unknown character code. It seems it can mount the card and open the file but it reads garbage from the file. I put the card back into my PC and verified that the data was written correctly to the card.
Could be a timing problem in my SD card handling, or it could be a problem with your C3.
Have you successfully used the SD card on your C3 for anything else?
Ross.
Both cards work fine with xbasic compiled with ZOG running on the same C3.
I've run lots of programs that use the SD card and not seen a problem, but a possible explanation is that the timing of SD cards varies and my code just happens to work with the card I use.
I'll try a few different cards tonight and see if I can get it to fail.
Ross.
Thanks,
David
Okay - I'll do that when I get home.
Ross.
Yes, homespun's new features allow me to simplify a lot of things internally. I am trying to make 3.0 a lot "cleaner" than previous versions. I still have some way to go!
Ros.
I think I have identified (but not solved) the problem. It seems to be a problem with the integer version of the library. When I compile with -lcx everything seems to work. When I compile with -lcix it resets the C3 when you try and do a "load".
Attached is a binary (compiled with libcx) that works on my C3. Here are the makefile options:
Can you confirm this works on your C3? Note you will need to use a version of xmm.binary compiled with both the C3 and CACHED option.
Ross.
Well, I just rebuilt all the libraries, and the problem seems to have disappeared - i.e. xbasic now works whether compiled with -lcix or -lcx.
In my 3.0 pre-release "upgrade" I tried to include the only the changed library files, but I must have done something wrong. I have emailed you both a complete new compiled set of library files which you should unzip over the upgrade. Or you can just rebuild the library from source.
David, let me know if this fixes your problem,
Ross,
Yes, the new libraries fix my SD card problem. Thanks!
The Catalina version of xbasic compiles a lot faster than the ZOG version but takes far longer to load. One reason, of course, is that it is almost twice as big but I think payload is also slower than zogload. I wonder if I could modify zogload to load Catalina binaries? As far as execution speed goes, both are very slow running entirely from external memory but they are close enough that I'll have to use a stopwatch to determine which is faster. Anyway, thanks for the help with Catalina!
1) Compile cog code to an array, include the array in the C program (ends up in external memory), then load and run OR
2) Compile cog code to a binary file, rename to a .cog file, download to sd card, load into external ram, load and run
I need to tidy this up and have just one "serial_start". Probably with two functions - one loads the cog from the array in hub, and one loads it from the sd card. There are merits in both systems so all the differences between the two systems ought to come down to calling one load function or the other.
But it does work, and this is pretty cool!
Where do you think a logical place would be for the array that contains the cog code - either at the beginning of the program all in a group, or would you put each array after its associated pasm code? I can see merits in both options so I'm not sure about that.
Catalina: 12 seconds
GCC/ZOG: 17 seconds
This isn't a competely fair comparison because the Catalina version was using the hires TV driver and the ZOG version was using the lowres TV driver. For some reason the hires driver isn't working under ZOG at the moment.
Of course, both of these results are horrible. This is a fairly simple program and shouldn''t take this long to compile into bytecodes and run. I need to test both C compilers with data in SRAM and stack in hub memory. That should speed up execution significantly. Unfortunately, neither C compiler supports that mode yet. I could also try data/stack in hub. That mode is supported in Catalina and could probably be added to ZOG fairly easily. It isn't done yet though so the test will have to wait.
In case you're interested, here is my test program:
Some musings re self contained cog programs:
1) Define the cog arrays at the beginning of the program -eg unsigned long serial_cog[511] and then all the data. The data corresponds to the pasm code. If you want to load a .cog off an sd card, it will overwrite this array with the sd card data. If you want to use the pure sd card system, don't put any data in this array (and you could leave out the pasm part too once .cog objects become standardised enough).
2) The equivalent of adding Spin objects I think becomes a merge of two files. Merge all the .cog arrays at the beginning so they end up one after the other. Then merge all the pasm comment code, one after the other. Then merge all the functions, one after the other. Then merge the main programs. This could be a command line program - two source files and a destination file. Then repeat for more objects. I think this might allow the release of C objects for use in a non object oriented language (and this ought to work the same for BCX basic and maybe PropBasic as well?). The merge program would need to know the start and end of the blocks but this is pretty easy with comment lines.
I think you said at one point that you had not used my combined C3 SPI driver for Catalina but instead used separate drivers for SRAM/flash and the SD card. How do you coordinate between those two drivers? I'm wondering what I would have to do to use the SPI SRAM in my C code if I'm using the -x4 memory model that doesn't use it for Catalina itself.
Thanks,
David
I just compiled xbasic to use the -x4 mode. You have to remember that with everything except code in Hub RAM, you probably cannot afford to dedicate 8K just as cache. I used a 2K cache and it works. You may be able to use a 4K cache - I didn't try that.
Here are the make options I used: Also note that you need to recompile the "utilities" folder using a command like: I used the Parallax Serial Terminal program and everything seems to work ok. Doesn't seem to be any faster, although its difficult to tell with this program - you need to come up with a more compute intensive basic program to compare the two - all this program does is a small amount of I/O and I wouldn't expect much difference there.
Ross.
Hi David,
You appear to be describing what Catalina does in the -x3 mode - i.e. all the code is in SPI Flash. All the global data (and the heap) is in SPI RAM. The stack and all local data is in Hub.
Ross.
I'm impressed with what you've achieved so far - I hope you will keep it up. This weekend I have to spend working school projects with my kids - but I hope next week to try out your IDE and try out some 'cogjects'!
Ross.
Yes, that's exactly what I meant. I'll have to try that. Thanks! Do I have to rebuild the utilities to try that memory model?
David,
The coordination is done using XMM API functions XMM_Activate and XMM_TriState (or equivalents). We are not really tristating any pins on the C3 (the terminology is a hangover from a platform where this was in fact the case - on the C3 we are instead assigning logical control of the SPI Bus) but the kernel has to make sure that it calls XMM_TriState before requesting a service from another cog that might need to use the SPI bus, and those cogs must do the equivalent of XMM_Activate to gain access to the SPI Bus and XMM_Tristate before returning control to the kernel.
To access the SPI RAM from C, I don't see why you don't just use the Catalina -x3 mode. That's what it's for.
Ross.
When you use the -x3 mode, that's essentially what you are doing. If you want data in Hub RAM then declare it as local (e.g. local to the main function). If you want data in SPI RAM then declare it as global (or allocate it on the heap).
For xbasic I didn't notice much speed difference between -x3 and -x4, so I wouldn't expect much of a performance gain in any case.
Ross.
No, you only have to rebuild the utilities if you change cache size.
Ross.
Thanks Ross! I'll have to give that a try tomorrow.
Edit: Ummm... I must be losing my mind. The -x3 layout is what I started out with. I hadn't realized that it was putting the stack and locals in hub memory. I thought that they were being placed in SRAM along with the rest of the R/W data. So I guess I've pretty much gotten as much as I can out of Catalina. My only hope for better performance is to get ZOG working with the stack and locals in hub memory. That won't be easy though since the ZPU is big-endian and the Propeller is little-endian and there are endless conflicts trying to resolve that difference.
I just tried -x4 as you described above except with the standard HIRES_TV HMI and it sort of works but loading test.bas fails. Is that likely because I ran out of memory?
That's what I've been trying to tell you!
However, I think there is scope for improvement - I've just been running some benchmarks, and the cache is not behaving nearly as well as it should. I think there may be something wrong in the cache hit detection - i.e. it is not detecting a cache hit when it should, and is reloading from SPI too often.
I'll let you know what I find out - but my time this weekend is fairly limited.
Ross.
Most likely. Try reducing the cache size (I suppose 1K is all that's left!) or use the LORES_TV option.
Ross.
You mean it can go lower than 2K? I tried it again with -DPC and now I can load test.bas but it crashes when I type RUN.