Prop performance question

cmd · 2015-03-27 16:44

I'm modifying Chip Gracey's "VGA High-Res Text Demo v1.0" code to read data from a FIFO connected to the Propeller and poke incoming data into the 2000-character screen buffer. The Prop is connected to a 5.5296 MHz clock signal on XI with XO left unconnected, and I'm using the 16x PLL to get a CPU speed of 88.47 MHz.

I modified the program as follows:

// CON section
_clkmode = xinput + pll16x ' Specify external clock input
_xinfreq = 5_529_600
// Main loop, clock FIFO, check flag, store byte to screen buffer
repeat
OUTA[11] := 0 ' Clock FIFO RCLK to update empty flag state
OUTA[11] := 1
cmd := INA[7..0] ' Read FIFO word (may be invalid)
if(INA[14] == 1) ' Check if empty flag is false
screen.byte[ write_address ] := cmd ' Write word to buffer
write_address++ ' Bump buffer position
if write_address == 2000 ' Wrap buffer position
write_address := 0

Here pin 11 is the FIFO read clock, pin 14 is the FIFO empty flag, pins 0-7 are the FIFO data.
When I check how long the polling loop takes for the case where there is no data in the FIFO, it takes a whopping 35.91us, and for the longer case where there is data to be written it is 60.41us. It seems like the Prop is running much slower than expected.

Does this sound correct? Maybe I've misconfigured something and the cogs are not being clocked at the expected frequency.

The other CPU filling the FIFO only runs at 5 MHz (no internal multiplier like the Prop) and it can write data in just 5.4us. I assumed at 88 MHz the Prop could read data at a significantly higher rate.
Or is this normal for interpreted Spin code and I'd need to rewrite it in assembly to get better performance?

Bean · 2015-03-27 16:51

Yeah SPIN is not fast enough to read the data in 5.4uSec.

Bean

Peter Jakacki · 2015-03-27 16:53

cmd wrote: »

I'm modifying Chip Gracey's "VGA High-Res Text Demo v1.0" code to read data from a FIFO connected to the Propeller and poke incoming data into the 2000-character screen buffer. The Prop is connected to a 5.5296 MHz clock signal on XI with XO left unconnected, and I'm using the 16x PLL to get a CPU speed of 88.47 MHz.

I modified the program as follows:

// CON section
_clkmode = xinput + pll16x ' Specify external clock input
_xinfreq = 5_529_600
// Main loop, clock FIFO, check flag, store byte to screen buffer
repeat
OUTA[11] := 0 ' Clock FIFO RCLK to update empty flag state
OUTA[11] := 1
cmd := INA[7..0] ' Read FIFO word (may be invalid)
if(INA[14] == 1) ' Check if empty flag is false
screen.byte[ write_address ] := cmd ' Write word to buffer
write_address++ ' Bump buffer position
if write_address == 2000 ' Wrap buffer position
write_address := 0

Here pin 11 is the FIFO read clock, pin 14 is the FIFO empty flag, pins 0-7 are the FIFO data.
When I check how long the polling loop takes for the case where there is no data in the FIFO, it takes a whopping 35.91us, and for the longer case where there is data to be written it is 60.41us. It seems like the Prop is running much slower than expected.

Does this sound correct? Maybe I've misconfigured something and the cogs are not being clocked at the expected frequency.

The other CPU filling the FIFO only runs at 5 MHz (no internal multiplier like the Prop) and it can write data in just 5.4us. I assumed at 88 MHz the Prop could read data at a significantly higher rate.
Or is this normal for interpreted Spin code and I'd need to rewrite it in assembly to get better performance?

You answered yourself in the last sentence and btw, the cogs execute at 1/4 of the clock speed, so 22 MIPs at the most in your case.

The PASM code for your loop would be very simple of course but you would only read the input once per loop as there is a timing error in the way that you read the cmd and then read the inputs again just for the empty flag. You could have just read all 32 bits into cmd, test bit 14, and then the byte write would only have used the 8 lsbs anyway. Your PASM loop should just use a simple DJNZ to determine when it needs to wrap the buffer and maybe a WAITPNE/WAITPEQ for the "empty" flag perhaps in which case the loop will be very much faster.

EDIT: not quite sure whether your empty flag is also a ready flag (please ignore my timing remark in this case) but the PASM loop may be too fast for the other CPU, you may need to slow the Prop down.

cmd · 2015-03-27 17:32

1/4 of the clock speed, so 22 MIPs at the most in your case

Aha, that explains a lot! Thanks for your input -- I'll rewrite it in assembly with your suggestions in mind. Not sure where I got the idea all the cogs were running at full speed.

davidsaunders · 2015-03-30 05:02

cmd wrote: »

Aha, that explains a lot! Thanks for your input -- I'll rewrite it in assembly with your suggestions in mind. Not sure where I got the idea all the cogs were running at full speed.

All cogs are running at full speed. It just takes 4 clock cycles to execute one instruction.

Prop performance question

Comments