For comparison I tried the good old way with Spin + optimized PASM code for the LED String code.
The PASM code uses the counter trick to speed up the transfer to 20 Mbit/sec. The RGB color data is still defined inside the Spin code, so this is not harder to use than the C or Forth versions.
I don't have such LED strings, so could not test it. But on the scope the data and clock looks good and I have used code with the same timing to write to SPI RAMs. If it not works with the LEDs, it may be too fast and needs some delays here and there.
The Benchmark result for this Spin+PASM is 388 us (microseconds not milliseconds!) for updating all the LED chips with one color, including the zero byte at begin.
This is over 3600 times faster than the PropForth version in the first post and about 8 times faster than the fastest candidate so far.
Here is the code:
CON
_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000
LEDdata = 0
LEDclock = 1
numLEDs = 264
OBJ
term : "FullDuplexSerial"
VAR
long rgbdata
PUB Main
term.start(31,30,0,115200)
cognew(@fspi,@rgbdata)
waitcnt(clkfreq*2 + cnt)
result := CNT
rgbdata := $808080
repeat until rgbdata == 0
result := CNT - result
rgbdata := $80FFFF 'another color
waitcnt(clkfreq + cnt) 'some delay
rgbdata := $FF80FF 'another color
repeat until rgbdata == 0
term.str(string(13,"time = "))
term.dec(result/80)
term.str(string(" us",13))
repeat
DAT
fspi mov dira,outmask
mov outa,#0
mov ctra,modeMux
mov frqb,freq4
loop mov t1,#0
wrlong t1,par
waitd rdlong t1,par wz
if_z jmp #waitd
mov lcnt,#numLEDs 'number of LED chips
shl t1,#8
mov phsa,#0 'ctra for mux+shiftreg
mov phsb,#0 'ctrb for 20MHz sclk
mov ctrb,modeClk 'start clock
nop '$00 at begin (8 bits)
nop
nop
nop
nop
nop
nop
mov ctrb,#0 'stop clock
leds mov phsa,t1 'send the RGB bytes (24 bits)
mov phsb,#0
mov ctrb,modeClk
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
shl phsa,#1
mov ctrb,#0
djnz lcnt,#leds 'repeat for all LED chips
jmp #loop
modeClk long %00100<<26 + LEDclock 'counter mode for Clock
modeMux long %00100<<26 + LEDdata 'counter mode for shiftreg/Mux
freq4 long $4000_0000 'frequency = sysclock/4
outmask long 1<<LEDclock | 1<<LEDdata
t1 res 1
lcnt res 1
You just write the desired RGB data to the rgbdata variable. The PASM cog notices that and writes the data to the LEDs, then the variable gets cleared. You can wait in Spin until the variable is zero, but normally you anyway have a delay longer then 388us after a color change in the code.
For comparison I tried the good old way with Spin + optimized PASM code for the LED String code.
The PASM code uses the counter trick to speed up the transfer to 20 Mbit/sec. The RGB color data is still defined inside the Spin code, so this is not harder to use than the C or Forth versions.
I don't have such LED strings, so could not test it. But on the scope the data and clock looks good and I have used code with the same timing to write to SPI RAMs. If it not works with the LEDs, it may be too fast and needs some delays here and there.
The Benchmark result for this Spin+PASM is 388 us (microseconds not milliseconds!) for updating all the LED chips with one color, including the zero byte at begin.
This is over 3600 times faster than the PropForth version in the first post and about 8 times faster than the fastest candidate so far.
You just write the desired RGB data to the rgbdata variable. The PASM cog notices that and writes the data to the LEDs, then the variable gets cleared. You can wait in Spin until the variable is zero, but normally you anyway have a delay longer then 388us after a color change in the code.
Andy
I just knew that someone would pull this trick out of the hat and it had to be you Andy
The Tachyon time is 2.66ms so that makes your PASM counter code almost 7 times faster although I know that I can code this way but the challenge was to do it with HLL. I know my version could be faster if I used the counter method but I'm interested in running more than snippets of code which can always seem to perform well when they "hog a cog". However, of course your counter code would make a good object for this device and when I get a round TUIT I might incorporate it in my resident run-time OBEX which will live in upper EEPROM or SD. The idea there is that I can load objects at run time (or is that fun time?) and reuse cogs as needed.
Comments
The PASM code uses the counter trick to speed up the transfer to 20 Mbit/sec. The RGB color data is still defined inside the Spin code, so this is not harder to use than the C or Forth versions.
I don't have such LED strings, so could not test it. But on the scope the data and clock looks good and I have used code with the same timing to write to SPI RAMs. If it not works with the LEDs, it may be too fast and needs some delays here and there.
The Benchmark result for this Spin+PASM is 388 us (microseconds not milliseconds!) for updating all the LED chips with one color, including the zero byte at begin.
This is over 3600 times faster than the PropForth version in the first post and about 8 times faster than the fastest candidate so far.
Here is the code: You just write the desired RGB data to the rgbdata variable. The PASM cog notices that and writes the data to the LEDs, then the variable gets cleared. You can wait in Spin until the variable is zero, but normally you anyway have a delay longer then 388us after a color change in the code.
Andy
I just knew that someone would pull this trick out of the hat and it had to be you Andy
The Tachyon time is 2.66ms so that makes your PASM counter code almost 7 times faster although I know that I can code this way but the challenge was to do it with HLL. I know my version could be faster if I used the counter method but I'm interested in running more than snippets of code which can always seem to perform well when they "hog a cog". However, of course your counter code would make a good object for this device and when I get a round TUIT I might incorporate it in my resident run-time OBEX which will live in upper EEPROM or SD. The idea there is that I can load objects at run time (or is that fun time?) and reuse cogs as needed.