I added a level of command buffering to the streamer, so that you can feed it two initial commands without any delays, and then trigger an interrupt when the command buffer is empty, giving you time to feed it the next command before it runs dry. In other words, video via interrupts. This transfer-buffer-empty event can also be polled and waited for.
I also added another event to track when the streamer finishes and shuts off.
Having the streamer commands double-buffered like this makes it certain that we won't run into any timing timing pinch with SDRAM, since we can give it a command to effectively stall for some clocks, before it executes the next command which tells it to capture or output some number of words over the I/O pins. That we didn't have a way to ensure some delay had been really bothering me.
Will it work as it does now, and this is an addition or mode?
Yeah OZ, the WAITVID was double buffered.
This is nice, in that the "driver" then, gets packed into some subroutine somewhere, and just runs. Overruns won't disturb the stream. This is more robust and easier to debug for video things for sure, but likely most other things too. When doing the character pixel data fetch on the 8x8 driver, I would lose sync when asking for too much during a stream. With this change, it should just not all get done and I can see that it wasn't all done. Nice.
And see? Those empty directive slots were empty the whole time, just waiting for the right "event"
Will it work as it does now, and this is an addition or mode?
Yeah OZ, the WAITVID was double buffered.
This is nice, in that the "driver" then, gets packed into some subroutine somewhere, and just runs. Overruns won't disturb the stream. This is more robust and easier to debug for video things for sure, but likely most other things too. When doing the character pixel data fetch on the 8x8 driver, I would lose sync when asking for too much during a stream. With this change, it should just not all get done and I can see that it wasn't all done. Nice.
And see? Those empty directive slots were empty the whole time, just waiting for the right "event"
It works just as before, but you can now feed it TWO instructions, initially, without any waiting. It's like always being one month ahead on your mortgage payment.
Having the streamer commands double-buffered like this makes it certain that we won't run into any timing timing pinch with SDRAM, since we can give it a command to effectively stall for some clocks, before it executes the next command which tells it to capture or output some number of words over the I/O pins.
The cog that handles the SDRAM will probably do only that. It will stream data between hub RAM and I/O pins connected to SDRAM.
Sounds great - this can all be tested on a P123 ?
Does this work to all streamer widths ?
( I guess the smart pins will do
8b -> 4b DDR & 16b -> 8b DDR ?)
I'm thinking of QuadSPI-DDR and the new 8b wide PSRAM from Spansion/Micron/Winbond etc
For highest speeds, I think those new parts have a CLKEcho design that gives a copy of the SCL back with the data.
That is used to Clock-in on Read, for tighter timing tolerances. ie it is not fully async, but the Rx in is first sampled by the externally looped clock with a few ns phase delay.
I think at low bus speeds, that clock echo can be ignored, but less clear is at what MHz does this start to matter ?
180nm will have significant delays, maybe if a true external CLK-in is too complex, you could at least pick up the read clock from the physical CLKOUT pin, to remove part-internal delays.
Having the streamer commands double-buffered like this makes it certain that we won't run into any timing timing pinch with SDRAM, since we can give it a command to effectively stall for some clocks, before it executes the next command which tells it to capture or output some number of words over the I/O pins.
The cog that handles the SDRAM will probably do only that. It will stream data between hub RAM and I/O pins connected to SDRAM.
Sounds great - this can all be tested on a P123 ?
Does this work to all streamer widths ?
( I guess the smart pins will do
8b -> 4b DDR & 16b -> 8b DDR ?)
I'm thinking of QuadSPI-DDR and the new 8b wide PSRAM from Spansion/Micron/Winbond etc
For highest speeds, I think those new parts have a CLKEcho design that gives a copy of the SCL back with the data.
That is used to Clock-in on Read, for tighter timing tolerances. ie it is not fully async, but the Rx in is first sampled by the externally looped clock with a few ns phase delay.
I think at low bus speeds, that clock echo can be ignored, but less clear is at what MHz does this start to matter ?
180nm will have significant delays, maybe if a true external CLK-in is too complex, you could at least pick up the read clock from the physical CLKOUT pin, to remove part-internal delays.
All the SDRAM stuff can be tested on the newer Prop123-A9 board, since it has the SDRAM.
The streamer handles 8/16/32-bit wide data, so no smart pins are needed there.
Hmmm.... data input hold time issues are possible with an external clock. Could be remedied, though, in the pad, itself.
( I guess the smart pins will do
8b -> 4b DDR & 16b -> 8b DDR ?)
The streamer handles 8/16/32-bit wide data, so no smart pins are needed there.
and 4b ?
What about DDR, where data is output on both clock edges ?
I thought that would be a 2:1 MUX at the pin-area, or is that better in the streamer, so the width is easier to manage in one place ?
Chip has indicated he wants to include narrower data widths.
There ain't a clock on the I/O pins as it stands. I am intrigued as to what Chip has up his sleeve for clocking SDRAMs. Maybe just emulated in the data pattern, in which case DDR is as ease as SDR is I think.
There ain't a clock on the I/O pins as it stands. I am intrigued as to what Chip has up his sleeve for clocking SDRAMs. Maybe just emulated in the data pattern, in which case DDR is as ease as SDR is I think.
Hmm. SW based clocking rather defeats the idea of a streamer, and needs 4x data preparation, and can at most burst data at SysCLK/4 in DDR
SW/Data based clocking, also infers/requires a 9th or 17th bit, which the streamer does not support currently.
Gets messy and inefficient, very quickly.
There must be a related clock pin, in the SDRAM / QSPI / LCD Parallel Data Streaming plans.
( I guess the smart pins will do
8b -> 4b DDR & 16b -> 8b DDR ?)
The streamer handles 8/16/32-bit wide data, so no smart pins are needed there.
and 4b ?
What about DDR, where data is output on both clock edges ?
I thought that would be a 2:1 MUX at the pin-area, or is that better in the streamer, so the width is easier to manage in one place ?
All the DDR chips signal at 1.8V. Only SDRAM works at 3.3V.
There ain't a clock on the I/O pins as it stands. I am intrigued as to what Chip has up his sleeve for clocking SDRAMs. Maybe just emulated in the data pattern, in which case DDR is as ease as SDR is I think.
Hmm. SW based clocking rather defeats the idea of a streamer, and needs 4x data preparation, and can at most burst data at SysCLK/4 in DDR
SW/Data based clocking, also infers/requires a 9th or 17th bit, which the streamer does not support currently.
Gets messy and inefficient, very quickly.
There must be a related clock pin, in the SDRAM / QSPI / LCD Parallel Data Streaming plans.
Smartpins will be able to output repeating signals, like clocks.
I got the streamer-empty interrupt proven with an interrupt-driven NTSC demo. The streamer ISR never takes more than several instructions, leaving lots of time for the main program:
'*******************************
'* NTSC 256 x 192 x 8bpp-lut *
'* Interrupt-driven *
'*******************************
CON
f_color = 3_579_545.0 'colorburst frequency
f_scanline = f_color / 227.5 'scanline frequency
f_pixel = f_scanline * 400.0 'pixel frequency for 400 pixels per scanline
f_clock = 80_000_000.0 'clock frequency
f_xfr = f_pixel / f_clock * float($7FFF_FFFF)
f_csc = f_color / f_clock * float($7FFF_FFFF) * 2.0
s = 84 'scales DAC output (s = 0..128)
r = s * 78 / 128 'adjusts for modulator expansion
mody = ((+38*s/128) & $FF) << 24 + ((+75*s/128) & $FF) << 16 + ((+15*s/128) & $FF) << 8 + (110*s/128 & $FF)
modi = ((+76*r/128) & $FF) << 24 + ((-35*r/128) & $FF) << 16 + ((-41*r/128) & $FF) << 8 + (100*s/128 & $FF)
modq = ((+27*r/128) & $FF) << 24 + ((-67*r/128) & $FF) << 16 + ((+40*r/128) & $FF) << 8 + 128
DAT org
'
'
' Setup
'
rdfast #0,##$1000-$400 'load .bmp palette into lut
rep @.end,#$100
rflong y
shl y,#8
wrlut y,x
add x,#1
.end
rdfast ##256*192/64,##$1000 'set rdfast to wrap on bitmap
setxfrq ##round(f_xfr) 'set transfer frequency
setcfrq ##round(f_csc) 'set colorspace converter frequency
setcy ##mody 'set colorspace converter coefficients
setci ##modi
setcq ##modq
setcmod #%11_1_0000 'set colorspace converter to YIQ mode (composite)
mov ijmp1,#field 'set up streamer-empty interrupt
setint1 #9
xcont #10,#0 'do streamer instruction to start interrupt sequence
'
'
' Main program
'
or dira,#1 'make p0 output
.loop xor outa,#1 'keep toggling p0
jmp #.loop
'
'
' Field loop via interrupts - issue next streamer command and then resume
'
field mov x,#27+192+35 'set blank+visible+blank lines
.line xcont m_bs,#1 'horizontal sync
resi1
xcont m_sn,#2
resi1
xcont m_bc,#1
resi1
xcont m_cb,c_cb
resi1
xcont m_bv,#1
resi1
cmp x,#27+192+1 wc 'blank line or visible line?
if_c cmpr x,#27 wc
if_nc xcont m_vi,#0 'blank line
if_c xcont m_rf,#0 'visible line
resi1
djnz x,#.line
mov x,#6 'high vertical syncs
.vlow xcont m_bs,#1
resi1
xcont m_hs,#2
resi1
xcont m_hl,#1
resi1
djnz x,#.vlow
mov x,#6 'low vertical syncs
.vhigh xcont m_bs,#1
resi1
xcont m_hl,#2
resi1
xcont m_hs,#1
resi1
djnz x,#.vhigh
mov x,#6 'high vertical syncs
.vlow2 xcont m_bs,#1
resi1
xcont m_hs,#2
resi1
xcont m_hl,#1
resi1
djnz x,#.vlow2
jmp #field 'loop
'
'
' Initialized data
'
m_bs long $CF000000+50 'before sync
m_sn long $CF000000+29 'sync
m_bc long $CF000000+7 'before colorburst
m_cb long $CF000000+18 'colorburst
m_bv long $CF000000+40 'before visible
m_vi long $CF000000+256 'visible
m_rf long $7F000000+256 'visible rflong 8bpp lut
m_hs long $CF000000+20 'vertical sync short
m_hl long $CF000000+130 'vertical sync long
c_cb long $507000_01 'colorburst reference color
c_vw long $FFFF00_00 'white
c_vb long $000000_00 'black
x res 1
y res 1
'
' Bitmap
'
orgh $1000 - $436 'justify pixels at $1000, pallete at $1000-$400
file "bitmap.bmp"
I added resume-from-interrupt instructions (via PNut), which are like return-from-interrupts, except they update their ISR vector, in order to have execution resume at the next instruction in the ISR when the next interrupt occurs. Here are the RETI's and the RESI's:
At the first post of the original "The New 16-Cog, 512KB, 64 analog I/O Propeller Chip", in the place we used to find the latest P2's pin-out definition, now there is a bluish "Attachment not found".
Refering to it, Chip has stated:
"Here is the pin-out, as posted earlier in another thread:"
Until the missing link can be fixed, could someone point-me where there is the original one, or, at least, another good copy of it?
What about DDR, where data is output on both clock edges ?
I thought that would be a 2:1 MUX at the pin-area, or is that better in the streamer, so the width is easier to manage in one place ?
All the DDR chips signal at 1.8V. Only SDRAM works at 3.3V.
You are likely thinking of SDRAM DDR, but there are a number of 3.3V DDR solutions, starting with QuadSPI
a good link with summaries and command comments is here
I've been working on the pixel mixer from Prop2-Hot, getting it adapted to the current chip. I generalized it, somewhat, so that it does operations on all 4 bytes in a long, straight across. It started out taking 281 ALM's on the Cyclone V, but I've got it down to only 71 ALM's, including 51 registers. It's quite small now, but does all the fun stuff for pixels, as well as configurable sum-of-products operations on the four byte sets between D and S.
I'm taking Sunday off, but will be back on Monday morning. I see there's a lot of postings I've fallen behind on!
Comments
I also added another event to track when the streamer finishes and shuts off.
Having the streamer commands double-buffered like this makes it certain that we won't run into any timing timing pinch with SDRAM, since we can give it a command to effectively stall for some clocks, before it executes the next command which tells it to capture or output some number of words over the I/O pins. That we didn't have a way to ensure some delay had been really bothering me.
This will be in the next FPGA release.
In some ways, that should be similar to interfacing with LCD and other chips with parallel bus interface...
BTW: I seem to recall steamer command buffering from the P2-Hot days...
Will it work as it does now, and this is an addition or mode?
Yeah OZ, the WAITVID was double buffered.
This is nice, in that the "driver" then, gets packed into some subroutine somewhere, and just runs. Overruns won't disturb the stream. This is more robust and easier to debug for video things for sure, but likely most other things too. When doing the character pixel data fetch on the 8x8 driver, I would lose sync when asking for too much during a stream. With this change, it should just not all get done and I can see that it wasn't all done. Nice.
And see? Those empty directive slots were empty the whole time, just waiting for the right "event"
It works just as before, but you can now feed it TWO instructions, initially, without any waiting. It's like always being one month ahead on your mortgage payment.
The cog that handles the SDRAM will probably do only that. It will stream data between hub RAM and I/O pins connected to SDRAM.
Sounds great - this can all be tested on a P123 ?
Does this work to all streamer widths ?
( I guess the smart pins will do
8b -> 4b DDR & 16b -> 8b DDR ?)
I'm thinking of QuadSPI-DDR and the new 8b wide PSRAM from Spansion/Micron/Winbond etc
For highest speeds, I think those new parts have a CLKEcho design that gives a copy of the SCL back with the data.
That is used to Clock-in on Read, for tighter timing tolerances. ie it is not fully async, but the Rx in is first sampled by the externally looped clock with a few ns phase delay.
I think at low bus speeds, that clock echo can be ignored, but less clear is at what MHz does this start to matter ?
180nm will have significant delays, maybe if a true external CLK-in is too complex, you could at least pick up the read clock from the physical CLKOUT pin, to remove part-internal delays.
All the SDRAM stuff can be tested on the newer Prop123-A9 board, since it has the SDRAM.
The streamer handles 8/16/32-bit wide data, so no smart pins are needed there.
Hmmm.... data input hold time issues are possible with an external clock. Could be remedied, though, in the pad, itself.
and 4b ?
What about DDR, where data is output on both clock edges ?
I thought that would be a 2:1 MUX at the pin-area, or is that better in the streamer, so the width is easier to manage in one place ?
There ain't a clock on the I/O pins as it stands. I am intrigued as to what Chip has up his sleeve for clocking SDRAMs. Maybe just emulated in the data pattern, in which case DDR is as ease as SDR is I think.
Hmm. SW based clocking rather defeats the idea of a streamer, and needs 4x data preparation, and can at most burst data at SysCLK/4 in DDR
SW/Data based clocking, also infers/requires a 9th or 17th bit, which the streamer does not support currently.
Gets messy and inefficient, very quickly.
There must be a related clock pin, in the SDRAM / QSPI / LCD Parallel Data Streaming plans.
All the DDR chips signal at 1.8V. Only SDRAM works at 3.3V.
Smartpins will be able to output repeating signals, like clocks.
I added resume-from-interrupt instructions (via PNut), which are like return-from-interrupts, except they update their ISR vector, in order to have execution resume at the next instruction in the ISR when the next interrupt occurs. Here are the RETI's and the RESI's:
At the first post of the original "The New 16-Cog, 512KB, 64 analog I/O Propeller Chip", in the place we used to find the latest P2's pin-out definition, now there is a bluish "Attachment not found".
Refering to it, Chip has stated:
"Here is the pin-out, as posted earlier in another thread:"
Until the missing link can be fixed, could someone point-me where there is the original one, or, at least, another good copy of it?
Thanks in advance
Henrique
You are likely thinking of SDRAM DDR, but there are a number of 3.3V DDR solutions, starting with QuadSPI
a good link with summaries and command comments is here
http://www.spansion.com/Support/Application Notes/High_Density_SPI_Core_Command_Sets_AN.pdf
and the new HyperBUS and Micron Buses also use DDR, with 3.3v models :
See this thread for info and links on Spansion HyperFLASH, & XTRMFlash from Micron
http://forums.parallax.com/discussion/157695/spansion-hyperbus-for-p1-p1v-p2
These parts are essentially Dual QuadSPI.DDR, with a Clock-feed-back signal for delay closure at > 100MHz? speeds.
I'm taking Sunday off, but will be back on Monday morning. I see there's a lot of postings I've fallen behind on!
That reminds me. In P2-hot, there was a MACC register. Will something similar be available in this iteration?