Shop OBEX P1 Docs P2 Docs Learn Events
The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2 - Page 18 — Parallax Forums

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2

11314151618

Comments

  • rjo__rjo__ Posts: 2,114
    wow
  • Cool beans! Now we will have something for both schools of debug. And it seems lean and unobtrusive.

  • cgraceycgracey Posts: 14,241
    I added a level of command buffering to the streamer, so that you can feed it two initial commands without any delays, and then trigger an interrupt when the command buffer is empty, giving you time to feed it the next command before it runs dry. In other words, video via interrupts. This transfer-buffer-empty event can also be polled and waited for.

    I also added another event to track when the streamer finishes and shuts off.

    Having the streamer commands double-buffered like this makes it certain that we won't run into any timing timing pinch with SDRAM, since we can give it a command to effectively stall for some clocks, before it executes the next command which tells it to capture or output some number of words over the I/O pins. That we didn't have a way to ensure some delay had been really bothering me.

    This will be in the next FPGA release.
  • Very nice work Chip!
  • Sounds like you're making huge progress! Just be careful not to blow your cerebral cortex.
  • rjo__rjo__ Posts: 2,114
    SDRAM:)
  • RaymanRayman Posts: 14,803
    edited 2015-11-12 16:37
    I was curious if SDRAM was going to be used with streamer... Interested to see how that is going to work.

    In some ways, that should be similar to interfacing with LCD and other chips with parallel bus interface...

    BTW: I seem to recall steamer command buffering from the P2-Hot days...
  • potatoheadpotatohead Posts: 10,261
    edited 2015-11-12 17:06
    That is great Chip!

    Will it work as it does now, and this is an addition or mode?

    Yeah OZ, the WAITVID was double buffered.

    This is nice, in that the "driver" then, gets packed into some subroutine somewhere, and just runs. Overruns won't disturb the stream. This is more robust and easier to debug for video things for sure, but likely most other things too. When doing the character pixel data fetch on the 8x8 driver, I would lose sync when asking for too much during a stream. With this change, it should just not all get done and I can see that it wasn't all done. Nice.

    And see? Those empty directive slots were empty the whole time, just waiting for the right "event"





  • RaymanRayman Posts: 14,803
    I recently saw an lcd driver chip that used external sdram for screen buffers. Would be great if that works here...
  • cgraceycgracey Posts: 14,241
    potatohead wrote: »
    That is great Chip!

    Will it work as it does now, and this is an addition or mode?

    Yeah OZ, the WAITVID was double buffered.

    This is nice, in that the "driver" then, gets packed into some subroutine somewhere, and just runs. Overruns won't disturb the stream. This is more robust and easier to debug for video things for sure, but likely most other things too. When doing the character pixel data fetch on the 8x8 driver, I would lose sync when asking for too much during a stream. With this change, it should just not all get done and I can see that it wasn't all done. Nice.

    And see? Those empty directive slots were empty the whole time, just waiting for the right "event"

    It works just as before, but you can now feed it TWO instructions, initially, without any waiting. It's like always being one month ahead on your mortgage payment.
  • Cool. That's easy, and takes response pressure off the ISR, if used.
  • cgraceycgracey Posts: 14,241
    Rayman wrote: »
    I was curious if SDRAM was going to be used with streamer... Interested to see how that is going to work.

    In some ways, that should be similar to interfacing with LCD and other chips with parallel bus interface...

    BTW: I seem to recall steamer command buffering from the P2-Hot days...

    The cog that handles the SDRAM will probably do only that. It will stream data between hub RAM and I/O pins connected to SDRAM.
  • jmgjmg Posts: 15,183
    cgracey wrote: »
    Having the streamer commands double-buffered like this makes it certain that we won't run into any timing timing pinch with SDRAM, since we can give it a command to effectively stall for some clocks, before it executes the next command which tells it to capture or output some number of words over the I/O pins.
    ....
    cgracey wrote: »
    The cog that handles the SDRAM will probably do only that. It will stream data between hub RAM and I/O pins connected to SDRAM.

    Sounds great - this can all be tested on a P123 ?

    Does this work to all streamer widths ?
    ( I guess the smart pins will do
    8b -> 4b DDR & 16b -> 8b DDR ?)

    I'm thinking of QuadSPI-DDR and the new 8b wide PSRAM from Spansion/Micron/Winbond etc

    For highest speeds, I think those new parts have a CLKEcho design that gives a copy of the SCL back with the data.
    That is used to Clock-in on Read, for tighter timing tolerances. ie it is not fully async, but the Rx in is first sampled by the externally looped clock with a few ns phase delay.

    I think at low bus speeds, that clock echo can be ignored, but less clear is at what MHz does this start to matter ?

    180nm will have significant delays, maybe if a true external CLK-in is too complex, you could at least pick up the read clock from the physical CLKOUT pin, to remove part-internal delays.


  • cgraceycgracey Posts: 14,241
    edited 2015-11-12 19:58
    jmg wrote: »
    cgracey wrote: »
    Having the streamer commands double-buffered like this makes it certain that we won't run into any timing timing pinch with SDRAM, since we can give it a command to effectively stall for some clocks, before it executes the next command which tells it to capture or output some number of words over the I/O pins.
    ....
    cgracey wrote: »
    The cog that handles the SDRAM will probably do only that. It will stream data between hub RAM and I/O pins connected to SDRAM.

    Sounds great - this can all be tested on a P123 ?

    Does this work to all streamer widths ?
    ( I guess the smart pins will do
    8b -> 4b DDR & 16b -> 8b DDR ?)

    I'm thinking of QuadSPI-DDR and the new 8b wide PSRAM from Spansion/Micron/Winbond etc

    For highest speeds, I think those new parts have a CLKEcho design that gives a copy of the SCL back with the data.
    That is used to Clock-in on Read, for tighter timing tolerances. ie it is not fully async, but the Rx in is first sampled by the externally looped clock with a few ns phase delay.

    I think at low bus speeds, that clock echo can be ignored, but less clear is at what MHz does this start to matter ?

    180nm will have significant delays, maybe if a true external CLK-in is too complex, you could at least pick up the read clock from the physical CLKOUT pin, to remove part-internal delays.


    All the SDRAM stuff can be tested on the newer Prop123-A9 board, since it has the SDRAM.

    The streamer handles 8/16/32-bit wide data, so no smart pins are needed there.

    Hmmm.... data input hold time issues are possible with an external clock. Could be remedied, though, in the pad, itself.
  • Nice one Chip, should be a great addition to especially help learners to the chip too!
  • jmgjmg Posts: 15,183
    cgracey wrote: »
    jmg wrote: »
    ( I guess the smart pins will do
    8b -> 4b DDR & 16b -> 8b DDR ?)

    The streamer handles 8/16/32-bit wide data, so no smart pins are needed there.

    and 4b ?

    What about DDR, where data is output on both clock edges ?
    I thought that would be a 2:1 MUX at the pin-area, or is that better in the streamer, so the width is easier to manage in one place ?

  • evanhevanh Posts: 16,088
    edited 2015-11-12 22:13
    Chip has indicated he wants to include narrower data widths.

    There ain't a clock on the I/O pins as it stands. I am intrigued as to what Chip has up his sleeve for clocking SDRAMs. Maybe just emulated in the data pattern, in which case DDR is as ease as SDR is I think.
  • jmgjmg Posts: 15,183
    evanh wrote: »
    Chip has indicated he wants to include narrower data widths.
    I know, & I thought 4b was already there, for output at least, and I thought 4b in was 'coming' ?
    evanh wrote: »
    There ain't a clock on the I/O pins as it stands. I am intrigued as to what Chip has up his sleeve for clocking SDRAMs. Maybe just emulated in the data pattern, in which case DDR is as ease as SDR is I think.
    Hmm. SW based clocking rather defeats the idea of a streamer, and needs 4x data preparation, and can at most burst data at SysCLK/4 in DDR

    SW/Data based clocking, also infers/requires a 9th or 17th bit, which the streamer does not support currently.
    Gets messy and inefficient, very quickly.

    There must be a related clock pin, in the SDRAM / QSPI / LCD Parallel Data Streaming plans.


  • cgraceycgracey Posts: 14,241
    jmg wrote: »
    cgracey wrote: »
    jmg wrote: »
    ( I guess the smart pins will do
    8b -> 4b DDR & 16b -> 8b DDR ?)

    The streamer handles 8/16/32-bit wide data, so no smart pins are needed there.

    and 4b ?

    What about DDR, where data is output on both clock edges ?
    I thought that would be a 2:1 MUX at the pin-area, or is that better in the streamer, so the width is easier to manage in one place ?

    All the DDR chips signal at 1.8V. Only SDRAM works at 3.3V.
  • cgraceycgracey Posts: 14,241
    jmg wrote: »
    evanh wrote: »
    Chip has indicated he wants to include narrower data widths.
    I know, & I thought 4b was already there, for output at least, and I thought 4b in was 'coming' ?
    evanh wrote: »
    There ain't a clock on the I/O pins as it stands. I am intrigued as to what Chip has up his sleeve for clocking SDRAMs. Maybe just emulated in the data pattern, in which case DDR is as ease as SDR is I think.
    Hmm. SW based clocking rather defeats the idea of a streamer, and needs 4x data preparation, and can at most burst data at SysCLK/4 in DDR

    SW/Data based clocking, also infers/requires a 9th or 17th bit, which the streamer does not support currently.
    Gets messy and inefficient, very quickly.

    There must be a related clock pin, in the SDRAM / QSPI / LCD Parallel Data Streaming plans.


    Smartpins will be able to output repeating signals, like clocks.

  • cgraceycgracey Posts: 14,241
    edited 2015-11-12 23:24
    I got the streamer-empty interrupt proven with an interrupt-driven NTSC demo. The streamer ISR never takes more than several instructions, leaving lots of time for the main program:
    '*******************************
    '*  NTSC 256 x 192 x 8bpp-lut  *
    '*      Interrupt-driven       *
    '*******************************
    
    CON
    
      f_color	= 3_579_545.0		'colorburst frequency
      f_scanline	= f_color / 227.5	'scanline frequency
      f_pixel	= f_scanline * 400.0	'pixel frequency for 400 pixels per scanline
    
      f_clock	= 80_000_000.0		'clock frequency
    
      f_xfr		= f_pixel / f_clock * float($7FFF_FFFF)
      f_csc		= f_color / f_clock * float($7FFF_FFFF) * 2.0
    
      s		= 84			'scales DAC output (s = 0..128)
      r		= s * 78 / 128		'adjusts for modulator expansion
    
      mody		= ((+38*s/128) & $FF) << 24 + ((+75*s/128) & $FF) << 16 + ((+15*s/128) & $FF) << 8 + (110*s/128 & $FF)
      modi		= ((+76*r/128) & $FF) << 24 + ((-35*r/128) & $FF) << 16 + ((-41*r/128) & $FF) << 8 + (100*s/128 & $FF)
      modq		= ((+27*r/128) & $FF) << 24 + ((-67*r/128) & $FF) << 16 + ((+40*r/128) & $FF) << 8 + 128
    
    
    DAT		org
    '
    '
    ' Setup
    '
    		rdfast	#0,##$1000-$400		'load .bmp palette into lut
    		rep	@.end,#$100
    		rflong	y
    		shl	y,#8
    		wrlut	y,x
    		add	x,#1
    .end
    		rdfast	##256*192/64,##$1000	'set rdfast to wrap on bitmap
    
    		setxfrq ##round(f_xfr)		'set transfer frequency
    		setcfrq	##round(f_csc)		'set colorspace converter frequency
    
    		setcy	##mody			'set colorspace converter coefficients
    		setci	##modi
    		setcq	##modq
    
    		setcmod	#%11_1_0000		'set colorspace converter to YIQ mode (composite)
    
    		mov	ijmp1,#field		'set up streamer-empty interrupt
    		setint1	#9
    
    		xcont	#10,#0			'do streamer instruction to start interrupt sequence
    '
    '
    ' Main program
    '
    		or	dira,#1			'make p0 output
    
    .loop		xor	outa,#1			'keep toggling p0
    		jmp	#.loop
    '
    '
    ' Field loop via interrupts - issue next streamer command and then resume
    '
    field		mov     x,#27+192+35		'set blank+visible+blank lines
    .line	        xcont	m_bs,#1			'horizontal sync
    		resi1
    		xcont	m_sn,#2
    		resi1
    		xcont	m_bc,#1
    		resi1
    		xcont	m_cb,c_cb
    		resi1
    		xcont	m_bv,#1
    		resi1
    		cmp	x,#27+192+1	wc	'blank line or visible line?
    	if_c	cmpr	x,#27		wc
    	if_nc	xcont	m_vi,#0			'blank line
    	if_c	xcont	m_rf,#0			'visible line
    		resi1
    		djnz    x,#.line
    
    		mov	x,#6			'high vertical syncs
    .vlow		xcont	m_bs,#1
    		resi1
    		xcont	m_hs,#2
    		resi1
    		xcont	m_hl,#1
    		resi1
    		djnz	x,#.vlow
    
    		mov	x,#6			'low vertical syncs
    .vhigh		xcont	m_bs,#1
    		resi1
    		xcont	m_hl,#2
    		resi1
    		xcont	m_hs,#1
    		resi1
    		djnz	x,#.vhigh
    
    		mov	x,#6			'high vertical syncs
    .vlow2		xcont	m_bs,#1
    		resi1
    		xcont	m_hs,#2
    		resi1
    		xcont	m_hl,#1
    		resi1
    		djnz	x,#.vlow2
    
                    jmp     #field                  'loop
    '
    '
    ' Initialized data
    '
    m_bs		long	$CF000000+50		'before sync
    m_sn		long	$CF000000+29		'sync
    m_bc		long	$CF000000+7		'before colorburst
    m_cb		long	$CF000000+18		'colorburst
    m_bv		long	$CF000000+40		'before visible
    m_vi		long	$CF000000+256		'visible
    
    m_rf		long	$7F000000+256		'visible rflong 8bpp lut
    
    m_hs		long	$CF000000+20		'vertical sync short
    m_hl		long	$CF000000+130		'vertical sync long
    
    c_cb		long	$507000_01		'colorburst reference color
    c_vw		long	$FFFF00_00		'white
    c_vb		long	$000000_00		'black
    
    x		res	1
    y		res	1
    
    '
    ' Bitmap
    '
    		orgh	$1000 - $436	'justify pixels at $1000, pallete at $1000-$400
    		file	"bitmap.bmp"
    

    I added resume-from-interrupt instructions (via PNut), which are like return-from-interrupts, except they update their ISR vector, in order to have execution resume at the next instruction in the ISR when the next interrupt occurs. Here are the RETI's and the RESI's:
    RETI0                           =       CALLD   INB,INB     WC,WZ
    RETI1                           =       CALLD   INB,$1F5    WC,WZ
    RETI2                           =       CALLD   INB,$1F3    WC,WZ
    RETI3                           =       CALLD   INB,$1F1    WC,WZ
    
    RESI0                           =       CALLD   INA,INB     WC,WZ
    RESI1                           =       CALLD   $1F4,$1F5   WC,WZ
    RESI2                           =       CALLD   $1F2,$1F3   WC,WZ
    RESI3                           =       CALLD   $1F0,$1F1   WC,WZ
    
  • YanomaniYanomani Posts: 1,524
    edited 2015-11-13 02:19
    Hi

    At the first post of the original "The New 16-Cog, 512KB, 64 analog I/O Propeller Chip", in the place we used to find the latest P2's pin-out definition, now there is a bluish "Attachment not found".

    Refering to it, Chip has stated:

    "Here is the pin-out, as posted earlier in another thread:"

    Until the missing link can be fixed, could someone point-me where there is the original one, or, at least, another good copy of it?

    Thanks in advance

    Henrique
  • jmgjmg Posts: 15,183
    edited 2015-11-13 02:18
    cgracey wrote: »
    jmg wrote: »
    What about DDR, where data is output on both clock edges ?
    I thought that would be a 2:1 MUX at the pin-area, or is that better in the streamer, so the width is easier to manage in one place ?

    All the DDR chips signal at 1.8V. Only SDRAM works at 3.3V.

    You are likely thinking of SDRAM DDR, but there are a number of 3.3V DDR solutions, starting with QuadSPI
    a good link with summaries and command comments is here

    http://www.spansion.com/Support/Application Notes/High_Density_SPI_Core_Command_Sets_AN.pdf

    and the new HyperBUS and Micron Buses also use DDR, with 3.3v models :

    See this thread for info and links on Spansion HyperFLASH, & XTRMFlash from Micron
    http://forums.parallax.com/discussion/157695/spansion-hyperbus-for-p1-p1v-p2

    These parts are essentially Dual QuadSPI.DDR, with a Clock-feed-back signal for delay closure at > 100MHz? speeds.
  • cgraceycgracey Posts: 14,241
    I've been working on the pixel mixer from Prop2-Hot, getting it adapted to the current chip. I generalized it, somewhat, so that it does operations on all 4 bytes in a long, straight across. It started out taking 281 ALM's on the Cyclone V, but I've got it down to only 71 ALM's, including 51 registers. It's quite small now, but does all the fun stuff for pixels, as well as configurable sum-of-products operations on the four byte sets between D and S.

    I'm taking Sunday off, but will be back on Monday morning. I see there's a lot of postings I've fallen behind on!
  • evanhevanh Posts: 16,088
    Enjoy that Chip. It's already Monday morning here.
  • rjo__rjo__ Posts: 2,114
    Still Sunday here:) Have a great day off.
  • Things are in good hands....many of your testing minions play on Sunday!!!
  • That's awesome Chip :D
  • "... sum-of-products..."

    That reminds me. In P2-hot, there was a MACC register. Will something similar be available in this iteration?
  • Cluso99Cluso99 Posts: 18,069
    Enjoy your day off Chip - you certainly deserve it!
Sign In or Register to comment.