NTSC Spiral Demo - Now with HDMI and VGA output

cgraceycgracey Posts: 12,461
edited 2019-12-07 - 09:24:16 in Propeller 2
Here is a short program that displays a rotating spiral on an NTSC monitor via P16 (pin can be changed).

It uses the CORDIC for doing cartesian-to-polar conversion and the RGBI8 streamer mode for RGB-select with-5-bit intensity.

An MP4 is below.

Here is the code, NTSC driver first, then spiral program at the end:

'**********************
'*  NTSC Spiral Demo  *
'**********************

DAT		org

        	hubset  ##%1_000001_0000011000_1111_10_00       'config PLL, 20MHz/2*25*1 = 250MHz
                waitx   ##20_000_000 / 200                      'allow crystal+PLL 5ms to stabilize
                hubset  ##%1_000001_0000011000_1111_10_11       'switch to PLL

		coginit	#1,##@pgm_ntsc	'launch video cog
		coginit	#0,##@pgm_bmap	'launch bitmap cog


'*********************************
'*  NTSC 256 x 192 x 8bpp rgbi8  *
'*********************************

CON

  f_color	= 3_579_545.0		'colorburst frequency
  f_scanline	= f_color / 227.5	'scanline frequency
  f_pixel	= f_scanline * 400.0	'pixel frequency for 400 pixels per scanline

  f_clock	= 250_000_000.0		'clock frequency

  f_xfr		= f_pixel / f_clock * float($7FFF_FFFF)
  f_csc		= f_color / f_clock * float($7FFF_FFFF) * 2.0

  s		= 84			'scale DAC output (s = 0..128)
  r		= s * 1000 / 1646	'precompensate for modulator expansion of 1.646

  mody		= ((+38*s/128) & $FF) << 24 + ((+75*s/128) & $FF) << 16 + ((+15*s/128) & $FF) << 8 + (110*s/128 & $FF)
  modi		= ((+76*r/128) & $FF) << 24 + ((-35*r/128) & $FF) << 16 + ((-41*r/128) & $FF) << 8 + (100*s/128 & $FF)
  modq		= ((+27*r/128) & $FF) << 24 + ((-67*r/128) & $FF) << 16 + ((+40*r/128) & $FF) << 8 + 128

  video_pin	= 16

  ntsc_map	= $1000

DAT		org

' Setup

pgm_ntsc	rdfast	##256*192/64,##ntsc_map	'set rdfast to wrap on bitmap

		setxfrq ##round(f_xfr)		'set transfer frequency
		setcfrq	##round(f_csc)		'set colorspace converter frequency

		setcy	##mody			'set colorspace converter coefficients
		setci	##modi
		setcq	##modq

		setcmod	#%11_1_0000		'set colorspace converter to YIQ mode (composite)

		cogid	.x			'enable dac mode in pin
		setnib	.dacmode,.x,#2
		wrpin	.dacmode,#video_pin
		drvl	#video_pin

' Field loop

.field          mov	.x,#35			'top blanks
		call	#.blank

                mov     .x,#192			'set visible lines
.line	        call	#.hsync			'do horizontal sync
		xcont	.m_rf,#0		'visible line
		xcont	.m_av,#1		'after visible spacer
		djnz    .x,#.line           	'another line?

                mov     .x,#27			'bottom blanks
		call	#.blank

		mov	.x,#6			'high vertical syncs
.vlow		xcont	.m_hl,#2
		xcont	.m_hh,#1
		djnz	.x,#.vlow

		mov	.x,#6			'low vertical syncs
.vhigh		xcont	.m_ll,#2
		xcont	.m_lh,#1
		djnz	.x,#.vhigh

		mov	.x,#6			'high vertical syncs
.vlow2		xcont	.m_hl,#2
		xcont	.m_hh,#1
		djnz	.x,#.vlow2

                jmp     #.field                 'loop

' Subroutines

.blank		call	#.hsync			'blank lines
		xcont	.m_vi,#0
		xcont	.m_av,#1
	_ret_	djnz	.x,#.blank

.hsync		xcont	.m_sn,#2		'horizontal sync
		xcont	.m_bc,#1
		xcont	.m_cb,.c_cb
	_ret_	xcont	.m_ac,#1

' Data

.dacmode	long	%0000_0000_000_1011100000000_01_00000_0

.m_sn		long	$7F010000+29		'sync
.m_bc		long	$7F010000+7		'before colorburst
.m_cb		long	$7F010000+18		'colorburst
.m_ac		long	$7F010000+40		'after colorburst
.m_vi		long	$7F010000+256		'visible
.m_av		long	$7F010000+50		'after visible (400 total)

.m_rf		long	$BF030000+256		'visible rfbyte 8bpp rgbi8

.m_hl		long	$7F010000+15		'vertical sync high low 
.m_hh		long	$7F010000+185		'vertical sync high high (200 total)

.m_ll		long	$7F010000+171		'vertical sync low low
.m_lh		long	$7F010000+29		'vertical sync low high (200 total)

.c_cb		long	$507000_01		'colorburst reference color

.x		res	1
.y		res	1


'**************************************
'*  Make spirals in 256 x 192 bitmap  *
'**************************************

		org

pgm_bmap	wrfast	##256*192/64,##ntsc_map	'set wrfast to wrap on bitmap

.pixel		mov	.px,.x			'translate (x,y) to (x-256/2,y-192/2)
		sub	.px,#256/2
		mov	.py,.y
		sub	.py,#192/2
		qvector	.px,.py			'convert (x,y) to polar (rho,theta)
		getqx	.px
		getqy	.py

		shr	.py,#32-9		'get 9 MSBs of theta
		add	.py,.px			'add rho to twist it
		add	.py,.z			'add z to slowly spin it

		mov	.px,.py			'convert 6 LSBs to 5-bit up/down ramp
		test	.px,#$20	wc
	if_c	xor	.px,#$3F
		and	.px,#$1F

		shr	.py,#1			'apply 3 MSBs to RGB bits
		and	.py,#$E0
		or	.px,.py

		wfbyte	.px			'write rgbi8 pixel to bitmap

		incmod	.x,#256-1	wc	'step x
	if_c	incmod	.y,#192-1	wc	'step y
	if_c	add	.z,#1			'step z
		jmp	#.pixel

.x		long	0
.y		long	0
.z		res	1
.px		res	1
.py		res	1
«1

Comments

  • cgraceycgracey Posts: 12,461
    edited 2019-11-21 - 03:48:16
    I adapted this spiral demo to HDMI, but it was very slow - only 7.8 fps.

    I made three optimizations:

    1) Overlapped the CORDIC instructions to get 16 pixels through at once
    2) Used the LUT for fast lookup of RGBI8 pixel values
    3) Unrolled the loop for 16 pixels at a time

    Now, it's running 4.7 times as fast, so it's watchable.
    '*****************
    '*  HDMI Spiral  *
    '*****************
    
    CON		hdmi_base	= 8		'must be a multiple of 8
    
    		freq		= 250_000_000.0	'system clock frequency must be 250 MHz for HDMI
    
    		fast		= 1		'0 for small code (7.8 fps), 1 for fast code (36.6 fps)
    
    		bitmap		= $400		'HDMI bitmap (300 KB)
    
    DAT		org
    
                    hubset  ##%1_000001_0000011000_1111_10_00       'config PLL, 20MHz/2*25*1 = 250MHz
                    waitx   ##20_000_000 / 200                      'allow crystal+PLL 5ms to stabilize
                    hubset  ##%1_000001_0000011000_1111_10_11       'switch to PLL
    
    		setq	##($7FFFF - @end_of_pgm)/4		'clear hub RAM
    		wrlong	#0,##@end_of_pgm
    
    		coginit	#2,##@pgm_hdmi		'launch HDMI
    		coginit	#0,##@pgm_bmap		'launch bitmap cog
    
    
    '*********************************
    '*  HDMI 640 x 480 x 8bpp luma8  *
    '*********************************
    
    DAT             org
    
    pgm_hdmi        setcmod #$100                   'enable HDMI mode
                    drvl    #7<<6 + hdmi_base       'enable HDMI pins
                    wrpin   ##%001001<<8,#7<<6 + hdmi_base  'set 1k-ohm drive on HDMI pins
    
                    setxfrq ##$0CCCCCCC+1           'set streamer freq to 1/10th clk (25 MHz)
    
                    rdfast  ##640*480/64,##bitmap   'set rdfast to wrap on bitmap
    
    ' Field loop
    
    field           mov     hsync0,sync_000         'vsync off
                    mov     hsync1,sync_001
    
                    callpa  #10,#blank              'top blanks
    
                    mov     i,#480                  'set visible lines
    line            call    #hsync                  'do horizontal sync
                    xcont   m_rf,#0		        'do visible line
                    djnz    i,#line                 'another line?
    
                    callpa  #33,#blank              'bottom blanks
    
                    mov     hsync0,sync_222         'vsync on
                    mov     hsync1,sync_223
    
                    callpa  #2,#blank               'vertical sync blanks
    
                    jmp     #field                  'loop
    
    ' Subroutines
    
    blank           call    #hsync                  'blank lines
                    xcont   m_vi,hsync0
            _ret_   djnz    pa,#blank
    
    hsync           xcont   m_bs,hsync0             'horizontal sync
                    xzero   m_sn,hsync1
            _ret_   xcont   m_bv,hsync0
    
    ' Data
    
    sync_000        long    %1101010100_1101010100_1101010100_10    '
    sync_001        long    %1101010100_1101010100_0010101011_10    '        hsync
    sync_222        long    %0101010100_0101010100_0101010100_10    'vsync
    sync_223        long    %0101010100_0101010100_1010101011_10    'vsync + hsync
    
    m_bs            long    $70810000 + hdmi_base<<17 + 16          'before sync
    m_sn            long    $70810000 + hdmi_base<<17 + 96          'sync
    m_bv            long    $70810000 + hdmi_base<<17 + 48          'before visible
    m_vi            long    $70810000 + hdmi_base<<17 + 640         'visible
    m_rf            long    $B0830000 + hdmi_base<<17 + 640         'visible rfbyte luma8
    
    i               res     1
    hsync0          res     1
    hsync1          res     1
    
    
    '**************************************
    '*  Make spirals in 640 x 480 bitmap  *
    '**************************************
    
    		org
    
    pgm_bmap	wrfast	##640*480/64,##bitmap	'set wrfast to wrap on bitmap
    
    		modc	fast * %1111	wc	'fast or slow code?
    	if_nc	jmp	#.pixel
    
    
    ' Fast code (36.6 fps) 4.7x the speed of slow code
    
    .lut		mov	.px,.z			'make lookup table for fast translation
    		test	.px,#$20	wc	'convert 6 LSBs to 5-bit up/down ramp
    	if_c	xor	.px,#$3F
    		and	.px,#$1F
    		mov	.py,.z
    		shr	.py,#1			'apply 3 MSBs to RGB bits
    		and	.py,#$E0
    		or	.px,.py
    		wrlut	.px,.z
    		incmod	.z,#$1FF	wc
    	if_nc	jmp	#.lut
    
    
    .pixels		qvector	.x,.y	'0 in		do overlapped QVECTOR ops for 16 pixels
    
    		add	.x,#1	'1 in
    		qvector	.x,.y
    
    		add	.x,#1	'2 in
    		qvector	.x,.y
    
    		add	.x,#1	'3 in
    		qvector	.x,.y
    
    		add	.x,#1	'4 in
    		qvector	.x,.y
    
    		add	.x,#1	'5 in
    		qvector	.x,.y
    
    		add	.x,#1	'6 in
    		qvector	.x,.y
    
    		add	.x,#1	'7 in
    		qvector	.x,.y
    
    		getqx	.px+0	'0 out
    		getqy	.py+0
    
    		add	.x,#1	'8 in
    		qvector	.x,.y
    
    		getqx	.px+1	'1 out
    		getqy	.py+1
    
    		add	.x,#1	'9 in
    		qvector	.x,.y
    
    		getqx	.px+2	'2 out
    		getqy	.py+2
    
    		add	.x,#1	'10 in
    		qvector	.x,.y
    
    		getqx	.px+3	'3 out
    		getqy	.py+3
    
    		add	.x,#1	'11 in
    		qvector	.x,.y
    
    		getqx	.px+4	'4 out
    		getqy	.py+4
    
    		add	.x,#1	'12 in
    		qvector	.x,.y
    
    		getqx	.px+5	'5 out
    		getqy	.py+5
    
    		add	.x,#1	'13 in
    		qvector	.x,.y
    
    		getqx	.px+6	'6 out
    		getqy	.py+6
    
    		add	.x,#1	'14 in
    		qvector	.x,.y
    
    		getqx	.px+7	'7 out
    		getqy	.py+7
    
    		add	.x,#1	'15 in
    		qvector	.x,.y
    
    		getqx	.px+8	'8 out
    		getqy	.py+8
    
    		shr	.py+0,#32-9		'get 9 MSBs of theta (stuff code between GETQx ops)
    		add	.py+0,.px+0		'add rho to twist it
    
    		getqx	.px+9	'9 out
    		getqy	.py+9
    
    		shr	.py+1,#32-9
    		add	.py+1,.px+1
    
    		getqx	.px+10	'10 out
    		getqy	.py+10
    
    		shr	.py+2,#32-9
    		add	.py+2,.px+2
    
    		getqx	.px+11	'11 out
    		getqy	.py+11
    
    		shr	.py+3,#32-9
    		add	.py+3,.px+3
    
    		getqx	.px+12	'12 out
    		getqy	.py+12
    
    		shr	.py+4,#32-9
    		add	.py+4,.px+4
    
    		getqx	.px+13	'13 out
    		getqy	.py+13
    
    		shr	.py+5,#32-9
    		add	.py+5,.px+5
    
    		getqx	.px+14	'14 out
    		getqy	.py+14
    
    		shr	.py+6,#32-9
    		add	.py+6,.px+6
    
    		getqx	.px+15	'15 out
    		getqy	.py+15
    
    
    		add	.py+0,.z		'add z to slowly spin it
    		rdlut	.py+0,.py+0		'lookup rgbi8 color
    		wfbyte	.py+0			'write rgbi8 pixel to bitmap
    
    		add	.py+1,.z
    		rdlut	.py+1,.py+1
    		wfbyte	.py+1
    
    		add	.py+2,.z
    		rdlut	.py+2,.py+2
    		wfbyte	.py+2
    
    		add	.py+3,.z
    		rdlut	.py+3,.py+3
    		wfbyte	.py+3
    
    		add	.py+4,.z
    		rdlut	.py+4,.py+4
    		wfbyte	.py+4
    
    		add	.py+5,.z
    		rdlut	.py+5,.py+5
    		wfbyte	.py+5
    
    		add	.py+6,.z
    		rdlut	.py+6,.py+6
    		wfbyte	.py+6
    
    		shr	.py+7,#32-9
    		add	.py+7,.px+7
    		add	.py+7,.z
    		rdlut	.py+7,.py+7
    		wfbyte	.py+7
    
    		shr	.py+8,#32-9
    		add	.py+8,.px+8
    		add	.py+8,.z
    		rdlut	.py+8,.py+8
    		wfbyte	.py+8
    
    		shr	.py+9,#32-9
    		add	.py+9,.px+9
    		add	.py+9,.z
    		rdlut	.py+9,.py+9
    		wfbyte	.py+9
    
    		shr	.py+10,#32-9
    		add	.py+10,.px+10
    		add	.py+10,.z
    		rdlut	.py+10,.py+10
    		wfbyte	.py+10
    
    		shr	.py+11,#32-9
    		add	.py+11,.px+11
    		add	.py+11,.z
    		rdlut	.py+11,.py+11
    		wfbyte	.py+11
    
    		shr	.py+12,#32-9
    		add	.py+12,.px+12
    		add	.py+12,.z
    		rdlut	.py+12,.py+12
    		wfbyte	.py+12
    
    		shr	.py+13,#32-9
    		add	.py+13,.px+13
    		add	.py+13,.z
    		rdlut	.py+13,.py+13
    		wfbyte	.py+13
    
    		shr	.py+14,#32-9
    		add	.py+14,.px+14
    		add	.py+14,.z
    		rdlut	.py+14,.py+14
    		wfbyte	.py+14
    
    		shr	.py+15,#32-9
    		add	.py+15,.px+15
    		add	.py+15,.z
    		rdlut	.py+15,.py+15
    		wfbyte	.py+15
    
    		incmod	.x,#640/2-1	wc	'check if x at limit
    	if_c	neg	.x,#640/2
    	if_c	incmod	.y,#480/2-1	wc	'step y
    	if_c	neg	.y,#480/2
    	if_c	sub	.z,#1			'step z
    	if_c	drvnot	#56			'toggle P56 for speed check
    		jmp	#.pixels
    
    
    ' Slow code (7.8 fps)
    
    .pixel		qvector	.x,.y			'convert (x,y) to polar (rho,theta)
    		getqx	.px
    		getqy	.py
    
    		shr	.py,#32-9		'get 9 MSBs of theta
    		add	.py,.px			'add rho to twist it
    		add	.py,.z			'add z to slowly spin it
    
    		mov	.px,.py			'convert 6 LSBs to 5-bit up/down ramp
    		test	.px,#$20	wc
    	if_c	xor	.px,#$3F
    		and	.px,#$1F
    
    		shr	.py,#1			'apply 3 MSBs to RGB bits
    		and	.py,#$E0
    		or	.px,.py
    
    		wfbyte	.px			'write rgbi8 pixel to bitmap
    
    		incmod	.x,#640/2-1	wc	'step x
    	if_c	neg	.x,#640/2
    	if_c	incmod	.y,#480/2-1	wc	'step y
    	if_c	neg	.y,#480/2
    	if_c	sub	.z,#1			'step z
    	if_c	drvnot	#56			'toggle P56 for speed check
    		jmp	#.pixel
    
    ' Data
    
    .x		long	-640/2
    .y		long	-480/2
    .z		long	0
    .px		res	16
    .py		res	16
    
    end_of_pgm
    

  • That's cool. Only one cog for the maths. The binary is a mere 960 bytes, uncompressed and unsupported. :)
  • Like wow man! :cool:
  • Chip... things are sure hopping around here.

    The NTSC works fine, but I am not able to get any of your recent hdmi samples to run... two different cables two different Visio monitors.
    Signal not detected.
  • Oops... forgot to set the resolution on the monitor:) TRYING NOW
  • Impressive or what! :sunglasses: :sunglasses: :sunglasses: :sunglasses: :sunglasses:
  • even more so if you remember to set the resolution of your monitor.

    I have no 640x480 but 800x600 wirks jest fin.
  • Try this variant for a great "HD trippy" time
    Adjust the kaleidoscope constant for interesting effects. 22 is also good
    '*****************
    '*  HDMI Spiral  *
    '*****************
    
    CON		hdmi_base	= 24		'must be a multiple of 8
    
    		freq		= 250_000_000.0	'system clock frequency must be 250 MHz for HDMI
    
    		fast		= 1		'0 for small code (7.8 fps), 1 for fast code (36.6 fps)
    
    		bitmap		= $400		'HDMI bitmap (300 KB)
    		kaleidoscope		= 18
    DAT		org
    
                    hubset  ##%1_000001_0000011000_1111_10_00       'config PLL, 20MHz/2*25*1 = 250MHz
                    waitx   ##20_000_000 / 200                      'allow crystal+PLL 5ms to stabilize
                    hubset  ##%1_000001_0000011000_1111_10_11       'switch to PLL
    
    		setq	##($7FFFF - @end_of_pgm)/4		'clear hub RAM
    		wrlong	#0,##@end_of_pgm
    
    		coginit	#2,##@pgm_hdmi		'launch HDMI
    		coginit	#0,##@pgm_bmap		'launch bitmap cog
    
    
    '*********************************
    '*  HDMI 640 x 480 x 8bpp luma8  *
    '*********************************
    
    DAT             org
    
    pgm_hdmi        setcmod #$100                   'enable HDMI mode
                    drvl    #7<<6 + hdmi_base       'enable HDMI pins
                    wrpin   ##%001001<<8,#7<<6 + hdmi_base  'set 1k-ohm drive on HDMI pins
    
                    setxfrq ##$0CCCCCCC+1           'set streamer freq to 1/10th clk (25 MHz)
    
                    rdfast  ##640*480/64,##bitmap   'set rdfast to wrap on bitmap
    
    ' Field loop
    
    field           mov     hsync0,sync_000         'vsync off
                    mov     hsync1,sync_001
    
                    callpa  #10,#blank              'top blanks
    
                    mov     i,#480                  'set visible lines
    line            call    #hsync                  'do horizontal sync
                    xcont   m_rf,#0		        'do visible line
                    djnz    i,#line                 'another line?
    
                    callpa  #33,#blank              'bottom blanks
    
                    mov     hsync0,sync_222         'vsync on
                    mov     hsync1,sync_223
    
                    callpa  #2,#blank               'vertical sync blanks
    
                    jmp     #field                  'loop
    
    ' Subroutines
    
    blank           call    #hsync                  'blank lines
                    xcont   m_vi,hsync0
            _ret_   djnz    pa,#blank
    
    hsync           xcont   m_bs,hsync0             'horizontal sync
                    xzero   m_sn,hsync1
            _ret_   xcont   m_bv,hsync0
    
    ' Data
    
    sync_000        long    %1101010100_1101010100_1101010100_10    '
    sync_001        long    %1101010100_1101010100_0010101011_10    '        hsync
    sync_222        long    %0101010100_0101010100_0101010100_10    'vsync
    sync_223        long    %0101010100_0101010100_1010101011_10    'vsync + hsync
    
    m_bs            long    $70810000 + hdmi_base<<17 + 16          'before sync
    m_sn            long    $70810000 + hdmi_base<<17 + 96          'sync
    m_bv            long    $70810000 + hdmi_base<<17 + 48          'before visible
    m_vi            long    $70810000 + hdmi_base<<17 + 640         'visible
    m_rf            long    $B0830000 + hdmi_base<<17 + 640         'visible rfbyte luma8
    
    i               res     1
    hsync0          res     1
    hsync1          res     1
    
    
    '**************************************
    '*  Make spirals in 640 x 480 bitmap  *
    '**************************************
    
    		org
    
    pgm_bmap	wrfast	##640*480/64,##bitmap	'set wrfast to wrap on bitmap
    
    		modc	fast * %1111	wc	'fast or slow code?
    	if_nc	jmp	#.pixel
    
    
    ' Fast code (36.6 fps) 4.7x the speed of slow code
    
    .lut		mov	.px,.z			'make lookup table for fast translation
    		test	.px,#$20	wc	'convert 6 LSBs to 5-bit up/down ramp
    	if_c	xor	.px,#$3F
    		and	.px,#$1F
    		mov	.py,.z
    		shr	.py,#1			'apply 3 MSBs to RGB bits
    		and	.py,#$E0
    		or	.px,.py
    		wrlut	.px,.z
    		incmod	.z,#$1FF	wc
    	if_nc	jmp	#.lut
    
    
    .pixels		qvector	.x,.y	'0 in		do overlapped QVECTOR ops for 16 pixels
    
    		add	.x,#1	'1 in
    		qvector	.x,.y
    
    		add	.x,#1	'2 in
    		qvector	.x,.y
    
    		add	.x,#1	'3 in
    		qvector	.x,.y
    
    		add	.x,#1	'4 in
    		qvector	.x,.y
    
    		add	.x,#1	'5 in
    		qvector	.x,.y
    
    		add	.x,#1	'6 in
    		qvector	.x,.y
    
    		add	.x,#1	'7 in
    		qvector	.x,.y
    
    		getqx	.px+0	'0 out
    		getqy	.py+0
    
    		add	.x,#1	'8 in
    		qvector	.x,.y
    
    		getqx	.px+1	'1 out
    		getqy	.py+1
    
    		add	.x,#1	'9 in
    		qvector	.x,.y
    
    		getqx	.px+2	'2 out
    		getqy	.py+2
    
    		add	.x,#1	'10 in
    		qvector	.x,.y
    
    		getqx	.px+3	'3 out
    		getqy	.py+3
    
    		add	.x,#1	'11 in
    		qvector	.x,.y
    
    		getqx	.px+4	'4 out
    		getqy	.py+4
    
    		add	.x,#1	'12 in
    		qvector	.x,.y
    
    		getqx	.px+5	'5 out
    		getqy	.py+5
    
    		add	.x,#1	'13 in
    		qvector	.x,.y
    
    		getqx	.px+6	'6 out
    		getqy	.py+6
    
    		add	.x,#1	'14 in
    		qvector	.x,.y
    
    		getqx	.px+7	'7 out
    		getqy	.py+7
    
    		add	.x,#1	'15 in
    		qvector	.x,.y
    
    		getqx	.px+8	'8 out
    		getqy	.py+8
    
    		shr	.py+0,#32-kaleidoscope		'get 9 MSBs of theta (stuff code between GETQx ops)
    		add	.py+0,.px+0		'add rho to twist it
    
    		getqx	.px+9	'9 out
    		getqy	.py+9
    
    		shr	.py+1,#32-kaleidoscope
    		add	.py+1,.px+1
    
    		getqx	.px+10	'10 out
    		getqy	.py+10
    
    		shr	.py+2,#32-kaleidoscope
    		add	.py+2,.px+2
    
    		getqx	.px+11	'11 out
    		getqy	.py+11
    
    		shr	.py+3,#32-kaleidoscope
    		add	.py+3,.px+3
    
    		getqx	.px+12	'12 out
    		getqy	.py+12
    
    		shr	.py+4,#32-kaleidoscope
    		add	.py+4,.px+4
    
    		getqx	.px+13	'13 out
    		getqy	.py+13
    
    		shr	.py+5,#32-kaleidoscope
    		add	.py+5,.px+5
    
    		getqx	.px+14	'14 out
    		getqy	.py+14
    
    		shr	.py+6,#32-kaleidoscope
    		add	.py+6,.px+6
    
    		getqx	.px+15	'15 out
    		getqy	.py+15
    
    
    		add	.py+0,.z		'add z to slowly spin it
    		rdlut	.py+0,.py+0		'lookup rgbi8 color
    		wfbyte	.py+0			'write rgbi8 pixel to bitmap
    
    		add	.py+1,.z
    		rdlut	.py+1,.py+1
    		wfbyte	.py+1
    
    		add	.py+2,.z
    		rdlut	.py+2,.py+2
    		wfbyte	.py+2
    
    		add	.py+3,.z
    		rdlut	.py+3,.py+3
    		wfbyte	.py+3
    
    		add	.py+4,.z
    		rdlut	.py+4,.py+4
    		wfbyte	.py+4
    
    		add	.py+5,.z
    		rdlut	.py+5,.py+5
    		wfbyte	.py+5
    
    		add	.py+6,.z
    		rdlut	.py+6,.py+6
    		wfbyte	.py+6
    
    		shr	.py+7,#32-kaleidoscope
    		add	.py+7,.px+7
    		add	.py+7,.z
    		rdlut	.py+7,.py+7
    		wfbyte	.py+7
    
    		shr	.py+8,#32-kaleidoscope
    		add	.py+8,.px+8
    		add	.py+8,.z
    		rdlut	.py+8,.py+8
    		wfbyte	.py+8
    
    		shr	.py+9,#32-kaleidoscope
    		add	.py+9,.px+9
    		add	.py+9,.z
    		rdlut	.py+9,.py+9
    		wfbyte	.py+9
    
    		shr	.py+10,#32-kaleidoscope
    		add	.py+10,.px+10
    		add	.py+10,.z
    		rdlut	.py+10,.py+10
    		wfbyte	.py+10
    
    		shr	.py+11,#32-kaleidoscope
    		add	.py+11,.px+11
    		add	.py+11,.z
    		rdlut	.py+11,.py+11
    		wfbyte	.py+11
    
    		shr	.py+12,#32-kaleidoscope
    		add	.py+12,.px+12
    		add	.py+12,.z
    		rdlut	.py+12,.py+12
    		wfbyte	.py+12
    
    		shr	.py+13,#32-kaleidoscope
    		add	.py+13,.px+13
    		add	.py+13,.z
    		rdlut	.py+13,.py+13
    		wfbyte	.py+13
    
    		shr	.py+14,#32-kaleidoscope
    		add	.py+14,.px+14
    		add	.py+14,.z
    		rdlut	.py+14,.py+14
    		wfbyte	.py+14
    
    		shr	.py+15,#32-kaleidoscope
    		add	.py+15,.px+15
    		add	.py+15,.z
    		rdlut	.py+15,.py+15
    		wfbyte	.py+15
    
    		incmod	.x,#640/2-1	wc	'check if x at limit
    	if_c	neg	.x,#640/2
    	if_c	incmod	.y,#480/2-1	wc	'step y
    	if_c	neg	.y,#480/2
    	if_c	sub	.z,#1			'step z
    	if_c	drvnot	#56			'toggle P56 for speed check
    		jmp	#.pixels
    
    
    ' Slow code (7.8 fps)
    
    .pixel		qvector	.x,.y			'convert (x,y) to polar (rho,theta)
    		getqx	.px
    		getqy	.py
    
    		shr	.py,#32-16		'get 9 MSBs of theta
    		add	.py,.px			'add rho to twist it
    		add	.py,.z			'add z to slowly spin it
    
    		mov	.px,.py			'convert 6 LSBs to 5-bit up/down ramp
    		test	.px,#$20	wc
    	if_c	xor	.px,#$3F
    		and	.px,#$1F
    
    		shr	.py,#1			'apply 3 MSBs to RGB bits
    		and	.py,#$E0
    		or	.px,.py
    
    		wfbyte	.px			'write rgbi8 pixel to bitmap
    
    		incmod	.x,#640/2-1	wc	'step x
    	if_c	neg	.x,#640/2
    	if_c	incmod	.y,#480/2-1	wc	'step y
    	if_c	neg	.y,#480/2
    	if_c	sub	.z,#1			'step z
    	if_c	drvnot	#56			'toggle P56 for speed check
    		jmp	#.pixel
    
    ' Data
    
    .x		long	-640/2
    .y		long	-480/2
    .z		long	0
    .px		res	16
    .py		res	16
    
    end_of_pgm
    
  • @cgracey You have to rotate you screen driver for 90° :lol:
  • Played with this last night via NTSC. Adjusting the theta calculation resulted in some cool visuals. Ordered an HDMI to DVI cable to try play with the HDMI version. My monitor has VGA,DVI,Component, Composite, and S-Video, but no HDMI.

    This is a fun little demo Chip!
  • Mesmerizing/Hypnotizing: "You are getting sleepy...." Wonder if a VGA (640x480) version is doable.
  • That's very cool! What is Chip adjusting in the colors to make them look so bright? :)
  • Mesmerizing/Hypnotizing: "You are getting sleepy...." Wonder if a VGA (640x480) version is doable.

    Here you go:
    '****************
    '*  VGA Spiral  *
    '****************
    
    CON		vga_base	= 8		'must be a multiple of 8
    
    		intensity	= 90	'0..128
    
    		fclk		= 320_000_000.0
    		fpix		= 25_000_000.0
    		fset		= (fpix / fclk * 2.0) * float($4000_0000)
    
    		fast		= 1		'0 for small code (10 fps), 1 for fast code (47 fps)
    
    		bitmap		= $400		'rgbi8 bitmap (300 KB)
    
    
    DAT		org
    
                    hubset  ##%1_000000_0000001111_1111_10_00       'config PLL, 20MHz/1*16*1 = 320MHz
                    waitx   ##20_000_000 / 100                      'allow crystal+PLL 10ms to stabilize
                    hubset  ##%1_000000_0000001111_1111_10_11       'switch to PLL
    
    		coginit	#1,##@pgm_vga		'launch vga
    		coginit	#0,##@pgm_bmap		'launch bitmap cog
    
    
    '********************************
    '*  VGA 640 x 480 x 8bpp rgbi8  *
    '********************************
    
    DAT             org
    
    pgm_vga         setxfrq ##round(fset)		'set transfer frequency to fpix
    
    		setcy	##intensity << 24	'r	set colorspace for rgb
    		setci	##intensity << 16	'g
    		setcq	##intensity << 08	'b
    		setcmod	#%01_0_000_0		'enable colorspace conversion
    
    		wrpin	dacmode_hsy,#0<<6 + vga_base + 0	'enable dac mode in pin 0 for hsync
    		wrpin	dacmode_rgb,#2<<6 + vga_base + 1	'enable dac modes in pins 1..3 for rgb
    		drvl	#3<<6 + vga_base			'enable dac outputs
    
                    rdfast  ##640*480/64,##bitmap   'set rdfast to wrap on bitmap
    
    ' Field loop
    
    field           callpa  #10,#blank              'top blanks
    
                    mov     i,#480                  'set visible lines
    line            call    #hsync                  'do horizontal sync
                    xcont   m_rf,#0		        'do visible line
                    djnz    i,#line                 'another line?
    
                    callpa  #33,#blank              'bottom blanks
    
    		drvnot	#vga_base+4		'vsync on
    
                    callpa  #2,#blank               'vsync blanks
    
    		drvnot	#vga_base+4		'vsync off
    
                    jmp     #field                  'loop
    
    ' Subroutines
    
    blank           call    #hsync                  'blank lines
                    xcont   m_vi,#0
            _ret_   djnz    pa,#blank
    
    hsync           xcont   m_bs,#0			'horizontal sync
                    xzero   m_sn,#1
            _ret_   xcont   m_bv,#0
    
    ' Data
    
    dacmode_hsy	long	%0000_0000_000_1011000000001_01_00000_0	'123-ohm 3.3V, cog 1 dac channels
    dacmode_rgb	long	%0000_0000_000_1011100000001_01_00000_0	'75-ohm 2.0V, cog 1 dac channels
    
    m_bs            long    $7F010000 + 16          'before sync
    m_sn            long    $7F010000 + 96          'sync
    m_bv            long    $7F010000 + 48          'before visible
    m_vi            long    $7F010000 + 640         'visible
    m_rf            long    $BF030000 + 640         'visible rfbyte rgbi8
    
    i               res     1
    
    
    '**************************************
    '*  Make spirals in 640 x 480 bitmap  *
    '**************************************
    
    		org
    
    pgm_bmap	wrfast	##640*480/64,##bitmap	'set wrfast to wrap on bitmap
    
    		modc	fast * %1111	wc	'fast or slow code?
    	if_nc	jmp	#.pixel
    
    
    ' Fast code (47 fps) 4.7x the speed of slow code
    
    .lut		mov	.px,.z			'make lookup table for fast translation
    		test	.px,#$20	wc	'convert 6 LSBs to 5-bit up/down ramp
    	if_c	xor	.px,#$3F
    		and	.px,#$1F
    		mov	.py,.z
    		shr	.py,#1			'apply 3 MSBs to RGB bits
    		and	.py,#$E0
    		or	.px,.py
    		wrlut	.px,.z
    		incmod	.z,#$1FF	wc
    	if_nc	jmp	#.lut
    
    
    .pixels		qvector	.x,.y	'0 in		do overlapped QVECTOR ops for 16 pixels
    
    		add	.x,#1	'1 in
    		qvector	.x,.y
    
    		add	.x,#1	'2 in
    		qvector	.x,.y
    
    		add	.x,#1	'3 in
    		qvector	.x,.y
    
    		add	.x,#1	'4 in
    		qvector	.x,.y
    
    		add	.x,#1	'5 in
    		qvector	.x,.y
    
    		add	.x,#1	'6 in
    		qvector	.x,.y
    
    		add	.x,#1	'7 in
    		qvector	.x,.y
    
    		getqx	.px+0	'0 out
    		getqy	.py+0
    
    		add	.x,#1	'8 in
    		qvector	.x,.y
    
    		getqx	.px+1	'1 out
    		getqy	.py+1
    
    		add	.x,#1	'9 in
    		qvector	.x,.y
    
    		getqx	.px+2	'2 out
    		getqy	.py+2
    
    		add	.x,#1	'10 in
    		qvector	.x,.y
    
    		getqx	.px+3	'3 out
    		getqy	.py+3
    
    		add	.x,#1	'11 in
    		qvector	.x,.y
    
    		getqx	.px+4	'4 out
    		getqy	.py+4
    
    		add	.x,#1	'12 in
    		qvector	.x,.y
    
    		getqx	.px+5	'5 out
    		getqy	.py+5
    
    		add	.x,#1	'13 in
    		qvector	.x,.y
    
    		getqx	.px+6	'6 out
    		getqy	.py+6
    
    		add	.x,#1	'14 in
    		qvector	.x,.y
    
    		getqx	.px+7	'7 out
    		getqy	.py+7
    
    		add	.x,#1	'15 in
    		qvector	.x,.y
    
    		getqx	.px+8	'8 out
    		getqy	.py+8
    
    		shr	.py+0,#32-9		'get 9 MSBs of theta (stuff code between GETQx ops)
    		add	.py+0,.px+0		'add rho to twist it
    
    		getqx	.px+9	'9 out
    		getqy	.py+9
    
    		shr	.py+1,#32-9
    		add	.py+1,.px+1
    
    		getqx	.px+10	'10 out
    		getqy	.py+10
    
    		shr	.py+2,#32-9
    		add	.py+2,.px+2
    
    		getqx	.px+11	'11 out
    		getqy	.py+11
    
    		shr	.py+3,#32-9
    		add	.py+3,.px+3
    
    		getqx	.px+12	'12 out
    		getqy	.py+12
    
    		shr	.py+4,#32-9
    		add	.py+4,.px+4
    
    		getqx	.px+13	'13 out
    		getqy	.py+13
    
    		shr	.py+5,#32-9
    		add	.py+5,.px+5
    
    		getqx	.px+14	'14 out
    		getqy	.py+14
    
    		shr	.py+6,#32-9
    		add	.py+6,.px+6
    
    		getqx	.px+15	'15 out
    		getqy	.py+15
    
    
    		add	.py+0,.z		'add z to slowly spin it
    		rdlut	.py+0,.py+0		'lookup rgbi8 color
    		wfbyte	.py+0			'write rgbi8 pixel to bitmap
    
    		add	.py+1,.z
    		rdlut	.py+1,.py+1
    		wfbyte	.py+1
    
    		add	.py+2,.z
    		rdlut	.py+2,.py+2
    		wfbyte	.py+2
    
    		add	.py+3,.z
    		rdlut	.py+3,.py+3
    		wfbyte	.py+3
    
    		add	.py+4,.z
    		rdlut	.py+4,.py+4
    		wfbyte	.py+4
    
    		add	.py+5,.z
    		rdlut	.py+5,.py+5
    		wfbyte	.py+5
    
    		add	.py+6,.z
    		rdlut	.py+6,.py+6
    		wfbyte	.py+6
    
    		shr	.py+7,#32-9
    		add	.py+7,.px+7
    		add	.py+7,.z
    		rdlut	.py+7,.py+7
    		wfbyte	.py+7
    
    		shr	.py+8,#32-9
    		add	.py+8,.px+8
    		add	.py+8,.z
    		rdlut	.py+8,.py+8
    		wfbyte	.py+8
    
    		shr	.py+9,#32-9
    		add	.py+9,.px+9
    		add	.py+9,.z
    		rdlut	.py+9,.py+9
    		wfbyte	.py+9
    
    		shr	.py+10,#32-9
    		add	.py+10,.px+10
    		add	.py+10,.z
    		rdlut	.py+10,.py+10
    		wfbyte	.py+10
    
    		shr	.py+11,#32-9
    		add	.py+11,.px+11
    		add	.py+11,.z
    		rdlut	.py+11,.py+11
    		wfbyte	.py+11
    
    		shr	.py+12,#32-9
    		add	.py+12,.px+12
    		add	.py+12,.z
    		rdlut	.py+12,.py+12
    		wfbyte	.py+12
    
    		shr	.py+13,#32-9
    		add	.py+13,.px+13
    		add	.py+13,.z
    		rdlut	.py+13,.py+13
    		wfbyte	.py+13
    
    		shr	.py+14,#32-9
    		add	.py+14,.px+14
    		add	.py+14,.z
    		rdlut	.py+14,.py+14
    		wfbyte	.py+14
    
    		shr	.py+15,#32-9
    		add	.py+15,.px+15
    		add	.py+15,.z
    		rdlut	.py+15,.py+15
    		wfbyte	.py+15
    
    		incmod	.x,#640/2-1	wc	'check if x at limit
    	if_c	neg	.x,#640/2
    	if_c	incmod	.y,#480/2-1	wc	'step y
    	if_c	neg	.y,#480/2
    	if_c	sub	.z,#1			'step z
    	if_c	drvnot	#56			'toggle P56 for speed check
    		jmp	#.pixels
    
    
    ' Slow code (10 fps)
    
    .pixel		qvector	.x,.y			'convert (x,y) to polar (rho,theta)
    		getqx	.px
    		getqy	.py
    
    		shr	.py,#32-9		'get 9 MSBs of theta
    		add	.py,.px			'add rho to twist it
    		add	.py,.z			'add z to slowly spin it
    
    		mov	.px,.py			'convert 6 LSBs to 5-bit up/down ramp
    		test	.px,#$20	wc
    	if_c	xor	.px,#$3F
    		and	.px,#$1F
    
    		shr	.py,#1			'apply 3 MSBs to RGB bits
    		and	.py,#$E0
    		or	.px,.py
    
    		wfbyte	.px			'write rgbi8 pixel to bitmap
    
    		incmod	.x,#640/2-1	wc	'step x
    	if_c	neg	.x,#640/2
    	if_c	incmod	.y,#480/2-1	wc	'step y
    	if_c	neg	.y,#480/2
    	if_c	sub	.z,#1			'step z
    	if_c	drvnot	#56			'toggle P56 for speed check
    		jmp	#.pixel
    
    ' Data
    
    .x		long	-640/2
    .y		long	-480/2
    .z		long	0
    .px		res	16
    .py		res	16
    
  • That's very cool! What is Chip adjusting in the colors to make them look so bright? :)

    It's using the streamer's RGBI8 mode, which is a byte per pixel. The top 3 bits select the color and the lower 5 bits select the intensity. No palette needed.
  • JRetSapDoogJRetSapDoog Posts: 836
    edited 2019-12-08 - 08:20:06
    Wow! Thanks, Chip, for making/posting a VGA version. All one had to do was ask (Ye have not because ye have not asked). I know from your past comments that you like NTSC and VGA. So, when you posted NTSC and HDMI versions, I thought, "What? No love for VGA?" But now the family is complete. I also appreciate that you posted a video for those without a board or not in their shops. But running it in person is sweet.

    I just got my EVAL board four days ago. So I can finally move out of the peanut gallery and do some actual testing/coding/playing. Yay! But it will take some time for me to get up to speed (meaning a slow walk, in my case).

    In playing with the VGA version, I noticed that the LED for P56 rapidly flickers, which it doesn't do when running the NTSC version. I haven't tried to figure out why (and likely couldn't if I did), but I assume that it's some kind of "artifact" from the code. I did comment out the launch of the bitmap cog (Cog 0) and I didn't see any activity on any of the LED's when I did that, so I assume that something in the code to drive the spiral is somehow affecting P56.

    A funny thing happen when I commented out the coginit line for the bitmap cog and recompiled and ran it. The colorful spiral image appeared on the screen in all its beauty. And at first, I thought it was moving, but moving very slowly. So I thought, "What? How can that be? I killed the cog!" But the perception of movement was just my eyes playing tricks on me from having stared too much at the moving spiral.

    Then I thought to myself, "But wait a second. Why is the spiral there at all, moving or not?" Well, I hadn't powered down between launches of the moving version and the static version. So, apparently, the hub RAM doesn't get cleared on a reset. I may have read that somewhere back during my peanut gallery days, but it caught me by surprise.

    Anyway, I manually hit reset on the board and then re-ran the static version to see if the screen would be black. But it actually displayed a somewhat ghostly-looking version of the screen that shows up from a program that I stored in the flash memory. So, that program actually briefly loaded and put some data in the hub, which was then retained when I quickly relaunched the static spiral version. It makes sense, but it's interesting.

    I then set the FLASH dip switch to off (such that my program in flash would not run on reset) to see what would happen. And rather than a black screen, I got a screen filled with random "static," kind of like white noise. So that's interesting, too.

    I then re-enabled the bitmap cog (Cog 0) such that it would put up the spiral and fill the corresponding area in the hub. I then commented out the bitmap cog again and cut the power ever so briefly and reloaded the board. I was trying to see if the hub SRAM elements might retain any data, but they didn't (I got snow/static again). I believe that I recall reading that data in a DRAM (oddly enough with its refreshing needs) can persist for several seconds.

    Moving on, I modified the original VGA version to allow me to pass in the video base_pin as a parameter (using setq before coginit). I then launched seven instances of the VGA program with separate base pins (0, 8, 16, 24, 32, 40 and 48), such that all eight cogs were running (with the bitmap cog occupying Cog 0). I wanted to see how hot the chip got to the touch. And it did get warm but not hot, much to my (pleasant) surprise, especially with the clock frequency set to 320 MHz for the VGA version (instead of 250 MHz for the NTSC one).

    Unfortunately, I only made up a single VGA adapter board to plug into the headers, so I could only look at one cog's output at a time, but I moved the adapter around to confirm that all cogs were outputting on their respective headers (though I cut the power between each move of the adapter board just to be a bit safer when moving it around). Anyway, that pretty much confirms that all the cogs are running concurrently, with Cogs 1..7 all pulling data from the same bitmap cog (Cog 0).

    I wouldn't be surprised if the bitmap cog is using the most power, as it uses the cordic. Then again, the vga cogs do have to access the memory slices at a pretty good clip. I haven't wrapped my head around RDFAST entirely yet, so I don't know if a video cog can "rest" a bit with the streamer handling video data transfers in the background. So, I'm not really sure how a video cog's power consumption compares to that of the the bitmap cog.

    But I did monkey around with the fclk constant to try lower values than 320 MHz. It seemed to work for everything that I tried down to 243 MHz. The monitor sometimes had to shift the image a couple of times as it tried to lock on to the signals, but it took everything that I threw at it down to 243 MHz. Below that, my little 7" monitor protested "not support" (without the "ed"). But that's pretty impressive.

    By the way, things seem to run equally fast at 320 MHz as at 243 MHz. It took about 10 seconds for an arm of the spiral to go all the way around (though I didn't get out a stopwatch). There seems to be another limiting factor involved other than the fclk setting. Perhaps it's due to waiting for the cordic to finish in the bitmap cog. Anyway, I'm not clear on why Chip used 320 MHz for the VGA version (It's not even an even multiple of 25 MHz). But it seems to work well, which is to say that my monitor didn't need to shift the screen to lock on to it.

    For all I know (which is practically nothing at this point) perhaps some change could be made in the video cog code to accommodate running at a lower frequency (without any shifting occurring when locking on to the signal). But even if so, eventually such lowering of the clock frequency would cut in to the rate at which the bitmap cog is able to rotate the spiral.

    I do know from running Chip's (?) program to display a bitmap (said program being modified by rayman to run on the Eval B version), that I could take the clock frequency all the way down to 23 MHz (Yes, 2 MHz below 25 MHz, not 250 MHz) and the image (bird or whatever) still displayed rock solid.

    Incidentally, I know that I once stated my lack of comfort with using 25 MHz for VGA (640x480) instead of 25.175 MHz or something a bit closer to it than 25 MHz. But it seems to work fine on my little monitor (no complaints from it). And 25 MHz and multiples thereof (such as 200 and 250 MHz) are pretty convenient to work with when setting the clock speed.

    Still on the To-Do list for my play with this VGA spiral program is to investigate why the base_pin supposedly must be a multiple of 8 instead of 4. I presume that some instruction(s) are so limited in this particular code design (probably for speed). But theoretically, the code could be written such that the base_pin would work on multiples of 4, or am I wrong? But I haven't thought about this much, partly because the VGA adapter board that I built the other day only works on multiples of 8. I guess that I should build one that works for VGA base_pins of 4, 12, 20, 28, 36, 44 and 52 (I likely would not try for pin 60).

    Anyway, sorry for the long post, but maybe it's understandable considering that I've spent so much time in the peanut gallery. And many thanks to ersmith, rayman and ozpropdev, et. al., as I've used their compilers, GUI's and loaders to play with the P2. And thanks again, Chip, for providing the VGA version of your spiral program (not to mention the design of the P2 chip). Cheers! --Jim
  • Wow, we have Spud mk2!

    pin56 toggle is diagnostics for monitoring the rendering frame rate.

  • potatoheadpotatohead Posts: 9,989
    edited 2019-12-08 - 08:45:23
    Big grin over here.

    Go @JRetSapDoog !!
  • Running the program at 320MHz just has the effect of running the animation faster. The video displays at the same rate.
  • evanhevanh Posts: 8,813
    edited 2019-12-08 - 09:16:11
    320 MHz sys-clock produces 28% speed up in rendering frame rate and rotation speed. Matches expectation.

    Power consumption will be relatively small for the video out cogs. Even the hubram bandwidth is not very high. The renderer is the only one doing any hard work.
  • JRetSapDoogJRetSapDoog Posts: 836
    edited 2019-12-09 - 02:07:26
    Thanks for the responses, folks. Gee, I really "spiraled out of control" with that last post. No, I don't want to challenge potatohead for length-of-post supremecy (sorry if I encroached on your territory, potatohead, although it seems that you welcome the company). Potatohead is wordy and nerdy, whereas I'm just wordy.

    As for the flicker on P56, yes, I see the pair of "toggle P56 for speed check" lines in the code now. Thanks, Evan. I'm sorry that I did not look myself earlier (that would have taken less time than posting about it).

    As for the rotation rate of the spiral, on first thought, it makes sense that it would be faster at 320 MHz than 243 MHz. That's what I expected, but it just seemed like the rates were the same from a cursory look when I ran things yesterday. Too bad that I couldn't run two versions side-by-side to watch (I'd need another P2 board for that).

    Anyway, I figured that I had just misjudged the rates. So I went back again today to have another look. This time, I compared running at 375 MHz with running at 250 MHz, the former being 50% faster than the latter. But I still wasn't sure if my eye would be able to reliably detect any difference from successive runs. For this, I only changed the fclk constant in the code, nothing else.

    I recorded both runs on video for one minute using a countdown timer. For both of them, the whitish spiral arms "hit" the top of the screen about 11.2 times over the course of a minute. I was expecting the run at 250 MHz to be about a third less (say between 7 and 8 hits). But it wasn't (or at least it didn't seem to be). So, I'm still open to the idea that there might be some other limiting factor, even though that runs counter to expectations.

    I apologize in advance if there was something wrong with my test/measurements described above. Perhaps someone else can do a check (or if a code warrior, analyze the code for any possible limiting factor, but I'd see if there's really a difference in the rate first). As for me, I'll likely go on to some other things as I break in (but hopefully not break) this new board.

    By the way, as an alternative to running 7 video cogs, I tried the opposite: I launched 7 instances of the bitmap manipulation (spiral animation) code. I assumed that they would kind of clobber each other and produce strange output (after the video cog displayed the bitmap data). But the spiral ran fine (and at the same speed).

    As to why they didn't step on each other, maybe the 7 instances were all in lockstep. Or not. They are all use the same pipelined CORDIC, I realize. Well, in all likelihood, it's a case of "last man standing wins," with the final cog clobbering any earlier hub writes (likely with the same or very similar data). Or maybe I botched the code to launch them or they weren't really contending for the same memory for some reason.

    Anyway, I was mostly trying to see if I could get the P2 chip to run warmer. But I couldn't tell if it ran any warmer (using my fingertip) than the 7-cog video version.

    Actually, I still need to open up the P2 docs to see if there's any reason why the video cog is not running flat out (like I assume that the bitmap cog is), despite the streamer having more than enough time to supply the data (from the bitmap cog). If there's an automatic mechanism to make or allow it to "rest" (and thus take less power), I'm not aware of it yet, and I don't see any wait instructions in this particular VGA video generation code, either.

    Anyway, as far as heat generation and dissipation goes, so far, so good. But the EVAL is a four-layer board, with two-ounce (or maybe it's four-ounce?) copper on the bottom layer. I recall that someone (evanh?) got all eight cores/cogs running flat out and was able to consume upwards of 2 amps at 320 MHz (for Vdd). So, I guess I should try that code if I really want to heat things up. But based on the results so far, it doesn't sound like I'll be roasting any marshmallows over the chip any time soon.

    Update: Okay, I just ran the speed check described above (375 MHz vs. 250 MHz) again just using a countdown timer (there was no real need to make a video). Again, the whitish spiral arms "hit" the border at the top of the screen a tad over 11 times per minute either way, so it still seems like there's some other limiting factor for the max speed of the bitmap cog. But what? It doesn't seem to make sense. I see references in the code to 10 and 47 frames per second, but I assume those are measured results, not imposed/calculated results. Assuming that Chip is not limiting things, you'd think the faster the clock, the faster the animation. Hmm, could some logic handling the spiral rotation code be directly tied to the 20 MHz crystal speed instead of the 320 MHz (or whatever) clock speed? Hope not. Or is the code imposing an FPS limit (such as 47 frames per second)? Again, sorry if I've made some mistake in measuring the rotation speeds.
  • All is welcome. Seriously. I am here to learn, have fun and enjoy everyone's antics.
  • evanhevanh Posts: 8,813
    edited 2019-12-09 - 04:03:33
    Whitish? That's not the same demo I'm running. There is eight arms, each a different colour.

    At 250 MHz sys-clock:
    P56 is pulsing at 18.495 Hz (39.99 36.99 frames per second). Rotation is moving at about 34.5 arms per minute, or about 14 seconds for one rotation.

  • Maybe tweak the math to make position easier to discern?
  • JRetSapDoog, by increasing that frequency value at the top of the program, you're not actually changing the speed of the processor, since it is only set by the HUBSET instructions. What you are doing is slowing down the video output rate, since it's using that frequency value at the top of the program to compute it's 25 megahertz pixel output rate.

    If you ran six cogs for rendering and had each one handle 80 lines, based on its COGID, then you'd see a 6x speed increase, making a frame rate of 6 x 42fps =252fps. That's way faster than the VGA could keep up with.
  • JRetSapDoogJRetSapDoog Posts: 836
    edited 2019-12-09 - 03:09:01
    Evan, yes, 8 colored arms. I'm seeing a total of 45 arms per minute at 250, 320 and 375 MHz. UPDATE: I wasn't changing the actual P2 sys-clock frequency. Sorry, Guys!
  • Chip, oh, I see. Good!!! I just assumed that you were using that value when you set the hub. Sorry about that!
  • JRetSapDoogJRetSapDoog Posts: 836
    edited 2019-12-09 - 03:07:15
    Chip, I had been looking at other people's code who create lots of constants and then derive the hubset values from that. I just assumed that that's what you were doing. But now I see (after you pointed it out) that you've coded the value directly in one line. I couldn't see the forest for the trees. Nothing to see here, folks. Go on home.
  • JRetSapDoogJRetSapDoog Posts: 836
    edited 2019-12-09 - 03:13:38
    Chip, yes, I can see how six cogs applied to the rotation math could speed things up. But it spins at a pretty satisfying rate as it is. Perhaps the NTSC version is faster (and spins counterclockwise), but all three versions are good. Thanks for making them.
  • The NTSC version is way faster because there are only 1/6th the pixels to compute. If we were to give it that pipelined CORDIC treatment, it would really fly.
  • I see. All three versions make for nice demos. Thanks for the help, Chip/All.
Sign In or Register to comment.