Shop OBEX P1 Docs P2 Docs Learn Events
Overlapping CORDIC commands to maximize throughput — Parallax Forums

Overlapping CORDIC commands to maximize throughput

cgraceycgracey Posts: 14,223
edited 2018-10-12 07:27 in Propeller 2
I did a test today where I overlapped CORDIC commands and result fetches for the single-most complex CORDIC instruction:
	setq	y	'2 clocks
	qrotate	x,angle	'2
	getqy	y	'2
	getqx	x	'2	= 8 clocks (minimum)

SETQ+QROTATE rotates a signed 32-bit (x,y) coordinate pair by a 32-bit angle, then returns the resultant (x,y) via GETQX and GETQY.

With minimum timing, achievable by overlapping command issuing and result fetching, a cog in an 8-cog P2 implementation can do one of these operations every 8 clocks. There is no time for hub memory accesses or indirection for SETQ+QROTATE+GETQY+GETQX. All instructions must be hard-coded with register addresses from which inputs are read and outputs are stored.

In my test, I did 32 operations, of which the middle 16 were completely overlapped. It took 312 clocks to complete (32*8 + 56). That comes to 9.75 clocks per operation (312/32). The more operations you overlap, the closer you get to 8 clocks per operation.

The program I wrote rotates 32 sets of (x,y) coordinate pairs by ascending angle values and outputs the y values to 16-bit DACs on P0..P31. I'm running the P2 at 250MHz off a 20MHz crystal.

P0..P31 output 990-ohm 3.3V sine waves from 10.0Khz..13.1KHz in 100Hz increments. The whole loop takes 2.33us, so the update rate is 428KHz.

Here is the code:
'
' CORDIC overlapping command demo
'
' - outputs 32 sine waves of increasing frequency on P0..P31 using 990-ohm DACs
' - uses SETQ+QROTATE+GETQY+GETQX, the most-in/out-intensive CORDIC command
'
con	f = $0010C400			'base frequency = ~10KHz, 100Hz steps

dat	org

	hubset	##%1_000001_0000011000_1111_10_00	'enable crystal+PLL, stay in 20MHz+ mode
	waitx	##20_000_000/100			'wait ~10ms for crystal+PLL to stabilize
	hubset	##%1_000001_0000011000_1111_10_11	'now switch to PLL running at 250MHz

	rep	@.r,#32			'ready to set P0..P31 to DAC mode
	wrpin	dacmode,i		'set 16-bit noise-dither DAC mode
	wxpin	#1,i			'set period to 1 for anytime updating
	dirh	i			'enable smart pin
	incmod	i,#31			'inc index, wrap to 0
.r
'
'
' Rotate 32 sets of (x,y) coordinates at different rates
' by overlapping CORDIC commands and result fetches
'
'					'clk	sum
'					'w=wait	!=cordic tick
'
loop	setq	y+00			'2	?	begin first 8 commands
	qrotate	x+00,a+00		'?w+2	2!

	setq	y+01			'2	4
	qrotate	x+01,a+01		'4w+2	10!

	setq	y+02			'2	12
	qrotate	x+02,a+02		'4w+2	18!

	setq	y+03			'2	20
	qrotate	x+03,a+03		'4w+2	26!

	setq	y+04			'2	28
	qrotate	x+04,a+04		'4w+2	34!

	setq	y+05			'2	36
	qrotate	x+05,a+05		'4w+2	42!

	setq	y+06			'2	44
	qrotate	x+06,a+06		'4w+2	50!

	setq	y+07			'2	52
	qrotate	x+07,a+07		'4w+2	58!	result 00 is ready at 54!!!

	getqy	y+00			'2	60	get result 00, no waiting!!!
	getqx	x+00			'2	62

	setq	y+08			'2	64	begin overlapping commands and results
	qrotate	x+08,a+08		'2	66!

	getqy	y+01			'2	68
	getqx	x+01			'2	70

	setq	y+09			'2	72
	qrotate	x+09,a+09		'2	74!

	getqy	y+02			'2	76
	getqx	x+02			'2	78

	setq	y+10			'2	80
	qrotate	x+10,a+10		'2	82!

	getqy	y+03			'2	84
	getqx	x+03			'2	86

	setq	y+11			'2	88
	qrotate	x+11,a+11		'2	90!

	getqy	y+04			'2	92
	getqx	x+04			'2	94

	setq	y+12			'2	96
	qrotate	x+12,a+12		'2	98!

	getqy	y+05			'2	100
	getqx	x+05			'2	102

	setq	y+13			'2	104
	qrotate	x+13,a+13		'2	106!

	getqy	y+06			'2	108
	getqx	x+06			'2	110

	setq	y+14			'2	112
	qrotate	x+14,a+14		'2	114!

	getqy	y+07			'2	116
	getqx	x+07			'2	118

	setq	y+15			'2	120
	qrotate	x+15,a+15		'2	122!

	getqy	y+08			'2	124
	getqx	x+08			'2	126

	setq	y+16			'2	128
	qrotate	x+16,a+16		'2	130!

	getqy	y+09			'2	132
	getqx	x+09			'2	134

	setq	y+17			'2	136
	qrotate	x+17,a+17		'2	138!

	getqy	y+10			'2	140
	getqx	x+10			'2	142

	setq	y+18			'2	144
	qrotate	x+18,a+18		'2	146!

	getqy	y+11			'2	148
	getqx	x+11			'2	150

	setq	y+19			'2	152
	qrotate	x+19,a+19		'2	154!

	getqy	y+12			'2	156
	getqx	x+12			'2	158

	setq	y+20			'2	160
	qrotate	x+20,a+20		'2	162!

	getqy	y+13			'2	164
	getqx	x+13			'2	166

	setq	y+21			'2	168
	qrotate	x+21,a+21		'2	170!

	getqy	y+14			'2	172
	getqx	x+14			'2	174

	setq	y+22			'2	176
	qrotate	x+22,a+22		'2	178!

	getqy	y+15			'2	180
	getqx	x+15			'2	182

	setq	y+23			'2	184
	qrotate	x+23,a+23		'2	186!

	getqy	y+16			'2	188
	getqx	x+16			'2	190

	setq	y+24			'2	192
	qrotate	x+24,a+24		'2	194!

	getqy	y+17			'2	196
	getqx	x+17			'2	198

	setq	y+25			'2	200
	qrotate	x+25,a+25		'2	202!

	getqy	y+18			'2	204
	getqx	x+18			'2	206

	setq	y+26			'2	208
	qrotate	x+26,a+26		'2	210!

	getqy	y+19			'2	212
	getqx	x+19			'2	214

	setq	y+27			'2	216
	qrotate	x+27,a+27		'2	218!

	getqy	y+20			'2	220
	getqx	x+20			'2	222

	setq	y+28			'2	224
	qrotate	x+28,a+28		'2	226!

	getqy	y+21			'2	228
	getqx	x+21			'2	230

	setq	y+29			'2	232
	qrotate	x+29,a+29		'2	234!

	getqy	y+22			'2	236
	getqx	x+22			'2	238

	setq	y+30			'2	240
	qrotate	x+30,a+30		'2	242!

	getqy	y+23			'2	244
	getqx	x+23			'2	246

	setq	y+31			'2	248
	qrotate	x+31,a+31		'2	250!

	getqy	y+24			'2	252	get 8 trailing results
	getqx	x+24			'2	254

	getqy	y+25			'4w+2	260
	getqx	x+25			'2	262

	getqy	y+26			'4w+2	268
	getqx	x+26			'2	270

	getqy	y+27			'4w+2	276
	getqx	x+27			'2	278

	getqy	y+28			'4w+2	284
	getqx	x+28			'2	286

	getqy	y+29			'4w+2	292
	getqx	x+29			'2	294

	getqy	y+30			'4w+2	300
	getqx	x+30			'2	302

	getqy	y+31			'4w+2	308
	getqx	x+31			'2	310
'
'
' Output y[00..31] (sines) to P0..P31
'
	rep	@.r,#32			'ready to update 32 DACs
	alts	i,#y			'get y[00..31] into next s and inc i
	getword	j,0,#1			'get upper word of y
	bitnot	j,#15			'convert signed word to unsigned word for DAC output
	wypin	j,i			'update DAC output value
	incmod	i,#31			'inc index, wrap to 0
.r
	drvnot	#32			'toggle P32 on each iteration

	jmp	#loop
'
'
' Data
'
dacmode	long	%10100_00000000_01_00010_0	'990-ohm DAC + randomly-dithered 16-bit DAC mode

i	long	0			'index
j	long	0			'misc

x	long	$7F000000[32]		'initial (x,y) coordinates
y	long	$00000000[32]

a	long	100*f,101*f,102*f,103*f,104*f,105*f,106*f,107*f		'ascending frequencies
	long	108*f,109*f,110*f,111*f,112*f,113*f,114*f,115*f
	long	116*f,117*f,118*f,119*f,120*f,121*f,122*f,123*f
	long	124*f,125*f,126*f,127*f,128*f,129*f,130*f,131*f

EDIT: Found a bug in the program where P0 DAC was not updating. Fixed it.

Comments

  • cgraceycgracey Posts: 14,223
    edited 2018-10-11 04:26
    Here is a picture of one of the 32 pins outputting a sine wave. This is not nearly as fast as the DDS via the Goertzel circuit, since it's outputting a new sample to each pin only once every 2.33us, but the smart pin is dithering the 8-bit DAC with 16-bit data, getting another 8 bits of resolution out of it.

    Most importantly here, by batching up CORDIC operations in any mixture, a cog can approach a mere 8 clocks per operation, for even the most data-intensive CORDIC function.
    2048 x 1152 - 439K
  • Cluso99Cluso99 Posts: 18,069
    Will nice to see what Phil can come up with decoding/processing RF signals :)
Sign In or Register to comment.