LUT out of tune

kbash · 2019-02-22 23:57

After several weeks of “almost” getting there trying to use the LUT to share information between two process, I've given up and gone back to sharing data through the Hub Ram only.

I'd get everything working and think I was finally starting to get a handle on how the LUT sharing worked. Then I'd change something that SHOULDN'T have affected anything and it would all fall to pieces again. Seriously, add a NOP, insert a WAITX etc. and the data shared between cogs would be corrupted. Nothing like this happens using the Hub Ram.

I'm still using a couple of flags in the LUT to trigger functions between two cogs, This seem to work fine with a waitx #6 after setting one or the other of the flags but once I start to pass substantial amounts of information it breaks.

Is anyone else having difficulty using LUT sharing? Is there some timing trick I'm missing? Odds are good that I'm having issues with overwriting memory while it's being read. I've tried to establish “Locks” to prevent this but still no luck.

I can use the lut as “Scratchpad memory” for one cog with no problem, but between cogs is another matter.

I'm running one function in cog 2 and the other in cog 3. Is there some other combination that might work better?

At startup I am using this sequence:


   coginit   cognum1, ##@onemove   ' init  cog 2 for movement

   coginit   cognum2, #@mset   ' init  cog 3 for the setup functions.

                cogstop #0     ' terminate the startup cog.   coginit   cognum2, #@mset   ' init  cog1 for the setup 

  cognum1 long 2
  cognum2 long 3

Is this incorrect for using LUT sharing between cogs? If so... the fact that it ALMOST works has been throwing me off.

I am NOT using:

   COGINIT #%1_1_0001,addr ‘start a pair of free cogs (lookup RAM sharing)

because I know cogs 2 and 3 are free and I can't figure out how to get my “onemove” function running in one cog and “mset” running in the other without starting them individually.

I realize this COULD be my problem. If so... how do I start two cogs with the COGINIT #%1_1_0001, function then get two different programs running in those “Matched” cogs?

jmg · 2019-02-23 00:02

Did you try a read-until-same code snippet, to avoid any issues with write-while-read not being same-clock valid ?
or, does your code guarantee by design it cannot ever read on the same clock whilst the other cog writes ?

Mark_T · 2019-02-23 00:31

I implemented a producer/consumer pair using event counters, works reliably.

CON
    OSCMODE = $010c3f04
    FREQ = 160_000_000
    BAUD = 2*115200

    ' buffer in LUT from start of LUT
    BUFSIZE  = 250
    ' event counter addresses in LUT
    INSERT_EVCOUNT = 250
    EXTRACT_EVCOUNT = 251

    ' DAC setup
    dither = %0000_0000_000_10100_00000000_01_00010_0    ' remember to wypin
    DACpin = 44  ' output DAC

VAR	
    long scratch [20]
OBJ
    ser: "SmartSerial.spin2"
	

PUB  Demo
    clkset (OSCMODE, FREQ)
    ser.start (63, 62, 0, BAUD)
    ser.str (string ("LUT share prod consume"))
    ser.tx(13)
    ser.tx(10)

    resvec := @scratch
  
    coginit (%0_1_0001, @workers)  ' start pair of cogs

    repeat
        pausems (10000)
	ser.hex (scratch[0], 8)  ' print out checksums after they get set
        ser.tx(32)
	ser.hex (scratch[1], 8)
        ser.tx(13)
        ser.tx(10)

DAT
		ORG	0
workers
		wrlut	#0, #INSERT_EVCOUNT   ' LUT not cleared at cog start, so must init variables in it
		wrlut	#0, #EXTRACT_EVCOUNT  ' either cog might get here first

		LUTSON  ' LUT sharing enabled

		mov     counter, ##10_000_000   ' producer run count, larger takes longer
		mov	checksum, #0

		cogid	cognum    ' cogs have to agree which is producer and which consumer
		testb	cognum, #0  wz
	if_z	jmp	#producer


' consumer truncates items from the buffer to 16 bits and output on DAC pin
consumer	wrpin	##dither, #DACpin
.loop		call	#extract
		add	checksum, item
		zerox   item, #15
		wypin	item, #DACpin
		waitx	##$30   ' ensure producer can get ahead sometimes
		djnz    counter, #.loop

		jmp	#report_back

' producer generates random samples with random delays
producer	
.loop 		getrnd	item
		mov	del, item
		shr	del, #24
		waitx	del
		add	checksum, item
		call	#insert
		djnz    counter, #.loop
		
report_back	cogid	cognum
		and	cognum, #1
		shl	cognum, #2
		add	cognum, resvec
		wrlong	checksum, cognum
		cogid	cognum
		cogstop cognum


' event-counter style producer consumer pair using buffer at start of LUT ram

insert		sub	ins_count, #BUFSIZE
.insert_wait	RDLUT	ext_count, #EXTRACT_EVCOUNT
		cmp	ins_count, ext_count  wz
	if_z	jmp	#.insert_wait  ' wait on buffer full (inptr == outptr+BUFSIZE)

		WRLUT   item, wrptr    ' currently restricted to start of LUT ram by using incmod
		incmod	wrptr, #BUFSIZE-1

		add	ins_count, #BUFSIZE+1
		WRLUT	ins_count, #INSERT_EVCOUNT   ' signal new item
insert_ret	ret

wrptr		long	0   ' wrapping pointer for adding items
ins_count	long	0   ' insert event counter


extract
.extract_wait   RDLUT	ins_count, #INSERT_EVCOUNT
		cmp	ext_count, ins_count  wz
	if_z	jmp	#.extract_wait ' wait on buffer empty (inptr == outptr)

		RDLUT	item, rdptr    ' currently restricted to start of LUT ram by using incmod
		incmod	rdptr, #BUFSIZE-1

		add	ext_count, #1
		WRLUT	ext_count, #EXTRACT_EVCOUNT  ' signal item taken
extract_ret	ret

rdptr		long	0   ' wrapping pointer for removing items
ext_count	long	0   ' extract event counter
resvec		long	0
checksum	long	0

counter		res	1
cognum		res	1
item		res	1
del		res	1

		FIT	$1F0

The two threads synchronize with event counters, which is immune to the LUT sharing glitch bug as
it turns out. Stick a 'scope on pin 44 and see the random waveform that's send through the shared
buffer.

ozpropdev · 2019-02-23 02:00

For coginit use %1_0001 to launch a pair of free cogs.
Then test for a odd numbered cog and reload it with your alternate code.
For example

dat		org

		coginit	#%1_0001,##@even_cog		'start pair of cogs
		cogstop	#0

'*********************************************

		org
even_cog	cogid	pa
		testb	pa,#0 wc
	if_c	coginit	pa,##@odd_cog		'if odd cog reload

		lutson
		wrlut	#0,#0
.loop		rdlut	pa,#0
		rcr	pa,#1 wc
		drvc	#56
		jmp	#.loop

'*********************************************
		org
odd_cog		lutson
.loop		waitx	##25_000_000
		rdlut	pa,#0
		add	pa,#1
		wrlut	pa,#0
		jmp	#.loop

ozpropdev · 2019-02-23 02:26

BTW
The LUT glitch bug has been fixed in FPGA testing and will be in the next respin of the silicon.

kbash · 2019-02-23 03:00

I tried several different combinations of setting flags ( just setting a value to 1 or 0 ) in the lut to separate the two from trying to use the LUT at the same time. Nothing seemed to work. I turned led's on and off which seemed to show that the two processes were working at separate times, but data still got corrupted.

I'm mainly wondering if the lut sharing SHOULD work ok with cogs not started with the:

    COGINIT #%1_1_0001,addr ‘start a pair of free cogs (lookup RAM sharing)

function.

and by the way... what happens when cog 2 and 3 are sharing the lut and another cog ( 1 or 4 ) is started and LUTSON is used? How do the cogs know that THEY are paired and others are excluded? The docs say that cogs must be adjacent but does it make a difference if it's 1 and 2 or 2 and 3?

The docs say: "the COGINIT instruction has a mechanism for finding an even/odd pair and then starting them both with the same parameters. It will be necessary for the program to differentiate between even and odd cogs and possibly restart one, or both, with the final, intended program. To have COGINIT find and start two adjacent cogs, use %x_1_xxx1 for the D/# operand."

Are there any examples out there on how to do these things?

Same parameters? as in the same program start address?

kbash · 2019-02-23 03:02

Thanks guys, I didn't see your posts until after I finished my dinner and the write-up.

jmg · 2019-02-23 03:22

kbash wrote: »

... The docs say that cogs must be adjacent but does it make a difference if it's 1 and 2 or 2 and 3?

I think they are adjacent in the physical and binary sense, so 0 & 1 are possible, but 1 & 2 are not. ie they are hardwired as pairs ? (fixed for base 0 counting)

evanh · 2019-02-23 04:02

Yes, as even/odd pairs. Four pairs numbered 0/1, 2/3, 4/5, 6/7. And as Oz said, they both get identical copies of running code so have to decide at runtime which is going to do which job.

Mark_T · 2019-02-23 04:09

To quote the documentation, which is very precise:

LOOKUP RAM SHARING BETWEEN PAIRED COGS

Adjacent cogs whose ID numbers differ by only the LSB (cogs 0 and 1, 2 and 3, 4 and 5, etc.) can each allow their lookup RAMs to be written by the other cog via its local lookup RAM writes

evanh · 2019-02-23 05:05

One detail to get straight is what the LUTSON instruction means. It means you are enabling writes from the paired cog to you, not from you to it.

LUT out of tune

Comments