LUT out of tune

After several weeks of “almost” getting there trying to use the LUT to share information between two process, I've given up and gone back to sharing data through the Hub Ram only.

I'd get everything working and think I was finally starting to get a handle on how the LUT sharing worked. Then I'd change something that SHOULDN'T have affected anything and it would all fall to pieces again. Seriously, add a NOP, insert a WAITX etc. and the data shared between cogs would be corrupted. Nothing like this happens using the Hub Ram.

I'm still using a couple of flags in the LUT to trigger functions between two cogs, This seem to work fine with a waitx #6 after setting one or the other of the flags but once I start to pass substantial amounts of information it breaks.

Is anyone else having difficulty using LUT sharing? Is there some timing trick I'm missing? Odds are good that I'm having issues with overwriting memory while it's being read. I've tried to establish “Locks” to prevent this but still no luck.

I can use the lut as “Scratchpad memory” for one cog with no problem, but between cogs is another matter.

I'm running one function in cog 2 and the other in cog 3. Is there some other combination that might work better?

At startup I am using this sequence:
   coginit   cognum1, ##@onemove   ' init  cog 2 for movement

   coginit   cognum2, #@mset   ' init  cog 3 for the setup functions.

                cogstop #0     ' terminate the startup cog.   coginit   cognum2, #@mset   ' init  cog1 for the setup 

  cognum1 long 2
  cognum2 long 3


Is this incorrect for using LUT sharing between cogs? If so... the fact that it ALMOST works has been throwing me off.

I am NOT using:
   COGINIT #%1_1_0001,addr ‘start a pair of free cogs (lookup RAM sharing)
because I know cogs 2 and 3 are free and I can't figure out how to get my “onemove” function running in one cog and “mset” running in the other without starting them individually.

I realize this COULD be my problem. If so... how do I start two cogs with the COGINIT #%1_1_0001, function then get two different programs running in those “Matched” cogs?

Comments

  • jmgjmg Posts: 13,611
    Did you try a read-until-same code snippet, to avoid any issues with write-while-read not being same-clock valid ?
    or, does your code guarantee by design it cannot ever read on the same clock whilst the other cog writes ?
  • Mark_TMark_T Posts: 1,981
    edited 2019-02-23 - 00:33:09
    I implemented a producer/consumer pair using event counters, works reliably.
    CON
        OSCMODE = $010c3f04
        FREQ = 160_000_000
        BAUD = 2*115200
    
        ' buffer in LUT from start of LUT
        BUFSIZE  = 250
        ' event counter addresses in LUT
        INSERT_EVCOUNT = 250
        EXTRACT_EVCOUNT = 251
    
        ' DAC setup
        dither = %0000_0000_000_10100_00000000_01_00010_0    ' remember to wypin
        DACpin = 44  ' output DAC
    
    VAR	
        long scratch [20]
    OBJ
        ser: "SmartSerial.spin2"
    	
    
    PUB  Demo
        clkset (OSCMODE, FREQ)
        ser.start (63, 62, 0, BAUD)
        ser.str (string ("LUT share prod consume"))
        ser.tx(13)
        ser.tx(10)
    
        resvec := @scratch
      
        coginit (%0_1_0001, @workers)  ' start pair of cogs
    
        repeat
            pausems (10000)
    	ser.hex (scratch[0], 8)  ' print out checksums after they get set
            ser.tx(32)
    	ser.hex (scratch[1], 8)
            ser.tx(13)
            ser.tx(10)
    
    DAT
    		ORG	0
    workers
    		wrlut	#0, #INSERT_EVCOUNT   ' LUT not cleared at cog start, so must init variables in it
    		wrlut	#0, #EXTRACT_EVCOUNT  ' either cog might get here first
    
    		LUTSON  ' LUT sharing enabled
    
    		mov     counter, ##10_000_000   ' producer run count, larger takes longer
    		mov	checksum, #0
    
    		cogid	cognum    ' cogs have to agree which is producer and which consumer
    		testb	cognum, #0  wz
    	if_z	jmp	#producer
    
    
    ' consumer truncates items from the buffer to 16 bits and output on DAC pin
    consumer	wrpin	##dither, #DACpin
    .loop		call	#extract
    		add	checksum, item
    		zerox   item, #15
    		wypin	item, #DACpin
    		waitx	##$30   ' ensure producer can get ahead sometimes
    		djnz    counter, #.loop
    
    		jmp	#report_back
    
    ' producer generates random samples with random delays
    producer	
    .loop 		getrnd	item
    		mov	del, item
    		shr	del, #24
    		waitx	del
    		add	checksum, item
    		call	#insert
    		djnz    counter, #.loop
    		
    report_back	cogid	cognum
    		and	cognum, #1
    		shl	cognum, #2
    		add	cognum, resvec
    		wrlong	checksum, cognum
    		cogid	cognum
    		cogstop cognum
    
    
    ' event-counter style producer consumer pair using buffer at start of LUT ram
    
    insert		sub	ins_count, #BUFSIZE
    .insert_wait	RDLUT	ext_count, #EXTRACT_EVCOUNT
    		cmp	ins_count, ext_count  wz
    	if_z	jmp	#.insert_wait  ' wait on buffer full (inptr == outptr+BUFSIZE)
    
    		WRLUT   item, wrptr    ' currently restricted to start of LUT ram by using incmod
    		incmod	wrptr, #BUFSIZE-1
    
    		add	ins_count, #BUFSIZE+1
    		WRLUT	ins_count, #INSERT_EVCOUNT   ' signal new item
    insert_ret	ret
    
    wrptr		long	0   ' wrapping pointer for adding items
    ins_count	long	0   ' insert event counter
    
    
    extract
    .extract_wait   RDLUT	ins_count, #INSERT_EVCOUNT
    		cmp	ext_count, ins_count  wz
    	if_z	jmp	#.extract_wait ' wait on buffer empty (inptr == outptr)
    
    		RDLUT	item, rdptr    ' currently restricted to start of LUT ram by using incmod
    		incmod	rdptr, #BUFSIZE-1
    
    		add	ext_count, #1
    		WRLUT	ext_count, #EXTRACT_EVCOUNT  ' signal item taken
    extract_ret	ret
    
    rdptr		long	0   ' wrapping pointer for removing items
    ext_count	long	0   ' extract event counter
    resvec		long	0
    checksum	long	0
    
    counter		res	1
    cognum		res	1
    item		res	1
    del		res	1
    
    		FIT	$1F0
    

    The two threads synchronize with event counters, which is immune to the LUT sharing glitch bug as
    it turns out. Stick a 'scope on pin 44 and see the random waveform that's send through the shared
    buffer.
  • For coginit use %1_0001 to launch a pair of free cogs.
    Then test for a odd numbered cog and reload it with your alternate code.
    For example
    dat		org
    
    		coginit	#%1_0001,##@even_cog		'start pair of cogs
    		cogstop	#0
    
    '*********************************************
    
    		org
    even_cog	cogid	pa
    		testb	pa,#0 wc
    	if_c	coginit	pa,##@odd_cog		'if odd cog reload
    
    		lutson
    		wrlut	#0,#0
    .loop		rdlut	pa,#0
    		rcr	pa,#1 wc
    		drvc	#56
    		jmp	#.loop
    
    '*********************************************
    		org
    odd_cog		lutson
    .loop		waitx	##25_000_000
    		rdlut	pa,#0
    		add	pa,#1
    		wrlut	pa,#0
    		jmp	#.loop
    
    Melbourne, Australia
  • BTW
    The LUT glitch bug has been fixed in FPGA testing and will be in the next respin of the silicon.
    Melbourne, Australia

  • I tried several different combinations of setting flags ( just setting a value to 1 or 0 ) in the lut to separate the two from trying to use the LUT at the same time. Nothing seemed to work. I turned led's on and off which seemed to show that the two processes were working at separate times, but data still got corrupted.

    I'm mainly wondering if the lut sharing SHOULD work ok with cogs not started with the:
        COGINIT #%1_1_0001,addr ‘start a pair of free cogs (lookup RAM sharing)
    
    function.

    and by the way... what happens when cog 2 and 3 are sharing the lut and another cog ( 1 or 4 ) is started and LUTSON is used? How do the cogs know that THEY are paired and others are excluded? The docs say that cogs must be adjacent but does it make a difference if it's 1 and 2 or 2 and 3?

    The docs say: "the COGINIT instruction has a mechanism for finding an even/odd pair and then starting them both with the same parameters. It will be necessary for the program to differentiate between even and odd cogs and possibly restart one, or both, with the final, intended program. To have COGINIT find and start two adjacent cogs, use %x_1_xxx1 for the D/# operand."

    Are there any examples out there on how to do these things?

    Same parameters? as in the same program start address?


  • Thanks guys, I didn't see your posts until after I finished my dinner and the write-up.
  • jmgjmg Posts: 13,611
    edited 2019-02-23 - 04:19:42
    kbash wrote: »
    ... The docs say that cogs must be adjacent but does it make a difference if it's 1 and 2 or 2 and 3?
    I think they are adjacent in the physical and binary sense, so 0 & 1 are possible, but 1 & 2 are not. ie they are hardwired as pairs ? (fixed for base 0 counting)

  • evanhevanh Posts: 7,285
    edited 2019-02-23 - 04:04:51
    Yes, as even/odd pairs. Four pairs numbered 0/1, 2/3, 4/5, 6/7. And as Oz said, they both get identical copies of running code so have to decide at runtime which is going to do which job.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • To quote the documentation, which is very precise:
    LOOKUP RAM SHARING BETWEEN PAIRED COGS

    Adjacent cogs whose ID numbers differ by only the LSB (cogs 0 and 1, 2 and 3, 4 and 5, etc.) can each allow their lookup RAMs to be written by the other cog via its local lookup RAM writes
  • One detail to get straight is what the LUTSON instruction means. It means you are enabling writes from the paired cog to you, not from you to it.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
Sign In or Register to comment.