Shop OBEX P1 Docs P2 Docs Learn Events
Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i - Page 155 — Parallax Forums

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

1152153155157158160

Comments

  • evanhevanh Posts: 15,126
    edited 2019-01-15 04:49
    Yep, two unrelated problems that I'd bumped into with the one code loop. Actually there was also the branch at last line of REP block problem too, but Chip has clarified that that is considered an illegal operation.

    The corruption problem, #2, is as I've said. It continuously occurs when synchronously reading at a particular phase to the writes. That problem doesn't occur otherwise. I don't have other examples.

  • evanhevanh Posts: 15,126
    edited 2019-01-20 00:24
    Chip,
    Got another tiny improvement request: For async receive, smartpin mode %11111, I'd like RDPIN reg,pin WC to make C same as IN state. Currently, C is always zero I think.
  • cgraceycgracey Posts: 14,133
    Keep cutting
    evanh wrote: »
    Chip,
    Got another tiny improvement request: For async receive, smartpin mode %11111, I'd like RDPIN reg,pin WC to make C same as IN state. Currently, C is always zero I think.

    Ok. I will look into that tonight.
  • evanhevanh Posts: 15,126
    Chip,
    I've started trying to duplicate the lut sharing issue with single pass test by aligning read and write on the same CT number.

    I've found something else: There's a one clock difference between the v32i FPGA and P2ES silicon when reading a written lutRAM location. Both tested at 20 MHz.
    '----- test lut sharing dual-port glitch -----
    		getct   ticks
    		add     ticks, sec
    
    		lutson
    		setq    ticks
    		coginit #1, ##@start_lut_test
    
    		addct1  ticks, #3     '<-------------- HERE --- #3 is need for P2ES, #2 is needed for FPGA
    		waitct1
    
    		getct   ticks
    		rdlut   pa, #$1ff
    		rdlut   i, #$1ff
    		rdlut   j, #$1ff
    		rdlut   k, #$1ff
    		getct   temp
    ...
    
    
    
    '============================
    ORG
    start_lut_test
    		wrlut   #0, #$1ff
    		mov     pa, ##$deadbeef
    		addct1  ptra, #4
    		waitct1
    		wrlut   pa, #$1ff
    
    		cogid   pa
    		cogstop pa
    '============================
    
  • evanhevanh Posts: 15,126
    edited 2019-01-26 16:16
    Oh, that's a revelation. It depends on setting vs resetting of the bits. When resetting bits, both timings are the same at #2, so not really a logic difference at all.

    Okay, resetting bits is when the glitching shows it's face the most. It's actually got a worse case of bit-mashing on the P2ES silicon than the FPGA.

    EDIT: Here's output I get from the P2ES board:
    lut $1ff = ffffffff   4a41febf   00000000   00000000    14
    

    Attached is full source (need to uncomment the HUBSET's in Set Xtal for P2ES silicon):
  • jmgjmg Posts: 15,140
    Trying to follow this - I think you are saying
    The shared access is ok when Old-> New is =\_ (same on FPGA and P2ES)
    When Old -> New is _/= there is corruption, and an extra clock delay in needed in P2ES ?

    If P2ES reads again, is the data correct (ie is it read-side corruption, or write-side corruption ?)
    Is any data not being read affected (cross address corruption) ?
    Does this behave the same for all COG pairs ?
    20MHz should be well inside timing, even on FPGA, but you could try 10MHz & 40MHz on P2ES to see if they change.
    Does heat affect this ?
  • evanhevanh Posts: 15,126
    JMG,
    Give that code a run.
  • ozpropdevozpropdev Posts: 2,791
    edited 2019-01-28 06:52
    @evanh
    Confirming your findings, here's what I found in my own testing.
    The glitch occurs when then RDLUT is +2 clocks after the WRLUT.
    Offset  Original New
    -4      FFFFFFFF FFFFFFFF
    -3      FFFFFFFF FFFFFFFF
    -2      FFFFFFFF FFFFFFFF
    -1      FFFFFFFF FFFFFFFF
    +0      FFFFFFFF FFFFFFFF
    +1      FFFFFFFF FFFFFFFF
    +2      FFFFFFFF 09009DFF	'glitch
    +3      FFFFFFFF 00000000
    +4      FFFFFFFF 00000000
    
    Offset  Original New
    -4      00000000 00000000
    -3      00000000 00000000
    -2      00000000 00000000
    -1      00000000 00000000
    +0      00000000 00000000
    +1      00000000 00000000
    +2      00000000 00000000
    +3      00000000 FFFFFFFF
    +4      00000000 FFFFFFFF
    
    Offset  Original New
    -4      55555555 55555555
    -3      55555555 55555555
    -2      55555555 55555555
    -1      55555555 55555555
    +0      55555555 55555555
    +1      55555555 55555555
    +2      55555555 01005555	'glitch
    +3      55555555 AAAAAAAA
    +4      55555555 AAAAAAAA
    
    Offset  Original New
    -4      AAAAAAAA AAAAAAAA
    -3      AAAAAAAA AAAAAAAA
    -2      AAAAAAAA AAAAAAAA
    -1      AAAAAAAA AAAAAAAA
    +0      AAAAAAAA AAAAAAAA
    +1      AAAAAAAA AAAAAAAA
    +2      AAAAAAAA 88288AAA	'glitch
    +3      AAAAAAAA 55555555
    +4      AAAAAAAA 55555555
    
  • evanhevanh Posts: 15,126
    Thanks Brian. I see the $55/$AA combos show it even more.
  • Here's the code I used for the above tests.
    I used COGATN/WAITATN to sync the shared lut activity.
  • evanhevanh Posts: 15,126
    I had no idea CALLPA could do that! I'm thieving it! :)
  • evanh wrote: »
    I had no idea CALLPA could do that! I'm thieving it! :)

    :lol: Send royalty payment to cgracey @ Red Bluff!
  • evanhevanh Posts: 15,126
    edited 2019-01-27 04:33
    Bugger, I just realised I'm always using long calls. CALLPA can't do those directly. :(

    EDIT: And fastspin has no error/warning for it either.
  • jmgjmg Posts: 15,140
    ozpropdev wrote: »
    @evanh
    Confirming your findings, here's what I found in my own testing.
    The glitch occurs when then RDLUT is +2 clocks after the WRLUT.
    Because this looks more like an aperture effect, it's less a 'glitch' and more a transitional ambiguity.
    Does that transitional ambiguity occur independent (largely?) of SysCLK speed ?
  • jmg wrote: »
    ozpropdev wrote: »
    @evanh
    Confirming your findings, here's what I found in my own testing.
    The glitch occurs when then RDLUT is +2 clocks after the WRLUT.
    Because this looks more like an aperture effect, it's less a 'glitch' and more a transitional ambiguity.
    Does that transitional ambiguity occur independent (largely?) of SysCLK speed ?
    Ran from 20 MHz up to 350 MHz, issue remains the same at +2 clock offset.

  • jmgjmg Posts: 15,140
    ozpropdev wrote: »
    Ran from 20 MHz up to 350 MHz, issue remains the same at +2 clock offset.
    Cool thanks, that does not sound like a hard to nail down difference delay effect, just an aperture effect where the change occurs on the same edge/window as the sample.
    Becomes the same as a group of D-FF's with fail tsu/th, so some capture new data and some capture old data.

    I'm not sure how much Chip can vary here, & if there is enough margin to shift the RD/WR to opposite clock edges , or that might be a latch, that holds over a change.

  • @cgracey
    I know you've explained it before somewhere on the forum but what was the reason behind WRLUT taking 2 clocks and RDLUT taking 3 clocks?
    Is this somehow related to this LUT share issue?
  • evanh wrote: »
    Bugger, I just realised I'm always using long calls. CALLPA can't do those directly. :(

    EDIT: And fastspin has no error/warning for it either.

    Could you give me an example? When I tried to reproduce this with:
    dat
    	org 0
    	callpa #1, #faraway
    	callpa #2, #\faraway
    	orgh $400
    	callpa #20, #faraway
    	long 0[512]
    faraway
    	jmp	#faraway
    
    fastspin gave me:
    foo.spin2(3) error: Source out of range for relative branch callpa
    foo.spin2(4) error: Absolute address not valid for callpa
    foo.spin2(6) error: Source out of range for relative branch callpa
    
  • evanhevanh Posts: 15,126
    edited 2019-01-27 16:47
    I seem to have found the one hole of hubexec calling lutexec:
    dat
    org 0
    	callpa #1, #faraway
    	callpa #2, #\faraway
    	callpa #3, #lutaway
    
    orgh $400
    	callpa #20, #faraway
    	callpa #21, #lutaway
    	long 0[512]
    faraway
    	jmp	#$
    
    org $200
    lutaway
    	jmp	#$
    

    The final CALLPA is illegal but gives no error:
    Version 3.9.15 Compiled on: Jan 21 2019
    callpa.spin2
    callpa.spin2(3) error: Source out of range for relative branch callpa
    callpa.spin2(4) error: Absolute address not valid for callpa
    callpa.spin2(5) error: Source out of range for relative branch callpa
    callpa.spin2(8) error: Source out of range for relative branch callpa
    
  • evanhevanh Posts: 15,126
    Correction, it seems to be all down to a relative calculation thing:
    dat
    org 0
    	callpa #1, #faraway
    	callpa #2, #\faraway
    cogaway
    	callpa #3, #lutaway
    
    orgh $400
    	callpa #20, #faraway
    	callpa #21, #cogaway
    	callpa #22, #lutaway
    	long 0[512]
    faraway
    	callpa #30, #cogaway
    	callpa #31, #lutaway
    
    org $200
    lutaway
    	callpa #40, #faraway
    	callpa #41, #cogaway
    

    Here, only lines 10 and 11 don't create an error:
    callpa.spin2(3) error: Source out of range for relative branch callpa
    callpa.spin2(4) error: Absolute address not valid for callpa
    callpa.spin2(6) error: Source out of range for relative branch callpa
    callpa.spin2(9) error: Source out of range for relative branch callpa
    callpa.spin2(14) error: Source out of range for relative branch callpa
    callpa.spin2(15) error: Source out of range for relative branch callpa
    callpa.spin2(19) error: Source out of range for relative branch callpa
    callpa.spin2(20) error: Source out of range for relative branch callpa
    
  • evanhevanh Posts: 15,126
    edited 2019-01-27 17:28
    Of note, relative addressing is illegal when crossing domains. Therefore all the out-of-range errors are kind of wrong too.
  • evanhevanh Posts: 15,126
    edited 2019-01-27 18:18
    ozpropdev wrote: »
    @cgracey
    I know you've explained it before somewhere on the forum but what was the reason behind WRLUT taking 2 clocks and RDLUT taking 3 clocks?
    Is this somehow related to this LUT share issue?

    No would be the short answer.

    The SRAM dual-porting function should be handled independently of processor operations. The two cogs are accessing the lutRAM on separate buses. Or at least are supposed to be afaik.

  • evanh wrote: »
    Here, only lines 10 and 11 don't create an error:

    Wow, that is very weird. Well, it's definitely a bug -- thanks for finding it and sending the reproducer. I'll try to figure out what's going on.
  • cgraceycgracey Posts: 14,133
    ozpropdev wrote: »
    @cgracey
    I know you've explained it before somewhere on the forum but what was the reason behind WRLUT taking 2 clocks and RDLUT taking 3 clocks?
    Is this somehow related to this LUT share issue?

    The RDLUT takes three clocks because it must do the read command, the data latch, and the result mux, which each take a clock.

    The WRLUT takes only two clocks because it must do the write command and then take one more clock to finish the minimum instruction cycle.

    This doesn't have anything to do with LUT sharing.
  • cgraceycgracey Posts: 14,133
    edited 2019-01-28 05:32
    So, is there a problem with LUT sharing, where writing on one port will always cause a read corruption when reading on the other port?

    Sorry I'm behind the curve here.
  • jmgjmg Posts: 15,140
    edited 2019-01-28 05:39
    cgracey wrote: »
    So, is there a problem with LUT sharing, where writing on one port will always cause a read corruption when reading on the other port?

    Sorry I'm behind the curve here.

    Yes, see the post with test results tabulated http://forums.parallax.com/discussion/comment/1462738/#Comment_1462738
    There is a single clock cycle critical timing alignment where this occurs, largely independent of SysCLK.
    Present on both FPGA and P2 silicon.
    Sounds like data is being changed, on the same clk edge it is being read, without enough tsu.th margin.
  • evanhevanh Posts: 15,126
    edited 2019-01-28 07:08
    Chip,
    I remember bringing this up some months back. There is a setting in the Altera megafunction ALTSYNCRAM that looks like it needs to be setup correctly to sort this issue. I'm guessing ALTSYNCRAM is the building block you've used in the FPGA design.

    Parameter name is: READ_DURING_WRITE_MODE_MIXED_PORTS
    Description is: Whats the expected output when reading and writing at the same address through different ports ?. Values are "OLD_DATA" or "DONT_CARE"(default)

    And as you can see the default is DONT_CARE. I'm guessing it still needs changed to OLD_DATA for all your dual-port RAMs.

    And presumably this parameter is also used by OnSemi.
  • evanhevanh Posts: 15,126
    Actually, someone else might have suggested SRAM configuration first but I do remember looking it up and pondering on here about it and as to whether it would affect OnSemi.
  • evanhevanh Posts: 15,126
    Chip,
    I've got a setup here with sync serial output responding within 3 clocks of an external clock source. I've been using another smartpin to produce the clock source so that on the scope the timing is about 3 clocks. That's for 20 - 60 MHz on the FPGA. However ...

    The reaffirming issue here is that with the latest v33i FPGA at 80 MHz it still leaps another clock the same way as my much older measurements using software and GETCT.

    I've carefully looked at it on the scope. The output timing of a non-registering pin stays within a nanosecond of the phase of a pad-ring registered pin. To me this says the output circuit is not any issue.

    So the problem has to be early in the input path, I'm guessing between the pad-ring and first verilog register stage.

    PS: Turning on pad-ring registering doesn't help at all. It just adds two sysclocks to the lag.
  • evanhevanh Posts: 15,126
    Here's two screenshots of the scope with registered and unregistered output on the blue trace with the FPGA operating at 80 MHz.

    Green trace is transition smartpin mode without registering. It is the clock input for the sync serial smartpin.
    Orange trace is sync serial smartpin mode also without registering.
    Blue trace is OTHER inverted output paired to the pin of the green trace. This probe is missing its ground clip so the trace wobbles way more than the other two.

    Unregistered lag from green rising to blue falling is about 3 ns.
    Registered lag from green rising to blue falling is about 14.5 ns.
    The difference is 11.5 which is 1.0 ns short of ideal. No issue there.
    640 x 480 - 11K
    640 x 480 - 11K
Sign In or Register to comment.