Yep, two unrelated problems that I'd bumped into with the one code loop. Actually there was also the branch at last line of REP block problem too, but Chip has clarified that that is considered an illegal operation.
The corruption problem, #2, is as I've said. It continuously occurs when synchronously reading at a particular phase to the writes. That problem doesn't occur otherwise. I don't have other examples.
Chip,
Got another tiny improvement request: For async receive, smartpin mode %11111, I'd like RDPIN reg,pin WC to make C same as IN state. Currently, C is always zero I think.
Chip,
Got another tiny improvement request: For async receive, smartpin mode %11111, I'd like RDPIN reg,pin WC to make C same as IN state. Currently, C is always zero I think.
Chip,
I've started trying to duplicate the lut sharing issue with single pass test by aligning read and write on the same CT number.
I've found something else: There's a one clock difference between the v32i FPGA and P2ES silicon when reading a written lutRAM location. Both tested at 20 MHz.
'----- test lut sharing dual-port glitch -----
getct ticks
add ticks, sec
lutson
setq ticks
coginit #1, ##@start_lut_test
addct1 ticks, #3 '<-------------- HERE --- #3 is need for P2ES, #2 is needed for FPGA
waitct1
getct ticks
rdlut pa, #$1ff
rdlut i, #$1ff
rdlut j, #$1ff
rdlut k, #$1ff
getct temp
...
'============================
ORG
start_lut_test
wrlut #0, #$1ff
mov pa, ##$deadbeef
addct1 ptra, #4
waitct1
wrlut pa, #$1ff
cogid pa
cogstop pa
'============================
Oh, that's a revelation. It depends on setting vs resetting of the bits. When resetting bits, both timings are the same at #2, so not really a logic difference at all.
Okay, resetting bits is when the glitching shows it's face the most. It's actually got a worse case of bit-mashing on the P2ES silicon than the FPGA.
EDIT: Here's output I get from the P2ES board:
lut $1ff = ffffffff 4a41febf 00000000 00000000 14
Attached is full source (need to uncomment the HUBSET's in Set Xtal for P2ES silicon):
Trying to follow this - I think you are saying
The shared access is ok when Old-> New is =\_ (same on FPGA and P2ES)
When Old -> New is _/= there is corruption, and an extra clock delay in needed in P2ES ?
If P2ES reads again, is the data correct (ie is it read-side corruption, or write-side corruption ?)
Is any data not being read affected (cross address corruption) ?
Does this behave the same for all COG pairs ?
20MHz should be well inside timing, even on FPGA, but you could try 10MHz & 40MHz on P2ES to see if they change.
Does heat affect this ?
@evanh
Confirming your findings, here's what I found in my own testing.
The glitch occurs when then RDLUT is +2 clocks after the WRLUT.
Because this looks more like an aperture effect, it's less a 'glitch' and more a transitional ambiguity.
Does that transitional ambiguity occur independent (largely?) of SysCLK speed ?
@evanh
Confirming your findings, here's what I found in my own testing.
The glitch occurs when then RDLUT is +2 clocks after the WRLUT.
Because this looks more like an aperture effect, it's less a 'glitch' and more a transitional ambiguity.
Does that transitional ambiguity occur independent (largely?) of SysCLK speed ?
Ran from 20 MHz up to 350 MHz, issue remains the same at +2 clock offset.
Ran from 20 MHz up to 350 MHz, issue remains the same at +2 clock offset.
Cool thanks, that does not sound like a hard to nail down difference delay effect, just an aperture effect where the change occurs on the same edge/window as the sample.
Becomes the same as a group of D-FF's with fail tsu/th, so some capture new data and some capture old data.
I'm not sure how much Chip can vary here, & if there is enough margin to shift the RD/WR to opposite clock edges , or that might be a latch, that holds over a change.
@cgracey
I know you've explained it before somewhere on the forum but what was the reason behind WRLUT taking 2 clocks and RDLUT taking 3 clocks?
Is this somehow related to this LUT share issue?
Bugger, I just realised I'm always using long calls. CALLPA can't do those directly.
EDIT: And fastspin has no error/warning for it either.
Could you give me an example? When I tried to reproduce this with:
dat
org 0
callpa #1, #faraway
callpa #2, #\faraway
orgh $400
callpa #20, #faraway
long 0[512]
faraway
jmp #faraway
fastspin gave me:
foo.spin2(3) error: Source out of range for relative branch callpa
foo.spin2(4) error: Absolute address not valid for callpa
foo.spin2(6) error: Source out of range for relative branch callpa
Version 3.9.15 Compiled on: Jan 21 2019
callpa.spin2
callpa.spin2(3) error: Source out of range for relative branch callpa
callpa.spin2(4) error: Absolute address not valid for callpa
callpa.spin2(5) error: Source out of range for relative branch callpa
callpa.spin2(8) error: Source out of range for relative branch callpa
callpa.spin2(3) error: Source out of range for relative branch callpa
callpa.spin2(4) error: Absolute address not valid for callpa
callpa.spin2(6) error: Source out of range for relative branch callpa
callpa.spin2(9) error: Source out of range for relative branch callpa
callpa.spin2(14) error: Source out of range for relative branch callpa
callpa.spin2(15) error: Source out of range for relative branch callpa
callpa.spin2(19) error: Source out of range for relative branch callpa
callpa.spin2(20) error: Source out of range for relative branch callpa
@cgracey
I know you've explained it before somewhere on the forum but what was the reason behind WRLUT taking 2 clocks and RDLUT taking 3 clocks?
Is this somehow related to this LUT share issue?
No would be the short answer.
The SRAM dual-porting function should be handled independently of processor operations. The two cogs are accessing the lutRAM on separate buses. Or at least are supposed to be afaik.
@cgracey
I know you've explained it before somewhere on the forum but what was the reason behind WRLUT taking 2 clocks and RDLUT taking 3 clocks?
Is this somehow related to this LUT share issue?
The RDLUT takes three clocks because it must do the read command, the data latch, and the result mux, which each take a clock.
The WRLUT takes only two clocks because it must do the write command and then take one more clock to finish the minimum instruction cycle.
This doesn't have anything to do with LUT sharing.
So, is there a problem with LUT sharing, where writing on one port will always cause a read corruption when reading on the other port?
Sorry I'm behind the curve here.
Yes, see the post with test results tabulated http://forums.parallax.com/discussion/comment/1462738/#Comment_1462738
There is a single clock cycle critical timing alignment where this occurs, largely independent of SysCLK.
Present on both FPGA and P2 silicon.
Sounds like data is being changed, on the same clk edge it is being read, without enough tsu.th margin.
Chip,
I remember bringing this up some months back. There is a setting in the Altera megafunction ALTSYNCRAM that looks like it needs to be setup correctly to sort this issue. I'm guessing ALTSYNCRAM is the building block you've used in the FPGA design.
Parameter name is: READ_DURING_WRITE_MODE_MIXED_PORTS
Description is: Whats the expected output when reading and writing at the same address through different ports ?. Values are "OLD_DATA" or "DONT_CARE"(default)
And as you can see the default is DONT_CARE. I'm guessing it still needs changed to OLD_DATA for all your dual-port RAMs.
And presumably this parameter is also used by OnSemi.
Actually, someone else might have suggested SRAM configuration first but I do remember looking it up and pondering on here about it and as to whether it would affect OnSemi.
Chip,
I've got a setup here with sync serial output responding within 3 clocks of an external clock source. I've been using another smartpin to produce the clock source so that on the scope the timing is about 3 clocks. That's for 20 - 60 MHz on the FPGA. However ...
The reaffirming issue here is that with the latest v33i FPGA at 80 MHz it still leaps another clock the same way as my much older measurements using software and GETCT.
I've carefully looked at it on the scope. The output timing of a non-registering pin stays within a nanosecond of the phase of a pad-ring registered pin. To me this says the output circuit is not any issue.
So the problem has to be early in the input path, I'm guessing between the pad-ring and first verilog register stage.
PS: Turning on pad-ring registering doesn't help at all. It just adds two sysclocks to the lag.
Here's two screenshots of the scope with registered and unregistered output on the blue trace with the FPGA operating at 80 MHz.
Green trace is transition smartpin mode without registering. It is the clock input for the sync serial smartpin.
Orange trace is sync serial smartpin mode also without registering.
Blue trace is OTHER inverted output paired to the pin of the green trace. This probe is missing its ground clip so the trace wobbles way more than the other two.
Unregistered lag from green rising to blue falling is about 3 ns.
Registered lag from green rising to blue falling is about 14.5 ns.
The difference is 11.5 which is 1.0 ns short of ideal. No issue there.
Comments
The corruption problem, #2, is as I've said. It continuously occurs when synchronously reading at a particular phase to the writes. That problem doesn't occur otherwise. I don't have other examples.
Got another tiny improvement request: For async receive, smartpin mode %11111, I'd like RDPIN reg,pin WC to make C same as IN state. Currently, C is always zero I think.
Ok. I will look into that tonight.
I've started trying to duplicate the lut sharing issue with single pass test by aligning read and write on the same CT number.
I've found something else: There's a one clock difference between the v32i FPGA and P2ES silicon when reading a written lutRAM location. Both tested at 20 MHz.
Okay, resetting bits is when the glitching shows it's face the most. It's actually got a worse case of bit-mashing on the P2ES silicon than the FPGA.
EDIT: Here's output I get from the P2ES board:
Attached is full source (need to uncomment the HUBSET's in Set Xtal for P2ES silicon):
The shared access is ok when Old-> New is =\_ (same on FPGA and P2ES)
When Old -> New is _/= there is corruption, and an extra clock delay in needed in P2ES ?
If P2ES reads again, is the data correct (ie is it read-side corruption, or write-side corruption ?)
Is any data not being read affected (cross address corruption) ?
Does this behave the same for all COG pairs ?
20MHz should be well inside timing, even on FPGA, but you could try 10MHz & 40MHz on P2ES to see if they change.
Does heat affect this ?
Give that code a run.
Confirming your findings, here's what I found in my own testing.
The glitch occurs when then RDLUT is +2 clocks after the WRLUT.
I used COGATN/WAITATN to sync the shared lut activity.
Send royalty payment to cgracey @ Red Bluff!
EDIT: And fastspin has no error/warning for it either.
Does that transitional ambiguity occur independent (largely?) of SysCLK speed ?
Becomes the same as a group of D-FF's with fail tsu/th, so some capture new data and some capture old data.
I'm not sure how much Chip can vary here, & if there is enough margin to shift the RD/WR to opposite clock edges , or that might be a latch, that holds over a change.
I know you've explained it before somewhere on the forum but what was the reason behind WRLUT taking 2 clocks and RDLUT taking 3 clocks?
Is this somehow related to this LUT share issue?
Could you give me an example? When I tried to reproduce this with: fastspin gave me:
The final CALLPA is illegal but gives no error:
Here, only lines 10 and 11 don't create an error:
No would be the short answer.
The SRAM dual-porting function should be handled independently of processor operations. The two cogs are accessing the lutRAM on separate buses. Or at least are supposed to be afaik.
Wow, that is very weird. Well, it's definitely a bug -- thanks for finding it and sending the reproducer. I'll try to figure out what's going on.
The RDLUT takes three clocks because it must do the read command, the data latch, and the result mux, which each take a clock.
The WRLUT takes only two clocks because it must do the write command and then take one more clock to finish the minimum instruction cycle.
This doesn't have anything to do with LUT sharing.
Sorry I'm behind the curve here.
Yes, see the post with test results tabulated http://forums.parallax.com/discussion/comment/1462738/#Comment_1462738
There is a single clock cycle critical timing alignment where this occurs, largely independent of SysCLK.
Present on both FPGA and P2 silicon.
Sounds like data is being changed, on the same clk edge it is being read, without enough tsu.th margin.
I remember bringing this up some months back. There is a setting in the Altera megafunction ALTSYNCRAM that looks like it needs to be setup correctly to sort this issue. I'm guessing ALTSYNCRAM is the building block you've used in the FPGA design.
Parameter name is: READ_DURING_WRITE_MODE_MIXED_PORTS
Description is: Whats the expected output when reading and writing at the same address through different ports ?. Values are "OLD_DATA" or "DONT_CARE"(default)
And as you can see the default is DONT_CARE. I'm guessing it still needs changed to OLD_DATA for all your dual-port RAMs.
And presumably this parameter is also used by OnSemi.
I've got a setup here with sync serial output responding within 3 clocks of an external clock source. I've been using another smartpin to produce the clock source so that on the scope the timing is about 3 clocks. That's for 20 - 60 MHz on the FPGA. However ...
The reaffirming issue here is that with the latest v33i FPGA at 80 MHz it still leaps another clock the same way as my much older measurements using software and GETCT.
I've carefully looked at it on the scope. The output timing of a non-registering pin stays within a nanosecond of the phase of a pad-ring registered pin. To me this says the output circuit is not any issue.
So the problem has to be early in the input path, I'm guessing between the pad-ring and first verilog register stage.
PS: Turning on pad-ring registering doesn't help at all. It just adds two sysclocks to the lag.
Green trace is transition smartpin mode without registering. It is the clock input for the sync serial smartpin.
Orange trace is sync serial smartpin mode also without registering.
Blue trace is OTHER inverted output paired to the pin of the green trace. This probe is missing its ground clip so the trace wobbles way more than the other two.
Unregistered lag from green rising to blue falling is about 3 ns.
Registered lag from green rising to blue falling is about 14.5 ns.
The difference is 11.5 which is 1.0 ns short of ideal. No issue there.