I just talked to Wendy at ON Semi about this, though, and she is looking into what we must do to ensure that random data is not returned on a READ during a simultaneous write to the same location from the other port. She is going to call me back soon about this. If it's doable, I'll update the FPGA images, accordingly.
Thanks for bringing this up!!!
If there is a problem, having now seen the speed of the P2, I would be happy to forego the dual porting if needs be.
If there is a bugfix, any possibility of a JMPRET P1 style (or partial)?
The missing part is the CALL D,#/S where we want to write the 9bit return address without the C & Z bits so that can be placed into a JMP absolute instruction. A 20bit return would work but its the C & Z bits that destroy the return instruction.
If there is a problem, having now seen the speed of the P2, I would be happy to forego the dual porting if needs be.
Why forgo dual porting ?
The issue is one of same-clock-access, so a SW work-around exists of reading until the same answer occurs twice, if your design means this can occur.
I think this could be fixed in HW, as the apertures of registers are very small in P2.
What is the Minimum / Maximum Quartas versions for the 1-2-3 A9? I think 15.0 was safe.
If you want to flash this FPGA image to a P123-A9 board you can't use Quartus.
That board uses a custom Parallax loader (PX.EXE).
Set switch to "PGM"
px Prop123_A9_Prop2_v33g.rbf /p /4
Set switch to "RUN" when complete.
Thanks Brian. I'm new to the FPGA world. Just got the last 1-2-3 A9.
@cgracey
Loaded v33g.rbf. Blinkey works fine. PNut only reports 4 COGS. I was under the assumption the new A9 version was 8 COGs.
EDIT: Same response Pnut v32j. Eight green LEDs are blinking.
My mistake. I was assuming I had left off with version "A" in the ROM file, which would have indicated 8 cogs. As long as the memory is understood to be 512KB, though, there should be no problem.
In talking to Wendy some more, it was apparent that our needs for not glitching LUT reads during LUT writes were beyond what she could address via clock inversion and timing constraints. So, I made some Verilog changes to detect these r/w conflicts and pass the write data to the read port of the otherwise-victim.
I was able to produce the glitch condition on the current FPGA image, so it enabled me to write a work-around which has verified okay and Wendy now has the latest Verilog code.
I also slipped in the C=C change for 'GETCT reg WC'.
I even tried to replicate the JMP-event within a REP block, but couldn't get this case to fail. I need to find the link someone left about the cases which fail.
This program works okay, though:
dat org
getct t
.loop addct1 t,#100
rep @.r,#0 'inifinite REP block
jct1 #.out 'JCT1 happens every 100 clocks
drvnot #0
.r
drvnot #1 'never gets here
jmp #.loop
.out drvnot #2 'gets here every 100 clocks
jmp #.loop
t res 1
I'm hoping some of you will be able to verify that the LUT-sharing bug is now gone. This fix is good for the streamer, too, as it allows live updating of palette and DDS data without glitching.
I also slipped in the C=C change for 'GETCT reg WC'.
I really like this new C=C, don't know why exactly. Just out of interest, was it easier to (a) actually write C to C, or (b) disable write to C for GETCT?
This could be a path to extra functionality for certain instructions in the future, by using an otherwise redundant opcode bit without any side-effects.
I also slipped in the C=C change for 'GETCT reg WC'.
I really like this new C=C, don't know why exactly. Just out of interest, was it easier to (a) actually write C to C, or (b) disable write to C for GETCT?
This could be a path to extra functionality for certain instructions in the future, by using an otherwise redundant opcode bit without any side-effects.
It was much easier to just copy C, than to make the instruction not write C.
In talking to Wendy some more, it was apparent that our needs for not glitching LUT reads during LUT writes were beyond what she could address via clock inversion and timing constraints. So, I made some Verilog changes to detect these r/w conflicts and pass the write data to the read port of the otherwise-victim.
Good stuff! It explains why the default is not OLD_DATA.
I even tried to replicate the JMP-event within a REP block, but couldn't get this case to fail. I need to find the link someone left about the cases which fail.
This program works okay, though:
So it does! I'm blown away. More test cases to come I guess ... EDIT: Yay, I see Chip has found a reason for it.
The main benefit this change for these instruction pairings is together they then become like a 3-operand arrangement because the D field of the second instruction is still valid for its result address.
PS: Which is also why the idea of generalising it for the prop3 came up.
Thanks, Evanh. I looked all that over. I also looked at the Verilog code. I don't feel like this would be worth doing, at this point. Thanks for bringing it up, again, though.
I never got round to testing OUT to IN speed on the real chip. That was something I wasn't happy with in the FPGA. Forgot about it till now.
The pin drivers and input buffers on the FPGA are only a few nanosecond combined, but there were very long lags of maybe 30 ns of asynchronous transition coming back to IN from the prior OUT - In addition to clocked stages.
I never got round to testing OUT to IN speed on the real chip. That was something I wasn't happy with in the FPGA. Forgot about it till now.
The pin drivers and input buffers on the FPGA are only a few nanosecond combined, but there were very long lags of maybe 30 ns of asynchronous transition coming back to IN from the prior OUT - In addition to clocked stages.
The other area of Pin-core delay that needs to be checked, is the Xtal Buffer to PFD detector.
Ideally, that non-clocked path should be matched with a equal-gate-delay path in the counter feedback, (so they track) to avoid the PFD moving with temperature across a SysCLK threshold.
That mechanism may explain the observed temperature 'hot zones' for jitter issues.
On P2, as you mention, I have seen similar ten+ ns movement in Xtal to SysCLK pin vs temperature sweeps.
if PFD paths are matched, that also means external clocks will (mostly) keep pin-relative placement, and that will be important for application that clock P2 from a master clock, and expect P2 pins to keep exact relative time.
Comments
If there is a problem, having now seen the speed of the P2, I would be happy to forego the dual porting if needs be.
The missing part is the CALL D,#/S where we want to write the 9bit return address without the C & Z bits so that can be placed into a JMP absolute instruction. A 20bit return would work but its the C & Z bits that destroy the return instruction.
https://forums.parallax.com/discussion/169438/rep-blocks-and-branching-issue
Why forgo dual porting ?
The issue is one of same-clock-access, so a SW work-around exists of reading until the same answer occurs twice, if your design means this can occur.
I think this could be fixed in HW, as the apertures of registers are very small in P2.
If you want to flash this FPGA image to a P123-A9 board you can't use Quartus.
That board uses a custom Parallax loader (PX.EXE).
Set switch to "PGM" Set switch to "RUN" when complete.
Thanks Brian. I'm new to the FPGA world. Just got the last 1-2-3 A9.
@cgracey
Loaded v33g.rbf. Blinkey works fine. PNut only reports 4 COGS. I was under the assumption the new A9 version was 8 COGs.
EDIT: Same response Pnut v32j. Eight green LEDs are blinking.
My mistake. I was assuming I had left off with version "A" in the ROM file, which would have indicated 8 cogs. As long as the memory is understood to be 512KB, though, there should be no problem.
I was able to produce the glitch condition on the current FPGA image, so it enabled me to write a work-around which has verified okay and Wendy now has the latest Verilog code.
I also slipped in the C=C change for 'GETCT reg WC'.
I even tried to replicate the JMP-event within a REP block, but couldn't get this case to fail. I need to find the link someone left about the cases which fail.
This program works okay, though:
I'm hoping some of you will be able to verify that the LUT-sharing bug is now gone. This fix is good for the streamer, too, as it allows live updating of palette and DDS data without glitching.
https://forums.parallax.com/discussion/169438/rep-blocks-and-branching-issue/p1
evanh's code for the bug is here:
https://forums.parallax.com/discussion/comment/1458393/#Comment_1458393
I really like this new C=C, don't know why exactly. Just out of interest, was it easier to (a) actually write C to C, or (b) disable write to C for GETCT?
This could be a path to extra functionality for certain instructions in the future, by using an otherwise redundant opcode bit without any side-effects.
It was much easier to just copy C, than to make the instruction not write C.
So it does! I'm blown away. More test cases to come I guess ... EDIT: Yay, I see Chip has found a reason for it.
There was an idea or two I had but they weren't of much significance, or too big.
Yes. I looked into this. It's doable, but I wasn't convinced of its benefit. Could you please refresh me on this? A link would do. Thanks, Evanh.
Tony felt it was a good idea - https://forums.parallax.com/discussion/comment/1461517/#Comment_1461517
PS: Which is also why the idea of generalising it for the prop3 came up.
Sounds like you're almost there with the verilog
How many days before the component values (adc cap etc) get tweaked?
Thanks, Evanh. I looked all that over. I also looked at the Verilog code. I don't feel like this would be worth doing, at this point. Thanks for bringing it up, again, though.
That has to happen soon, maybe within the next week.
The pin drivers and input buffers on the FPGA are only a few nanosecond combined, but there were very long lags of maybe 30 ns of asynchronous transition coming back to IN from the prior OUT - In addition to clocked stages.
EDIT: I've found the prior effort - https://forums.parallax.com/discussion/comment/1439248/#Comment_1439248
and https://forums.parallax.com/discussion/comment/1430499/#Comment_1430499
and where it all started from: http://forums.parallax.com/discussion/comment/1426018/#Comment_1426018
The other area of Pin-core delay that needs to be checked, is the Xtal Buffer to PFD detector.
Ideally, that non-clocked path should be matched with a equal-gate-delay path in the counter feedback, (so they track) to avoid the PFD moving with temperature across a SysCLK threshold.
That mechanism may explain the observed temperature 'hot zones' for jitter issues.
On P2, as you mention, I have seen similar ten+ ns movement in Xtal to SysCLK pin vs temperature sweeps.
if PFD paths are matched, that also means external clocks will (mostly) keep pin-relative placement, and that will be important for application that clock P2 from a master clock, and expect P2 pins to keep exact relative time.
If you can, please verify that the LUT-sharing bug is fixed, as well as the JMP-event-within-REP bug.
Thanks.
V33i LUT sharing tests Ok here.
Thanks, Brian. And this is a difference in behavior from before, right?
Here's the Silicon results showing the "glitch"