New FPGA files for next silicon version - 5th/final release - contains new ROM!!

Cluso99 · 2019-01-31 00:14

cgracey wrote: »

evanh wrote: »

Chip,
Have you set the dual-port SRAM parameter, READ_DURING_WRITE_MODE_MIXED_PORTS? See https://forums.parallax.com/discussion/comment/1462814/#Comment_1462814

I forgot!

I just talked to Wendy at ON Semi about this, though, and she is looking into what we must do to ensure that random data is not returned on a READ during a simultaneous write to the same location from the other port. She is going to call me back soon about this. If it's doable, I'll update the FPGA images, accordingly.

Thanks for bringing this up!!!

If there is a problem, having now seen the speed of the P2, I would be happy to forego the dual porting if needs be.

Cluso99 · 2019-01-31 00:19

If there is a bugfix, any possibility of a JMPRET P1 style (or partial)?

The missing part is the CALL D,#/S where we want to write the 9bit return address without the C & Z bits so that can be placed into a JMP absolute instruction. A 20bit return would work but its the C & Z bits that destroy the return instruction.

ozpropdev · 2019-01-31 00:25

evanh wrote: »

Well, the problem with event branching, Jxx, instructions within a REP block counts as a design bug. I never saw any fix mentioned for those. See https://forums.parallax.com/discussion/comment/1459273/#Comment_1459273

See some extra notes here
https://forums.parallax.com/discussion/169438/rep-blocks-and-branching-issue

evanh · 2019-01-31 00:34

Thanks Brian, I'd lost that one. I'll add it to the traps post ...

jmg · 2019-01-31 00:40

Cluso99 wrote: »

If there is a problem, having now seen the speed of the P2, I would be happy to forego the dual porting if needs be.

Why forgo dual porting ?
The issue is one of same-clock-access, so a SW work-around exists of reading until the same answer occurs twice, if your design means this can occur.
I think this could be fixed in HW, as the apertures of registers are very small in P2.

ozpropdev · 2019-01-31 00:40

Publison wrote: »

What is the Minimum / Maximum Quartas versions for the 1-2-3 A9? I think 15.0 was safe.

If you want to flash this FPGA image to a P123-A9 board you can't use Quartus.
That board uses a custom Parallax loader (PX.EXE).

Set switch to "PGM"

px Prop123_A9_Prop2_v33g.rbf /p /4

Set switch to "RUN" when complete.

Publison · 2019-01-31 14:18

ozpropdev wrote: »
Publison wrote: »

What is the Minimum / Maximum Quartas versions for the 1-2-3 A9? I think 15.0 was safe.

If you want to flash this FPGA image to a P123-A9 board you can't use Quartus.
That board uses a custom Parallax loader (PX.EXE).

Set switch to "PGM"
px Prop123_A9_Prop2_v33g.rbf /p /4
Set switch to "RUN" when complete.

Thanks Brian. I'm new to the FPGA world. Just got the last 1-2-3 A9.

@cgracey

Loaded v33g.rbf. Blinkey works fine. PNut only reports 4 COGS. I was under the assumption the new A9 version was 8 COGs.

EDIT: Same response Pnut v32j. Eight green LEDs are blinking.

cgracey · 2019-01-31 16:56

Publison wrote: »
ozpropdev wrote: »
Publison wrote: »

What is the Minimum / Maximum Quartas versions for the 1-2-3 A9? I think 15.0 was safe.

If you want to flash this FPGA image to a P123-A9 board you can't use Quartus.
That board uses a custom Parallax loader (PX.EXE).

Set switch to "PGM"
px Prop123_A9_Prop2_v33g.rbf /p /4
Set switch to "RUN" when complete.
Thanks Brian. I'm new to the FPGA world. Just got the last 1-2-3 A9.

@cgracey

Loaded v33g.rbf. Blinkey works fine. PNut only reports 4 COGS. I was under the assumption the new A9 version was 8 COGs.

EDIT: Same response Pnut v32j. Eight green LEDs are blinking.

My mistake. I was assuming I had left off with version "A" in the ROM file, which would have indicated 8 cogs. As long as the memory is understood to be 512KB, though, there should be no problem.

Publison · 2019-01-31 17:01

Thanks Chip.

cgracey · 2019-01-31 20:11

In talking to Wendy some more, it was apparent that our needs for not glitching LUT reads during LUT writes were beyond what she could address via clock inversion and timing constraints. So, I made some Verilog changes to detect these r/w conflicts and pass the write data to the read port of the otherwise-victim.

I was able to produce the glitch condition on the current FPGA image, so it enabled me to write a work-around which has verified okay and Wendy now has the latest Verilog code.

I also slipped in the C=C change for 'GETCT reg WC'.

I even tried to replicate the JMP-event within a REP block, but couldn't get this case to fail. I need to find the link someone left about the cases which fail.

This program works okay, though:

dat		org

		getct	t
.loop		addct1	t,#100

		rep	@.r,#0		'inifinite REP block
		jct1	#.out		'JCT1 happens every 100 clocks
		drvnot	#0
.r
		drvnot	#1		'never gets here
		jmp	#.loop

.out		drvnot	#2		'gets here every 100 clocks
		jmp	#.loop


t		res	1

cgracey · 2019-01-31 20:20

I'm recompiling some FPGA images.

I'm hoping some of you will be able to verify that the LUT-sharing bug is now gone. This fix is good for the streamer, too, as it allows live updating of palette and DDS data without glitching.

Rayman · 2019-01-31 20:55

Here's ozpropdev's code for the REP bug:
https://forums.parallax.com/discussion/169438/rep-blocks-and-branching-issue/p1

evanh's code for the bug is here:
https://forums.parallax.com/discussion/comment/1458393/#Comment_1458393

TonyB_ · 2019-01-31 21:12

cgracey wrote: »

I also slipped in the C=C change for 'GETCT reg WC'.

I really like this new C=C, don't know why exactly. Just out of interest, was it easier to (a) actually write C to C, or (b) disable write to C for GETCT?

This could be a path to extra functionality for certain instructions in the future, by using an otherwise redundant opcode bit without any side-effects.

cgracey · 2019-01-31 21:26

TonyB_ wrote: »

cgracey wrote: »

I also slipped in the C=C change for 'GETCT reg WC'.

I really like this new C=C, don't know why exactly. Just out of interest, was it easier to (a) actually write C to C, or (b) disable write to C for GETCT?

This could be a path to extra functionality for certain instructions in the future, by using an otherwise redundant opcode bit without any side-effects.

It was much easier to just copy C, than to make the instruction not write C.

evanh · 2019-01-31 22:02

cgracey wrote: »

In talking to Wendy some more, it was apparent that our needs for not glitching LUT reads during LUT writes were beyond what she could address via clock inversion and timing constraints. So, I made some Verilog changes to detect these r/w conflicts and pass the write data to the read port of the otherwise-victim.

Good stuff! It explains why the default is not OLD_DATA.

I even tried to replicate the JMP-event within a REP block, but couldn't get this case to fail. I need to find the link someone left about the cases which fail.

This program works okay, though:

So it does! I'm blown away. More test cases to come I guess ... EDIT: Yay, I see Chip has found a reason for it.

cgracey · 2019-01-31 22:23

Does anyone remember any other bugs that haven't been fixed yet?

evanh · 2019-01-31 22:50

Other things I thought I found were just my own mistakes.

There was an idea or two I had but they weren't of much significance, or too big.

evanh · 2019-01-31 23:18

evanh wrote: »

There was an idea or two I had but they weren't of much significance, or too big.

One idea that would be nice to have is changing XORO32 and SCA results to feeding next D input instead of next S input.

cgracey · 2019-01-31 23:35

evanh wrote: »

evanh wrote: »

There was an idea or two I had but they weren't of much significance, or too big.

One idea that would be nice to have is changing XORO32 and SCA results to feeding next D input instead of next S input.

Yes. I looked into this. It's doable, but I wasn't convinced of its benefit. Could you please refresh me on this? A link would do. Thanks, Evanh.

evanh · 2019-01-31 23:45

Here's the topic - https://forums.parallax.com/discussion/169585/xoro32-scrambler-output/p1

Tony felt it was a good idea - https://forums.parallax.com/discussion/comment/1461517/#Comment_1461517

evanh · 2019-01-31 23:54

The main benefit this change for these instruction pairings is together they then become like a 3-operand arrangement because the D field of the second instruction is still valid for its result address.

PS: Which is also why the idea of generalising it for the prop3 came up.

Tubular · 2019-01-31 23:59

Hey Chip
Sounds like you're almost there with the verilog
How many days before the component values (adc cap etc) get tweaked?

cgracey · 2019-02-01 00:07

evanh wrote: »

Here's the topic - https://forums.parallax.com/discussion/169585/xoro32-scrambler-output/p1

Tony felt it was a good idea - https://forums.parallax.com/discussion/comment/1461517/#Comment_1461517

Thanks, Evanh. I looked all that over. I also looked at the Verilog code. I don't feel like this would be worth doing, at this point. Thanks for bringing it up, again, though.

cgracey · 2019-02-01 00:09

Tubular wrote: »

Hey Chip
Sounds like you're almost there with the verilog
How many days before the component values (adc cap etc) get tweaked?

That has to happen soon, maybe within the next week.

evanh · 2019-02-01 00:34

I never got round to testing OUT to IN speed on the real chip. That was something I wasn't happy with in the FPGA. Forgot about it till now.

The pin drivers and input buffers on the FPGA are only a few nanosecond combined, but there were very long lags of maybe 30 ns of asynchronous transition coming back to IN from the prior OUT - In addition to clocked stages.

EDIT: I've found the prior effort - https://forums.parallax.com/discussion/comment/1439248/#Comment_1439248
and https://forums.parallax.com/discussion/comment/1430499/#Comment_1430499
and where it all started from: http://forums.parallax.com/discussion/comment/1426018/#Comment_1426018

jmg · 2019-02-01 02:17

evanh wrote: »

I never got round to testing OUT to IN speed on the real chip. That was something I wasn't happy with in the FPGA. Forgot about it till now.

The pin drivers and input buffers on the FPGA are only a few nanosecond combined, but there were very long lags of maybe 30 ns of asynchronous transition coming back to IN from the prior OUT - In addition to clocked stages.

The other area of Pin-core delay that needs to be checked, is the Xtal Buffer to PFD detector.
Ideally, that non-clocked path should be matched with a equal-gate-delay path in the counter feedback, (so they track) to avoid the PFD moving with temperature across a SysCLK threshold.
That mechanism may explain the observed temperature 'hot zones' for jitter issues.

On P2, as you mention, I have seen similar ten+ ns movement in Xtal to SysCLK pin vs temperature sweeps.

if PFD paths are matched, that also means external clocks will (mostly) keep pin-relative placement, and that will be important for application that clock P2 from a master clock, and expect P2 pins to keep exact relative time.

cgracey · 2019-02-01 04:26

I just posted a new set of FPGA files at the top of the thread - v33i. Please try them out.

If you can, please verify that the LUT-sharing bug is fixed, as well as the JMP-event-within-REP bug.

Thanks.

ozpropdev · 2019-02-01 05:28

Chip
V33i LUT sharing tests Ok here.

Offset  Original New
+0      FFFFFFFF FFFFFFFF
+1      FFFFFFFF FFFFFFFF
+2      FFFFFFFF 00000000
+3      FFFFFFFF 00000000
+4      FFFFFFFF 00000000

Offset  Original New
+0      00000000 00000000
+1      00000000 00000000
+2      00000000 FFFFFFFF
+3      00000000 FFFFFFFF
+4      00000000 FFFFFFFF

Offset  Original New
+0      55555555 55555555
+1      55555555 55555555
+2      55555555 AAAAAAAA
+3      55555555 AAAAAAAA
+4      55555555 AAAAAAAA

Offset  Original New
+0      AAAAAAAA AAAAAAAA
+1      AAAAAAAA AAAAAAAA
+2      AAAAAAAA 55555555
+3      AAAAAAAA 55555555
+4      AAAAAAAA 55555555

cgracey · 2019-02-01 05:36

ozpropdev wrote: »

Chip
V33i LUT sharing tests Ok here.

Offset  Original New
+0      FFFFFFFF FFFFFFFF
+1      FFFFFFFF FFFFFFFF
+2      FFFFFFFF 00000000
+3      FFFFFFFF 00000000
+4      FFFFFFFF 00000000

Offset  Original New
+0      00000000 00000000
+1      00000000 00000000
+2      00000000 FFFFFFFF
+3      00000000 FFFFFFFF
+4      00000000 FFFFFFFF

Offset  Original New
+0      55555555 55555555
+1      55555555 55555555
+2      55555555 AAAAAAAA
+3      55555555 AAAAAAAA
+4      55555555 AAAAAAAA

Offset  Original New
+0      AAAAAAAA AAAAAAAA
+1      AAAAAAAA AAAAAAAA
+2      AAAAAAAA 55555555
+3      AAAAAAAA 55555555
+4      AAAAAAAA 55555555

Thanks, Brian. And this is a difference in behavior from before, right?

ozpropdev · 2019-02-01 05:51

cgracey wrote: »

And this is a difference in behavior from before, right?

That's right Chip.
Here's the Silicon results showing the "glitch"

Offset  Original New
+0      FFFFFFFF FFFFFFFF
+1      FFFFFFFF FFFFFFFF
+2      FFFFFFFF 09009DFF	'glitch
+3      FFFFFFFF 00000000
+4      FFFFFFFF 00000000

Offset  Original New
+0      00000000 00000000
+1      00000000 00000000
+2      00000000 00000000
+3      00000000 FFFFFFFF
+4      00000000 FFFFFFFF

Offset  Original New
+0      55555555 55555555
+1      55555555 55555555
+2      55555555 01005555	'glitch
+3      55555555 AAAAAAAA
+4      55555555 AAAAAAAA

Offset  Original New
+0      AAAAAAAA AAAAAAAA
+1      AAAAAAAA AAAAAAAA
+2      AAAAAAAA 88288AAA	'glitcg
+3      AAAAAAAA 55555555
+4      AAAAAAAA 55555555

New FPGA files for next silicon version - 5th/final release - contains new ROM!!

Comments