Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

evanh · 2019-07-20 01:46

cgracey wrote: »

Evanh, here's the spoiler: the ASIC tools optimize the logic cell placement and routing only to meet the timing goal, so that signals with plenty of slack get routed around the hot spots, loosing their slack, while the signals needing speed get the prime placement and shortest routes. In the end, hundreds of thousands of paths are stacked against the timing wall, forming a cliff, where the chip fails systemically if the clock period becomes too short. So, while in theory some things take less time than others, the implementation is a blob of nearly identically-timed paths that affords no possibility of speed-up via selective clock cycle shortening. When you hit the speed limit, everything fails at once.

I know, I didn't show the gap as a speed-up feature. It was just indicative of where the propagations end is all. The propagations must complete ahead of flop setup timing requirements.

cgracey · 2019-07-20 02:23

Roy Eltham wrote: »

@cgracey aren't the new chips due soon?

The new wafers came out of the fab on the 13th and are now being packaged into 10 glob tops. The rest of the dice will be packaged in the real Amkor ePad package, but they won't be shipped for several weeks.

ON Semi has been promising delivery of the 10 glob tops on August 1, but they always under-promise and over-deliver, so I'm thinking we may have chips sometime next week. Man, I really hope the design is okay. If it works as planned, it's going to be great. We'll have a really nice chip.

Cluso99 · 2019-07-20 05:52

Fingers doubly crossed. With all the work you've done Chip, it needs to be a rewarded success!

I am off to the UK for 4 weeks, so I am taking my ES board with me to work on while there

cgracey · 2019-07-20 06:22

Cluso99 wrote: »

Fingers doubly crossed. With all the work you've done Chip, it needs to be a rewarded success!

I am off to the UK for 4 weeks, so I am taking my ES board with me to work on while there

We've all done a lot of work on this. This chip is a distillation of a lot of fantastic ideas that people have had over the last 13 years. We packed at least 15 pounds into the 5 pound bag.

Cluso99 · 2019-07-20 06:59

cgracey wrote: »

Cluso99 wrote: »

Fingers doubly crossed. With all the work you've done Chip, it needs to be a rewarded success!

I am off to the UK for 4 weeks, so I am taking my ES board with me to work on while there

We've all done a lot of work on this. This chip is a distillation of a lot of fantastic ideas that people have had over the last 13 years. We packed at least 15 pounds into the 5 pound bag.

More like 50 pounds into that 5 pound bag

There is something for everyone in the P2!

Roy Eltham · 2019-07-20 08:13

Excellent news Chip!
Validation with the 10 glob tops in a couple weeks, then the proper packaged ones in late September or Early October (on the new rev of the eval board) for us! Yay!

jmg · 2019-07-20 08:19

cgracey wrote: »

The new wafers came out of the fab on the 13th and are now being packaged into 10 glob tops. The rest of the dice will be packaged in the real Amkor ePad package, but they won't be shipped for several weeks.

ON Semi has been promising delivery of the 10 glob tops on August 1, but they always under-promise and over-deliver, so I'm thinking we may have chips sometime next week....

Have those 10 passed the test program you were working on ? Have OnSemi indicated yields on this run yet ?

cgracey · 2019-07-20 09:19

jmg wrote: »

cgracey wrote: »

The new wafers came out of the fab on the 13th and are now being packaged into 10 glob tops. The rest of the dice will be packaged in the real Amkor ePad package, but they won't be shipped for several weeks.

ON Semi has been promising delivery of the 10 glob tops on August 1, but they always under-promise and over-deliver, so I'm thinking we may have chips sometime next week....

Have those 10 passed the test program you were working on ? Have OnSemi indicated yields on this run yet ?

I'm not sure about the test program status for the new silicon. It would be all new files within ON Semi, since the logic was resynthesized.

Roy Eltham · 2019-07-20 19:46

Chip,
He meant the PASM2 code that you wrote to test things. You indicated that it was for ON Semi to use for testing the chips.

cgracey · 2019-07-21 01:21

Roy Eltham wrote: »

Chip,
He meant the PASM2 code that you wrote to test things. You indicated that it was for ON Semi to use for testing the chips.

Our analog test program is the same, but they must make new digital test programs.

evanh · 2019-08-11 01:48

Chip,
This is cool! I finally understand QFRAC and why you've included it! I had the following code working, seemed perfect. Purpose is to scale up the remainder of (clk_freq / asyn_baud) so as to make use of the full 16.6 bit format for clock divider in smartpin asynchronous serial mode.

		qdiv	##$ffff_ffff, asyn_baud

But there was a small doubt I was missing something. I went to ask you about it, whether it really was as perfect as it seemed and whether you had any recommendations. As I was typing the question, it dawned on me there was this other cordic divide I'd never understood ... QFRAC, so I tried it and blow me down it did the job even better because it didn't need the large constant.

		qfrac	#1, asyn_baud

Same answer.

evanh · 2019-09-22 15:25

Chip,
Is there a way to release the FIFO back to hubexec after a RD/WRFAST? Grr, silly question, it's not what I wanted anyway.

I've been testing the timing of hubRAM accesses vs non-hubRAM hub-ops like COGID and cordic commands. I've found that issuing WRFAST seems to produce peculiar responses and I'm wondering, to get something sensible, if I need to reset the FIFO ops back to idle in some fashion.

I've tried issuing a RDFAST in between each WRFAST test but that's not particularly effective. The numbers shuffle around a little but are still just as weird.

Here's the table of results. The values in the middle are execution duration, of the second instruction, in sysclocks. The X-axis labels are hubRAM addresses of the first instruction's data access, and right column is the address that produced the shortest execution.

                     0   28   24   20   16   12    8    4
--------------------------------------------------------------------
 RDLONG  QMUL        9    2    3    4    5    6    7    8   28
 WRLONG  QMUL        7    8    9    2    3    4    5    6   20
 RDFAST  QMUL        9    2    3    4    5    6    7    8   28
 WRFAST  QMUL        4    7    3    7    3    3    7    3   24
 RDLONG  COGID      11    4    5    6    7    8    9   10   28
 WRLONG  COGID       9   10   11    4    5    6    7    8   20
 RDFAST  COGID      11    4    5    6    7    8    9   10   28
 WRFAST  COGID      11    9    5    9    5    5    9    5   24
 RDLONG  COGID WC   11    4    5    6    7    8    9   10   28
 WRLONG  COGID WC    9   10   11    4    5    6    7    8   20
 RDFAST  COGID WC   11    4    5    6    7    8    9   10   28
 WRFAST  COGID WC    7    9    5    9    5    5    9    5   24
 RDLONG  LOCKRET     9    2    3    4    5    6    7    8   28
 WRLONG  LOCKRET     7    8    9    2    3    4    5    6   20
 RDFAST  LOCKRET     9    2    3    4    5    6    7    8   28
 WRFAST  LOCKRET     2    7    3    7    7    7    7    3    0

Ignoring the WRFASTs, you can see a regular sequence to each line - where the execution duration increases by one for each column in the table.

WRFAST doesn't even slightly follow that pattern. Any idea why?

Here's the critical measuring code

		...
inst1		nop
		getct	tickstart		'measure time
inst2		nop
		getct	pa			'measure time
		rdfast	#0, #0
		...

And the instruction tables that fill those two NOPs

hubram_tab
		rdlong	inb, phase
		byte	13,10," RDLONG ",0
		wrlong	inb, phase
		byte	13,10," WRLONG ",0
		rdfast	#0, phase
		byte	13,10," RDFAST ",0
		wrfast	#0, phase
		byte	13,10," WRFAST ",0

hubop_tab
		qmul	tickstart, #37
		byte	" QMUL    ",0
		cogid	inb
		byte	" COGID   ",0
		cogid	#15	wc
		byte	" COGID WC",0
		lockret	#0
		byte	" LOCKRET ",0

evanh · 2019-09-22 17:48

Huh, it comes right when I do a three instruction variant of the above. "inst4" and "inst5" are hubram and hubop respectively.

		...
inst3		wrlong	inb, phase2
		getct	tickstart		'measure time
inst4		nop
inst5		nop
		getct	pa			'measure time
		rdfast	#0, #0
		...

evanh · 2019-09-22 17:55

Ah, got it! WRFAST isn't the controlling factor because it always takes 2 clocks unless it is blocked by a prior WRFAST flushing. Lol, that took a while to sink in. I guess it is 6:00 AM now, time to hit the sack.

cgracey · 2019-09-22 19:30

evanh wrote: »

Ah, got it! WRFAST isn't the controlling factor because it always takes 2 clocks unless it is blocked by a prior WRFAST flushing. Lol, that took a while to sink in. I guess it is 6:00 AM now, time to hit the sack.

I had forgotten that WRFAST takes only 2 clocks.

cgracey · 2019-09-22 21:54

WRFAST will take more than two clocks if a prior WRFAST has not finished and queued data still needs to be written.

evanh · 2019-09-23 01:18

EDIT: Err, it's 3 clocks for WRFAST normally. I'd guessed 2.

Okay, those results indicate a discrepancy with the docs:
COGID execution times says "2...9, +2 if result". I'm always seeing a minimum of 4 (2+2). No apparent way to get down to 2 clocks.

cgracey · 2019-09-23 03:23

evanh wrote: »

EDIT: Err, it's 3 clocks for WRFAST normally. I'd guessed 2.

Okay, those results indicate a discrepancy with the docs:
COGID execution times says "2...9, +2 if result". I'm always seeing a minimum of 4 (2+2). No apparent way to get down to 2 clocks.

Yes, COGID must always have a result. I will uppdate the sheets.

evanh · 2019-10-08 07:47

Bumped into a couple of out-of-date names in the doc:

The section on branch addressing talks about the JMPREL instruction but the listed encoding line right below labels it as JMP only:

EEEE 1101011 00L DDDDDDDDD 000110000 JMP {#}D

The section on interrupts has this line with SCLU and SCL instead of SCA and SCAS respectively:

ALTxx / CRCNIB / SCLU / SCL / GETXACC / SETQ / SETQ2 / XORO32 / XBYTE must not be executing

cgracey · 2019-10-08 15:14

evanh wrote: »

Bumped into a couple of out-of-date names in the doc:

The section on branch addressing talks about the JMPREL instruction but the listed encoding line right below labels it as JMP only:

EEEE 1101011 00L DDDDDDDDD 000110000 JMP {#}D

The section on interrupts has this line with SCLU and SCL instead of SCA and SCAS respectively:

ALTxx / CRCNIB / SCLU / SCL / GETXACC / SETQ / SETQ2 / XORO32 / XBYTE must not be executing

Thanks, Evanh. I will get those cleaned up.

cgracey · 2019-11-20 10:10

This thread can be un-stuck. There is a newer thread somewhere for the current-silicon FPGA files.

VonSzarvas · 2019-11-20 10:36

Unstuck.

The latest (and archive) FPGA files, along with many other resources include IDEs and sample code, can be found here : https://propeller.parallax.com/

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments