New FPGA files for next silicon version - 5th/final release - contains new ROM!!

cgracey · 2019-01-30 06:34

5th Release

New ROM with updated SD booter and TAQOZ.

Extra register on each IN signal from pins to ensure metastability.

Fixes r/w glitch during LUT sharing.
Fixes JMP-event-within-REP bug.
'GETCT reg WC' doesn't change C.

This is for anyone who wants to try the next version of silicon, including the new ROM:

https://drive.google.com/file/d/1dOe3JPTZvcKvdE9SDOUSdMqM7BJ8Ixqk/view?usp=sharing

	           cogs	  smart pins	RAM	Freq	CORDIC	Filename
	         +-------------------------------------------------------------------------
Prop123-A9       |  8	  0-39,56-63	512k *	80MHz	Yes	Prop123_A9_Prop2_v33k.rbf
BeMicro-A9       |  8	  0-39,56-63 	512k *	80MHz	Yes	BeMicro_A9_Prop2_v33k.jic **
Prop123-A7       |  4	  0-15,62-63	512k	80MHz	Yes	Prop123_A7_Prop2_v33k.rbf
DE2-115          |  4	  0-7,60-63	256k	80MHz	Yes	DE2_115_Prop2_v33k.pof

 * Allows loading up to $FFFFF to rewrite ROM.

** I had a file overwrite and I don't think that the SD card pins are mapped properly
   anymore to P[61:58] on the BeMicro-A9 image.

Here are the differences between the current silicon and these next-silicon FPGA images:

RDLUT and WRLUT now support PTRA/PTRB expressions. This means immediate LUT addresses are limited to $000..$0FF, unless ## is used.

PTRA/PTRB expressions are now encoded slightly differently to allow wider address ranging. These are used by
RDBYTE/RDWORD/RDLONG/WRBYTE/WRWORD/WRLONG/WMLONG, and now also RDLUT/WRLUT. The version of PNut.exe included in the .zip file handles all this. You don't need to do anything. PNut.exe will assemble proper object code from your PASM source code.

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

There are two new instructions which set up and read the scope mode: 'SETSCP D/#' and 'GETSCP D'. SETSCP points the scope mux to a set of four pins starting at (D[5:0] AND $3C), with D[6]=1 to enable scope operation. Any time GETSCP is executed, the lower bytes of those four pins' RDPIN values are returned in D. This feature will mainly be useful on the next silicon, as the FPGAs don't have ADC-capable pins.

Lastly, the USB smart pin modes have changed. There used to be four different USB modes ranging in %110xx. USB mode is now %11011 with WXPIN bits 15 and 14 setting the sub-modes and bits 13..0 setting the NCO frequency, as before, since bits 15 and 14 were always '0', anyway. Now, bit 15 = 0 for device mode or 1 for host mode, and bit 14 = 0 for low-speed mode or 1 for full-speed mode.

Smart pin modes %1100x are SINC2/SINC3/raw ADC modes, while smart pin mode %11010 is Scope mode. These aren't very useful until the next silicon exists, so there's no need to elaborate, yet.

I think those are the only changes.

Wait, better review this list of changes, too, since now all instructions that affect bits can now affect a RANGE of bits. You'll need to make sure you're not inadvertently affecting more than one bit, unless you intend to:

https://forums.parallax.com/discussion/169282/list-of-changes-in-next-p2-silicon/p1

The .spin2 files in the .zip have all been modified to take advantage, where possible, of the new bit/pin-range operations.

cgracey · 2019-01-30 06:42

PeterJakacki and Cluso99,

My ROM Booter has not changed, at all, so you can use my prior code when putting the whole image together, which includes your code.

We are going to verify through simulation that the race condition is gone, so there's no need for me to change my code around to handle DIR differently.

It's critical, of course, that this latest version of PNut be used to assemble your programs, so that PTRx expression are assembled correctly.

The mechanism for overwriting the ROM is in place, as during the last development period.

I'm hoping we can get this together in the next few days. And Thanks!!!

evanh · 2019-01-30 07:52

Chip,
Have you set the dual-port SRAM parameter, READ_DURING_WRITE_MODE_MIXED_PORTS? See https://forums.parallax.com/discussion/comment/1462814/#Comment_1462814

msrobots · 2019-01-30 09:16

cgracey wrote: »

...
RDLUT and WRLUT now support PTRA/PTRB expressions. This means immediate LUT addresses are limited to $000..$0FF, unless ## is used.

PTRA/PTRB expressions are now encoded slightly differently to allow wider address ranging. These are used by
RDBYTE/RDWORD/RDLONG/WRBYTE/WRWORD/WRLONG/WMLONG, and now also RDLUT/WRLUT.
…

very nice to have

cgracey wrote: »

...
The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.
...

this is just cool. Thank you very much

cgracey wrote: »

…
Wait, better review this list of changes, too, since now all instructions that affect bits can now affect a RANGE of bits. You'll need to make sure you're not inadvertently affecting more than one bit, unless you intend to:

https://forums.parallax.com/discussion/169282/list-of-changes-in-next-p2-silicon/p1
…

And now you got me worried, must read...

Enjoy!

Mike

ozpropdev · 2019-01-30 09:45

Chip
Flashed all 4 images of "33g" to relevant FPGA boards.
All running Ok, will throw some more code at them tomorrow.

ersmith · 2019-01-30 11:20

cgracey wrote: »

It's critical, of course, that this latest version of PNut be used to assemble your programs, so that PTRx expression are assembled correctly.

Wasn't the last proposal for the PTRx encoding backward compatible? It'd be really nice if we didn't have to have separate sets of tools for the P2ES and the next chip

.

TonyB_ · 2019-01-30 12:01

ersmith wrote: »

cgracey wrote: »

It's critical, of course, that this latest version of PNut be used to assemble your programs, so that PTRx expression are assembled correctly.

Wasn't the last proposal for the PTRx encoding backward compatible? It'd be really nice if we didn't have to have separate sets of tools for the P2ES and the next chip .

Yes, my idea (B2) for binary compatibility including the Verilog change is in the first post on this page:
http://forums.parallax.com/discussion/169243/rdlut-wrlut-with-auto-incrementing-address/p5

If implemented, an index of -16..+15 would encode the same in rev B as rev A.

Rayman · 2019-01-30 12:33

Is hdmi still going to be added?

Dave Hein · 2019-01-30 13:02

cgracey wrote: »

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:

        getct  high wc ' Read high at cycle N
        getct  low    ' Read low at cycle N+2
        shr    low, #1 wz ' Test if low is 0 or 1
 if_z   add     high, #1   ' Increment high if low wrapped

EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

Cluso99 · 2019-01-30 13:23

From what I understood, the low count will be latched internally when the high count is read for an immediate following instruction to read.

Mark_T · 2019-01-30 13:23

If you don't want to clobber low, you'd avoid shifting it right!

        getct  high wc ' Read high at cycle N
        getct  low    ' Read low at cycle N+2
        cmp     low, #1  wcz
 if_le  add     high, #1

Dave Hein · 2019-01-30 13:29

Cluso99 wrote: »

From what I understood, the low count will be latched internally when the high count is read for an immediate following instruction to read.

So there would be a time difference of 2 cycles if I read low immediately after reading high versus reading low by itself.

@Mark_T, thanks for correcting my code. That's what I get for trying to write code early in the morning.

TonyB_ · 2019-01-30 13:33

Dave Hein wrote: »
cgracey wrote: »

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
        getct  high wc ' Read high at cycle N
        getct  low    ' Read low at cycle N+2
        shr    low, #1 wz ' Test if low is 0 or 1
 if_z   add     high, #1   ' Increment high if low wrapped
EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

Here's the original discussion, which went increasingly off-topic in the later pages:
https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?

Cluso99 · 2019-01-30 14:00

TonyB_ wrote: »
Dave Hein wrote: »
cgracey wrote: »

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
        getct  high wc ' Read high at cycle N
        getct  low    ' Read low at cycle N+2
        shr    low, #1 wz ' Test if low is 0 or 1
 if_z   add     high, #1   ' Increment high if low wrapped
EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?
Here's the original discussion, which went increasingly off-topic in the later pages:
https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?

Why?
It would be better to not change the C at all, buy it’s likely more silicon and not intuitive unless a pseudo op ode of GETCTH D was used.

Publison · 2019-01-30 15:53

What is the Minimum / Maximum Quartas versions for the 1-2-3 A9? I think 15.0 was safe.

jmg · 2019-01-30 18:33

Dave Hein wrote: »

Cluso99 wrote: »

From what I understood, the low count will be latched internally when the high count is read for an immediate following instruction to read.

So there would be a time difference of 2 cycles if I read low immediately after reading high versus reading low by itself.

Usually, latched opcodes grab both fields on the first opcode, and the second opcode merely reads the stored value.
There should be no rollover handling needed, and the only time difference should be the 64 code is 1 opcode larger/slower than 32b, but the capture instant should not move.

Dave Hein wrote: »

EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

Usually such details are hidden from the user, so it appears like a 'seamless 64b counter'.
eg on P2, a true 64b counter would run too slow, so the actual code will generate a terminal count on -1 which clock enables the second 32b counter.
when running, all bits rollover to 0000 on the same clock.

Dave Hein · 2019-01-30 19:25

It might be a good idea to run the following code as a check:

getct high wc
getct low1
getct low2

Assuming no interrupts (low2 - low1) should be 2 and not 4.

TonyB_ · 2019-01-30 20:05

Cluso99 wrote: »
TonyB_ wrote: »
Dave Hein wrote: »
cgracey wrote: »

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
        getct  high wc ' Read high at cycle N
        getct  low    ' Read low at cycle N+2
        shr    low, #1 wz ' Test if low is 0 or 1
 if_z   add     high, #1   ' Increment high if low wrapped
EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?
Here's the original discussion, which went increasingly off-topic in the later pages:
https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?
Why?
It would be better to not change the C at all, buy it’s likely more silicon and not intuitive unless a pseudo op ode of GETCTH D was used.

It seems that using wc is the easiest way to read the high count. Clearing C is not the only possible option and I suggested an alternative but here's another one: copy C to C!

Cluso99 · 2019-01-30 20:19

Yes, Chip extended the GETCT to return the high value if WC is used.
IMHO it would be better if the C flag was not changed as this allows user code to keep C unchanged. But that may be a few gates extra.

evanh · 2019-01-30 20:42

It would be simpler for both hardware and software to not use the C flag. But it would mean the 2-clock IRQ blocking always happens.

I'd be happy with this as standard baggage given there is already many instruction pairings that already do this. AUGx/ALTx/SETQ comes to mind.

evanh · 2019-01-30 20:46

I suspect the hidden Q register is likely used for the CT second half copy.

cgracey · 2019-01-30 21:38

The top 32 bits of CT read two clocks ahead of the bottom 32 bits.

Here is a program I made to verify rollover behavior on the next silicon:

con		t = 0	't=0 for $00000000_FFFFFFFF or t=1 $00000001_00000000

dat		org

		hubset	#$FF			'select 80MHz on FPGA

.msb		getct	lo			'wait for ct msb
		tjns	lo,#.msb

		addct1	x,#0			'set ct target near rollover

		waitct1				'wait for target

		getct	hi	wc		'capture upper ct
		getct	lo			'capture lower ct

		cmp	lo,##$FFFF_FFFF+t wz	'check 64-bit ct value
	if_z	cmp	hi,##$0000_0000+t wz

		drvz	#32			'good on p32
		drvh	#33			'done on p33

		jmp	#$


x		long	$FFFF_FFF9+t		'$FFFF_FFF9 gets to $0000_0000_FFFF_FFFF

lo		res	1
hi		res	1

cgracey · 2019-01-30 22:15

evanh wrote: »

Chip,
Have you set the dual-port SRAM parameter, READ_DURING_WRITE_MODE_MIXED_PORTS? See https://forums.parallax.com/discussion/comment/1462814/#Comment_1462814

I forgot!

I just talked to Wendy at ON Semi about this, though, and she is looking into what we must do to ensure that random data is not returned on a READ during a simultaneous write to the same location from the other port. She is going to call me back soon about this. If it's doable, I'll update the FPGA images, accordingly.

Thanks for bringing this up!!!

cgracey · 2019-01-30 22:17

msrobots wrote: »

cgracey wrote: »

...
RDLUT and WRLUT now support PTRA/PTRB expressions. This means immediate LUT addresses are limited to $000..$0FF, unless ## is used.

PTRA/PTRB expressions are now encoded slightly differently to allow wider address ranging. These are used by
RDBYTE/RDWORD/RDLONG/WRBYTE/WRWORD/WRLONG/WMLONG, and now also RDLUT/WRLUT.
…

very nice to have

cgracey wrote: »

...
The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.
...

this is just cool. Thank you very much

cgracey wrote: »

…
Wait, better review this list of changes, too, since now all instructions that affect bits can now affect a RANGE of bits. You'll need to make sure you're not inadvertently affecting more than one bit, unless you intend to:

https://forums.parallax.com/discussion/169282/list-of-changes-in-next-p2-silicon/p1
…

And now you got me worried, must read...

Enjoy!

Mike

The new bit/pin-field capability lets you condense a series of bit/pin operations into one two-clock instruction. Keeps code small and fast.

cgracey · 2019-01-30 22:17

ozpropdev wrote: »

Chip
Flashed all 4 images of "33g" to relevant FPGA boards.
All running Ok, will throw some more code at them tomorrow.

Thanks, Brian!

cgracey · 2019-01-30 22:19

ersmith wrote: »

cgracey wrote: »

It's critical, of course, that this latest version of PNut be used to assemble your programs, so that PTRx expression are assembled correctly.

Wasn't the last proposal for the PTRx encoding backward compatible? It'd be really nice if we didn't have to have separate sets of tools for the P2ES and the next chip .

I implemented what was simplest in logic, because the PTRx computation circuitry was near critical-path and I didn't want to possibly slow things down.

I need to document what the new scheme is, though you can run PNut and see the output for different expressions. There's not a whole lot to it, and I don't know exactly where it breaks compatibility. Need to look at it.

cgracey · 2019-01-30 22:25

Rayman wrote: »

Is hdmi still going to be added?

Yes! It's in there.

cgracey · 2019-01-30 22:29

TonyB_ wrote: »
Dave Hein wrote: »
cgracey wrote: »

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
        getct  high wc ' Read high at cycle N
        getct  low    ' Read low at cycle N+2
        shr    low, #1 wz ' Test if low is 0 or 1
 if_z   add     high, #1   ' Increment high if low wrapped
EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?
Here's the original discussion, which went increasingly off-topic in the later pages:
https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?

It COULD have, but I already sent the code off to ON Semi. If we wind up doing a bug fix because one of you guys detect a problem, I will make C=CT[32].

cgracey · 2019-01-30 22:30

Cluso99 wrote: »
TonyB_ wrote: »
Dave Hein wrote: »
cgracey wrote: »

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
        getct  high wc ' Read high at cycle N
        getct  low    ' Read low at cycle N+2
        shr    low, #1 wz ' Test if low is 0 or 1
 if_z   add     high, #1   ' Increment high if low wrapped
EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?
Here's the original discussion, which went increasingly off-topic in the later pages:
https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?
Why?
It would be better to not change the C at all, buy it’s likely more silicon and not intuitive unless a pseudo op ode of GETCTH D was used.

Ah, I could have had it make C = current C. That would have been so simple, and better.

cgracey · 2019-01-30 22:34

TonyB_ wrote: »
Cluso99 wrote: »
TonyB_ wrote: »
Dave Hein wrote: »
cgracey wrote: »

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
        getct  high wc ' Read high at cycle N
        getct  low    ' Read low at cycle N+2
        shr    low, #1 wz ' Test if low is 0 or 1
 if_z   add     high, #1   ' Increment high if low wrapped
EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?
Here's the original discussion, which went increasingly off-topic in the later pages:
https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?
Why?
It would be better to not change the C at all, buy it’s likely more silicon and not intuitive unless a pseudo op ode of GETCTH D was used.
It seems that using wc is the easiest way to read the high count. Clearing C is not the only possible option and I suggested an alternative but here's another one: copy C to C!

Yes! I've made a note in the source to make that change if we submit more code, due to a bug or timing fix.

evanh · 2019-01-30 23:37

Well, the problem with event branching, Jxx, instructions within a REP block counts as a design bug. I never saw any fix mentioned for those. See https://forums.parallax.com/discussion/comment/1459273/#Comment_1459273

New FPGA files for next silicon version - 5th/final release - contains new ROM!!

Comments