New FPGA files for next silicon version - 5th/final release - contains new ROM!!

cgraceycgracey Posts: 11,012
edited 2019-02-20 - 07:53:32 in Propeller 2
5th Release

New ROM with updated SD booter and TAQOZ.

Extra register on each IN signal from pins to ensure metastability.

Fixes r/w glitch during LUT sharing.
Fixes JMP-event-within-REP bug.
'GETCT reg WC' doesn't change C.


This is for anyone who wants to try the next version of silicon, including the new ROM:

https://drive.google.com/file/d/1dOe3JPTZvcKvdE9SDOUSdMqM7BJ8Ixqk/view?usp=sharing
	           cogs	  smart pins	RAM	Freq	CORDIC	Filename
	         +-------------------------------------------------------------------------
Prop123-A9       |  8	  0-39,56-63	512k *	80MHz	Yes	Prop123_A9_Prop2_v33k.rbf
BeMicro-A9       |  8	  0-39,56-63 	512k *	80MHz	Yes	BeMicro_A9_Prop2_v33k.jic **
Prop123-A7       |  4	  0-15,62-63	512k	80MHz	Yes	Prop123_A7_Prop2_v33k.rbf
DE2-115          |  4	  0-7,60-63	256k	80MHz	Yes	DE2_115_Prop2_v33k.pof

 * Allows loading up to $FFFFF to rewrite ROM.

** I had a file overwrite and I don't think that the SD card pins are mapped properly
   anymore to P[61:58] on the BeMicro-A9 image.

Here are the differences between the current silicon and these next-silicon FPGA images:

RDLUT and WRLUT now support PTRA/PTRB expressions. This means immediate LUT addresses are limited to $000..$0FF, unless ## is used.

PTRA/PTRB expressions are now encoded slightly differently to allow wider address ranging. These are used by
RDBYTE/RDWORD/RDLONG/WRBYTE/WRWORD/WRLONG/WMLONG, and now also RDLUT/WRLUT. The version of PNut.exe included in the .zip file handles all this. You don't need to do anything. PNut.exe will assemble proper object code from your PASM source code.

The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

There are two new instructions which set up and read the scope mode: 'SETSCP D/#' and 'GETSCP D'. SETSCP points the scope mux to a set of four pins starting at (D[5:0] AND $3C), with D[6]=1 to enable scope operation. Any time GETSCP is executed, the lower bytes of those four pins' RDPIN values are returned in D. This feature will mainly be useful on the next silicon, as the FPGAs don't have ADC-capable pins.

Lastly, the USB smart pin modes have changed. There used to be four different USB modes ranging in %110xx. USB mode is now %11011 with WXPIN bits 15 and 14 setting the sub-modes and bits 13..0 setting the NCO frequency, as before, since bits 15 and 14 were always '0', anyway. Now, bit 15 = 0 for device mode or 1 for host mode, and bit 14 = 0 for low-speed mode or 1 for full-speed mode.

Smart pin modes %1100x are SINC2/SINC3/raw ADC modes, while smart pin mode %11010 is Scope mode. These aren't very useful until the next silicon exists, so there's no need to elaborate, yet.

I think those are the only changes.

Wait, better review this list of changes, too, since now all instructions that affect bits can now affect a RANGE of bits. You'll need to make sure you're not inadvertently affecting more than one bit, unless you intend to:

https://forums.parallax.com/discussion/169282/list-of-changes-in-next-p2-silicon/p1


The .spin2 files in the .zip have all been modified to take advantage, where possible, of the new bit/pin-range operations.
«134

Comments

  • cgraceycgracey Posts: 11,012
    edited 2019-01-30 - 06:44:20
    PeterJakacki and Cluso99,

    My ROM Booter has not changed, at all, so you can use my prior code when putting the whole image together, which includes your code.

    We are going to verify through simulation that the race condition is gone, so there's no need for me to change my code around to handle DIR differently.

    It's critical, of course, that this latest version of PNut be used to assemble your programs, so that PTRx expression are assembled correctly.

    The mechanism for overwriting the ROM is in place, as during the last development period.

    I'm hoping we can get this together in the next few days. And Thanks!!!
  • Chip,
    Have you set the dual-port SRAM parameter, READ_DURING_WRITE_MODE_MIXED_PORTS? See https://forums.parallax.com/discussion/comment/1462814/#Comment_1462814

    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • cgracey wrote: »
    ...
    RDLUT and WRLUT now support PTRA/PTRB expressions. This means immediate LUT addresses are limited to $000..$0FF, unless ## is used.

    PTRA/PTRB expressions are now encoded slightly differently to allow wider address ranging. These are used by
    RDBYTE/RDWORD/RDLONG/WRBYTE/WRWORD/WRLONG/WMLONG, and now also RDLUT/WRLUT.

    very nice to have
    cgracey wrote: »
    ...
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.
    ...

    this is just cool. Thank you very much
    cgracey wrote: »

    Wait, better review this list of changes, too, since now all instructions that affect bits can now affect a RANGE of bits. You'll need to make sure you're not inadvertently affecting more than one bit, unless you intend to:

    https://forums.parallax.com/discussion/169282/list-of-changes-in-next-p2-silicon/p1

    And now you got me worried, must read...

    Enjoy!

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • Chip
    Flashed all 4 images of "33g" to relevant FPGA boards.
    All running Ok, will throw some more code at them tomorrow.
    Melbourne, Australia
  • cgracey wrote: »
    It's critical, of course, that this latest version of PNut be used to assemble your programs, so that PTRx expression are assembled correctly.

    Wasn't the last proposal for the PTRx encoding backward compatible? It'd be really nice if we didn't have to have separate sets of tools for the P2ES and the next chip :(.
  • ersmith wrote: »
    cgracey wrote: »
    It's critical, of course, that this latest version of PNut be used to assemble your programs, so that PTRx expression are assembled correctly.

    Wasn't the last proposal for the PTRx encoding backward compatible? It'd be really nice if we didn't have to have separate sets of tools for the P2ES and the next chip :(.

    Yes, my idea (B2) for binary compatibility including the Verilog change is in the first post on this page:
    http://forums.parallax.com/discussion/169243/rdlut-wrlut-with-auto-incrementing-address/p5

    If implemented, an index of -16..+15 would encode the same in rev B as rev A.
    Formerly known as TonyB
  • Is hdmi still going to be added?
    Prop Info and Apps: http://www.rayslogic.com/
  • Dave HeinDave Hein Posts: 5,898
    edited 2019-01-30 - 13:18:41
    cgracey wrote: »
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

    Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
            getct  high wc ' Read high at cycle N
            getct  low    ' Read low at cycle N+2
            shr    low, #1 wz ' Test if low is 0 or 1
     if_z   add     high, #1   ' Increment high if low wrapped
    

    EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

  • From what I understood, the low count will be latched internally when the high count is read for an immediate following instruction to read.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • If you don't want to clobber low, you'd avoid shifting it right!
            getct  high wc ' Read high at cycle N
            getct  low    ' Read low at cycle N+2
            cmp     low, #1  wcz
     if_le  add     high, #1
    
  • Cluso99 wrote: »
    From what I understood, the low count will be latched internally when the high count is read for an immediate following instruction to read.

    So there would be a time difference of 2 cycles if I read low immediately after reading high versus reading low by itself.

    @Mark_T, thanks for correcting my code. That's what I get for trying to write code early in the morning.
  • TonyB_TonyB_ Posts: 1,200
    edited 2019-01-30 - 13:34:01
    Dave Hein wrote: »
    cgracey wrote: »
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

    Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
            getct  high wc ' Read high at cycle N
            getct  low    ' Read low at cycle N+2
            shr    low, #1 wz ' Test if low is 0 or 1
     if_z   add     high, #1   ' Increment high if low wrapped
    

    EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

    Here's the original discussion, which went increasingly off-topic in the later pages:
    https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
    I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

    A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?
    Formerly known as TonyB
  • TonyB_ wrote: »
    Dave Hein wrote: »
    cgracey wrote: »
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

    Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
            getct  high wc ' Read high at cycle N
            getct  low    ' Read low at cycle N+2
            shr    low, #1 wz ' Test if low is 0 or 1
     if_z   add     high, #1   ' Increment high if low wrapped
    

    EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

    Here's the original discussion, which went increasingly off-topic in the later pages:
    https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
    I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

    A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?
    Why?
    It would be better to not change the C at all, buy it’s likely more silicon and not intuitive unless a pseudo op ode of GETCTH D was used.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • PublisonPublison Posts: 10,783
    edited 2019-01-30 - 15:54:19
    What is the Minimum / Maximum Quartas versions for the 1-2-3 A9? I think 15.0 was safe.
    Infernal Machine
  • jmgjmg Posts: 13,448
    Dave Hein wrote: »
    Cluso99 wrote: »
    From what I understood, the low count will be latched internally when the high count is read for an immediate following instruction to read.

    So there would be a time difference of 2 cycles if I read low immediately after reading high versus reading low by itself.

    Usually, latched opcodes grab both fields on the first opcode, and the second opcode merely reads the stored value.
    There should be no rollover handling needed, and the only time difference should be the 64 code is 1 opcode larger/slower than 32b, but the capture instant should not move.

    Dave Hein wrote: »
    EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?
    Usually such details are hidden from the user, so it appears like a 'seamless 64b counter'.
    eg on P2, a true 64b counter would run too slow, so the actual code will generate a terminal count on -1 which clock enables the second 32b counter.
    when running, all bits rollover to 0000 on the same clock.

  • Dave HeinDave Hein Posts: 5,898
    edited 2019-01-30 - 19:30:56
    It might be a good idea to run the following code as a check:
    getct high wc
    getct low1
    getct low2
    
    Assuming no interrupts (low2 - low1) should be 2 and not 4.
  • Cluso99 wrote: »
    TonyB_ wrote: »
    Dave Hein wrote: »
    cgracey wrote: »
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

    Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
            getct  high wc ' Read high at cycle N
            getct  low    ' Read low at cycle N+2
            shr    low, #1 wz ' Test if low is 0 or 1
     if_z   add     high, #1   ' Increment high if low wrapped
    

    EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

    Here's the original discussion, which went increasingly off-topic in the later pages:
    https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
    I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

    A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?
    Why?
    It would be better to not change the C at all, buy it’s likely more silicon and not intuitive unless a pseudo op ode of GETCTH D was used.

    It seems that using wc is the easiest way to read the high count. Clearing C is not the only possible option and I suggested an alternative but here's another one: copy C to C!
    Formerly known as TonyB
  • Yes, Chip extended the GETCT to return the high value if WC is used.
    IMHO it would be better if the C flag was not changed as this allows user code to keep C unchanged. But that may be a few gates extra.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • It would be simpler for both hardware and software to not use the C flag. But it would mean the 2-clock IRQ blocking always happens.

    I'd be happy with this as standard baggage given there is already many instruction pairings that already do this. AUGx/ALTx/SETQ comes to mind.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • I suspect the hidden Q register is likely used for the CT second half copy.
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
  • cgraceycgracey Posts: 11,012
    edited 2019-01-30 - 21:40:01
    The top 32 bits of CT read two clocks ahead of the bottom 32 bits.

    Here is a program I made to verify rollover behavior on the next silicon:
    con		t = 0	't=0 for $00000000_FFFFFFFF or t=1 $00000001_00000000
    
    dat		org
    
    		hubset	#$FF			'select 80MHz on FPGA
    
    .msb		getct	lo			'wait for ct msb
    		tjns	lo,#.msb
    
    		addct1	x,#0			'set ct target near rollover
    
    		waitct1				'wait for target
    
    		getct	hi	wc		'capture upper ct
    		getct	lo			'capture lower ct
    
    		cmp	lo,##$FFFF_FFFF+t wz	'check 64-bit ct value
    	if_z	cmp	hi,##$0000_0000+t wz
    
    		drvz	#32			'good on p32
    		drvh	#33			'done on p33
    
    		jmp	#$
    
    
    x		long	$FFFF_FFF9+t		'$FFFF_FFF9 gets to $0000_0000_FFFF_FFFF
    
    lo		res	1
    hi		res	1
    
  • evanh wrote: »
    Chip,
    Have you set the dual-port SRAM parameter, READ_DURING_WRITE_MODE_MIXED_PORTS? See https://forums.parallax.com/discussion/comment/1462814/#Comment_1462814

    I forgot!

    I just talked to Wendy at ON Semi about this, though, and she is looking into what we must do to ensure that random data is not returned on a READ during a simultaneous write to the same location from the other port. She is going to call me back soon about this. If it's doable, I'll update the FPGA images, accordingly.

    Thanks for bringing this up!!!
  • msrobots wrote: »
    cgracey wrote: »
    ...
    RDLUT and WRLUT now support PTRA/PTRB expressions. This means immediate LUT addresses are limited to $000..$0FF, unless ## is used.

    PTRA/PTRB expressions are now encoded slightly differently to allow wider address ranging. These are used by
    RDBYTE/RDWORD/RDLONG/WRBYTE/WRWORD/WRLONG/WMLONG, and now also RDLUT/WRLUT.

    very nice to have
    cgracey wrote: »
    ...
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.
    ...

    this is just cool. Thank you very much
    cgracey wrote: »

    Wait, better review this list of changes, too, since now all instructions that affect bits can now affect a RANGE of bits. You'll need to make sure you're not inadvertently affecting more than one bit, unless you intend to:

    https://forums.parallax.com/discussion/169282/list-of-changes-in-next-p2-silicon/p1

    And now you got me worried, must read...

    Enjoy!

    Mike

    The new bit/pin-field capability lets you condense a series of bit/pin operations into one two-clock instruction. Keeps code small and fast.
  • ozpropdev wrote: »
    Chip
    Flashed all 4 images of "33g" to relevant FPGA boards.
    All running Ok, will throw some more code at them tomorrow.

    Thanks, Brian!
  • ersmith wrote: »
    cgracey wrote: »
    It's critical, of course, that this latest version of PNut be used to assemble your programs, so that PTRx expression are assembled correctly.

    Wasn't the last proposal for the PTRx encoding backward compatible? It'd be really nice if we didn't have to have separate sets of tools for the P2ES and the next chip :(.

    I implemented what was simplest in logic, because the PTRx computation circuitry was near critical-path and I didn't want to possibly slow things down.

    I need to document what the new scheme is, though you can run PNut and see the output for different expressions. There's not a whole lot to it, and I don't know exactly where it breaks compatibility. Need to look at it.
  • Rayman wrote: »
    Is hdmi still going to be added?

    Yes! It's in there.
  • TonyB_ wrote: »
    Dave Hein wrote: »
    cgracey wrote: »
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

    Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
            getct  high wc ' Read high at cycle N
            getct  low    ' Read low at cycle N+2
            shr    low, #1 wz ' Test if low is 0 or 1
     if_z   add     high, #1   ' Increment high if low wrapped
    

    EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

    Here's the original discussion, which went increasingly off-topic in the later pages:
    https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
    I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

    A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?

    It COULD have, but I already sent the code off to ON Semi. If we wind up doing a bug fix because one of you guys detect a problem, I will make C=CT[32].
  • Cluso99 wrote: »
    TonyB_ wrote: »
    Dave Hein wrote: »
    cgracey wrote: »
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

    Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
            getct  high wc ' Read high at cycle N
            getct  low    ' Read low at cycle N+2
            shr    low, #1 wz ' Test if low is 0 or 1
     if_z   add     high, #1   ' Increment high if low wrapped
    

    EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

    Here's the original discussion, which went increasingly off-topic in the later pages:
    https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
    I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

    A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?
    Why?
    It would be better to not change the C at all, buy it’s likely more silicon and not intuitive unless a pseudo op ode of GETCTH D was used.

    Ah, I could have had it make C = current C. That would have been so simple, and better.
  • TonyB_ wrote: »
    Cluso99 wrote: »
    TonyB_ wrote: »
    Dave Hein wrote: »
    cgracey wrote: »
    The system counter (CT) has been extended to 64 bits. 'GETCT reg WC' returns the top 32 bits of the 64-bit system counter, clears C, and shields the next instruction from interrupts, so that a time-aligned reading of both halves can be made by following 'GETCT high WC' with 'GETCT low'.

    Does hardware handle the case where the lower 32 bits wrap after reading the upper 32 bits? If software has to handle it we would need to do something like this:
            getct  high wc ' Read high at cycle N
            getct  low    ' Read low at cycle N+2
            shr    low, #1 wz ' Test if low is 0 or 1
     if_z   add     high, #1   ' Increment high if low wrapped
    

    EDIT: I'm guessing that the high cycles look ahead by 2 cycles or the low cycles are delayed by 2 cycles so they are in sync, correct?

    Here's the original discussion, which went increasingly off-topic in the later pages:
    https://forums.parallax.com/discussion/169267/cnt-extension-to-64-bit/p1
    I think Chip has implemented GETCT slightly differently now but the method of sync'ing high and low counts is probably the same.

    A suggestion: could 'wc' for high count copy CT[32] to C instead of clearing it?
    Why?
    It would be better to not change the C at all, buy it’s likely more silicon and not intuitive unless a pseudo op ode of GETCTH D was used.

    It seems that using wc is the easiest way to read the high count. Clearing C is not the only possible option and I suggested an alternative but here's another one: copy C to C!

    Yes! I've made a note in the source to make that change if we submit more code, due to a bug or timing fix.
  • evanhevanh Posts: 6,949
    edited 2019-01-30 - 23:39:07
    Well, the problem with event branching, Jxx, instructions within a REP block counts as a design bug. I never saw any fix mentioned for those. See https://forums.parallax.com/discussion/comment/1459273/#Comment_1459273
    "There's no huge amount of massive material
    hidden in the rings that we can't see,
    the rings are almost pure ice."
Sign In or Register to comment.