Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Peter Jakacki · 2017-11-23 12:22

Thanks for that info Chip, I ended up removing the nop after I figured that it was safe to perform a testp after the clock falling edge. So this works at 80MHZ clocking data at 10MHz.

SPIRD		rep	@.end,#8		' 8 bits
		outnot 	sck			' clock (low high)
		outnot	sck
		testp	miso wc			' read data from card
		rcl	tos,#1			' shift in msb first
.end		ret

cgracey · 2017-11-23 12:34

Peter, that wound up being the same as my ROM code:

'
'
' SPI byte in
'
spi_in		rep	@.in,#8			'ready to input a byte
		outh	#spi_ck			'clock pin high
		outl	#spi_ck			'clock pin low
		testp	#spi_dq		wc	'sample data pin ('testp' is from before 'outh')
		rcl	x,#1			'save data bit
.in
		ret

evanh · 2017-11-23 12:44

Modifying that a little, and I'm not sure how much this would be valued, but here's a slight unrolling to even out the clock duty:

'
'
' SPI byte in
'
spi_in		outh	#spi_ck			'clock pin high
		rep	@.in,#7			'ready to input a byte
		outl	#spi_ck			'clock pin low
		testp	#spi_dq		wc	'sample data pin ('testin' is from before 'outl')
		outh	#spi_ck			'clock pin high
		rcl	x,#1			'save data bit
.in
		outl	#spi_ck			'clock pin low
		testp	#spi_dq		wc	'sample data pin ('testin' is from before 'outl')
		rcl	x,#1			'save data bit
		ret

evanh · 2017-11-23 12:47

And it's two system clocks shorter too.

cgracey · 2017-11-23 12:58

Looks good, Evanh. The SPI memory outputs new data on the falling edge of the clock, so you'd want to sample just before the clock drops, which all our code does in these snippets.

evanh · 2017-11-23 13:05

Has to achieve a high first though!

Peter Jakacki · 2017-11-23 13:10

My SPI code just toggles the clock so that I can enter with it high or low to cater for various chips. The data that I'm reading though is already available before I do any clocking and so the clock is simply outputting the next data bit.

Chip, do you think it's a good idea to use smartpins for SPI for the boot code? I will probably try this out next just so I have the options.

evanh · 2017-11-23 13:18

cgracey wrote: »

Looks good, Evanh. The SPI memory outputs new data on the falling edge of the clock, so you'd want to sample just before the clock drops, which all our code does in these snippets.

The SPI slave has to be clocking out on the rising clock edge! Otherwise the first bit is not present for the first shift.

jmg · 2017-11-23 19:18

cgracey wrote: »

The extra clock cycle needed at 80Mhz is due to the unconstrained timing paths not allowing same-clock feedback from a register, through logic, out to a pin, back in from a pin, through more logic, then back into a register, all in 12.5ns.

Peter also reported issues at 40MHz, so those delays look to be significant.

Your notes for IN say "registered from physical pin", so I'm unclear where this slack is sneaking in ...

cgracey wrote: »

There's no need to suppose that some variable number of NOPs might be needed, based on frequency. It is safest to write code for the 80MHz timing situation, as it will always work.

Problem is, the real world does not quite work that way.

As long as the silicon can add an arbitrary extra clock, based on SysCLK and PVT, it will bite users.

In reality, the user codes until it works, and then later someone else may adjust the clock speed.

Even if they strive to carefully count cycles, there is no proof they have that right.
As your numbers show, even counting opcodes is too coarse, as the sampling can be mid-opcode.

So long as the silicon has the risk, proving it is field safe is very hard indeed.
ie You may just have a fluked 'good' part.

jmg · 2017-11-23 19:26

cgracey wrote: »

I went through the Verilog code to determine the IN/OUT paths to/from the cogs:

OUT		registered in cog on 'go' (on last clock of instruction)
		registered in hub after OR'ing all cogs' OUT signals *
		goes through smart pin logic to physical pin
		(total delay = 1 clock after last clock of instruction)

		DRVC	#30	'2 (+1!)	updates after 1 clock after 2-clock instruction


IN for D/S	registered from physical pin
		goes through smart pin mux and logic/filtering
		registered in hub for fan-out to all cogs *
		registered in cog on 'go' (last clock of prior instruction)
		(total delay = 3 clocks)

		TESTB	INB,#31	'(?3+) 2	samples 3 clocks before 2-clock instruction


IN for TESTP{N}	registered from physical pin
		goes through smart pin mux and logic/filtering
		registered in hub for fan-out to all cogs *
		registered in cog on 'get' (first clock of instruction)
		(total delay = 2 clocks, since it arrives in first clock of instruction)

		TESTP	#31	'(?2+) 2	samples 2 clocks before 2-clock instruction


* Can *maybe* be eliminated in ASIC

As you can see, TESTP/TESTPN get IN data that is one clock fresher than instructions which read INA/INB via D or S.

I'm not easily seeing a long path anywhere there, especially with the "registered from physical pin" on Pin in ?

In terms is *possible* ASIC speed ups, the IN path seems safer than OUT.
The OUT includes a wide OR, (slow), and you want the OUT pin to be close to the clock edge, to allow other external parts setup times. Smart pin bypass sounds a fast path.

For IN, what slack is 'registered in hub for fan-out to all cogs' swallowing, there is no wide-OR on IN ?
Could that be easier to remove than the OUT case ?

What exactly is "goes through smart pin mux and logic/filtering" ?
The mux here is a simple bypass for no-smart-pin, and the filtering is default-off right ?

evanh · 2017-11-23 20:56

JMG,
I read it as the issue is on the output. Input is good, albeit with a fixed lag.

It looks like the big OR is hidden behind a register in the Hub. So that leaves delays from long route from the Hub and also any mux'ing with associated Smartpin. The Smartpin propagation should be very short.

I'm suspicious, without any changes, that the final silicon will be a lot improved simply because the long route won't have all the FPGA's selectable interconnect.

I can almost bet the problem is just the FPGA ... but I won't.

jmg · 2017-11-23 22:08

evanh wrote: »

JMG,
I read it as the issue is on the output. Input is good, albeit with a fixed lag.

That's a good point, a scope should show which of the OUT, or IN, paths gets the bonus added clock.

Triggering the scope could be tricky, maybe connected to Smart Pin toggling pin, and SW-Pin, would presume the Smart-pin never had this effect, and so could not move ?

An alternative could be to add a SysClk/N scope pin to the FPGA build, that is eg /128 using a fast sync counter and then the SW can pace-itself to the same total clocks. Such a fixed ref avoids all Smart Pin complexities, and hopefully 'fails last'.

That fast-test-pin could also check IN effects, by wiring to a Smart Pin.

Cluso99 · 2017-11-24 00:31

From my understanding, one of the gotchas with FPGA is that the "OR"ing of the various "OUT"s come from a number of places...

1. each cog (ultimately the OUTx registers by way of various instructions)
2. smart pins
3. streamer

This results in a big OR gate. In the FPGA, only 4 or 6 lines can be "or"ed at a time. So cascading OR gates needs to be done. On top of that, we have the routing lines adding delays.

In the real silicon, the "OR" gate can have as many inputs as required, resulting in a single OR gate with less routing delays.

Cluso99 · 2017-11-24 01:10

evanh wrote: »

cgracey wrote: »

Looks good, Evanh. The SPI memory outputs new data on the falling edge of the clock, so you'd want to sample just before the clock drops, which all our code does in these snippets.

The SPI slave has to be clocking out on the rising clock edge! Otherwise the first bit is not present for the first shift.

In SPI Mode0 (the preferred SD mode) & Mode3 (Flash supports modes 0 & 3):
Data is output on CLK falling edge (must be valid within 6-8ns after the falling edge)
Data is sampled on CLK rising edge (must be valid tds>2ns before, and thd>2ns after the rising edge)

evanh · 2017-11-24 02:01

Cluso99 wrote: »

In SPI Mode0 (the preferred SD mode) & Mode3 (Flash supports modes 0 & 3):
Data is output on CLK falling edge (must be valid within 6-8ns after the falling edge)
Data is sampled on CLK rising edge (must be valid tds>2ns before, and thd>2ns after the rising edge)

Ah, thanks for the nudge, I've just had a nosy at a MicroChip SPI flash datasheet. The detail I hadn't understood is the last clock (rising) of the read command and address is also the first clock (falling) of the data reply.

The first data bit is always present for reading back before the SPI-In-routine clock even gets raised ... so only 7 data clocks will be needed in all those examples above.

evanh · 2017-11-25 05:52

Chip,
I just bumped into a small group of TESTP/TESTPN instructions that are explicitly encoded for the pin I/O bits. I note there is also a general case of these instructions that looks to be 100% compatible. The similarity is such that the {#}S bit selection field from the general cases is also listed in the pin I/O cases but these ones don't actually use S field at all.

I'm thinking the pin I/O versions should just be aliases of the general versions.

cgracey · 2017-11-25 06:35

evanh wrote: »

Chip,
I just bumped into a small groyup of TESTP/TESTPN instructions that are explicitly encoded for the pin I/O bits. I note there is also a general case of these instructions that looks to be 100% compatible. The similarity is such that the {#}S bit selection field from the general cases is also listed in the pin I/O cases but these ones don't actually use S field at all.

I'm thinking the pin I/O versions should just be aliases of the general versions.

For immediate pin numbers, you could use 'TESTB INx,#pin', but for variable pin numbers, you'd need extra code to resolve INx. That's where TESTP/TESTPN are needed.

evanh · 2017-11-25 07:21

According to the doc, S can be a register direct addressing mode too.

I do now, however, see one difference. That is the bit range is 0-63 for the TESTPx instructions verses 0-31 for the TESTBx instrctions.

EDIT: Oh, lol, I've kind of noted the same thing in a different way. You can't just runtime alias the two INx registers. Understood now.

evanh · 2017-11-25 07:30

Ha, I've probably asked the same thing in the past and forgotten the answer... it's a touch of deja vu just looking down the instruction list and seeing this virtually identical group of instructions.

cgracey · 2017-11-25 07:51

The same thing happens to me sometimes.

TonyB_ · 2017-11-25 13:34

A few questions/comments regarding some D-only instructions in Instructions v27:

1. TESTPx are shown with ,{#}S but there is no S. Is this a copy-and-paste error from TESTB?

2. TESTPx and the following DIRx instructions have identical opcodes.

EEEE 1101011 CZL DDDDDDDDD 001000110	TESTP   D,{#}S     XORC/XORZ
EEEE 1101011 CZL DDDDDDDDD 001000110	DIRRND  {#}D           {WCZ}

3. SPLITB to REGEXP descriptions mention S, but D must be S and it would be less confusing to use D only.

cgracey · 2017-11-25 18:32

TonyB_ wrote: »
A few questions/comments regarding some D-only instructions in Instructions v27:

1. TESTPx are shown with ,{#}S but there is no S. Is this a copy-and-paste error from TESTB?

2. TESTPx and the following DIRx instructions have identical opcodes.
EEEE 1101011 CZL DDDDDDDDD 001000110	TESTP   D,{#}S     XORC/XORZ
EEEE 1101011 CZL DDDDDDDDD 001000110	DIRRND  {#}D           {WCZ}
3. SPLITB to REGEXP descriptions mention S, but D must be S and it would be less confusing to use D only.

Thanks! That "D,{#}S" was a mistake. I changed it to "{#}D", like it should it have been.

There is a bunch of overlap between the TESTPx and the DIRx/OUTx/FLTx/DRVx encodings. The CZ bits differentiate the two sets. If the CZ bits are %01 or %10, it's TESTPx. If the CZ bits are %00 or %11, it's DIRx/OUTx/FLTx/DRVx. In other words, if one flag is affected, it's TESTPx. If neither or both flags are affected, it's the others.

Cluso99 · 2017-11-25 23:48

V27a problems...

I have V26 working and my program works.

So I retried programming V27a (BeMicro_A9_Prop2_v27.jic 11/11/2017 2:39AM) and worked.
But when I repower (replug USB programming port) there is no Cog Leds lit (shouldn't Cog 0 be lit).
I have an RGB LED connected to P5, P7 & P9 via resistors to GND. These are ON indicationg these pins are High. On v26 the RGB led are OFF.

PNut_v27a.exe 19/11/2017 8:31AM (dd/mm/yyyy format) cannot find the USB port for downloading - ie it cannot find the prop.

Do I have the correct versions? Am I missing something?

ozpropdev · 2017-11-26 00:02

Cluso99 wrote: »

V27a problems...

I have V26 working and my program works.

So I retried programming V27a (BeMicro_A9_Prop2_v27.jic 11/11/2017 2:39AM) and worked.
But when I repower (replug USB programming port) there is no Cog Leds lit (shouldn't Cog 0 be lit).
I have an RGB LED connected to P5, P7 & P9 via resistors to GND. These are ON indicationg these pins are High. On v26 the RGB led are OFF.

PNut_v27a.exe 19/11/2017 8:31AM (dd/mm/yyyy format) cannot find the USB port for downloading - ie it cannot find the prop.

Do I have the correct versions? Am I missing something?

V27 is a flat liner for the Bemicro-CVA9. You need V27z.
https://drive.google.com/file/d/1omGhklqFgAEEoR0jrSNupT_UUxO7Cye8/view?usp=sharing

ozpropdev · 2017-11-26 00:20

cgracey wrote: »
TonyB_ wrote: »
A few questions/comments regarding some D-only instructions in Instructions v27:

1. TESTPx are shown with ,{#}S but there is no S. Is this a copy-and-paste error from TESTB?

2. TESTPx and the following DIRx instructions have identical opcodes.
EEEE 1101011 CZL DDDDDDDDD 001000110	TESTP   D,{#}S     XORC/XORZ
EEEE 1101011 CZL DDDDDDDDD 001000110	DIRRND  {#}D           {WCZ}
3. SPLITB to REGEXP descriptions mention S, but D must be S and it would be less confusing to use D only.
Thanks! That "D,{#}S" was a mistake. I changed it to "{#}D", like it should it have been.

There is a bunch of overlap between the TESTPx and the DIRx/OUTx/FLTx/DRVx encodings. The CZ bits differentiate the two sets. If the CZ bits are %01 or %10, it's TESTPx. If the CZ bits are %00 or %11, it's DIRx/OUTx/FLTx/DRVx. In other words, if one flag is affected, it's TESTPx. If neither or both flags are affected, it's the others.

Here's the overlapping opcodes expanded for clarity

EEEE 0100000 00I DDDDDDDDD SSSSSSSSS        BITL    D,S/#
EEEE 0100000 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       WZ
EEEE 0100000 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       WC
EEEE 0100000 11I DDDDDDDDD SSSSSSSSS        BITL    D,S/#       WCZ

EEEE 0100001 00I DDDDDDDDD SSSSSSSSS        BITH    D,S/#
EEEE 0100001 01I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       WZ
EEEE 0100001 10I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       WC
EEEE 0100001 11I DDDDDDDDD SSSSSSSSS        BITHw   D,S/#       WCZ

EEEE 0100010 00I DDDDDDDDD SSSSSSSSS        BITC    D,S/#
EEEE 0100010 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       ANDZ
EEEE 0100010 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       ANDC
EEEE 0100010 11I DDDDDDDDD SSSSSSSSS        BITC    D,S/#       WCZ

EEEE 0100011 00I DDDDDDDDD SSSSSSSSS        BITNC   D,S/#
EEEE 0100011 01I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       ANDZ
EEEE 0100011 10I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       ANDC
EEEE 0100011 11I DDDDDDDDD SSSSSSSSS        BITNC   D,S/#       WCZ

EEEE 0100100 00I DDDDDDDDD SSSSSSSSS        BITZ    D,S/#
EEEE 0100100 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       ORZ
EEEE 0100100 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       ORC
EEEE 0100100 11I DDDDDDDDD SSSSSSSSS        BITZ    D,S/#       WCZ

EEEE 0100101 00I DDDDDDDDD SSSSSSSSS        BITNZ   D,S/#
EEEE 0100101 01I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       ORZ
EEEE 0100101 10I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       ORC
EEEE 0100101 11I DDDDDDDDD SSSSSSSSS        BITNZ   D,S/#       WCZ

EEEE 0100110 00I DDDDDDDDD SSSSSSSSS        BITRND  D,S/#
EEEE 0100110 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       XORZ
EEEE 0100110 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       XORC
EEEE 0100110 11I DDDDDDDDD SSSSSSSSS        BITRND  D,S/#       WCZ

EEEE 0100111 00I DDDDDDDDD SSSSSSSSS        BITNOT  D,S/#
EEEE 0100111 01I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       XORZ
EEEE 0100111 10I DDDDDDDDD SSSSSSSSS        TESTBN  D,S/#       XORC
EEEE 0100111 11I DDDDDDDDD SSSSSSSSS        BITNOT  D,S/#       WCZ




EEEE 1101011 00L DDDDDDDDD 001000000        DIRL    D/#
EEEE 1101011 01L DDDDDDDDD 001000000        TESTP   D/#         WZ
EEEE 1101011 10L DDDDDDDDD 001000000        TESTP   D/#         WC
EEEE 1101011 11L DDDDDDDDD 001000000        DIRL    D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000001        DIRH    D/#
EEEE 1101011 01L DDDDDDDDD 001000001        TESTPN  D/#         WZ
EEEE 1101011 10L DDDDDDDDD 001000001        TESTPN  D/#         WC
EEEE 1101011 11L DDDDDDDDD 001000001        DIRH    D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000010        DIRC    D/#
EEEE 1101011 01L DDDDDDDDD 001000010        TESTP   D/#         ANDZ
EEEE 1101011 10L DDDDDDDDD 001000010        TESTP   D/#         ANDC
EEEE 1101011 11L DDDDDDDDD 001000010        DIRC    D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000011        DIRNC   D/#         {WCZ}
EEEE 1101011 01L DDDDDDDDD 001000011        TESTPN  D/#         ANDZ
EEEE 1101011 10L DDDDDDDDD 001000011        TESTPN  D/#         ANDC
EEEE 1101011 11L DDDDDDDDD 001000011        DIRNC   D/#         {WCZ}

EEEE 1101011 00L DDDDDDDDD 001000100        DIRZ    D/#
EEEE 1101011 01L DDDDDDDDD 001000100        TESTP   D/#         ORZ
EEEE 1101011 10L DDDDDDDDD 001000100        TESTP   D/#         ORC
EEEE 1101011 11L DDDDDDDDD 001000100        DIRZ    D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000101        DIRNZ   D/#
EEEE 1101011 01L DDDDDDDDD 001000101        TESTPN  D/#         ORZ
EEEE 1101011 10L DDDDDDDDD 001000101        TESTPN  D/#         ORC
EEEE 1101011 11L DDDDDDDDD 001000101        DIRNZ   D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000110        DIRRND  D/#
EEEE 1101011 01L DDDDDDDDD 001000110        TESTP   D/#         XORZ
EEEE 1101011 10L DDDDDDDDD 001000110        TESTP   D/#         XORC
EEEE 1101011 11L DDDDDDDDD 001000110        DIRRND  D/#         WCZ

EEEE 1101011 00L DDDDDDDDD 001000111        DIRNOT  D/#
EEEE 1101011 01L DDDDDDDDD 001000111        TESTPN  D/#         XORZ
EEEE 1101011 10L DDDDDDDDD 001000111        TESTPN  D/#         XORC
EEEE 1101011 11L DDDDDDDDD 001000111        DIRNOT  D/#         WCZ

Cluso99 · 2017-11-26 21:56

ozpropdev wrote: »

Cluso99 wrote: »

V27a problems...

I have V26 working and my program works.

So I retried programming V27a (BeMicro_A9_Prop2_v27.jic 11/11/2017 2:39AM) and worked.
But when I repower (replug USB programming port) there is no Cog Leds lit (shouldn't Cog 0 be lit).
I have an RGB LED connected to P5, P7 & P9 via resistors to GND. These are ON indicationg these pins are High. On v26 the RGB led are OFF.

PNut_v27a.exe 19/11/2017 8:31AM (dd/mm/yyyy format) cannot find the USB port for downloading - ie it cannot find the prop.

Do I have the correct versions? Am I missing something?

V27 is a flat liner for the Bemicro-CVA9. You need V27z.
https://drive.google.com/file/d/1omGhklqFgAEEoR0jrSNupT_UUxO7Cye8/view?usp=sharing

Thanks Brian. v27z jic is working. Are the SD pins mapped on this version, and if so, are they the P3x or P6x sets and is SW1 used?

I also found v27zz which has fixed SD mapping but I missed where the SD is mapped.

TonyB_ · 2017-11-27 22:44

ozpropdev wrote: »
cgracey wrote: »
TonyB_ wrote: »
A few questions/comments regarding some D-only instructions in Instructions v27:

1. TESTPx are shown with ,{#}S but there is no S. Is this a copy-and-paste error from TESTB?

2. TESTPx and the following DIRx instructions have identical opcodes.
EEEE 1101011 CZL DDDDDDDDD 001000110	TESTP   D,{#}S     XORC/XORZ
EEEE 1101011 CZL DDDDDDDDD 001000110	DIRRND  {#}D           {WCZ}
3. SPLITB to REGEXP descriptions mention S, but D must be S and it would be less confusing to use D only.
Thanks! That "D,{#}S" was a mistake. I changed it to "{#}D", like it should it have been.

There is a bunch of overlap between the TESTPx and the DIRx/OUTx/FLTx/DRVx encodings. The CZ bits differentiate the two sets. If the CZ bits are %01 or %10, it's TESTPx. If the CZ bits are %00 or %11, it's DIRx/OUTx/FLTx/DRVx. In other words, if one flag is affected, it's TESTPx. If neither or both flags are affected, it's the others.
Here's the overlapping opcodes expanded for clarity
EEEE 0100000 00I DDDDDDDDD SSSSSSSSS        BITL    D,S/#
EEEE 0100000 01I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       WZ
EEEE 0100000 10I DDDDDDDDD SSSSSSSSS        TESTB   D,S/#       WC
EEEE 0100000 11I DDDDDDDDD SSSSSSSSS        BITL    D,S/#       WCZ
<snip>

Thanks Chip & ozpropdev.

Cluso99 · 2017-11-29 03:25

Peter J,
I am not finding any lockups on v27z. Run some outputting to serial for >1hr so far.
Haven't yet figured out if SD is on ~P38 or ~P60.

Update: now been running for more than 6 hours, no lockups.

Rayman · 2017-12-01 15:27

Anybody remember when and why multi-tasking was removed?
Last mention of "settask" instruction seems to be circa 2014...

I guess I didn't pay a whole lot of attention back then because there were 16 cores...

evanh · 2017-12-01 16:21

The feature disappeared along with the Prop2-Hot.

When HubExec was devised, coherent integration with the time sliced threads was troublesome. HubExec was clearly the favourite of two and, as you've noted, we suddenly had 16 true cores to play with so no-one felt losing the threads was a significant loss.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments