Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Ariba · 2017-12-16 08:59

Cluso99 wrote: »

The reasoning behind requesting a single bit CRC instruction is the possibility of accumulating the CRC as each bit of the byte/block is transmitted/received.

Often there is insufficient time at the end of a block to perform the CRC calculation for the last/all bytes and get a reply out (ack or nak) in the required time. This is the case with USB and P1.

When I asked for this instruction 4+ years ago, I had spent a lot of time understanding the USB protocol. I previously spent many years designing hardware and writing synchronous communications in the 80's and 90's.

Please, just leave it be a single bit CRC instruction. A lookup table can be used if you want Byte CRC calculation.

But it's the Smartpin logic that receives the single bits, do you even have access to them?

At the end of a received packet there are 2 CRC bytes, which are not part of the CRC calculation for the packetdata, so you should have enough time to calculate the last byte of the data while receiving the two CRC bytes. Then it's just a compare to decide if the CRC is correct and you have to send an ACK or a NAK.

At Transmit you can calculate the CRC one byte ahead, while sending. So you have the CRC value ready when you reach the end of the data.

The case on P2 is totally different from P1, where you have to do every bit with bitbanging at he right time.

Andy

Tor · 2017-12-16 09:14

I don't know if it matters for the bit/byte crc discussion, but there are 12-bit CRCs out there, and the one I worked with couldn't be implemented with a table.

cgracey · 2017-12-16 09:42

Tor wrote: »

I don't know if it matters for the bit/byte crc discussion, but there are 12-bit CRCs out there, and the one I worked with couldn't be implemented with a table.

Interesting. That would need some hardware, then.

I'm still thinking about how to approach this.

Doing 8 bits at once might take too many gates and too much time, but I'm thinking that 4 bits at once might be a good balance. That would accommodate a 12-bit CRC gracefully.

There could be two 2-clock CRC instructions:

CRCBIT  crc,poly        'use C as the input
CRCNIB  crc,poly        'use Q[31:28] as the input, Q shifts left by 4

CRCNIB is shielded from interrupts, as is SETQ. Here is a 12-bit CRC operation:

SHL     data,#32-12
SETQ    data
CRCNIB  crc,poly
CRCNIB  crc,poly
CRCNIB  crc,poly

...and to make a 13-bit CRC operation...

SHL     data,#32-12   WC
CRCBIT  crc,poly
SETQ    data
CRCNIB  crc,poly
CRCNIB  crc,poly
CRCNIB  crc,poly

...and to make a 5-bit CRC operation...

SHL     data,#32-4    WC
CRCBIT  crc,poly
SETQ    data
CRCNIB  crc,poly

...and an 8-bit CRC operation in 8 clocks...

SHL     data,#32-8
SETQ    data
CRCNIB  crc,poly
CRCNIB  crc,poly

How about that? It might seem a little piecemeal, but it adds no flipflops.

ozpropdev · 2017-12-16 10:35

I assume these different CRC formats have different shift directions and LSB/MSB taps?
Can these CRC instructions accommodate these differences?

Rayman · 2017-12-16 13:48

Could overflow of the 8 deep stack be made to trigger an interrupt?

David Betz · 2017-12-16 14:10

Rayman wrote: »

Could overflow of the 8 deep stack be made to trigger an interrupt?

What are people expecting to use the hardware stack for? I don't think it will be used by either C or Spin since it isn't big enough and doesn't provide the flexibility to setup stack frames. It will certainly be good for temporaries but then I would think the depth of 8 would be sufficient. Also, any code that uses it would have to disable interrupts if the interrupt service routines also use the HW stack.

Rayman · 2017-12-16 14:36

I don't use interrupts, at least not yet.

I do use call and ret though, which uses the stack.

I think I've gotten to about 5 deep in the stack.
Maybe I can't get to 8, but it is something that I feel I have to keep in back of my mind...

cgracey · 2017-12-16 15:48

ozpropdev wrote: »

I assume these different CRC formats have different shift directions and LSB/MSB taps?
Can these CRC instructions accommodate these differences?

I think that, at the heart, they are all the same. You may have to pre-reverse your data if it is LSB first.

garryj · 2017-12-16 21:05

cgracey wrote: »
Tor wrote: »

I don't know if it matters for the bit/byte crc discussion, but there are 12-bit CRCs out there, and the one I worked with couldn't be implemented with a table.

Interesting. That would need some hardware, then.

I'm still thinking about how to approach this.

Doing 8 bits at once might take too many gates and too much time, but I'm thinking that 4 bits at once might be a good balance. That would accommodate a 12-bit CRC gracefully.

There could be two 2-clock CRC instructions:
CRCBIT  crc,poly        'use C as the input
CRCNIB  crc,poly        'use Q[31:28] as the input, Q shifts left by 4
...and to make a 5-bit CRC operation...
SHL     data,#32-4    WC
CRCBIT  crc,poly
SETQ    data
CRCNIB  crc,poly
...and an 8-bit CRC operation in 8 clocks...
SHL     data,#32-8
SETQ    data
CRCNIB  crc,poly
CRCNIB  crc,poly
How about that? It might seem a little piecemeal, but it adds no flipflops.

I'm still a bit fuzzy on the calculating of CRC5 and CRC16 in byte chunks, since there can be an odd number of bytes (and bits, in the case of token and start-of-frame packets) of data to calculate. There will be a need for additional "house-keeping" code to ensure that the accumulated CRC value stays within the domain of the polymonial, right? Sorry if this is a stupid question, but my math sucks

For a time-wise comparison using lookup tables, the CRC16 takes 13 clocks/byte and for CRC5 ~32 clocks for the 11-bit token and start-of-frame data.

Also, contrary to my earlier statement, USB CRC calcs are done LSb->MSb

Cluso99 · 2017-12-16 21:47

The CRCBIT instruction will handle the various CRC versions.

Remember that there are two mainstream CRC16 in common use...
The original IBM version and the CCITT version. Then Microcomputer implemented the MNP protocol, but because they missunderstood how the CRC16 worked, they incorrectly implemented it - a bug! But it became yet another standard although it used one of the two CRC16 polynomials.
The variations are mainly to do with the initial value, MSB or LSB, and if the result is inverted, and which byte (for 16 bit crc) comes first. All these variations are covered by selecting the polynomial, the initial value, reversing the result, and inverting the result. So you can calculate these CRCs by the one method.

lonesock · 2017-12-16 22:22

On the Prop1, if you have a way to get each bit into a flag already, you can compute a CRC with only 2 additional instructions per bit. For example, if you have the bit in question stored in Z you can do

          shr crc, #1 wc
IF_C_NE_Z xor crc, crc_poly

Jonathan

cgracey · 2017-12-17 06:13

"garryj wrote:

Also, contrary to my earlier statement, USB CRC calcs are done LSb->MSb

That wouldn't be a problem. Just use a REV instead of a SHL.

cgracey · 2017-12-17 07:04

lonesock wrote: »
On the Prop1, if you have a way to get each bit into a flag already, you can compute a CRC with only 2 additional instructions per bit. For example, if you have the bit in question stored in Z you can do
          shr crc, #1 wc
IF_C_NE_Z xor crc, crc_poly
Jonathan

That's a nice way to do it. Almost makes a CRC instruction look silly.

Cluso99 · 2017-12-17 09:01

cgracey wrote: »
lonesock wrote: »
On the Prop1, if you have a way to get each bit into a flag already, you can compute a CRC with only 2 additional instructions per bit. For example, if you have the bit in question stored in Z you can do
          shr crc, #1 wc
IF_C_NE_Z xor crc, crc_poly
Jonathan
That's a nice way to do it. Almost makes a CRC instruction look silly.

It's only silly if you have the time for extra instruction. That is my point.
However, it's the same deal with those other TJx and DJx instructions.

I have a similar NRZI instruction request but I haven't had time to re-verify my request of years ago.

The combination of the NRZI and CRC is essential for bit-banging FS USB or other similar protocols.

While we do have USB with SmartPins, it's possible there are other protocols where the specific SmartPins functions won't work. For the little work/silicon involved, it seems prudent to have the flexible options these two instructions would provide. I would not have asked if these two were imperative for the bit-bang approach. This could also cover any shortcomings/bugs in the SmartPins, should we be unlucky to find some.

cgracey · 2017-12-17 09:53

Cluso99,

I got the CRC worked out. We have CRCBIT (uses C) and CRCNIB (uses Q[31:28], shifts Q left by 4 bits, shields interrupts):

dat	org

	bmask	dirb,#15

	jmp	#.nibs		'nibs, comment out for bits


.bits	rep	#2,#32		'32 bits, 130 clocks
	shl	b,#1	wc
	crcbit	crc,poly

	jmp	#.done


.nibs	setq	b		'8 nibbles, 18 clocks
	crcnib	crc,poly
	crcnib	crc,poly
	crcnib	crc,poly
	crcnib	crc,poly
	crcnib	crc,poly
	crcnib	crc,poly
	crcnib	crc,poly
	crcnib	crc,poly


.done	mov	outb,crc	'same result of $E578
	jmp	#$

b	long	$12345678	'data
crc	long	$FFFF		'initial crc
poly	long	$8005 >< 16	'polynomial

It generates correct results!

You can do bits or nibbles at a time. Nibble operations can be stacked, handling four bits every two clocks.

Cluso99 · 2017-12-17 12:53

Excellent thanks Chip!

Peter Jakacki · 2017-12-18 13:33

While testing the new SD driver together with improved Ethernet drivers (W5500 block speeds x6 faster =1MB/s) I suddenly ran into a problem with my PNut compiled kernel itself. It started acting strange and wouldn't load Forth code and it would continue to switch back to binary input mode even though there wasn't anything there telling it to. The exact same version worked well earlier in the day so I tried to track down the bug that made no sense until I remembered the extra nop before the coginits that I had used before but was now disabled. As soon as I enabled the nop the problem went away, and as soon as I disable the problem reappears.

But the exact same version on the exact same V29 has been working. It really seems to be some marginal timing problem, perhaps only with the A9s but I have had funny problems for no real reason on previous FPGA versions as you know. I will try to pin down exactly what is happening while it is still playing up but I think it has something to do with the cogid I do with every coginit that runs from hubexec so I will look there.

EDIT: I moved the nop to just before the cogid in hubexec and the problem went away. Here are the sections of code in question:

dat
		orgh	0

		org

	        clkset  #$FF                    'switch to 80MHz (if pll, else 50MHz)
reboot
'                nop			' seems to need delay after clkset (otherwise next coginit ids incorrectly)
                coginit #7,#@RESET
                coginit #6,#@RESET
                coginit #5,#@RESET
                coginit #4,#@RESET
                coginit #3,#@RESET
                coginit #2,#@RESET	'vgarun (when DACs are made available)
                coginit #1,#@rxcog
                coginit #0,#@RESET

org 0
RESET		call	#INITCOG		' run non-time critical init from hubexec
		jmp	#doNEXT

'***************************************** HUB CODE ***************************
'
dat
 		orgh

version		long	vernum,vertime
vername		byte	"V28 BOOT"


INITCOG
		loc	PTRA,#@IDLE
		nop				' !!! added the nop here instead and the bug is gone
		cogid	X  			'only cog 0 uses the serial port by default
		tjnz	X,#INITSTKS

cgracey · 2017-12-18 14:51

Peter,

I suspect that this bug has something to do with the hub FIFO interface underflowing in some cases.

I will make a new version which adds a few levels to the FIFO buffer and I will increase the 'full' level. We'll see if this fixes the problem.

I had made a FIFO simulator for the 16-cog version that revealed how many levels were needed. Perhaps my reduction for 8 cogs was too simplistic. Or, my simulation model was incomplete.

cgracey · 2017-12-18 17:52

Peter, could you please show more context?

I am interested to know what happens from the prior branch (which resets the FIFO) to the failure point.

ikemschn · 2017-12-18 18:34

Being an "old" Prop1-fan and following the progress of your work for years, here a question a bit apart from the ongoing context in this thread:

Are there any Prop123-A9-boards left for sale?
(and if yes: how to get one to germany?)

Would be great !

jmg · 2017-12-18 18:37

Peter Jakacki wrote: »

.... until I remembered the extra nop before the coginits that I had used before but was now disabled. As soon as I enabled the nop the problem went away, and as soon as I disable the problem reappears.
...
EDIT: I moved the nop to just before the cogid in hubexec and the problem went away. Here are the sections of code in question:

Checking what you are saying here - there are (at least?) two locations where NOP can fix the problem(s), and either one works ?
Both seem to prefix COGxx opcodes ? One comes after clkset.

If the issue was only clkset locked/related, should the 2nd NOP have any effect ?
What is the exact relative timing of those two lines, as in how many sysclks separate them ?

Peter Jakacki · 2017-12-18 20:27

All the files I am using are in my P2 dropbox folder but here are the current files. Only taqoz.spin2 is needed though.

cgracey · 2017-12-18 22:15

ikemschn wrote: »

Being an "old" Prop1-fan and following the progress of your work for years, here a question a bit apart from the ongoing context in this thread:

Are there any Prop123-A9-boards left for sale?
(and if yes: how to get one to germany?)

Would be great !

We have some in stock. They are part #60065. They are $475.00.

If you email Chantal at Parallax, she can get the order going: cwoods@parallax.com.

Welcome aboard!!!

Peter Jakacki · 2017-12-18 22:43

I have some further information about that startup bug I'm seeing. Pretty much adding a nop almost anywhere in the startup path fixes the problem. Now if I add three nops instead of one the problem comes back.

INITCOG
		nop
		nop
		nop
		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
		cogid	X
		tjnz	X,#INITSTKS

My console prompt includes a radix symbol which should be # for decimal but this changes to something else as a symptom of the problem.
TAQOZ#
may instead startup as:
TAQOZ%
for instance, but that is only a symptom.

cgracey · 2017-12-18 22:45

Peter, thanks for the info. This seems to be a hubexec bug.

In your post above, does the COGID return the wrong value?

Peter Jakacki · 2017-12-18 23:07

Oh boy, so I tried this code:

INITCOG
		cogid	X
		nop
		wrbyte	X,#$0F0
		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
		cogid	X
		tjnz	X,#INITSTKS

Then I examine location $F0 which should be 0 for console on cog 0 but it's 2. So I comment out the nop and I get a value of 3. This sure is interesting.......
edit: I think maybe that test is not that reliable since it relies upon the streamer, and what sequence it is in for that cog and so whatever comes last, so I will try something a little different

cgracey · 2017-12-18 23:18

Peter Jakacki wrote: »
Oh boy, so I tried this code:
INITCOG
		cogid	X
		nop
		wrbyte	X,#$0F0
		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
		cogid	X
		tjnz	X,#INITSTKS
Then I examine location $F0 which should be 0 for console on cog 0 but it's 2. So I comment out the nop and I get a value of 3. This sure is interesting.......

Whoa!!!

Maybe COGID is being stalled by a hubexec fetch, winding up with a stale ID from the hub. I think this is it.

Let me recompile...

Peter Jakacki · 2017-12-18 23:31

I made a mistake maybe in writing the value to the same location in hub since it relied upon the streamer. So I wrote to the cog's RAM instead with this code which produces the radix prompt symptom but indicates that it is cog #0. However I have a feeling that another cog also thinks it is supposed to be the console cog too so I will check further.

INITCOG
		nop
		cogid	X
		mov	clkdly,X
		or	clkdly,#$80
		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
		tjnz	X,#INITSTKS

Even though the prompt shows $ instead of # it seems it correctly registered as cog#0 (or'd with $80 check pattern) at Tachyon internal register location #28 (clkdly).

TAQOZ$ #28 COG@ .BYTE 80 ok

The radix or base is stored as "user registers" for each cog in hub RAM but each user register area should be unique. The base for cog#0 is at $800.

TAQOZ$ base .WORD 0816 ok

Peter Jakacki · 2017-12-18 23:46

Sorry about this, I'm still not sure but I don't think it is getting confused about the cogid. Look at this test which displays a symptom:

INITCOG
		cogid	X
		mov	r1,X
		add	r1,#$F0
		wrbyte	r1,R1
		loc	PTRA,#@IDLE		' default startup into Instruction Pointer
		tjnz	X,#INITSTKS
		drvh	#tx_pin			'set tx output high
		loc	PTRA,#@TERMINAL
		wrpin	#$7C,#tx_pin		' asynchronous transmit
		wxpin	##baudval+7,#tx_pin	' baud 8 data

It reports correctly for all cogs not including cog#1 which is a serial cog.

TAQOZ$ $F0 $10 DUMP 
00.00F0: F0 00 F2 F3 F4 F5 F6 F7 00 00 00 00 00 00 00 00    ................ ok

But this sure looks funny because they should all remain zeroed.

TAQOZ$ $F0 $10 ERASE  ok
TAQOZ$ $F0 $10 DUMP 
00.00F0: 00 00 F2 F3 F4 F5 F6 F7 00 00 00 00 00 00 00 00    ................ ok

So leave it to me as I will check further and cogid seems to be fine.

cgracey · 2017-12-18 23:47

Peter, you are using the BeMicro-A9 right now, right?

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments