Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

TonyB_ · 2017-12-14 20:57

cgracey wrote: »

TonyB_ wrote: »

As the stack is now 32-bit would it be better for C and Z to be bits 31 and 30 for CALLs and POPs?

The CALLD instructions use the same bit locations to store the flags, too. If they all changed, it could pave the way for future program counter expansion.

Is that a yes, then?

cgracey · 2017-12-14 20:58

TonyB_ wrote: »

cgracey wrote: »

TonyB_ wrote: »

As the stack is now 32-bit would it be better for C and Z to be bits 31 and 30 for CALLs and POPs?

The CALLD instructions use the same bit locations to store the flags, too. If they all changed, it could pave the way for future program counter expansion.

Is that a yes, then?

Well, I don't know. You're talking maybe 5 minutes of work here.

David Betz · 2017-12-14 21:44

cgracey wrote: »

David Betz wrote: »

Since so many changes are being made again I'm wondering if you might add a TLB so we can do virtual memory.

These changes are just minor refinements, not big things that would require deep re-thinking.

Yeah, I knew that would be the answer but I'm surprised that things are still in flux. Wasn't it a few months ago when we were told that the design was frozen except for the ROM?

Seairth · 2017-12-14 21:49

cgracey wrote: »

Seairth wrote: »

Now that the stack has been widened to 32 bits, I expect more data to be put in it. Is 8 levels enough? I know it's not meant to be a general-purpose stack, but it does have the advantage of being a "standard location" for passing parameters and return values between third-party code.

The problem is that the stack is currently built from flipflops. To get any real size increase, we would have to go to a RAM. That might be a big change with OnSemi. I think we would have to buffer the top of the stack with 32 flipflops.

Ahh. I have to say I miss the P2-hot LUT stack pointers.

cgracey · 2017-12-14 22:19

David Betz wrote: »

...Wasn't it a few months ago when we were told that the design was frozen except for the ROM?

That sounds right.

Yanomani · 2017-12-14 22:25

Hi Chip

If I understood it correctly; having 32 flip-flops to buffer the top level of the stack means that you've gained at least two clock cycles to access the stack ram and store/retrieve each pushed/popped item, having direct access to the buffered top level meanwhile.

This also means that the stack ram size could increase at will, limited only by phisical space availability.

Based on the assumption that stack overflow/underflow is mostly a software concern, it means you could maintain two independent binary pointers that are only realigned (zeroed) at each cog start.

Also, because it's not a dual ported Fifo, meant for simultaneous read and write access, all the complexity of Hamming coded empty/full counters/detectors and synced read/write clocks could be avoided.

Hope I'd figured it right!

P.S. Sorry, I didn't figured it right.

Stacks are Lifos by nature, then a single write/read pointer does suffice for accessing it.

If it is meant to be directly accessible only by call/push and ret/pop, its addressing pointer don't need to be initialized to any particular value at all.

Sometimes, drinking a lot of coffee has the same effects of drinking a lot of beer or wine. My fault.

Henrique

cgracey wrote: »

Seairth wrote: »

Now that the stack has been widened to 32 bits, I expect more data to be put in it. Is 8 levels enough? I know it's not meant to be a general-purpose stack, but it does have the advantage of being a "standard location" for passing parameters and return values between third-party code.

The problem is that the stack is currently built from flipflops. To get any real size increase, we would have to go to a RAM. That might be a big change with OnSemi. I think we would have to buffer the top of the stack with 32 flipflops.

Cluso99 · 2017-12-14 22:39

cgracey wrote: »

Cluso99 wrote: »

cgracey wrote: »

Thanks, Cluso. I will try to get my head around that tomorrow.

We need to be able to put whole bytes into it, right?

That wasn't my intention. My idea was a single bit at a time, and any width CRC with any formula.

A REP loop could perform a shift instruction followed by the CRCBIT instruction.

My thought was to make the instruction as generic as possible to allow for all the various CRC incarnations.

But how wide must the rotator be? 16 bits with options for 8 and 5 bits?

I was hoping it might be 32 bits, but 16 would be fine. There is no need to have width options as the unused upper bits are just "AND"ed out by user code after the whole byte is done.

cgracey · 2017-12-14 22:41

Cluso99 wrote: »

cgracey wrote: »

Cluso99 wrote: »

cgracey wrote: »

Thanks, Cluso. I will try to get my head around that tomorrow.

We need to be able to put whole bytes into it, right?

That wasn't my intention. My idea was a single bit at a time, and any width CRC with any formula.

A REP loop could perform a shift instruction followed by the CRCBIT instruction.

My thought was to make the instruction as generic as possible to allow for all the various CRC incarnations.

But how wide must the rotator be? 16 bits with options for 8 and 5 bits?

I was hoping it might be 32 bits, but 16 would be fine. There is no need to have width options as the unused upper bits are just "AND"ed out by user code after the whole byte is done.

But this thing would need a 32-bit rotator, right?

TonyB_ · 2017-12-14 23:45

There's a spare D,{#}S {WC/WZ/WCZ} instruction slot after ANYB. Could S be the (up to) 32-bit CRC polynomial?

cgracey · 2017-12-14 23:48

Can anyone think of any gotcha's involving putting the C and Z flags into bits 31 and 30, instead of into bits 21 and 20 for return-address storage in stack and registers?

I've got all the Verilog lines bookmarked that would need modification to implement this change, but before I do it, I'm wondering if this will compromise anything.

I can think of this: Having bits 31..22 free, as they are now, means that programmers could freely use the upper byte of the long for some purpose associated with the return address. By moving C and Z to bits 31 and 30, we clobber this option.

cgracey · 2017-12-14 23:50

TonyB_ wrote: »

There's a spare D,{#}S {WC/WZ/WCZ} instruction slot after ANYB. Could S be the (up to) 32-bit CRC polynomial?

That would be where I'd put it, yes. But, if this circuit involves a 32-bit rotator, forget it. That's a LOT of logic that we don't have room for.

Seairth · 2017-12-14 23:53

cgracey wrote: »

Can anyone think of any gotcha's involving putting the C and Z flags into bits 31 and 30, instead of into bits 21 and 20 for return-address storage in stack and registers?

I've got all the Verilog lines bookmarked that would need modification to implement this change, but before I do it, I'm wondering if this will compromise anything.

I can think of this: Having bits 31..22 free, as they are now, means that programmers could freely use the upper byte of the long for some purpose associated with the return address. By moving C and Z to bits 31 and 30, we clobber this option.

Why not just widen the stack to 34 bits?

cgracey · 2017-12-14 23:54

Seairth wrote: »

cgracey wrote: »

Can anyone think of any gotcha's involving putting the C and Z flags into bits 31 and 30, instead of into bits 21 and 20 for return-address storage in stack and registers?

I've got all the Verilog lines bookmarked that would need modification to implement this change, but before I do it, I'm wondering if this will compromise anything.

I can think of this: Having bits 31..22 free, as they are now, means that programmers could freely use the upper byte of the long for some purpose associated with the return address. By moving C and Z to bits 31 and 30, we clobber this option.

Why not just widen the stack to 34 bits?

Because then it would not be possible to manipulate the C and Z storage.

TonyB_ · 2017-12-14 23:55

cgracey wrote: »

TonyB_ wrote: »

There's a spare D,{#}S {WC/WZ/WCZ} instruction slot after ANYB. Could S be the (up to) 32-bit CRC polynomial?

That would be where I'd put it, yes. But, if this circuit involves a 32-bit rotator, forget it. That's a LOT of logic that we don't have room for.

What does the CRC code look like now in PASM2?

cgracey · 2017-12-14 23:59

TonyB_ wrote: »

cgracey wrote: »

TonyB_ wrote: »

There's a spare D,{#}S {WC/WZ/WCZ} instruction slot after ANYB. Could S be the (up to) 32-bit CRC polynomial?

That would be where I'd put it, yes. But, if this circuit involves a 32-bit rotator, forget it. That's a LOT of logic that we don't have room for.

What does the CRC code look like now in PASM2?

So far, it looks like this:

Peter Jakacki · 2017-12-15 00:01

I have some old products that have been selling well for years and I know that if I just tweak them just a little bit that they could be more useful. Instead I leave them exactly as they are, they have been proven and if I tweak them they may fall apart somehow even after much verification. Instead I put all these features into new products.

David Betz is rightly alarmed even though these are "minor" tweaks, I can definitely see them snowballing yet once again and all need verification and time to test otherwise there is the danger that P2 might fall apart somehow.

You may have noticed during the whole long P2 saga that I have practically never put my hand up to request features. This is my mindset: If my wife was baking a pie (I wish should did) I would leave her well enough alone so I could have some of that pie. There are other days for other pies.

Let's be practical and enjoy some pie. Hopefully there will be other days for other pies.

evanh · 2017-12-15 00:04

cgracey wrote: »

TonyB_ wrote: »

There's a spare D,{#}S {WC/WZ/WCZ} instruction slot after ANYB. Could S be the (up to) 32-bit CRC polynomial?

That would be where I'd put it, yes. But, if this circuit involves a 32-bit rotator, forget it. That's a LOT of logic that we don't have room for.

If it's only a rotate by 1 for each iteration then that's not huge. However, I haven't tried to do my own CRC so don't know if this is the case or not.

TonyB_ · 2017-12-15 00:05

cgracey wrote: »

Seairth wrote: »

cgracey wrote: »

Can anyone think of any gotcha's involving putting the C and Z flags into bits 31 and 30, instead of into bits 21 and 20 for return-address storage in stack and registers?

I've got all the Verilog lines bookmarked that would need modification to implement this change, but before I do it, I'm wondering if this will compromise anything.

I can think of this: Having bits 31..22 free, as they are now, means that programmers could freely use the upper byte of the long for some purpose associated with the return address. By moving C and Z to bits 31 and 30, we clobber this option.

Why not just widen the stack to 34 bits?

Because then it would not be possible to manipulate the C and Z storage.

Would C and Z at bits 1 and 0 be mad?

cgracey · 2017-12-15 00:06

TonyB_ wrote: »

cgracey wrote: »

Seairth wrote: »

cgracey wrote: »

Can anyone think of any gotcha's involving putting the C and Z flags into bits 31 and 30, instead of into bits 21 and 20 for return-address storage in stack and registers?

I've got all the Verilog lines bookmarked that would need modification to implement this change, but before I do it, I'm wondering if this will compromise anything.

I can think of this: Having bits 31..22 free, as they are now, means that programmers could freely use the upper byte of the long for some purpose associated with the return address. By moving C and Z to bits 31 and 30, we clobber this option.

Why not just widen the stack to 34 bits?

Because then it would not be possible to manipulate the C and Z storage.

Would C and Z at bits 1 and 0 be mad?

It used to be that way when long addresses' two LSBs were don't-care. At this point, it would be a mess, yes.

cgracey · 2017-12-15 00:08

Peter Jakacki wrote: »

I have some old products that have been selling well for years and I know that if I just tweak them just a little bit that they could be more useful. Instead I leave them exactly as they are, they have been proven and if I tweak them they may fall apart somehow even after much verification. Instead I put all these features into new products.

David Betz is rightly alarmed even though these are "minor" tweaks, I can definitely see them snowballing yet once again and all need verification and time to test otherwise there is the danger that P2 might fall apart somehow.

You may have noticed during the whole long P2 saga that I have practically never put my hand up to request features. This is my mindset: If my wife was baking a pie (I wish should did) I would leave her well enough alone so I could have some of that pie. There are other days for other pies.

Let's be practical and enjoy some pie. Hopefully there will be other days for other pies.

I generally agree, Peter. Until OnSemi requests Verilog, we can still tweak things. I feel like we are rapidly cleaning up little warts that have bugged me for a while. I think it could be done right now. I must look into the CRC thing, though, because that is rather important for USB.

Peter Jakacki · 2017-12-15 00:21

And generally I do agree too. It's just that we are coming across the warning signs which we should be all too familiar with unfortunately.

jmg · 2017-12-15 00:29

cgracey wrote: »

Can anyone think of any gotcha's involving putting the C and Z flags into bits 31 and 30, instead of into bits 21 and 20 for return-address storage in stack and registers?

I've got all the Verilog lines bookmarked that would need modification to implement this change, but before I do it, I'm wondering if this will compromise anything.

I can think of this: Having bits 31..22 free, as they are now, means that programmers could freely use the upper byte of the long for some purpose associated with the return address. By moving C and Z to bits 31 and 30, we clobber this option.

Bits 31/30 are more future proof, and allows a linear address to mean something.
You do not really clobber choices with 21.20 -> 31.30, as the number of spare bits is exactly the same. Some mask & shift is needed in both cases.

What you gain, is the address can be an unbroken 30 bits, which is more use than some left justified 8 bits.
If a future P2 adds the hardware for XIP Serial Flash, you do not want bits stuck at 21.20 breaking that.

TonyB_ · 2017-12-15 00:39

jmg wrote: »

cgracey wrote: »

Can anyone think of any gotcha's involving putting the C and Z flags into bits 31 and 30, instead of into bits 21 and 20 for return-address storage in stack and registers?

I've got all the Verilog lines bookmarked that would need modification to implement this change, but before I do it, I'm wondering if this will compromise anything.

I can think of this: Having bits 31..22 free, as they are now, means that programmers could freely use the upper byte of the long for some purpose associated with the return address. By moving C and Z to bits 31 and 30, we clobber this option.

Bits 31/30 are more future proof, and allows a linear address to mean something.
You do not really clobber choices with 21.20 -> 31.30, as the number of spare bits is exactly the same. Some mask & shift is needed in both cases.

What you gain, is the address can be an unbroken 30 bits, which is more use than some left justified 8 bits.
If a future P2 adds the hardware for XIP Serial Flash, you do not want bits stuck at 21.20 breaking that.

I agree with jmg.

evanh · 2017-12-15 00:42

What advantage did expanding the stack width bring?

evanh · 2017-12-15 00:50

The hardware stack is intentionally there just for CALL and RET. Using it for PUSH and POP is a disadvantage simply because it will overflow all too easy.

Adding more hardware stack depth is overkill. We already have indirection for LUT and CogRAM and we also have PTRA/B for HubRAM.

Cluso99 · 2017-12-15 02:50

Here is the 1-bit CRC calculation instruction...

CRCBIT D,[#]S
where D = CRC Register, C (carry flag) = current data bit, [#]S = polynomial
The CRCBIT instruction performs the following...
(1) X := C XOR D[0]
(2) D := D >> 1
(3) if X == 1 then D := D XOR POLY

So a full 8-bit CRC16 would be (using the CRC16 with initial=$0000, polynomial=$8005)

' calculate the CRC16 to include the 8-bit DATA byte
        REPS    #2,    #8           '\\ 2 instructions x 8 loops
        SHR     DATA,  #1     WC    '\\ C:=DATA[0]
        CRCBIT  CRC16, POLY         '// accumulate 1bit into crc

DATA    long    0                   ' data byte to be added to the CRC calculation          
CRC16   long    $0000               ' current CRC16 calculation (initially $0000)          
POLY    long    $8005               ' polynomial

Now, rather than use a full instruction slot, it would be possible to store the polynomial (or the CRC, or both) into an internal register such as the SETQ, SETQ2, X, Y, etc, or a new one.

ozpropdev · 2017-12-15 03:11

Is it that tight for time tat you couldn't just do this?

	rep	#4,#8
	shr	data,#1 wc
	testb	crc16,#0 xorc
	shr	crc16,#1
if_c	xor	crc16,poly

cgracey · 2017-12-15 03:16

Cluso,

That looks really good.

After looking at your explanation above, I tried to make a more compact instruction, but the problem is we have three inputs and one working register:

NUMBITS (input)
DATA (input)
POLY (input)
CRC (working register)

This is much easier to implement with little logic if it's broken up like you've done above.

So, we need this CRCBIT instruction. I'm on it.

cgracey · 2017-12-15 03:18

ozpropdev wrote: »
Is it that tight for time tat you couldn't just do this?
	rep	#4,#8
	shr	data,#1 wc
	testb	crc16,#0 xorc
	shr	crc16,#1
if_c	xor	crc16,poly

In playing with the same ideas, I had just written the same code:

		rep	#4,numbits
		shr	data,#1		wc
		testb	crc,#0		xorc
		shr	crc,#1
	if_c	xor	crc,poly

cgracey · 2017-12-15 03:24

Maybe something like this. CRCQ would just have to inhibit interrupts:

		setq	data
		crcq	crc,poly
		crcq	crc,poly
		crcq	crc,poly
		crcq	crc,poly
		crcq	crc,poly
		crcq	crc,poly
		crcq	crc,poly
		crcq	crc,poly

Q would be shifted right on each CRCQ. NUMBITS is expressed by the number of contiguous CRCQ's.

This would be very fast, anyway.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments