Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Rayman · 2017-02-11 14:44

USB mouse didn't work with V15, not sure why...

The TESTIN and TESTNIN still function the same, right?
I don't see right away what it could be...

Garryj's 0.8c code only has one REP and it's in a decimal out function, so not the problem.
Tried changing all TESTIN to TESTNIN, but that didn't work either...

Rayman · 2017-02-11 17:14

Was playing around in Photoshop and noticed a row-order flip option when saving as BMP.
This is perfect for the VGA demos because it lets the image look not flipped when viewed on both PC and P2-VGA output. Had to tweak the offsets to data and palette for some reason though...

garryj · 2017-02-11 17:31

Rayman wrote: »

USB mouse didn't work with V15, not sure why...

The v14 update improved the USB smart pin efficiency, which upset the v0.8c code's end-of-packet detection for both low and full speed. I got things working again on v14a, but I've been diverted to other things over the last few weeks and haven't had a chance to verify that it still works with the v15 FPGA image.

I'll try to get to that over the weekend.

cgracey · 2017-02-11 19:01

Rayman wrote: »

USB mouse didn't work with V15, not sure why...

The TESTIN and TESTNIN still function the same, right?
I don't see right away what it could be...

Garryj's 0.8c code only has one REP and it's in a decimal out function, so not the problem.
Tried changing all TESTIN to TESTNIN, but that didn't work either...

TESTIN and TESTNIN recently switched encodings. PNut handles it properly, but other tools will need to be changed.

garryj · 2017-02-11 19:52

USB low/full speed keyboard/mouse v0.9 "demo" and "minimal footprint host" sources updated for the P2 v15 FPGA image.

- End-of-packet detection changed and a few timing tweaks to work with the USB improvements Chip introduced in FPGA image v14.
- Bus transaction error retry logic has been improved.
- Some host routines that are not time-sensitive have been moved to hub exec to free up cog space.
- PNut v15 did its job in regard to REP and TESTIN/TESTNIN.

Rayman · 2017-02-11 20:50

Thanks Garryj!

evanh · 2017-02-12 02:05

Rayman wrote: »

Was playing around in Photoshop and noticed a row-order flip option when saving as BMP.
This is perfect for the VGA demos because it lets the image look not flipped when viewed on both PC and P2-VGA output. Had to tweak the offsets to data and palette for some reason though...

Neat, this works on my KDE desktop and the viewers seem happy too ... examining, I see the reason is because of a negative value (-480) is stored in image height parameter. See attached. This appears to be a hack but also seems to be well supported for loading and displaying even if not so much at saving from paint packages.

As for the bitmap offset variation, that looks to be something specific to Photoshop. I note the official method of handling this is to read value stored at offset 0x0b to find the start of the bitmap.

PS: I still marvel at how BMP format went backwards like that. Someone's retribution for being subjected to little-endian maybe ...

cgracey · 2017-02-16 10:40

I got the instruction spreadsheet done. Also, the color space converter is now documented. Both of these are linked in the top post of this thread.

potatohead · 2017-02-16 14:56

Thanks Chip!

Rayman · 2017-02-16 17:25

Good to see some details for the instructions.

I was looking at GETRND the other day... Had to google PRND, but that's clear now.
But, I'm not sure what is meant by each cog being unique. Also, not sure how to seed it...

cgracey · 2017-02-16 21:47

Rayman wrote: »

Good to see some details for the instructions.

I was looking at GETRND the other day... Had to google PRND, but that's clear now.
But, I'm not sure what is meant by each cog being unique. Also, not sure how to seed it...

There's one central 32-bit LFSR that each cog gets a different bit-order and static XOR mask of. It's seeded on reset. You just use it.

evanh · 2017-02-18 01:22

Chip,
Reading your functional descriptions I'm lost trying to interpret your shift instructions. Here's Shift Right as example:

D = lower ({32'b0, D[31:0]} >> S[4:0])

Explain that step by step please.

evanh · 2017-02-18 01:26

PS: I don't know Verilog so I imagine that's probably a factor.
PPS: I know that 32'b0 means a 32bit constant seeded with binary value 0.
PPPS: I know that >> S[4:0] means shift right by a 5bit value taken from either field S or register S.
PPPPS: I take it that D[31:0] and D are the same register but a cycle, or two, apart?

Seairth · 2017-02-18 02:03

evanh wrote: »

PS: I don't know Verilog so I imagine that's probably a factor.
PPS: I know that 32'b0 means a 32bit constant seeded with binary value 0.
PPPS: I know that >> S[4:0] means shift right by a 5bit value taken from either field S or register S.
PPPPS: I take it that D[31:0] and D are the same register but a cycle, or two, apart?

You're just about there. In verilog, the curly braces means concatenation, so..

{32'b0, D[31:0]}

basically means "a 64-bit value with 32 zeros, followed by the 32 bits in D." I'm guessing the "lower()" just means take the lowest 32 bits (after the shift).

evanh · 2017-02-18 02:10

Ah, lol, so simple. Thanks Seairth.

cgracey · 2017-02-18 02:20

evanh wrote: »

Chip,
Reading your functional descriptions I'm lost trying to interpret your shift instructions. Here's Shift Right as example:

D = lower ({32'b0, D[31:0]} >> S[4:0])

Explain that step by step please.

I changed it to this:

Rotate right. D = [31:0] of ({D[31:0], D[31:0]} >> S[4:0]). C = last bit shifted out if S[4:0] > 0, else D[0]. *

Seairth explained it correctly. I know it's kind of cryptic. Not much space and I'm leaning in recent syntax.

cgracey · 2017-02-18 02:21

Seairth wrote: »
evanh wrote: »

PS: I don't know Verilog so I imagine that's probably a factor.
PPS: I know that 32'b0 means a 32bit constant seeded with binary value 0.
PPPS: I know that >> S[4:0] means shift right by a 5bit value taken from either field S or register S.
PPPPS: I take it that D[31:0] and D are the same register but a cycle, or two, apart?

You're just about there. In verilog, the curly braces means concatenation, so..
{32'b0, D[31:0]}
basically means "a 64-bit value with 32 zeros, followed by the 32 bits in D." I'm guessing the "lower()" just means take the lowest 32 bits (after the shift).

That's right.

evanh · 2017-02-18 03:25

Okay, got this sorted I think. Rotate Carry Right:

D = [31:0] of ({{32{C}}, D[31:0]} >> S[4:0])

{32{C}} will mean 32 copies of the Carry bit, something like sign-extended, right?

That seems the instruction is very close to the arithmetic ones, ie: Naming should follow as a shift rather than rotate.

cgracey · 2017-02-18 05:23

evanh wrote: »

Okay, got this sorted I think. Rotate Carry Right:

D = [31:0] of ({{32{C}}, D[31:0]} >> S[4:0])

{32{C}} will mean 32 copies of the Carry bit, something like sign-extended, right?

That seems the instruction is very close to the arithmetic ones, ie: Naming should follow as a shift rather than rotate.

SCR/SCL would create a naming conflict with SCL (scale). The carry IS being rotated into D, in a sense.

evanh · 2017-02-18 05:47

In the same sense that a shift and rotate are kind of interchangeable as naming choices anyway. Virtual and logical are often like that too. Can be confusing if one is trying to use a specific definition - ends up needing extra clarification.

cgracey wrote: »

SCR/SCL would create a naming conflict with SCL (scale).

Suck. I guess that's the end of that idea.

evanh · 2017-02-18 05:56

Oh, SCL and SCLU are prefix instructions! Intriguing, what brought that idea about? I'm guessing there is some experience.

EDIT: They are S operand modifiers, like the ALTS instruction, right? Ah, no, it'll be they feed the ALU via the S port. Replacing whatever data the fully decoded S operand would have supplied.

evanh · 2017-02-18 06:29

I think I might understand why they are prefixes - It allows for a three operand operation when not wanting to modify either of the multiplicand sources.

I'm not sure of the effectiveness though ... since copying over top of the modified source is only a single instruction away ... so a prefix seems to defeat any possible speed or even size advantage.

evanh · 2017-02-18 06:47

Oh, prefixing an ADD with a SCLU is an effective MAC I think. I think the light has turned on.

cgracey · 2017-02-18 06:59

evanh wrote: »

Oh, prefixing an ADD with a SCLU is an effective MAC I think. I think the light has turned on.

In computing a FIR filter, you are multiplying and accumulating a FIFO sample buffer with an array of filter coefficients, neither of which can be overwritten. So, SCL is important because it doesn't clobber the inputs.

evanh · 2017-02-18 07:10

I like FIRs.

The existence of multistage delta-sigma ADCs has intrigued me.

Rayman · 2017-02-18 21:00

Just had to unroll a big loop.
PNut now gives me a "Relative Address Out of Range" error at my djnz at the end.
I don't really understand this since the place I'm jumping to hasn't changed...
Can't you jump to anywhere in a cog?

evanh · 2017-02-18 23:21

Reading the spreadsheet it says a signed relative branch - ** If #S and cogex, PC += signed (S)." Which means a max immediate range of -256 for a loop. That'll be a no.

evanh · 2017-02-18 23:28

Oh, reading the Prop1 manual for DJNZ it says an immediate S value is treated as an absolute address rather than relative address.

The change will be a result of HubExec ... even LutExec impacts this design choice.

ozpropdev · 2017-02-18 23:33

I can confirm the range limit with this code.

dat	org

loop	getct	pa
	testb	pa,#25 wc
	drvc	#32

	orgf	$ff     '$100 causes out of range error
	djnz	pa,#loop

evanh · 2017-02-18 23:38

It adds extra clutter to the code but using another register to store the branch address is available at no speed penalty.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments