Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

cgracey · 2017-12-21 06:32

Ultimately, one layer change could make a different ROM and we could have a version of the P2 which could load from USB.

@garryj, I'm glad the CRCxxx instructions have helped things along. I'm really glad you implemented them into your work.

@msrobots, yes, late changes can be dangerous. We had the kick-off meeting with OnSemi today. The lady I'm working with there who knows the whole flow says that the initial months are spent establishing the scripts which drive the various tools. During that time, the Verilog can be changed often, and there is an expectation that it will be, due to pressures from timing closure and other things. We just need to make sure that what we've got is working correctly now. The more playing around people can do with the current design, the safer the outcome.

msrobots · 2017-12-21 09:57

jmg wrote: »

msrobots wrote: »

Gosh, if we could get soft USB running out of the boot rom, we could save the whole FTDI problem.

I was not quite meaning that - the example of small-MCU boot, was really to show how widespread USB is, in the MCU space.
The P2 ROM is too small to hold USB loader, but a 2nd stage loader could offer USB.
I've not seen large-data-block download numbers yet, for P2-USB.

A cheaper, more flexible P2-USB solution than 'the whole FTDI problem', is the new EFM8UB3 - that has 40K flash, and is sub-90c
- it could do a useful P2 Debug interface, and Power Monitoring / Metering support.

this looks interesting, sadly 5v.

But I agree with you that using a small cheap MCU like that instead of flash or SD might be a quite good solution sometimes.

Slowly the P2 takes a final shape and thanks to all the FPGA-enabled testers bugs are found and eliminated. I wish I could play around with PASM2 too, but I m just a code monkey, FPGAs are way out of my comfort zone. So I can just enjoy reading snippets from ozpropdev or garryj, chip and rjo__, running their code in my head and hope to understand what it is doing.

fascinating,

Mike

ozpropdev · 2017-12-21 11:08

Chip
All V30a images flashed and running Ok.

cgracey · 2017-12-21 11:58

ozpropdev wrote: »

Chip
All V30a images flashed and running Ok.

Thanks for checking all those.

jmg · 2017-12-21 19:33

msrobots wrote: »

this looks interesting, sadly 5v.

? Not sure what you mean ? - USB has 5V on the cable, but the MCU core does not have to run on it. (FTDI parts are also 5V powered)

msrobots wrote: »

But I agree with you that using a small cheap MCU like that instead of flash or SD might be a quite good solution sometimes.

I think single pin-boot is still in the boot-rom, and that means small MCUs are the lowest pin impact way to boot a P2, if every last pin matters.

The EFMUB3 is a step up from the smallest MCUs, in that it includes the USB hardware, so it can support MHz-Speed links.
It can swallow the reset components, and add Power Monitoring, and include some P2 firmware for things like serious P2-Debug.
There is even space to include support for a Si5351A clock generator, to give real P2 development flexibility
Plenty of small and simple things can be swept into such a part to make a P2-Eval board more compact, lower price, and perform better.

Most MCU vendors now use a MCU as their Eval Board USB Bridge, instead of a simple FTDI part.
Often, that USB bridge includes a user UART and a Debug channel, with power Monitoring & reset control.

garryj · 2017-12-21 19:48

jmg wrote: »

USB is going to be very important for P2 - have you tried larger packet payloads yet ?
IIRC Mouse/Kbd is quite small packets ?

Yeah, boot protocol mouse/keyboard only uses 8-byte data packets.

On my todo list is a demo driver that will read USB thumb drive data using bulk transfer, which has a max packet size of 64 bytes.

cgracey · 2017-12-22 03:33

garryj wrote: »

jmg wrote: »

USB is going to be very important for P2 - have you tried larger packet payloads yet ?
IIRC Mouse/Kbd is quite small packets ?

Yeah, boot protocol mouse/keyboard only uses 8-byte data packets.

On my todo list is a demo driver that will read USB thumb drive data using bulk transfer, which has a max packet size of 64 bytes.

This would be a big deal, going from HID to MSC. That will be quite a test on the USB smart pin.

rjo__ · 2017-12-22 05:34

I am going to want to see this.

I have a ton of thumb drives around, but do I have the one you are using?
If I need to order one, I should have ordered it last month:)

Cluso99 · 2017-12-22 23:30

v30a running SD card fine

TonyB_ · 2017-12-23 01:23

There are a couple of inconsistencies in the instruction set and a few errors in the spreadsheet (v30a). In no particular order:

1. SCLU/SCL is the only unsigned/signed pair with extra U in the mnemonic for unsigned. All the others have extra S for signed.

2. ANDN and TESTN are the only negated instructions that come before the non-negated version, breaking the rule that applies to SUMx, TESTBx, BITx, MUXx, etc., etc., including MOV/NOT (latter is MOVN) where the only opcode bit that is different is 1 for the negated version. To be consistent ANDN/AND and TESTN/TEST should be swapped.

3. MODCZ looks out of place and would be better immediately after RCZR & RCZL, as all three are {WC/WZ/WCZ} instructions and WRx are not. (Single bit opcode change.)

4. CLKSET has C bit in the opcode in the spreadsheet, 0 in text file.

5. C and Z not changed to bits 31 and 30 in various JMP/CALL/RET/POP in spreadsheet.

6. TJV description in spreadsheet should include C, as mentioned earlier:
http://forums.parallax.com/discussion/comment/1427661/#Comment_1427661

7. WMLONG is described in documentation as not writing $00 bytes but in spreadsheet as not writing $FF.

8. Is WMLONG adequate or will we regret not also having MUXBYTS?

MUXBYTS D,{#}S	For each non-zero byte in S, copy that byte into the corresponding D byte, else leave that D byte the same.

This was last in v19 and replaced in v20 by MUXQ. There are currently two unused D/#,S/# slots, which would change to one D,S/# and one D/#,S/# free if MUXBYTS were restored. 640 x 480 x 8bpp uses 300KB and would fit in the hub RAM nicely. Would it be best to have both WMLONG and MUXBYTS for sprites and cursors with transparent pixels?

I don't know how video outputting works on the P2, for everything from pins, DACs, LUTs and streamers. Will video data be in a line buffer in cog RAM or a frame buffer in hub RAM or both? A short explanation would be helpful.

cgracey wrote: »

MUXNITS and MUXNIBS are useful for 2-bit and 4-bit graphics, but MUXBYTS was rather superfluous.

Rather but not completely superfluous?

cgracey · 2017-12-23 05:11

TonyB_, thank you for all the input. When I am home, I will go through the spreadsheet and straighten out everything. I really appreciate you looking so carefully at everything.

cgracey · 2017-12-23 05:20

Cluso99 wrote: »

v30a running SD card fine

Great!

Can anyone give a brief explanation of how a USB memory stick will differ in communication from an SD card?

I know the USB protocol will be on top of whatever memory protocol exists, but I'm wondering what it might look like. USB memory sticks, in some ways, are a lot more practical than SD cards.

Rayman · 2017-12-23 13:34

I think the main difference is that SD cards have a SPI interface mode that makes them about 100X easier to use...

Rayman · 2017-12-23 15:11

Microchip has several notes on USB Host:
http://ww1.microchip.com/downloads/en/AppNotes/01145b.pdf
http://ww1.microchip.com/downloads/en/AppNotes/USB_Host_Stack_01140a.pdf

They have a software library to give USB Host support.

Even with that, it still looks complicated...

ikemschn · 2017-12-23 19:35

Can anyone give a brief explanation of how a USB memory stick will differ in communication from an SD card?

I know the USB protocol will be on top of whatever memory protocol exists, but I'm wondering what it might look like. USB memory sticks, in some ways, are a lot more practical than SD cards.

Perhaps there is already some important work done and available:

Micah Dowty created a USB-interface object for the Prop1, including a simple Mass Storage Interface to access the content of USB memory sticks:
http://obex.parallax.com/object/640

see also:
http://forums.parallax.com/discussion/121321

Luckily, it looks like @jmg and Heater. were some of the experts in the said thread.

Yanomani · 2017-12-23 19:58

Whenever in doubt, two great resources, for reference, by Jan Axelson:

janaxelson.com/usbc.htm

janaxelson.com/usbms.htm

Hope they help.

Henrique

cgracey · 2017-12-23 22:20

TonyB_ wrote: »

There are a couple of inconsistencies in the instruction set and a few errors in the spreadsheet (v30a). In no particular order:

1. SCLU/SCL is the only unsigned/signed pair with extra U in the mnemonic for unsigned. All the others have extra S for signed.

SCLS seemed weird, no vowels. I see what you are saying, but I'm not sure what would look better. Note that SCL returns a different bit-range of the product than SCLU does.

2. ANDN and TESTN are the only negated instructions that come before the non-negated version, breaking the rule that applies to SUMx, TESTBx, BITx, MUXx, etc., etc., including MOV/NOT (latter is MOVN) where the only opcode bit that is different is 1 for the negated version. To be consistent ANDN/AND and TESTN/TEST should be swapped.

This is tough because what I saw as most important was that AND, OR, and XOR be in sequence. So, I put ANDN in front of them all, since it was closely related to AND.

3. MODCZ looks out of place and would be better immediately after RCZR & RCZL, as all three are {WC/WZ/WCZ} instructions and WRx are not. (Single bit opcode change.)

I don't like it so much, either, but I'm not sure where else to put it. It's the only instruction in that SPLITB..WRNZ block where L=1. It could go anywhere in there. I figured it was good being the caboose.

4. CLKSET has C bit in the opcode in the spreadsheet, 0 in text file.

I fixed that.

5. C and Z not changed to bits 31 and 30 in various JMP/CALL/RET/POP in spreadsheet.

Fixed those.

6. TJV description in spreadsheet should include C, as mentioned earlier:
http://forums.parallax.com/discussion/comment/1427661/#Comment_1427661

Fixed it.

7. WMLONG is described in documentation as not writing $00 bytes but in spreadsheet as not writing $FF.

Fixed.

8. Is WMLONG adequate or will we regret not also having MUXBYTS?
MUXBYTS D,{#}S	For each non-zero byte in S, copy that byte into the corresponding D byte, else leave that D byte the same.
This was last in v19 and replaced in v20 by MUXQ. There are currently two unused D/#,S/# slots, which would change to one D,S/# and one D/#,S/# free if MUXBYTS were restored. 640 x 480 x 8bpp uses 300KB and would fit in the hub RAM nicely. Would it be best to have both WMLONG and MUXBYTS for sprites and cursors with transparent pixels?

I don't know how video outputting works on the P2, for everything from pins, DACs, LUTs and streamers. Will video data be in a line buffer in cog RAM or a frame buffer in hub RAM or both? A short explanation would be helpful.

With SETQ+WMLONG, you can do this operation from cog to hub at one long per clock WITH random byte alignment. I figured that would be the preferred method for something like 'MUXBYTS'.

In summary:

SCLU/SCL could get different names.

ANDN/TESTN are okay, I think.

MODCZ: no better ideas here, yet.

WMLONG is probably sufficient. If we had lower granularity than bytes in hub memory, we could read/write nibbles, twits, and bits. That would really help with graphics. We could have WMLONG not just for bytes, but all the rest, too, and at any nibble/twit/bit offset.

TonyB_, thanks for studying all this stuff and pointing out all these things.

Rayman · 2017-12-23 23:15

I guess I forgot that Micah made a working USB memory stick interface.

That should certainly make it easier...

TonyB_ · 2017-12-24 02:19

cgracey wrote: »

In summary:

SCLU/SCL could get different names.

ANDN/TESTN are okay, I think.

MODCZ: no better ideas here, yet.

WMLONG is probably sufficient. If we had lower granularity than bytes in hub memory, we could read/write nibbles, twits, and bits. That would really help with graphics. We could have WMLONG not just for bytes, but all the rest, too, and at any nibble/twit/bit offset.

TonyB_, thanks for studying all this stuff and pointing out all these things.

Chip, thanks for doing the fixes.

SCLU/SCL:
New names are not an urgent issue at the moment.

MODCZ:
Has CZ1 in opcode and the other three CZ1 instructions are all paired with CZ0's, therefore for consistency RCZR or RCZL could partner MODCZ but RCZL is perhaps the best choice (assuming another CZ1 not required in the future).

Now:

EEEE 1101011 CZ0 DDDDDDDDD 001101010        RCZR    D           {WC/WZ/WCZ}
EEEE 1101011 CZ0 DDDDDDDDD 001101011        RCZL    D           {WC/WZ/WCZ}
EEEE 1101011 000 DDDDDDDDD 001101100        WRC     D
EEEE 1101011 000 DDDDDDDDD 001101101        WRNC    D
EEEE 1101011 000 DDDDDDDDD 001101110        WRZ     D
EEEE 1101011 000 DDDDDDDDD 001101111        WRNZ    D
EEEE 1101011 CZ1 0cccczzzz 001101111        MODCZ   c,z         {WC/WZ/WCZ}

Proposed:

EEEE 1101011 CZ0 DDDDDDDDD 001101010        RCZR    D           {WC/WZ/WCZ}
EEEE 1101011 CZ0 DDDDDDDDD 001101011        RCZL    D           {WC/WZ/WCZ}
EEEE 1101011 CZ1 0cccczzzz 001101011        MODCZ   c,z         {WC/WZ/WCZ}
EEEE 1101011 000 DDDDDDDDD 001101100        WRC     D
EEEE 1101011 000 DDDDDDDDD 001101101        WRNC    D
EEEE 1101011 000 DDDDDDDDD 001101110        WRZ     D
EEEE 1101011 000 DDDDDDDDD 001101111        WRNZ    D

ANDN/TESTN:
I plan to have a go at writing my own assembler and these stick out like sore thumbs to me. More importantly, the instruction decoding is not as simple as it could be if AND/ANDN and TEST/TESTN agreed with MOV/NOT (and the other ABC/ABNC type instructions).


EEEE 0101000 CZI DDDDDDDDD SSSSSSSSS ****** ANDN    D,S/#       {WC/WZ/WCZ}	D = D & !S
EEEE 0101001 CZI DDDDDDDDD SSSSSSSSS ****** AND     D,S/#       {WC/WZ/WCZ}	D = D & S
EEEE 0101010 CZI DDDDDDDDD SSSSSSSSS        OR      D,S/#       {WC/WZ/WCZ}
EEEE 0101011 CZI DDDDDDDDD SSSSSSSSS        XOR     D,S/#       {WC/WZ/WCZ}

EEEE 0110000 CZI DDDDDDDDD SSSSSSSSS        MOV     D,S/#       {WC/WZ/WCZ}	D = S
EEEE 0110001 CZI DDDDDDDDD SSSSSSSSS        NOT     D,S/#       {WC/WZ/WCZ}	D = !S

EEEE 0111110 CZI DDDDDDDDD SSSSSSSSS ****** TESTN   D,S/#       {WC/WZ/WCZ}	Test D with !S
EEEE 0111111 CZI DDDDDDDDD SSSSSSSSS ****** TEST    D,S/#       {WC/WZ/WCZ}	Test D with S

Here are a couple of excerpts from taqoz.spin2 by Peter Jakacki
http://forums.parallax.com/discussion/comment/1427829/#Comment_1427829

_AND 			and	tos1,tos
			jmp	#DROP
_ANDN 			andn	tos1,tos
			jmp	#DROP
_OR			or	tos1,tos
			jmp	#DROP
_XOR			xor	tos1,tos
			jmp	#DROP

	byte	c,"AND",t
	word	_AND
	byte	c,"ANDN",t
	word	_ANDN
	byte	c,"OR",t
	word	_OR
	byte	c,"XOR",t
	word	_XOR

At least two of us think AND should come before ANDN!

I'm treating the P2 as my only opportunity to help design a microcontroller. I can't do any physical testing but I can study the instruction set. It's nice to have instructions grouped by type but users won't care about the actual opcodes and there are some other instructions where the encoding/decoding could be simplified, I think.

potatohead · 2017-12-24 02:31

Re: Video.

Basically, the whole signal is streamed. One can feed the streamer from buffers in the HUB. Or, one can skip that, and bit bang the DACs or both! (Not recommended, but I'm sure someone will.)

There is a color transform engine that maps values to the necessary signals. It works with output from the streamer.

Rendering is basically from line RAM. One can make bitmaps, tiles, sprites by processing that line RAM ahead of the raster.

A video COG will set it's NCO up, configure pins, streamer for video / signal use, and then run a loop that outputs the frame. Doing that is very low impact.

Lots of us call this the kernel. An interrupt driven kernel leaves the COG mostly free. That time can be used to prep data for the streamer.

A couple P2 COGS working together can do a lot! Tiles, characters, sprites all get software assembled into line RAM, buffered for a couple, maybe few, scan lines.

It's pretty generic. The streamer can take line RAM and output composite, s video, component, RGB, and more. The color engine means we have color consistency across formats too. Very cool.

The pixel mixer can blend alpha channel defined pixels into line RAM too.

You just reminded me to check s video. It's not been tested, but the others have been, though we need more on this, particularly component.

TonyB_ · 2017-12-24 02:52

Many thanks for the info, potatohead. I have some more questions, but maybe they should go in a new P2 Video thread.

cgracey · 2017-12-24 03:17

TonyB_, why can't you do any physical testing? Is it lack of an FPGA board, or some other reason?

cgracey · 2017-12-24 03:21

Okay. I agree about AND/ANDN and TEST/TESTN. I also just realized that SETPIV/SETPIX should be ordered as such. I think your MODCZ proposal is half-way, only. Are there some D-only instructions elsewhere that we could exploit? There's got to be a better way, yet. And the SCL/SCLU situation is difficult. These fixes are purely cosmetic, but I agree.

cgracey · 2017-12-24 05:42

SCA/SCAS instead of SCLU/SCL?

cgracey · 2017-12-24 10:11

I just realized, while working on the Google doc, that CLKSET is going to need to change to HUBSET, because it now handles five different things:

1) It sets the clock mode, like it always has.
2) It write-protects/enables the last 16KB of hub RAM.
3) It sets the four digital filtering modes that the smart pins use.
4) It seeds the Xoroshiro128+ PRNG in the hub.
5) Hardware reset.

cgracey · 2017-12-24 10:56

Right now, I'm working on getting the documentation caught up to the current version of the chip. There are only a few key things left to cover - the digital filtering setup, the hub PRNG seeding, and the new smart pin measurement modes.

I went through the whole document today and made sure that what is there is now accurate. There were a few things that needed updating.

One thing that's a pain about Google Docs is that as I change my web page magnification, tabs which were marginally positioned shift in and out of original placement. I might have to resort to spaces and a monospace font in those areas.

Heater. · 2017-12-24 11:41

I think we had a debate here about what a terrible idea TABs are a while back

I'm not familiar with google docs. Is it so that it does not have a "Don't mess with my formatting" option. Like code tags on this forum or similar everywhere else?

cgracey · 2017-12-24 12:34

Heater. wrote: »

I think we had a debate here about what a terrible idea TABs are a while back

I'm not familiar with google docs. Is it so that it does not have a "Don't mess with my formatting" option. Like code tags on this forum or similar everywhere else?

The trouble is, as screen font sizes change with web-page magnification, they bump just below and above fixed tab positions, causing things to move wildly.

Heater. · 2017-12-24 12:45

Such are the wonders of HTML and CSS.

Using zoom often causes chaos in the layout. Even the likes of Google and MS haven't managed to get this working properly after all these years. And they don't just make the web pages, they make the frikken browsers as well!

jmg · 2017-12-24 18:55

cgracey wrote: »

I just realized, while working on the Google doc, that CLKSET is going to need to change to HUBSET, because it now handles five different things:

1) It sets the clock mode, like it always has.
2) It write-protects/enables the last 16KB of hub RAM.
3) It sets the four digital filtering modes that the smart pins use.
4) It seeds the Xoroshiro128+ PRNG in the hub.
5) Hardware reset.

Because this is a whole device config, why not P2CFG, or even CONFIG, as there is just one of them, and clearly it applies to P2....

If the digital filtering is global on all pins, when will anyone set a value slower than the fastest filtering choice ?

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments