Ultimately, one layer change could make a different ROM and we could have a version of the P2 which could load from USB.
@garryj, I'm glad the CRCxxx instructions have helped things along. I'm really glad you implemented them into your work.
@msrobots, yes, late changes can be dangerous. We had the kick-off meeting with OnSemi today. The lady I'm working with there who knows the whole flow says that the initial months are spent establishing the scripts which drive the various tools. During that time, the Verilog can be changed often, and there is an expectation that it will be, due to pressures from timing closure and other things. We just need to make sure that what we've got is working correctly now. The more playing around people can do with the current design, the safer the outcome.
Gosh, if we could get soft USB running out of the boot rom, we could save the whole FTDI problem.
I was not quite meaning that - the example of small-MCU boot, was really to show how widespread USB is, in the MCU space.
The P2 ROM is too small to hold USB loader, but a 2nd stage loader could offer USB.
I've not seen large-data-block download numbers yet, for P2-USB.
A cheaper, more flexible P2-USB solution than 'the whole FTDI problem', is the new EFM8UB3 - that has 40K flash, and is sub-90c
- it could do a useful P2 Debug interface, and Power Monitoring / Metering support.
this looks interesting, sadly 5v.
But I agree with you that using a small cheap MCU like that instead of flash or SD might be a quite good solution sometimes.
Slowly the P2 takes a final shape and thanks to all the FPGA-enabled testers bugs are found and eliminated. I wish I could play around with PASM2 too, but I m just a code monkey, FPGAs are way out of my comfort zone. So I can just enjoy reading snippets from ozpropdev or garryj, chip and rjo__, running their code in my head and hope to understand what it is doing.
But I agree with you that using a small cheap MCU like that instead of flash or SD might be a quite good solution sometimes.
I think single pin-boot is still in the boot-rom, and that means small MCUs are the lowest pin impact way to boot a P2, if every last pin matters.
The EFMUB3 is a step up from the smallest MCUs, in that it includes the USB hardware, so it can support MHz-Speed links.
It can swallow the reset components, and add Power Monitoring, and include some P2 firmware for things like serious P2-Debug.
There is even space to include support for a Si5351A clock generator, to give real P2 development flexibility Plenty of small and simple things can be swept into such a part to make a P2-Eval board more compact, lower price, and perform better.
Most MCU vendors now use a MCU as their Eval Board USB Bridge, instead of a simple FTDI part.
Often, that USB bridge includes a user UART and a Debug channel, with power Monitoring & reset control.
There are a couple of inconsistencies in the instruction set and a few errors in the spreadsheet (v30a). In no particular order:
1. SCLU/SCL is the only unsigned/signed pair with extra U in the mnemonic for unsigned. All the others have extra S for signed.
2. ANDN and TESTN are the only negated instructions that come before the non-negated version, breaking the rule that applies to SUMx, TESTBx, BITx, MUXx, etc., etc., including MOV/NOT (latter is MOVN) where the only opcode bit that is different is 1 for the negated version. To be consistent ANDN/AND and TESTN/TEST should be swapped.
3. MODCZ looks out of place and would be better immediately after RCZR & RCZL, as all three are {WC/WZ/WCZ} instructions and WRx are not. (Single bit opcode change.)
4. CLKSET has C bit in the opcode in the spreadsheet, 0 in text file.
5. C and Z not changed to bits 31 and 30 in various JMP/CALL/RET/POP in spreadsheet.
7. WMLONG is described in documentation as not writing $00 bytes but in spreadsheet as not writing $FF.
8. Is WMLONG adequate or will we regret not also having MUXBYTS?
MUXBYTS D,{#}S For each non-zero byte in S, copy that byte into the corresponding D byte, else leave that D byte the same.
This was last in v19 and replaced in v20 by MUXQ. There are currently two unused D/#,S/# slots, which would change to one D,S/# and one D/#,S/# free if MUXBYTS were restored. 640 x 480 x 8bpp uses 300KB and would fit in the hub RAM nicely. Would it be best to have both WMLONG and MUXBYTS for sprites and cursors with transparent pixels?
I don't know how video outputting works on the P2, for everything from pins, DACs, LUTs and streamers. Will video data be in a line buffer in cog RAM or a frame buffer in hub RAM or both? A short explanation would be helpful.
TonyB_, thank you for all the input. When I am home, I will go through the spreadsheet and straighten out everything. I really appreciate you looking so carefully at everything.
Can anyone give a brief explanation of how a USB memory stick will differ in communication from an SD card?
I know the USB protocol will be on top of whatever memory protocol exists, but I'm wondering what it might look like. USB memory sticks, in some ways, are a lot more practical than SD cards.
Can anyone give a brief explanation of how a USB memory stick will differ in communication from an SD card?
I know the USB protocol will be on top of whatever memory protocol exists, but I'm wondering what it might look like. USB memory sticks, in some ways, are a lot more practical than SD cards.
Perhaps there is already some important work done and available:
Micah Dowty created a USB-interface object for the Prop1, including a simple Mass Storage Interface to access the content of USB memory sticks: http://obex.parallax.com/object/640
There are a couple of inconsistencies in the instruction set and a few errors in the spreadsheet (v30a). In no particular order:
1. SCLU/SCL is the only unsigned/signed pair with extra U in the mnemonic for unsigned. All the others have extra S for signed.
SCLS seemed weird, no vowels. I see what you are saying, but I'm not sure what would look better. Note that SCL returns a different bit-range of the product than SCLU does.
2. ANDN and TESTN are the only negated instructions that come before the non-negated version, breaking the rule that applies to SUMx, TESTBx, BITx, MUXx, etc., etc., including MOV/NOT (latter is MOVN) where the only opcode bit that is different is 1 for the negated version. To be consistent ANDN/AND and TESTN/TEST should be swapped.
This is tough because what I saw as most important was that AND, OR, and XOR be in sequence. So, I put ANDN in front of them all, since it was closely related to AND.
3. MODCZ looks out of place and would be better immediately after RCZR & RCZL, as all three are {WC/WZ/WCZ} instructions and WRx are not. (Single bit opcode change.)
I don't like it so much, either, but I'm not sure where else to put it. It's the only instruction in that SPLITB..WRNZ block where L=1. It could go anywhere in there. I figured it was good being the caboose.
4. CLKSET has C bit in the opcode in the spreadsheet, 0 in text file.
I fixed that.
5. C and Z not changed to bits 31 and 30 in various JMP/CALL/RET/POP in spreadsheet.
7. WMLONG is described in documentation as not writing $00 bytes but in spreadsheet as not writing $FF.
Fixed.
8. Is WMLONG adequate or will we regret not also having MUXBYTS?
MUXBYTS D,{#}S For each non-zero byte in S, copy that byte into the corresponding D byte, else leave that D byte the same.
This was last in v19 and replaced in v20 by MUXQ. There are currently two unused D/#,S/# slots, which would change to one D,S/# and one D/#,S/# free if MUXBYTS were restored. 640 x 480 x 8bpp uses 300KB and would fit in the hub RAM nicely. Would it be best to have both WMLONG and MUXBYTS for sprites and cursors with transparent pixels?
I don't know how video outputting works on the P2, for everything from pins, DACs, LUTs and streamers. Will video data be in a line buffer in cog RAM or a frame buffer in hub RAM or both? A short explanation would be helpful.
With SETQ+WMLONG, you can do this operation from cog to hub at one long per clock WITH random byte alignment. I figured that would be the preferred method for something like 'MUXBYTS'.
In summary:
SCLU/SCL could get different names.
ANDN/TESTN are okay, I think.
MODCZ: no better ideas here, yet.
WMLONG is probably sufficient. If we had lower granularity than bytes in hub memory, we could read/write nibbles, twits, and bits. That would really help with graphics. We could have WMLONG not just for bytes, but all the rest, too, and at any nibble/twit/bit offset.
TonyB_, thanks for studying all this stuff and pointing out all these things.
WMLONG is probably sufficient. If we had lower granularity than bytes in hub memory, we could read/write nibbles, twits, and bits. That would really help with graphics. We could have WMLONG not just for bytes, but all the rest, too, and at any nibble/twit/bit offset.
TonyB_, thanks for studying all this stuff and pointing out all these things.
Chip, thanks for doing the fixes.
SCLU/SCL:
New names are not an urgent issue at the moment.
MODCZ:
Has CZ1 in opcode and the other three CZ1 instructions are all paired with CZ0's, therefore for consistency RCZR or RCZL could partner MODCZ but RCZL is perhaps the best choice (assuming another CZ1 not required in the future).
Now:
EEEE 1101011 CZ0 DDDDDDDDD 001101010 RCZR D {WC/WZ/WCZ}
EEEE 1101011 CZ0 DDDDDDDDD 001101011 RCZL D {WC/WZ/WCZ}
EEEE 1101011 000 DDDDDDDDD 001101100 WRC D
EEEE 1101011 000 DDDDDDDDD 001101101 WRNC D
EEEE 1101011 000 DDDDDDDDD 001101110 WRZ D
EEEE 1101011 000 DDDDDDDDD 001101111 WRNZ D
EEEE 1101011 CZ1 0cccczzzz 001101111 MODCZ c,z {WC/WZ/WCZ}
Proposed:
EEEE 1101011 CZ0 DDDDDDDDD 001101010 RCZR D {WC/WZ/WCZ}
EEEE 1101011 CZ0 DDDDDDDDD 001101011 RCZL D {WC/WZ/WCZ}
EEEE 1101011 CZ1 0cccczzzz 001101011 MODCZ c,z {WC/WZ/WCZ}
EEEE 1101011 000 DDDDDDDDD 001101100 WRC D
EEEE 1101011 000 DDDDDDDDD 001101101 WRNC D
EEEE 1101011 000 DDDDDDDDD 001101110 WRZ D
EEEE 1101011 000 DDDDDDDDD 001101111 WRNZ D
ANDN/TESTN:
I plan to have a go at writing my own assembler and these stick out like sore thumbs to me. More importantly, the instruction decoding is not as simple as it could be if AND/ANDN and TEST/TESTN agreed with MOV/NOT (and the other ABC/ABNC type instructions).
EEEE 0101000 CZI DDDDDDDDD SSSSSSSSS ****** ANDN D,S/# {WC/WZ/WCZ} D = D & !S
EEEE 0101001 CZI DDDDDDDDD SSSSSSSSS ****** AND D,S/# {WC/WZ/WCZ} D = D & S
EEEE 0101010 CZI DDDDDDDDD SSSSSSSSS OR D,S/# {WC/WZ/WCZ}
EEEE 0101011 CZI DDDDDDDDD SSSSSSSSS XOR D,S/# {WC/WZ/WCZ}
EEEE 0110000 CZI DDDDDDDDD SSSSSSSSS MOV D,S/# {WC/WZ/WCZ} D = S
EEEE 0110001 CZI DDDDDDDDD SSSSSSSSS NOT D,S/# {WC/WZ/WCZ} D = !S
EEEE 0111110 CZI DDDDDDDDD SSSSSSSSS ****** TESTN D,S/# {WC/WZ/WCZ} Test D with !S
EEEE 0111111 CZI DDDDDDDDD SSSSSSSSS ****** TEST D,S/# {WC/WZ/WCZ} Test D with S
_AND and tos1,tos
jmp #DROP
_ANDN andn tos1,tos
jmp #DROP
_OR or tos1,tos
jmp #DROP
_XOR xor tos1,tos
jmp #DROP
byte c,"AND",t
word _AND
byte c,"ANDN",t
word _ANDN
byte c,"OR",t
word _OR
byte c,"XOR",t
word _XOR
At least two of us think AND should come before ANDN!
I'm treating the P2 as my only opportunity to help design a microcontroller. I can't do any physical testing but I can study the instruction set. It's nice to have instructions grouped by type but users won't care about the actual opcodes and there are some other instructions where the encoding/decoding could be simplified, I think.
Basically, the whole signal is streamed. One can feed the streamer from buffers in the HUB. Or, one can skip that, and bit bang the DACs or both! (Not recommended, but I'm sure someone will.)
There is a color transform engine that maps values to the necessary signals. It works with output from the streamer.
Rendering is basically from line RAM. One can make bitmaps, tiles, sprites by processing that line RAM ahead of the raster.
A video COG will set it's NCO up, configure pins, streamer for video / signal use, and then run a loop that outputs the frame. Doing that is very low impact.
Lots of us call this the kernel. An interrupt driven kernel leaves the COG mostly free. That time can be used to prep data for the streamer.
A couple P2 COGS working together can do a lot! Tiles, characters, sprites all get software assembled into line RAM, buffered for a couple, maybe few, scan lines.
It's pretty generic. The streamer can take line RAM and output composite, s video, component, RGB, and more. The color engine means we have color consistency across formats too. Very cool.
The pixel mixer can blend alpha channel defined pixels into line RAM too.
You just reminded me to check s video. It's not been tested, but the others have been, though we need more on this, particularly component.
Okay. I agree about AND/ANDN and TEST/TESTN. I also just realized that SETPIV/SETPIX should be ordered as such. I think your MODCZ proposal is half-way, only. Are there some D-only instructions elsewhere that we could exploit? There's got to be a better way, yet. And the SCL/SCLU situation is difficult. These fixes are purely cosmetic, but I agree.
I just realized, while working on the Google doc, that CLKSET is going to need to change to HUBSET, because it now handles five different things:
1) It sets the clock mode, like it always has.
2) It write-protects/enables the last 16KB of hub RAM.
3) It sets the four digital filtering modes that the smart pins use.
4) It seeds the Xoroshiro128+ PRNG in the hub.
5) Hardware reset.
Right now, I'm working on getting the documentation caught up to the current version of the chip. There are only a few key things left to cover - the digital filtering setup, the hub PRNG seeding, and the new smart pin measurement modes.
I went through the whole document today and made sure that what is there is now accurate. There were a few things that needed updating.
One thing that's a pain about Google Docs is that as I change my web page magnification, tabs which were marginally positioned shift in and out of original placement. I might have to resort to spaces and a monospace font in those areas.
I think we had a debate here about what a terrible idea TABs are a while back
I'm not familiar with google docs. Is it so that it does not have a "Don't mess with my formatting" option. Like code tags on this forum or similar everywhere else?
I think we had a debate here about what a terrible idea TABs are a while back
I'm not familiar with google docs. Is it so that it does not have a "Don't mess with my formatting" option. Like code tags on this forum or similar everywhere else?
The trouble is, as screen font sizes change with web-page magnification, they bump just below and above fixed tab positions, causing things to move wildly.
Using zoom often causes chaos in the layout. Even the likes of Google and MS haven't managed to get this working properly after all these years. And they don't just make the web pages, they make the frikken browsers as well!
I just realized, while working on the Google doc, that CLKSET is going to need to change to HUBSET, because it now handles five different things:
1) It sets the clock mode, like it always has.
2) It write-protects/enables the last 16KB of hub RAM.
3) It sets the four digital filtering modes that the smart pins use.
4) It seeds the Xoroshiro128+ PRNG in the hub.
5) Hardware reset.
Because this is a whole device config, why not P2CFG, or even CONFIG, as there is just one of them, and clearly it applies to P2....
If the digital filtering is global on all pins, when will anyone set a value slower than the fastest filtering choice ?
Comments
@garryj, I'm glad the CRCxxx instructions have helped things along. I'm really glad you implemented them into your work.
@msrobots, yes, late changes can be dangerous. We had the kick-off meeting with OnSemi today. The lady I'm working with there who knows the whole flow says that the initial months are spent establishing the scripts which drive the various tools. During that time, the Verilog can be changed often, and there is an expectation that it will be, due to pressures from timing closure and other things. We just need to make sure that what we've got is working correctly now. The more playing around people can do with the current design, the safer the outcome.
this looks interesting, sadly 5v.
But I agree with you that using a small cheap MCU like that instead of flash or SD might be a quite good solution sometimes.
Slowly the P2 takes a final shape and thanks to all the FPGA-enabled testers bugs are found and eliminated. I wish I could play around with PASM2 too, but I m just a code monkey, FPGAs are way out of my comfort zone. So I can just enjoy reading snippets from ozpropdev or garryj, chip and rjo__, running their code in my head and hope to understand what it is doing.
fascinating,
Mike
All V30a images flashed and running Ok.
Thanks for checking all those.
I think single pin-boot is still in the boot-rom, and that means small MCUs are the lowest pin impact way to boot a P2, if every last pin matters.
The EFMUB3 is a step up from the smallest MCUs, in that it includes the USB hardware, so it can support MHz-Speed links.
It can swallow the reset components, and add Power Monitoring, and include some P2 firmware for things like serious P2-Debug.
There is even space to include support for a Si5351A clock generator, to give real P2 development flexibility
Plenty of small and simple things can be swept into such a part to make a P2-Eval board more compact, lower price, and perform better.
Most MCU vendors now use a MCU as their Eval Board USB Bridge, instead of a simple FTDI part.
Often, that USB bridge includes a user UART and a Debug channel, with power Monitoring & reset control.
On my todo list is a demo driver that will read USB thumb drive data using bulk transfer, which has a max packet size of 64 bytes.
This would be a big deal, going from HID to MSC. That will be quite a test on the USB smart pin.
I have a ton of thumb drives around, but do I have the one you are using?
If I need to order one, I should have ordered it last month:)
1. SCLU/SCL is the only unsigned/signed pair with extra U in the mnemonic for unsigned. All the others have extra S for signed.
2. ANDN and TESTN are the only negated instructions that come before the non-negated version, breaking the rule that applies to SUMx, TESTBx, BITx, MUXx, etc., etc., including MOV/NOT (latter is MOVN) where the only opcode bit that is different is 1 for the negated version. To be consistent ANDN/AND and TESTN/TEST should be swapped.
3. MODCZ looks out of place and would be better immediately after RCZR & RCZL, as all three are {WC/WZ/WCZ} instructions and WRx are not. (Single bit opcode change.)
4. CLKSET has C bit in the opcode in the spreadsheet, 0 in text file.
5. C and Z not changed to bits 31 and 30 in various JMP/CALL/RET/POP in spreadsheet.
6. TJV description in spreadsheet should include C, as mentioned earlier:
http://forums.parallax.com/discussion/comment/1427661/#Comment_1427661
7. WMLONG is described in documentation as not writing $00 bytes but in spreadsheet as not writing $FF.
8. Is WMLONG adequate or will we regret not also having MUXBYTS?
This was last in v19 and replaced in v20 by MUXQ. There are currently two unused D/#,S/# slots, which would change to one D,S/# and one D/#,S/# free if MUXBYTS were restored. 640 x 480 x 8bpp uses 300KB and would fit in the hub RAM nicely. Would it be best to have both WMLONG and MUXBYTS for sprites and cursors with transparent pixels?
I don't know how video outputting works on the P2, for everything from pins, DACs, LUTs and streamers. Will video data be in a line buffer in cog RAM or a frame buffer in hub RAM or both? A short explanation would be helpful.
Rather but not completely superfluous?
Great!
Can anyone give a brief explanation of how a USB memory stick will differ in communication from an SD card?
I know the USB protocol will be on top of whatever memory protocol exists, but I'm wondering what it might look like. USB memory sticks, in some ways, are a lot more practical than SD cards.
http://ww1.microchip.com/downloads/en/AppNotes/01145b.pdf
http://ww1.microchip.com/downloads/en/AppNotes/USB_Host_Stack_01140a.pdf
They have a software library to give USB Host support.
Even with that, it still looks complicated...
Perhaps there is already some important work done and available:
Micah Dowty created a USB-interface object for the Prop1, including a simple Mass Storage Interface to access the content of USB memory sticks:
http://obex.parallax.com/object/640
see also:
http://forums.parallax.com/discussion/121321
Luckily, it looks like @jmg and Heater. were some of the experts in the said thread.
janaxelson.com/usbc.htm
janaxelson.com/usbms.htm
Hope they help.
Henrique
SCLS seemed weird, no vowels. I see what you are saying, but I'm not sure what would look better. Note that SCL returns a different bit-range of the product than SCLU does.
This is tough because what I saw as most important was that AND, OR, and XOR be in sequence. So, I put ANDN in front of them all, since it was closely related to AND.
I don't like it so much, either, but I'm not sure where else to put it. It's the only instruction in that SPLITB..WRNZ block where L=1. It could go anywhere in there. I figured it was good being the caboose.
I fixed that.
Fixed those.
Fixed it.
Fixed.
With SETQ+WMLONG, you can do this operation from cog to hub at one long per clock WITH random byte alignment. I figured that would be the preferred method for something like 'MUXBYTS'.
In summary:
SCLU/SCL could get different names.
ANDN/TESTN are okay, I think.
MODCZ: no better ideas here, yet.
WMLONG is probably sufficient. If we had lower granularity than bytes in hub memory, we could read/write nibbles, twits, and bits. That would really help with graphics. We could have WMLONG not just for bytes, but all the rest, too, and at any nibble/twit/bit offset.
TonyB_, thanks for studying all this stuff and pointing out all these things.
That should certainly make it easier...
Chip, thanks for doing the fixes.
SCLU/SCL:
New names are not an urgent issue at the moment.
MODCZ:
Has CZ1 in opcode and the other three CZ1 instructions are all paired with CZ0's, therefore for consistency RCZR or RCZL could partner MODCZ but RCZL is perhaps the best choice (assuming another CZ1 not required in the future).
Now: Proposed:
ANDN/TESTN:
I plan to have a go at writing my own assembler and these stick out like sore thumbs to me. More importantly, the instruction decoding is not as simple as it could be if AND/ANDN and TEST/TESTN agreed with MOV/NOT (and the other ABC/ABNC type instructions).
Here are a couple of excerpts from taqoz.spin2 by Peter Jakacki
http://forums.parallax.com/discussion/comment/1427829/#Comment_1427829
At least two of us think AND should come before ANDN!
I'm treating the P2 as my only opportunity to help design a microcontroller. I can't do any physical testing but I can study the instruction set. It's nice to have instructions grouped by type but users won't care about the actual opcodes and there are some other instructions where the encoding/decoding could be simplified, I think.
Basically, the whole signal is streamed. One can feed the streamer from buffers in the HUB. Or, one can skip that, and bit bang the DACs or both! (Not recommended, but I'm sure someone will.)
There is a color transform engine that maps values to the necessary signals. It works with output from the streamer.
Rendering is basically from line RAM. One can make bitmaps, tiles, sprites by processing that line RAM ahead of the raster.
A video COG will set it's NCO up, configure pins, streamer for video / signal use, and then run a loop that outputs the frame. Doing that is very low impact.
Lots of us call this the kernel. An interrupt driven kernel leaves the COG mostly free. That time can be used to prep data for the streamer.
A couple P2 COGS working together can do a lot! Tiles, characters, sprites all get software assembled into line RAM, buffered for a couple, maybe few, scan lines.
It's pretty generic. The streamer can take line RAM and output composite, s video, component, RGB, and more. The color engine means we have color consistency across formats too. Very cool.
The pixel mixer can blend alpha channel defined pixels into line RAM too.
You just reminded me to check s video. It's not been tested, but the others have been, though we need more on this, particularly component.
1) It sets the clock mode, like it always has.
2) It write-protects/enables the last 16KB of hub RAM.
3) It sets the four digital filtering modes that the smart pins use.
4) It seeds the Xoroshiro128+ PRNG in the hub.
5) Hardware reset.
I went through the whole document today and made sure that what is there is now accurate. There were a few things that needed updating.
One thing that's a pain about Google Docs is that as I change my web page magnification, tabs which were marginally positioned shift in and out of original placement. I might have to resort to spaces and a monospace font in those areas.
I'm not familiar with google docs. Is it so that it does not have a "Don't mess with my formatting" option. Like code tags on this forum or similar everywhere else?
The trouble is, as screen font sizes change with web-page magnification, they bump just below and above fixed tab positions, causing things to move wildly.
Using zoom often causes chaos in the layout. Even the likes of Google and MS haven't managed to get this working properly after all these years. And they don't just make the web pages, they make the frikken browsers as well!
Because this is a whole device config, why not P2CFG, or even CONFIG, as there is just one of them, and clearly it applies to P2....
If the digital filtering is global on all pins, when will anyone set a value slower than the fastest filtering choice ?