Seairth pointed out that Chip's example of SETR is documented as SETI. And he followed that with a preference for the SETR mnemonic.
I was pondering what really is the best name since this SETI instruction is not really operating on the whole instruction like ALTI does. And as you highlighted, SETR is not a suitable mnemonic either since R implies the result register post execution.
It depends on the final use of the register.
If you intend of modifying an instructions opcode an/or CZ effect bits then "I" is relevant.
On the other hand if you intend on using the register to configure the ALTI instructions R-field then "R" is relevant.
In the end they are the same 9 bits.
Oi! You said R = Result register. And that's good right there, don't go adding more meanings. Besides, the "R" single bit field of the instructions don't exist any longer.
This picks up 16 longs in cog RAM, starting at 'w', XOR's them with 'opad' ($5C5C5C5C), then writes them starting at 'opad_key'. It uses the ALTI instruction to indirect and increment pointers for both the D register ('w') and result register ('opad key').
Your documentation states SETI, not SETR. Personally, I think SETR is more appropriate.
It depends on the final use of the register.
If you intend of modifying an instructions opcode an/or CZ effect bits then "I" is relevant.
On the other hand if you intend on using the register to configure the ALTI instructions R-field then "R" is relevant.
In the end they are the same 9 bits.
This is true. We almost need different names for different uses, even though bits 27..19 are always the ones being affected. SETR is going to be the most common use for setting those bits.
SETS sets sssssss (b8:0) to S/# of the instruction/register pointed to by D SETD sets ddddddd (b17:9) to S/# of the instruction/register pointed to by D SETI sets the iiiii_cz (b27:19) to S/# of the instruction/register pointed to by D
Note: If the destination is an instruction to be executed, then at least 2 instructions (4 clocks) must pass before the modified instruction can be executed due to the instruction pipeline.
ALTS alters sssssss (S8:0) field of the following instruction to the value of D plus S/# ALTS alters ddddddd (D17:9) field of the following instruction to the value of D plus S/# ALTR alters rrrrrrrrr (result) field (the register address where the following instructions result will be written to) to the value of D plus S/#
Warning: The values of D and S/# are limited to register addresses (ie 9 bits) otherwise overflow fails!
ALTISD alters the I/S/D fields of the following instruction according to the value of S/#
ALTISD is currently ALTI but it is more than that.
What happens with the next instruction if the S and/or D fields are not immediate??? I found that I needed to program MOV x,#0 for the ALTS to work properly.
Part of the issue here is that we are using "instruction" to sometimes talk about the entire 32-bit value (e.g. ALTI) and sometimes about iiiiiiicz (e.g. SETI). Renaming SETI to SETR simplifies that somewhat. To reinforce this, it might make more sense to refer to the "iiiiiii" field as an opcode ("ooooooo"):
Part of the issue here is that we are using "instruction" to sometimes talk about the entire 32-bit value (e.g. ALTI) and sometimes about iiiiiiicz (e.g. SETI). Renaming SETI to SETR simplifies that somewhat. To reinforce this, it might make more sense to refer to the "iiiiiii" field as an opcode ("ooooooo"):
cccc_ooooooo_czi_ddddddddd_sssssssss
I'm also fine with having dual-opcodes aka 64b opcode generation, if that makes the ASM easier to read.
It is also ok to have multiple names for the same binary opcode, if that also makes intent clearer.
What happens with the next instruction if the S and/or D fields are not immediate??? I found that I needed to program MOV x,#0 for the ALTS to work properly.
That is correct.
If you want the following S field to be an immediate you must have it already defined it as a ",#s".
I have used the SETCZ instruction a number of times now and I find the operation confusing with the Z flag being set if the bit=1.
This is opposite to the TESTB instruction which will set the Z flag if the bit=0, or other instructions which set the Z flag if the result is 0.
Should it be changed?
Also, it sure would be nice to have the reverse instruction SAVECZ D
which replaces D[1:0] with the C and !Z flag bits. ie D[31:2] remain untouched.
This could occupy the SETCZ D instruction slot with WC & WZ =00, or one of the spare opcode3 slots.
CCCC 1101011 CZ L DDDDDDDDD 000101001 SETCZ D/# {WC,WZ}
Sets the C & Z flags according to D[1:0] and WC, WZ
If WC is specified then C=D[1] (0=C cleared, 1=C set)
If WZ is specified then Z=D[0] (0=Z cleared, 1=Z set) (yes, 1 sets Zero flag)
If neither WC nor WZ is specified, then C & Z flags are not set/changed
If both WC and WZ are specified, then both C & Z flags are set/changed
I have verified the above with v10a.
Note to Chip
Might it be possible/easy to have SETCZ when neither WC nor WZ is specified, that D[1:0] is set/written with the contents of C & Z flags (ie the reverse of SETCZ D WC,WZ) ? It could be called SAVECZ.
Unfortunately, I'm just not finding the time to do it right now... I had hoped to get the VSCode extension finished enough to help others write more P2 code, but that's been a much bigger effort than I was expecting. I also need to find time to update/re-rerun these unit tests.
I'm still plugging along on my USB host, low-speed only at this time. I have six low-speed devices -- a mix of keyboard, mice and joysticks, and I'm able to read configuration info from all of them. I'm just starting on the HID driver to actually do something with the data, but it's been slow going :depressed:
I did some experimenting with full-speed, and it looks like it's going to be a real challenge to get something working reliably at 80Mhz, but it should be doable when the P2's on 160Mhz silicon.
I did some experimenting with full-speed, and it looks like it's going to be a real challenge to get something working reliably at 80Mhz, but it should be doable when the P2's on 160Mhz silicon.
Where are the issues with FS USB ?
Hmm, that exposes something of a conundrum, as Full Speed should really be tested before passing to FAB, and it seems 80MHz is the upper ceiling ?
Next choices could be 84MHz & 96MHz ?
Note that 160MHz is purely an aspirational target, and the final device may come in well under that, so best to have some real margin in USB operation.
I've been testing P2 at every free moment since V1.
I'm currently finishing off a P2 demonstration that includes working examples of ALL P2 instructions.
It uses multiple cogs (8) and uses hubexec, lutexec ,lut sharing, interrupts, streamers, cog attention, cordic etc. etc.
All of the instruction examples are working (necessary) parts of the complete demo.
I hope to have it and its documentation close to finalized next week.
Hmm, that exposes something of a conundrum, as Full Speed should really be tested before passing to FAB, and it seems 80MHz is the upper ceiling ?
Next choices could be 84MHz & 96MHz ?
Note that 160MHz is purely an aspirational target, and the final device may come in well under that, so best to have some real margin in USB operation.
It's very, very close, but a matter of too much to do and not enough time @80Mhz to git-er-done. I can reliably transmit 64 byte OUT data packets at full-speed, as there is less overhead than when receiving. But the tx routine is about a half-dozen clock cycles away from tipping over, i.e. adding one or two instructions and you fall behind in feeding the output buffer and the packet ends up being a bust.
On receive I've had success reading handshake response packets to SETUP transactions, but with data packets I can get only a few bytes read before falling behind the incoming data. I've pretty much run short of ideas on how to tighten up the routine and still do the right thing by running a CRC check on the data. I'm not the brightest bulb in the room when it comes to writing clever code, so there's likely someone that can get some demonstration code up and running at full-speed, as was done on the P1, but on that project there had to be a lot of corners cut to get it to work.
But I'm optimistic that it can be done in a robust manner with just a little more speed. 96MHz might get something workable, and if the silicon comes close to the 160MHz target that should provide more than enough headroom.
I've been testing P2 at every free moment since V1.
I'm currently finishing off a P2 demonstration that includes working examples of ALL P2 instructions.
It uses multiple cogs (8) and uses hubexec, lutexec ,lut sharing, interrupts, streamers, cog attention, cordic etc. etc.
All of the instruction examples are working (necessary) parts of the complete demo.
I hope to have it and its documentation close to finalized next week.
I've run into some unexpected behavior while trying to implement a simple timespan interrupt service routine that uses the CTx-equals-CT event. It happens when a WAITX is used in non-ISR code and the timespan is equal to, or greater than, the timespan of the ISR. When using a POLLCTx method in the non-ISR code, things work as expected. I didn't see WAITx mentioned in the doc section that lists the interrupt branch conditions. Here's my test code for the P2 1-2-3 A9 board, p2v10a image:
'
' Test of unexpected behavior of an interrupt service routine using
' the CTx-equals-CT event trigger.
' Test run on the Propeller 1-2-3 FPGA board with the P2v10a FPGA image.
'
con
SYSCLOCK = 80_000_000
_1ms = SYSCLOCK / 1_000
'------------------------------------------------------------------------------
dat
org
'------------------------------------------------------------------------------
init
setword dirb, ##$ffff, #0 ' Use P2 1-2-3 FPGA USER_LEDs for feedback
mov ijmp1, #isr1
mov pass, #1
pass2
getct ct1
getct ct2
addct1 ct1, ##_1ms * 500 ' ISR routine holds steady one Hz blink cycle
mov wait1, ##_1ms * 200 ' Start non-ISR code with a faster blink rate
addct2 ct2, wait1
setint1 #1 ' Set ISR event trigger to CTx-equals-CT
setword outb, ##$f00f, #0 ' Observe what happens on the USER_LEDs
mov count, #0
cmp pass, #2 wz
if_z jmp #pollx_blink
'------------------------------------------------------------------------------
' When WAITX is used in non-ISR code everyting looks fine until the WAITX
' timespan matches or exceeds the timespan of the ISR routine. The LED
' blink rate becomes synchronized for a few cycles, then the ISR routine's
' blinking goes out to lunch.
'
' If the ISR routine uses GETCT when resetting the blink timespan, when the
' non-ISR code blink rate matches the ISR blink rate the ISR blink rate
' becomes synchronized with the non-ISR blink rate.
'------------------------------------------------------------------------------
waitx_blink
waitx wait1
getnib wtmp, outb, #0
xor wtmp, #$f
setnib outb, wtmp, #0
add count, #1
cmp count, #10 wz
if_z add wait1, ##_1ms * 100 ' Start slowing the non-ISR routine's blink rate
if_z mov count, #0
cmp wait1, ##_1ms * 800 wz
if_z add pass, #1
if_z setword outb, ##$ffff, #0
if_z waitx ##_1ms * 2_000
if_z jmp #pass2
jmp #waitx_blink
'------------------------------------------------------------------------------
' When the non-ISR code uses a polling method, the behavior is as expected.
' The ISR blink rate remains static while the non-ISR code's blink rate
' continues to slow.
'------------------------------------------------------------------------------
pollx_blink
pollct2 wc
if_nc jmp #pollx_blink
add count, #1
cmp count, #10 wz
if_z add wait1, ##_1ms * 100 ' Start slowing the non-ISR routine's blink rate
if_z mov count, #0
addct2 ct2, wait1
getnib wtmp, outb, #0
xor wtmp, #$f
setnib outb, wtmp, #0
jmp #pollx_blink
blink
jmp #waitx_blink
'------------------------------------------------------------------------------
' Simple ISR routine to blink USER_LEDs 15..12 at a steady one Hz rate.
'------------------------------------------------------------------------------
isr1
' getct ct1
addct1 ct1, ##_1ms * 500
getnib ctmp, outb, #3
xor ctmp, #$f
setnib outb, ctmp, #3
reti1
'------------------------------------------------------------------------------
ct1 res 1
ct2 res 1
wait1 res 1
pass res 1
count res 1
ctmp res 1
wtmp res 1
When an interrupt event occurs, certain conditions must be met before
the interrupt branch can happen:
* ALTI / ALTR / ALTD / ALTS must not be executing
* SCLU / SCL must not be executing
* AUGS must not be executing or waiting for a S/# instruction
* AUGD must not be executing or waiting for a D/# instruction
* SETQ / SETQ2 must not be executing
* REP must not be executing or active
* STALLI must not be executing or active
* WAITX must not be executing ' *** Needs to be added ***
Once these conditions are all met, any pending interrupt is allowed to
branch, with priority given to INT1, then INT2, and then INT3.
In this example the ISR will fire once after 10 seconds and wont fire again for
another 43 seconds until the CT1 value wraps around again. Then from there ISR will work Ok.
Thanks much for the explanation. When I saw the REP instruction in the conditions list that made me think that WAITX could be involved, but at the time I had sections of code where WAITX was executing and it didn't show any ill effect. It was new code, and as I was perusing it looking for my usual stupid-programmer-trick of having a "#"-less branch address I spotted a WAITX whose timespan happened to be greater than the ISR's, and it was at that point that I tried the POLLCTx approach, which works just fine -- it's just a little more verbose than a good 'ol waitx
I've got the ROM booter written now, but it needs some debugging, yet. It does signed loader verification using the fuses. The bug is in the SHA-256/HMAC code. I hope to have that fixed today. Once that's done, we may be finished with the chip, as far as what's needed to make the silicon goes.
I decided NOT to have a monitor program in ROM, for several reasons which I'll write about later.
Comments
ALTR could be thought of this way. So reg1 = reg2 + reg3
I was mentally unintentionally switching between SETx and ALTx too. That's another reason to make the naming more different.
I was pondering what really is the best name since this SETI instruction is not really operating on the whole instruction like ALTI does. And as you highlighted, SETR is not a suitable mnemonic either since R implies the result register post execution.
If you intend of modifying an instructions opcode an/or CZ effect bits then "I" is relevant.
On the other hand if you intend on using the register to configure the ALTI instructions R-field then "R" is relevant.
In the end they are the same 9 bits.
Yeah, I changed SET I to SETR.
Correct.
This is true. We almost need different names for different uses, even though bits 27..19 are always the ones being affected. SETR is going to be the most common use for setting those bits.
SETS sets sssssss (b8:0) to S/# of the instruction/register pointed to by D
SETD sets ddddddd (b17:9) to S/# of the instruction/register pointed to by D
SETI sets the iiiii_cz (b27:19) to S/# of the instruction/register pointed to by D
Note: If the destination is an instruction to be executed, then at least 2 instructions (4 clocks) must pass before the modified instruction can be executed due to the instruction pipeline.
ALTS alters sssssss (S8:0) field of the following instruction to the value of D plus S/#
ALTS alters ddddddd (D17:9) field of the following instruction to the value of D plus S/#
ALTR alters rrrrrrrrr (result) field (the register address where the following instructions result will be written to) to the value of D plus S/#
Warning: The values of D and S/# are limited to register addresses (ie 9 bits) otherwise overflow fails!
ALTISD alters the I/S/D fields of the following instruction according to the value of S/#
ALTISD is currently ALTI but it is more than that.
What happens with the next instruction if the S and/or D fields are not immediate??? I found that I needed to program MOV x,#0 for the ALTS to work properly.
cccc_ooooooo_czi_ddddddddd_sssssssss
Ohhhh, you've got my vote!
I'm also fine with having dual-opcodes aka 64b opcode generation, if that makes the ASM easier to read.
It is also ok to have multiple names for the same binary opcode, if that also makes intent clearer.
The idea is to have readable ASM files.
If you want the following S field to be an immediate you must have it already defined it as a ",#s".
This is opposite to the TESTB instruction which will set the Z flag if the bit=0, or other instructions which set the Z flag if the result is 0.
Should it be changed?
Also, it sure would be nice to have the reverse instruction
SAVECZ D
which replaces D[1:0] with the C and !Z flag bits. ie D[31:2] remain untouched.
This could occupy the SETCZ D instruction slot with WC & WZ =00, or one of the spare opcode3 slots.
The forum seems dead as a nail.
Unfortunately, I'm just not finding the time to do it right now... I had hoped to get the VSCode extension finished enough to help others write more P2 code, but that's been a much bigger effort than I was expecting. I also need to find time to update/re-rerun these unit tests.
I did some experimenting with full-speed, and it looks like it's going to be a real challenge to get something working reliably at 80Mhz, but it should be doable when the P2's on 160Mhz silicon.
But been travelling, so has been slow...
Now that it looks like last version may last a while, I'm more motivated...
Where are the issues with FS USB ?
Hmm, that exposes something of a conundrum, as Full Speed should really be tested before passing to FAB, and it seems 80MHz is the upper ceiling ?
Next choices could be 84MHz & 96MHz ?
Note that 160MHz is purely an aspirational target, and the final device may come in well under that, so best to have some real margin in USB operation.
I've been testing P2 at every free moment since V1.
I'm currently finishing off a P2 demonstration that includes working examples of ALL P2 instructions.
It uses multiple cogs (8) and uses hubexec, lutexec ,lut sharing, interrupts, streamers, cog attention, cordic etc. etc.
All of the instruction examples are working (necessary) parts of the complete demo.
I hope to have it and its documentation close to finalized next week.
Teaser: The video part of the demo uses 5 cogs.
On receive I've had success reading handshake response packets to SETUP transactions, but with data packets I can get only a few bytes read before falling behind the incoming data. I've pretty much run short of ideas on how to tighten up the routine and still do the right thing by running a CRC check on the data. I'm not the brightest bulb in the room when it comes to writing clever code, so there's likely someone that can get some demonstration code up and running at full-speed, as was done on the P1, but on that project there had to be a lot of corners cut to get it to work.
But I'm optimistic that it can be done in a robust manner with just a little more speed. 96MHz might get something workable, and if the silicon comes close to the 160MHz target that should provide more than enough headroom.
Ooooh! Sounds interesting, so cannot wait to see
No interrupts can occur while the WAITX instruction is executing.
another 43 seconds until the CT1 value wraps around again. Then from there ISR will work Ok.
I've got the ROM booter written now, but it needs some debugging, yet. It does signed loader verification using the fuses. The bug is in the SHA-256/HMAC code. I hope to have that fixed today. Once that's done, we may be finished with the chip, as far as what's needed to make the silicon goes.
I decided NOT to have a monitor program in ROM, for several reasons which I'll write about later.