Chip:
I see that in the latest instruction list
SUBR
INCMOD
DECMOD
I presume these insructions require that the R (WR) bit =1 to distinguish them from their other opcodes with R=0.
If You look correctly -- You will find what I ask Chip!
Sorry. I guess you're looking for documentation to show that these are really the same opcode? I guess I should have known that since you obviously would know that those are all the same instruction. :-)
Sorry. I guess you're looking for documentation to show that these are really the same opcode? I guess I should have known that since you obviously would know that those are all the same instruction. :-)
Sapieha: There is no reason to have a separate NOP instruction (as in fact we didn't in P1). I presume the compiler (you can see what pnut.exe outputs) will continue to output 32x 0's, but NOPX #1 (or NOPX #0 ??? see below) would be equally correct, along with any opcode other than REPS together with cccc=0000. REPS is opcode 3 with some other bits set to distinguish it from others.
This brings up the question (because of REPD/REPS), does NOPX #1 repeat NOP 1 or 2 executions???
Did anyone see an explanation of JP & JNP & SUBR anywhere. I seem to have missed if/where I saw it.
BTW Here is my latest P2 Instruction Summary using Chips latest opcode list. IT is an Excel Spreadsheet zipped. P2_Instructions(10).zip
SUBR is subtract reverse: D = S - D, while normal SUB is D = D - S
JP and JNP I guess they work like that:
JP D,S : if Pin D <> 0 jump to S / #S
JNP D,S : if Pin D == 0 jump to S / #S
re:BTW Here is my latest P2 Instruction Summary using Chips latest opcode list. IT is an Excel Spreadsheet zipped. P2_Instructions(10).zip
That's looking real good and very helpful. Thanks!
Bob
Thanks Bob. My earlier posts were a zipped Excel spreadsheet but I just came to realise probably most did not know that.
When Chip posted the later opcodes a few posts back, I just imported them into the spreadsheed, sorted them to align with my prior instruction set, and used formulae to copy them into my part of the spreadsheet, then deleted the imported data. Most of the hard written docs I had done earlier. Then a re-sort.
Andy: Thanks for the SUBR instruction. Yes, I have guessed the JP/JPD/HNP/JNPD are something like that although probably the D is a mask where bits=1 are the bits to be tested. The jump has to be to the [#]S location to be consistent with other jumps.
There are only a few instruction details I am not happy with atm.
Chip: You previously asked if anyone had a specific shift style instruction. I see you have a spare slot opcode 000011 S 01000 marked <empty>.
I am unsure of how difficult this is, whether you have time, if there is any risk, and if you consider this useful to others besides me. So I will just put it out there for you.
In my faster interpreter, and in ZiCog, for each base 8 bit instruction I used this value as an offset into a table of 256 longs (I refer to them as a vector table). Each long had 3 9 bit cog addresses (vectors) for jump/call tables. This allowed me to divide the ops into up to 3 sets of routines. Of course, some of these routines were common to many ops. So each time I grabbed a new vector, I either jumped or called the address sitting in the lowest 9 bits of the long vector I had fetched. If the op needed a 2nd routine then I would shift the vector 9 bits right and again perform a jump/call. Likewise, if a 3rd routine was required, I would again shift right 9 bits and jump/call. I also used the top 5 bits as flags to do certain things within my routines. Now as these vectors shifted right, so would these 5 bits of flags.
What would be nice, would be an instruction that shifted only the right 27 bits (3x9) right 9 places leaving the top 5 bits in place. It would also be nice to clear the 9 bits 26..18 after the shift although this is not absolutely necessary.
As a bonus, it could be nice to shift left the top 5 bits a number of places left, inserting 0's in their place.
Here is what the P1/P2 equivalent instructions would be like to achieve this...
DAT
' Simplest basic instruction...
' Shift bits 26..0 (or 26..9) >> 9 leaving bits 31..27 untouched...
' 000011_ZCN_1_CCCC_DDDDDDDDD_01000xxxx S27R9 D 'xxxx=0000 (leaves another 15 possible instructions)
' Shifts D[26..0] >> 9 then clears D[28..18]=0 leaving D[31..27] untouched
' S27R9 vector ' new instruction simulated by P1/P2
S27R9 mov t1,vector ' take a copy
and t1,top5 ' extract top 5 bits for later
shr vector,#9 ' shift all bits right 9, bringing in 9x 0's at the top
and vector,bottom18 ' extract lower 18 bits (2x 9) <This could be optional>
or vector,t1 ' put back the top 5 bits
vector long (%10101 << 27) | (101101101 << 18) | (010010010 << 9) | (111000111) ' 5 bits plus 3x 9bit cog addresses
'the following are just to be able to show the instructions but would not be required with the new instruction
t1 long 0 ' temporary register
top5 long %11111_000000000_000000000_000000000 ' mask to extract top 5 bits[31..27]
bottom18 long %00000_000000000_111111111_111111111 ' mask to extract bottom 18 bits[17..0]
' Perhaps a second instruction or variant using S[0]=1
' 000011_ZCN_1_CCCC_DDDDDDDDD_01000xxx1 S27R9L2 D 'xxx-=000x (leaves another 14 possible instructions)
' Shifts D[26..0] >> 9 plus shifts D[31..27] << 2 then clears D[28..18]=0
Cluso99,
Perhaps a more general set of instructions that could still work for you are this: SWAPDS & SWAPIS (maybe more variants would be needed). It would require your data to be in 9 5 9 9 format instead of 5 9 9 9 format. So instead of shifting, you just swap things into S and then jmp/call to S.
I'm just offering this up as an alternative solution. If Chip does end up doing another change in the Verilog he might slip something like this in. I'm not suggesting he do one just for this, since it costs them a decent chunk of change each time.
Roy: My alternative is to just copy the vector twice, or keep track of where the upper 5 bits are. But with the instruction I described, provided it replaces the 9 bits shifted with 0's, what I end up with is a set of vectors that will store "0" once all the vectors have been executed. So I can just place an instruction at cog $0 and can continue to do my jump/call. So if the instruction only has 2 vectors used, the third will be previously set to $0. So it saves me checking the number of routines executed. I couldn't do this before.
I am sure there are additional ways I can do this with a special instruction. I did not want to try and make this special instruction too complicated, although I can think of a few ways to increase its possible use by utilising those 4 unused S bits.
I will continue to think about it, and to check my faster spin interpreter, to see if there is a better way with a special instruction. Meanwhile, I will await Chips reply to see if this is worthwhile pursuing some more over the next couple of days.
Cluso99,
Ok, yeah, I see the need for the clearing now. The most general case instructions I could think of require 15 bits of input data (which I don't think is available anymore). It would be a SHFR, SHFL, & SAFR. Same as SHR, SHL, and SAR except you speficy the upper and lower bounds of the field (5 bits each) along the shift amount (5 bits). Normal shifting clears the bits that are left behind, SAR clears or sets based on the upper bit, so it would work for you I think.
If You have still some instructions space why instead of adding extra shift --- Don't add one more SPx ---- SPC so it can be possible use CLUT at both Stack and Ring-buffer
I have been looking at the instruction summary (I have attached my latest spreadsheet here in zipped format)...
Since the work has begun on combining the P2 logic with the frame, I expect there will be no further changes to the P2 unless further bug(s) are found.
For the documents, here are a few questions, and also suggestions for possible future use by imposing some restrictions (aided by compiler restrictions):
SUBR, CMPSUB, INCMOD, DECMOD require WR (R=1) for correct decoding of their respective related instructions with NR (R=0). (see above posts)
Isn't CMPS a SUBS with NR ? and Isn't CMPSX a SUBSX with NR ? If so, this would save 2 opcodes for later. Make the compiler utilise SUBS/SUBSX NR coding for CMPS and CMPSX.
ROR, ROL, SHR, SHL, RCR, RCL, SAR, REV require that S[8..5] be set to 0000. This would permit possible future enhancements to these instructions by utilising these additional 4 bits. (my request above for a new shift instruction could fit in here at some later time)
ENC could have WR (R=1) permanently, so providing another instruction(s) with NR (R=0) later.
IJZ, IJZD, IJNZ, IJNZD and DJZ, DJZD, DJNZ, DJNZD and TJZ, TJZD, TJNZ, TJNZD and JP, JPD, JNP, JNPD use 3 opcodes. Unless I missed a use for IJxx and DJxx having the NR (R=0) option, then if you need another code change to the P2, these instructions could share 2 opcodes where the TJxx and JPxx have NR (R=0) and IJxx and DJxx have the WR (R=1). This would leave another opcode free. WAITVID (and DECMOD) could then move to opcode 111110 (puts it next to the other WAITCNT, WAITPEQ, WAITPNE opcode) - more logical in the summary???
I note many of the opcode 000011 instructions with a D operand have the option for #n. Is this the case for all these opcodes where there is a D operand?
I just finished documenting the hub counter instructions (at the end):
000011 000 1 CCCC DDDDDDDDD 000001101 PASSCNT D 'loops until CNTL passes D 1*
111111 0CR I CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,S 'wait for CNTL or CNT (WC), D += S ?
111111 110 I CCCC DDDDDDDDD SSSSSSSSS WAITPEQ D,S WC 'wait for (pins & S) = D with timeout ?
111111 111 I CCCC DDDDDDDDD SSSSSSSSS WAITPNE D,S WC 'wait for (pins & S) = D with timeout ?
----------------------------------------------------------------------------------------------------
* 1 clock if task uses no more than every 4th time slot (4 clocks in single-task)
can these have a granularity formula ? - and mins and inc sizes for ?
Are all multi-task compatible ?
I see this caveat PASSCNT This is intended as a non-pipeline-stalling alternative to WAITCNT, for use in multi-task programs.
I do not quite follow the * 1 clock if task uses no more than every 4th time slot (4 clocks in single-task)
something like : This takes X cycles + V*S*Tcy more, where Tcy is the sysClk period, S is the Slice multiplier and V is the variable time
0..RealTime. ?
I've just added these to the main document but it looks like nothing much has been done with the assembler reference section in another document. Maybe Seairth has run out of steam, any volunteers?
I find it too time consuming to update the document from the other contributions as I would have to find all the bits that have been changed. Is there any possiblity guys that you can update the live document? I know sometimes it's easier just to work offline with your favorite tools but it's even easier not to bother in the first place. Since we are putting the effort in how about that little extra needed to wrangle with Google docs, it gets easier, really, and everyone gets to benefit by having it all up-to-date in one place. Otherwise we just end up with fragmented bits and pieces.
The instruction summary spreadsheet is here and edit permissions are open to anyone.
Sorry Peter. THe spreadsheet is just too complicated to edit in google docs. There are too many changes and they are still coming. I don't mind if errors are posted on this thread and I will fix them. Once they settle down they can be posted that way, but atm it is way way easier to keep them in the excel and post here as a zip file. The last major change I had to import all the opcodes from Chip's later list because so many had changed. I will be posting an update shortly as soo as I resolve a few questions with Chip.
How can you decode the COGINIT instruction? Am I missing something?
000011 ZCR 0 CCCC DDDDDDDDD SSSSSSSSS COGINIT D,S 'launch cog at D, cog PTRA = S 1..9
000011 000 0 CCCC DDDDDDDDD 010110000 WRQUAD D 'write QUADs at D 1..8
000011 000 0 CCCC DDDDDDDDD 010110001 RDQUAD D 'read quad at D into QUADs 1..8
000011 010 0 CCCC DDDDDDDDD 010110001 RDQUADC D 'read cached quad at D into QUADs 1, 1..8
How can you decode the COGINIT instruction? Am I missing something?
000011 ZCR 0 CCCC DDDDDDDDD SSSSSSSSS COGINIT D,S 'launch cog at D, cog PTRA = S 1..9
000011 000 0 CCCC DDDDDDDDD 010110000 WRQUAD D 'write QUADs at D 1..8
000011 000 0 CCCC DDDDDDDDD 010110001 RDQUAD D 'read quad at D into QUADs 1..8
000011 010 0 CCCC DDDDDDDDD 010110001 RDQUADC D 'read cached quad at D into QUADs 1, 1..8
Comments
When describing the REPS instruction it says it can repeat 1..16,384 times. But then in the example code it is set to repeat 20,000 times ???
Bean
000011 n11 1 nnnn nnnnnnnnn 001iiiiii
Scrub that, 14 bits are only 16,384 anyway....I'll have to have a closer look but it's getting a bit late.
txt to pdf
I have updated my summary and posted below.
I note that the following instructions are no longer valid.
P2_Instructions(9).zip
Scrub that, I didn't read the attachement properly, it's SETSQRT
I see that in the latest instruction list
SUBR
INCMOD
DECMOD
I presume these insructions require that the R (WR) bit =1 to distinguish them from their other opcodes with R=0.
None of documents still describe that ones.
You write to me obvious things.
If You look correctly -- You will find what I ask Chip!
Still for be correct in my version of Instructions manual --- I need that info from Chip.
Not assumptions from You!
This brings up the question (because of REPD/REPS), does NOPX #1 repeat NOP 1 or 2 executions???
Did anyone see an explanation of JP & JNP & SUBR anywhere. I seem to have missed if/where I saw it.
BTW Here is my latest P2 Instruction Summary using Chips latest opcode list. IT is an Excel Spreadsheet zipped.
P2_Instructions(10).zip
COG picture need addition of.
4 x Regs.CFGPINS
re:BTW Here is my latest P2 Instruction Summary using Chips latest opcode list. IT is an Excel Spreadsheet zipped.
P2_Instructions(10).zip
That's looking real good and very helpful. Thanks!
Bob
SUBR is subtract reverse: D = S - D, while normal SUB is D = D - S
JP and JNP I guess they work like that:
JP D,S : if Pin D <> 0 jump to S / #S
JNP D,S : if Pin D == 0 jump to S / #S
Andy
When Chip posted the later opcodes a few posts back, I just imported them into the spreadsheed, sorted them to align with my prior instruction set, and used formulae to copy them into my part of the spreadsheet, then deleted the imported data. Most of the hard written docs I had done earlier. Then a re-sort.
Andy: Thanks for the SUBR instruction. Yes, I have guessed the JP/JPD/HNP/JNPD are something like that although probably the D is a mask where bits=1 are the bits to be tested. The jump has to be to the [#]S location to be consistent with other jumps.
There are only a few instruction details I am not happy with atm.
I am unsure of how difficult this is, whether you have time, if there is any risk, and if you consider this useful to others besides me. So I will just put it out there for you.
In my faster interpreter, and in ZiCog, for each base 8 bit instruction I used this value as an offset into a table of 256 longs (I refer to them as a vector table). Each long had 3 9 bit cog addresses (vectors) for jump/call tables. This allowed me to divide the ops into up to 3 sets of routines. Of course, some of these routines were common to many ops. So each time I grabbed a new vector, I either jumped or called the address sitting in the lowest 9 bits of the long vector I had fetched. If the op needed a 2nd routine then I would shift the vector 9 bits right and again perform a jump/call. Likewise, if a 3rd routine was required, I would again shift right 9 bits and jump/call. I also used the top 5 bits as flags to do certain things within my routines. Now as these vectors shifted right, so would these 5 bits of flags.
What would be nice, would be an instruction that shifted only the right 27 bits (3x9) right 9 places leaving the top 5 bits in place. It would also be nice to clear the 9 bits 26..18 after the shift although this is not absolutely necessary.
As a bonus, it could be nice to shift left the top 5 bits a number of places left, inserting 0's in their place.
Here is what the P1/P2 equivalent instructions would be like to achieve this...
Perhaps a more general set of instructions that could still work for you are this: SWAPDS & SWAPIS (maybe more variants would be needed). It would require your data to be in 9 5 9 9 format instead of 5 9 9 9 format. So instead of shifting, you just swap things into S and then jmp/call to S.
I'm just offering this up as an alternative solution. If Chip does end up doing another change in the Verilog he might slip something like this in. I'm not suggesting he do one just for this, since it costs them a decent chunk of change each time.
Roy
I am sure there are additional ways I can do this with a special instruction. I did not want to try and make this special instruction too complicated, although I can think of a few ways to increase its possible use by utilising those 4 unused S bits.
I will continue to think about it, and to check my faster spin interpreter, to see if there is a better way with a special instruction. Meanwhile, I will await Chips reply to see if this is worthwhile pursuing some more over the next couple of days.
Ok, yeah, I see the need for the clearing now. The most general case instructions I could think of require 15 bits of input data (which I don't think is available anymore). It would be a SHFR, SHFL, & SAFR. Same as SHR, SHL, and SAR except you speficy the upper and lower bounds of the field (5 bits each) along the shift amount (5 bits). Normal shifting clears the bits that are left behind, SAR clears or sets based on the upper bit, so it would work for you I think.
If You have still some instructions space why instead of adding extra shift --- Don't add one more SPx ---- SPC so it can be possible use CLUT at both Stack and Ring-buffer
I have been looking at the instruction summary (I have attached my latest spreadsheet here in zipped format)...
Since the work has begun on combining the P2 logic with the frame, I expect there will be no further changes to the P2 unless further bug(s) are found.
For the documents, here are a few questions, and also suggestions for possible future use by imposing some restrictions (aided by compiler restrictions):
- SUBR, CMPSUB, INCMOD, DECMOD require WR (R=1) for correct decoding of their respective related instructions with NR (R=0). (see above posts)
- Isn't CMPS a SUBS with NR ? and Isn't CMPSX a SUBSX with NR ? If so, this would save 2 opcodes for later. Make the compiler utilise SUBS/SUBSX NR coding for CMPS and CMPSX.
- ROR, ROL, SHR, SHL, RCR, RCL, SAR, REV require that S[8..5] be set to 0000. This would permit possible future enhancements to these instructions by utilising these additional 4 bits. (my request above for a new shift instruction could fit in here at some later time)
- ENC could have WR (R=1) permanently, so providing another instruction(s) with NR (R=0) later.
- IJZ, IJZD, IJNZ, IJNZD and DJZ, DJZD, DJNZ, DJNZD and TJZ, TJZD, TJNZ, TJNZD and JP, JPD, JNP, JNPD use 3 opcodes. Unless I missed a use for IJxx and DJxx having the NR (R=0) option, then if you need another code change to the P2, these instructions could share 2 opcodes where the TJxx and JPxx have NR (R=0) and IJxx and DJxx have the WR (R=1). This would leave another opcode free. WAITVID (and DECMOD) could then move to opcode 111110 (puts it next to the other WAITCNT, WAITPEQ, WAITPNE opcode) - more logical in the summary???
- I note many of the opcode 000011 instructions with a D operand have the option for #n. Is this the case for all these opcodes where there is a D operand?
P2_Instructions(11).zipProp2_Docs.txt
can these have a granularity formula ? - and mins and inc sizes for ?
Are all multi-task compatible ?
I see this caveat
PASSCNT This is intended as a non-pipeline-stalling alternative to WAITCNT, for use in multi-task programs.
I do not quite follow the * 1 clock if task uses no more than every 4th time slot (4 clocks in single-task)
something like : This takes X cycles + V*S*Tcy more, where Tcy is the sysClk period, S is the Slice multiplier and V is the variable time
0..RealTime. ?
I find it too time consuming to update the document from the other contributions as I would have to find all the bits that have been changed. Is there any possiblity guys that you can update the live document? I know sometimes it's easier just to work offline with your favorite tools but it's even easier not to bother in the first place. Since we are putting the effort in how about that little extra needed to wrangle with Google docs, it gets easier, really, and everyone gets to benefit by having it all up-to-date in one place. Otherwise we just end up with fragmented bits and pieces.
The instruction summary spreadsheet is here and edit permissions are open to anyone.
How can you decode the COGINIT instruction? Am I missing something?
And these... Should SUBR, CMPSUB, INCMOD and DECMOD have R=1 ?
Sorry, Cluso!
The "I" bit should set to 1 in WRQUAD..RDQUADC.
Here is an updated file:
Prop2_Docs.txt