Operation of C flag in SHx/RCx/ROx different in P2
Cluso99
Posts: 18,069
Postedit:
The shift/rotate instructions on the P1 are described as setting the C bit according to the original D[0] or D[31] depending on Right or Left shift/rotate accordingly. I have not yet confirmed this.
The P2 differs, in that it checks the last bit shifted out (left or right) and sets the C bit accordingly.
Therefore,
SHR D,#23wc[,nr]
will set C according to the original bit D[22] because this would be the last bit shifted out.
SHL D,#10 wc[,nr]
will set C according to the original bit D[22] because this would be the last bit shifted out.
Note the P2 also has a new instruction ISOB D.b which can test/isolate a bit.
The P1 specifies that the C will be set by either the original D[31] for left, or original D[0] for right. (I have not confirmed this)
However, when I use this on the P2 (DE0) I seem to get these results... (I am looking at the instructions immediate bit D[22] for disassembly)
Forgetting that C was set by the original DCOLOR=#ff0000]0[/COLOR I performed the following...
I am using LMM so between each of these instructions are 5 in the LMM loop.
Remember, I am examining for the "i" bit
if I read the P1 document correctly, then the C flag should in fact be set by the ORIGINAL value in D[0] or D[31].
Can you tell me if this is supposed to happen or do I have a bug somewhere?
So am I reading the P1 specs correctly?
Is the P2 behaviour different from P1?
Is the P2 behaviour correct?
I have attached my P2 debugger code that tests this. See around lines 2475 (a little below the label :part7)
The command to view a reasonable set of instructions disassembled is...
1000,10L2<cr>
Use P2Load to load the pnut compiled object...
p2load -b 115200 -v -s LSD_???_c.obj -h -T
LSD_081.spin
Here are the instructions being executed...
And the results...
The shift/rotate instructions on the P1 are described as setting the C bit according to the original D[0] or D[31] depending on Right or Left shift/rotate accordingly. I have not yet confirmed this.
The P2 differs, in that it checks the last bit shifted out (left or right) and sets the C bit accordingly.
Therefore,
SHR D,#23wc[,nr]
will set C according to the original bit D[22] because this would be the last bit shifted out.
SHL D,#10 wc[,nr]
will set C according to the original bit D[22] because this would be the last bit shifted out.
Note the P2 also has a new instruction ISOB D.b which can test/isolate a bit.
The P1 specifies that the C will be set by either the original D[31] for left, or original D[0] for right. (I have not confirmed this)
However, when I use this on the P2 (DE0) I seem to get these results... (I am looking at the instructions immediate bit D[22] for disassembly)
Forgetting that C was set by the original DCOLOR=#ff0000]0[/COLOR I performed the following...
I am using LMM so between each of these instructions are 5 in the LMM loop.
Remember, I am examining for the "i" bit
shr X, #22 wc,nr 'incorrect However, to examine bit 22 I needed to do this... shr X,#23 wc,nr 'correct I repeated this with rcr x,#22 wc,nr 'incorrect rcr x,#23 wc,nr 'correct So I tried the left versions shl x,#9 wc,nr 'incorrect shl x,#10 wc,nr 'correct rol x,#9 wc,nr 'incorrect rol x,#10 wc,nr 'correct Simple testing with one shr shows the nr is not the problem.So what I am seeing that the C flag is being set by the last bit shifted out left or right, so I need to shift 1 more than required. However,
if I read the P1 document correctly, then the C flag should in fact be set by the ORIGINAL value in D[0] or D[31].
Can you tell me if this is supposed to happen or do I have a bug somewhere?
So am I reading the P1 specs correctly?
Is the P2 behaviour different from P1?
Is the P2 behaviour correct?
I have attached my P2 debugger code that tests this. See around lines 2475 (a little below the label :part7)
The command to view a reasonable set of instructions disassembled is...
1000,10L2<cr>
Use P2Load to load the pnut compiled object...
p2load -b 115200 -v -s LSD_???_c.obj -h -T
LSD_081.spin
Here are the instructions being executed...
:part7 ' display 'source' mov lmm_x, #"$" ' "$" ' 'FCALL SP++, @_HubTx ' \ < call: transmit char(s) > wrlong lmm_pc, lmm_sp ' | PUSH PC add lmm_sp, #4 ' | SP++ rdlong lmm_pc, lmm_pc ' | CALL... long @_HubTx ' / ...PC = ADDR mov lmm_x, lmm_c ' restore 'long' shr lmm_x, #9 ' to 'S' and lmm_x, #$1FF ' extract 'S' mov lmm_f, #_HEX+3 ' set hex mode with 3 digits ' 'FCALL SP++, @_HubHex ' \ < call: display hex > wrlong lmm_pc, lmm_sp ' | PUSH PC add lmm_sp, #4 ' | SP++ rdlong lmm_pc, lmm_pc ' | CALL... long @_HubHex ' / ...PC = ADDR ' display 'destination' rdlong lmm_x, lmm_pc '\ ", $" or ",#$" long " "|"$"<<8 '/ (nop for 1-2 bytes) '??wrong shr lmm_c, #22 wc,nr ' extract 'immediate' bit 'works?? shr lmm_c, #23 wc,nr ' extract 'immediate' bit '??wrong rcr lmm_c, #22 wc,nr ' 'works?? rcr lmm_c, #23 wc,nr ' extract 'immediate' bit '??wrong shl lmm_c, #9 wc,nr 'works?? shl lmm_c, #10 wc,nr ''??wrong rol lmm_c, #9 wc,nr ' extract 'immediate' bit 'works?? rol lmm_c, #10 wc,nr ' extract 'immediate' bit ' rol lmm_c, #10 wc,nr ' extract 'immediate' bit ***** P2 BUG ???? shr lmm_c, #22 wc,nr ' extract 'immediate' bit (gives wrong result) if_c or lmm_x, #"#" ' set "#" shl lmm_x, #8 ' or lmm_x, #"," ' add "," ' 'FCALL SP++, @_HubTx ' \ < call: transmit char(s) > wrlong lmm_pc, lmm_sp ' | PUSH PC add lmm_sp, #4 ' | SP++ rdlong lmm_pc, lmm_pc ' | CALL... long @_HubTx ' / ...PC = ADDR mov lmm_x, lmm_c ' restore 'long' and lmm_x, #$1FF ' extract 'D' ' 'FCALL SP++, @_HubHex ' \ < call: display hex > wrlong lmm_pc, lmm_sp ' | PUSH PC add lmm_sp, #4 ' | SP++ rdlong lmm_pc, lmm_pc ' | CALL... long @_HubHex ' / ...PC = ADDR ' ------------------------
And the results...
=== Cluso's P2 Debugger v0.81 === *1000,10l2 addr- instr zcr i cccc dst src - 3 2 1 0 - 0 1 2 3 -------------------------------------------------------- 01000- 000011 001 1 1111 004 00D - ******* $004,#$00D 01004- 100000 001 0 1111 004 005 - ADD $004,#$005 01008- 111111 001 0 1111 004 000 - *WAITxxx $004,#$000 0100C- 000111 000 1 1111 000 009 - #JMPRET $000,#$009 01010- 000000 000 0 0000 000 000 - *rwBYTE $000, $000 01014- 000000 011 1 0010 0E1 180 - if_Z_AND_NC *rwBYTE $0E1, $180 01018- 000100 011 1 1000 0D1 100 - if_Z_AND_C *SETACCx $0D1,#$100 0101C- 000000 010 0 1000 1A2 167 - if_Z_AND_C *rwBYTE $1A2,#$167 01020- 000000 000 0 1100 199 031 - if_C *rwBYTE $199,#$031 01024- 101000 001 1 1111 1E8 080 - MOV $1E8,#$080 *
spin
165K
Comments
Every other processor I have used has shifted the end bits into carry.
Of course knowing Chip he may have had a very good reason for not doing it the normal way.
RCL and RCR are not the usual rotate as used in other micros that are only 1 bit shifts. We can shift up to 32 bits.
I need to get out my P1 and check what it did - not that it has to be the same. In fact, if the P1 docs are correct, I would be happy for the change to result based. It just needs documenting if different.
BTW the NR option has some great uses, particularly non destructive tests.
For the P2, I guess it boils down to which interpretation is the most useful. It would not disturb me to learn that Chip changed his mind and decided to set C from the last bit shifted out.
-Phil
This was changed because.
For whatever reason, "Yay!"
At least in this case none of the 33 bits disappear when doing a rotate through carry.
-Phil
Much better to set the flags on the result rather than the input. And it works with 'NR' which means you don't even need to actually save the shift/rotate.
-Phil
Especially as I agree with the outcome.
-Phil
Possibly true for handling individual bits, but not for updating the D register. However, to complement SHR/SHL, maybe you could do "ISOB d,n wz nr" when you want to use the Z flag instead of the C flag?
-Phil
That said, programmers with C or Arduino experience by appreciate a syntax such as ISOB statusReg.10
Again, however it doesn't permit an easy MOVS bitDetect, bitField that is necessary for programmatically defined values. The bit manipulation instructions use the lower 5 bits to enumerate the bit and the upper 4 bits to define the sub instruction.
In a programming language TMTOWTDI might seem a good idea. After all it only adds bloat and confusion to your language definition and hence to your compilers. Ultimately it confuses the readers of your code because they have to know every possible quirk in the syntax/semantics before they can understand what you have done. But, hey, it enabled you to write concise code.
However, TMTOWTDI in a CPU instructions set seems like not such a good idea to me. Every possible way to do the same thing adds transistors to your design. That takes silicon area that could possibly have been used for something else useful. It takes power. It needlessly takes up space in the brains of those working with your instruction set.
If you look at the Z80 instruction set you find quite a few OWTDI instructions that were never used in practice because what existed in the original 8080 instruction set was already good enough and often quicker.
In any architecture that provides a rich set of instructions, there is bound to be some overlap in functionality between families of instructions for certain narrow -- read pathological -- uses. In some cases, there may be something to gain from the pathological approach and, IMO, there's nothing wrong with taking advantage of it.
I'm not sure why non-immediate values could not also be used for instruction sub-coding, unless there's a pipeline issue.
-Phil
Yep, I guess that is what the old RISC vs CISC arguments were about.
At the time I did not understand that debate very much.
All I could see is that instruction set designers were throwing all kinds of bells and whistles into their processors to make it easy for assembly language programmers.
So for example we had the 6809 with a pile of stuff on top of it's 6800 forebear. We had the Z80 with a pile of stuff on top of the 8080 instruction set. In both cases most of the extra stuff was never used. We had the 8086's and up dragging around instructions like DAA that are pretty much never used.
All that I might see as CISC.
On the other hand there were the guys who said "never mind the assembly language programmer, lets make a CPU that efficiently handles what compilers can produce."
That I see a RISC.
The Propeller is confusing as it has many properties of a RISC machine but so many odd ball instructions it is clearly CISC.
Or perhaps that whole debate does not apply here.
It's messier to use the bitwise instructions programmatically than to use the overlapping instructions. It's cleaner and more concise to use ISOB D.bit in immediate mode, but faster to do SHR D, S nr wc in register mode.
However, for me, I am trying to only use the old P1 compatible instructions in my debugger. And this is where I came unstuck. While I did not quite remember the fact that the SHx/RCx/ROx instructions in P1 used the initial value of D, I was aware it used the D0/D31 bits. So it did not work as I expected. That is when I found that it required a shift/rotate of 1 bit extra to perform correctly, so I consulted the P1 instruction set. I still have not verified the P1 workings.
Anyway, it is true there is more than one way to do this. That is fine because each of the instructions have particular uses.
The 486 instroduced some RISC features in a CISC micro - these instructions were reduced cycles (cannot recall if they were single cycle). I wrote a minicomputer emulator in assembler and targetted the new 486 instructions for execution speed. This made a significant difference. However, I bet a lot of compilers still don't use some of these because the old instructions worked and those older compilers were most likely not reworked.
But the bottom line is... none of this matters! We just need to know what we have been provided with, so we can use it to our best ability.
BTW I am changing the subject slightly to reflect what I now know.
I would say it is more a bit encoding issue than a pipeline issue ;-)
If the immediate bit is not set then the instruction is decoded as COGINIT D,S. This instruction has no immediate form. Instead with the immediate bit set, there are 100 or more new instructions added. Most use the S field for an extended opcode, some use a few bits in S to select the bit or the cog number or similar things.
Andy
-Phil
But considering the immediate question - the instruction is decoded early in the pipeline, so that any non-immedaite value used for part of the instruction would cause problems in the pipeline. So, the immediate bit probably couldn't be used that way in any case, therefore becoming available for other instructions.
There are over 300 instructions in P2, and they had to be coded from bits somewhere. So various mixes of the ZCRI, CCCC and the S and D bits have been used. The majority of these are in the opcode 000011, but not all variations are restricted to this opcode. For anyone interested, my spreadsheet in the docs thread shows the instructions in bit definition order.