Operation of C flag in SHx/RCx/ROx different in P2

Cluso99 · 2013-05-11 22:39

Postedit:

The shift/rotate instructions on the P1 are described as setting the C bit according to the original D[0] or D[31] depending on Right or Left shift/rotate accordingly. I have not yet confirmed this.

The P2 differs, in that it checks the last bit shifted out (left or right) and sets the C bit accordingly.
Therefore,
SHR D,#23wc[,nr]
will set C according to the original bit D[22] because this would be the last bit shifted out.
SHL D,#10 wc[,nr]
will set C according to the original bit D[22] because this would be the last bit shifted out.

Note the P2 also has a new instruction ISOB D.b which can test/isolate a bit.

The P1 specifies that the C will be set by either the original D[31] for left, or original D[0] for right. (I have not confirmed this)

However, when I use this on the P2 (DE0) I seem to get these results... (I am looking at the instructions immediate bit D[22] for disassembly)

Forgetting that C was set by the original DCOLOR=#ff0000]0[/COLOR I performed the following...
I am using LMM so between each of these instructions are 5 in the LMM loop.
Remember, I am examining for the "i" bit

  shr X, #22 wc,nr 'incorrect
However, to examine bit 22 I needed to do this...
  shr X,#23 wc,nr 'correct
I repeated this with
  rcr x,#22 wc,nr 'incorrect
  rcr x,#23 wc,nr 'correct
So I tried the left versions
  shl x,#9 wc,nr  'incorrect
  shl x,#10  wc,nr  'correct
  rol x,#9 wc,nr  'incorrect
  rol x,#10 wc,nr  'correct
Simple testing with one shr shows the nr is not the problem.

So what I am seeing that the C flag is being set by the last bit shifted out left or right, so I need to shift 1 more than required. However,
if I read the P1 document correctly, then the C flag should in fact be set by the ORIGINAL value in D[0] or D[31].

Can you tell me if this is supposed to happen or do I have a bug somewhere?

So am I reading the P1 specs correctly?
Is the P2 behaviour different from P1?
Is the P2 behaviour correct?

I have attached my P2 debugger code that tests this. See around lines 2475 (a little below the label :part7)
The command to view a reasonable set of instructions disassembled is...
1000,10L2<cr>
Use P2Load to load the pnut compiled object...
p2load -b 115200 -v -s LSD_???_c.obj -h -T

LSD_081.spin

Here are the instructions being executed...

:part7
' display 'source'
  mov lmm_x, #"$"   ' "$"
'        'FCALL SP++, @_HubTx   ' \   < call: transmit char(s) >
   wrlong  lmm_pc, lmm_sp   ' | PUSH PC
   add  lmm_sp, #4   ' | SP++
   rdlong  lmm_pc, lmm_pc   ' | CALL...
   long  @_HubTx   ' / ...PC = ADDR
  mov lmm_x, lmm_c   ' restore 'long'
  shr lmm_x, #9   ' to 'S'
  and lmm_x, #$1FF   ' extract 'S'
  mov lmm_f, #_HEX+3   ' set hex mode with 3 digits
'        'FCALL SP++, @_HubHex   ' \   < call: display hex >
   wrlong  lmm_pc, lmm_sp   ' | PUSH PC
   add  lmm_sp, #4   ' | SP++
   rdlong  lmm_pc, lmm_pc   ' | CALL...
   long  @_HubHex   ' / ...PC = ADDR
' display 'destination'
  rdlong lmm_x, lmm_pc   '\ ", $" or ",#$"
  long " "|"$"<<8   '/ (nop for 1-2 bytes)
'??wrong shr lmm_c, #22  wc,nr ' extract 'immediate' bit
'works?? shr lmm_c, #23  wc,nr ' extract 'immediate' bit
'??wrong rcr lmm_c, #22  wc,nr ' 
'works?? rcr lmm_c, #23  wc,nr ' extract 'immediate' bit
'??wrong shl lmm_c, #9  wc,nr
'works?? shl lmm_c, #10  wc,nr
''??wrong rol lmm_c, #9  wc,nr ' extract 'immediate' bit
'works??  rol lmm_c, #10  wc,nr ' extract 'immediate' bit
'  rol lmm_c, #10  wc,nr ' extract 'immediate' bit ***** P2 BUG ????
  shr lmm_c, #22  wc,nr ' extract 'immediate' bit (gives wrong result)
 if_c or lmm_x, #"#"   ' set "#"
  shl lmm_x, #8   '
  or lmm_x, #","   ' add ","
'        'FCALL SP++, @_HubTx   ' \   < call: transmit char(s) >
   wrlong  lmm_pc, lmm_sp   ' | PUSH PC
   add  lmm_sp, #4   ' | SP++
   rdlong  lmm_pc, lmm_pc   ' | CALL...
   long  @_HubTx   ' / ...PC = ADDR
  mov lmm_x, lmm_c   ' restore 'long'
  and lmm_x, #$1FF   ' extract 'D'
'        'FCALL SP++, @_HubHex   ' \   < call: display hex >
   wrlong  lmm_pc, lmm_sp   ' | PUSH PC
   add  lmm_sp, #4   ' | SP++
   rdlong  lmm_pc, lmm_pc   ' | CALL...
   long  @_HubHex   ' / ...PC = ADDR
   
'  ------------------------

And the results...

=== Cluso's P2 Debugger v0.81 ===
*1000,10l2
 addr- instr  zcr i cccc dst src -  3 2 1 0 -  0  1  2  3
 --------------------------------------------------------
01000- 000011 001 1 1111 004 00D -              *******   $004,#$00D
01004- 100000 001 0 1111 004 005 -               ADD      $004,#$005
01008- 111111 001 0 1111 004 000 -              *WAITxxx  $004,#$000
0100C- 000111 000 1 1111 000 009 -              #JMPRET   $000,#$009
01010- 000000 000 0 0000 000 000 -              *rwBYTE   $000, $000
01014- 000000 011 1 0010 0E1 180 - if_Z_AND_NC  *rwBYTE   $0E1, $180
01018- 000100 011 1 1000 0D1 100 - if_Z_AND_C   *SETACCx  $0D1,#$100
0101C- 000000 010 0 1000 1A2 167 - if_Z_AND_C   *rwBYTE   $1A2,#$167
01020- 000000 000 0 1100 199 031 - if_C         *rwBYTE   $199,#$031
01024- 101000 001 1 1111 1E8 080 -               MOV      $1E8,#$080
*

pedward · 2013-05-11 23:55

FWIW, the observed behavior is what the x86 does. I always thought the P1 way was less useful because you couldn't shift out 1 bit and make a software shift register with that method. Perhaps Chip fixed this behavior?

Heater. · 2013-05-12 00:38

I'd go as far as to say that if the PI does not shift the top or bottom bit into carry then that is a bug. Or a design error(feature).
Every other processor I have used has shifted the end bits into carry.

Of course knowing Chip he may have had a very good reason for not doing it the normal way.

Cluso99 · 2013-05-12 04:56

I agree it would make more sense to set the flags based on the result. Z is done on the result. But a number of instructions do set Z and C based on the result of D[0] and D[31].
RCL and RCR are not the usual rotate as used in other micros that are only 1 bit shifts. We can shift up to 32 bits.

I need to get out my P1 and check what it did - not that it has to be the same. In fact, if the P1 docs are correct, I would be happy for the change to result based. It just needs documenting if different.

BTW the NR option has some great uses, particularly non destructive tests.

Phil Pilgrim (PhiPi) · 2013-05-12 09:30

With a barrel shifter, it's not so easy to determine what the "logical" behavior of C should be. I suppose that, if you think of a shift or rotate by six bits as equivalent to six one-bit operations, it make sense for C to hold the "last" bit shifted out. But that's not how a barrel shifter works: rather, it's just a single operation. In that case, as in the P1, it might make the most sense to set C to the original value of D[0] or D[31], i.e. the proximal value or "first" bit shifted out.

For the P2, I guess it boils down to which interpretation is the most useful. It would not disturb me to learn that Chip changed his mind and decided to set C from the last bit shifted out.

-Phil

cgracey · 2013-05-12 21:04

On the Prop2 shift/rotate instructions, C is set to the last bit shifted out (not the MSB or LSB like on the Prop1).

This was changed because.

Phil Pilgrim (PhiPi) · 2013-05-12 21:11

cgracey wrote:

This was changed because.

For whatever reason, "Yay!"

At least in this case none of the 33 bits disappear when doing a rotate through carry.

-Phil

Cluso99 · 2013-05-12 21:48

I am pleased too!

Much better to set the flags on the result rather than the input. And it works with 'NR' which means you don't even need to actually save the shift/rotate.

Phil Pilgrim (PhiPi) · 2013-05-12 22:09

Cluso99 wrote:

And it works with 'NR' which means you don't even need to actually save the shift/rotate.

'Great for testing any individual bit without having to define a separate long as a mask!

-Phil

Heater. · 2013-05-13 00:58

Chip,

This was changed because.

That's a good enough reason for me.
Especially as I agree with the outcome.

pedward · 2013-05-13 05:29

I believe the new instruction ISOB allows you to test individual bits. The caveat is that S is used for part of the instruction coding, so only immediate values are of use.

Phil Pilgrim (PhiPi) · 2013-05-13 08:20

pedward wrote:

I believe the new instruction ISOB allows you to test individual bits. The caveat is that S is used for part of the instruction coding, so only immediate values are of use.

So it would appear, then, that a new instruction was unnecessary.

-Phil

Seairth · 2013-05-13 10:20

Phil Pilgrim (PhiPi) wrote: »

So it would appear, then, that a new instruction was unnecessary.

Possibly true for handling individual bits, but not for updating the D register. However, to complement SHR/SHL, maybe you could do "ISOB d,n wz nr" when you want to use the Z flag instead of the C flag?

Phil Pilgrim (PhiPi) · 2013-05-13 10:40

In Perl, there's a famous acronym, TMTOWTDI. It stands for, "There's more than one way to do it," and that capacity is considered to be a Good Thing. I'm sure the same applies to the P2!

-Phil

pedward · 2013-05-13 11:31

ISOB appears to be part of a suite of bit handling instructions. While it's possible to use a catapult to get from one side of the gorge to another, a bridge works just as well and is easier for the layman to use.

That said, programmers with C or Arduino experience by appreciate a syntax such as ISOB statusReg.10

Again, however it doesn't permit an easy MOVS bitDetect, bitField that is necessary for programmatically defined values. The bit manipulation instructions use the lower 5 bits to enumerate the bit and the upper 4 bits to define the sub instruction.

Heater. · 2013-05-13 12:58

Phil,

TMTOWTDI

I'm really not sure on what planet TMTOWTDI is considered a "good thing".

In a programming language TMTOWTDI might seem a good idea. After all it only adds bloat and confusion to your language definition and hence to your compilers. Ultimately it confuses the readers of your code because they have to know every possible quirk in the syntax/semantics before they can understand what you have done. But, hey, it enabled you to write concise code.

However, TMTOWTDI in a CPU instructions set seems like not such a good idea to me. Every possible way to do the same thing adds transistors to your design. That takes silicon area that could possibly have been used for something else useful. It takes power. It needlessly takes up space in the brains of those working with your instruction set.

If you look at the Z80 instruction set you find quite a few OWTDI instructions that were never used in practice because what existed in the original 8080 instruction set was already good enough and often quicker.

Phil Pilgrim (PhiPi) · 2013-05-13 15:03

heater wrote:

In a programming language TMTOWTDI might seem a good idea. .... But, hey, it enabled you to write concise code.

Ex-ACTly!

Perl is rich, messy, and idiomatic. Those benefits appeal to a certain class of programmers, myself included.

However, TMTOWTDI in a CPU instructions set seems like not such a good idea to me.

In any architecture that provides a rich set of instructions, there is bound to be some overlap in functionality between families of instructions for certain narrow -- read pathological -- uses. In some cases, there may be something to gain from the pathological approach and, IMO, there's nothing wrong with taking advantage of it.

pedward wrote:

The caveat is that S is used for part of the instruction coding, so only immediate values are of use.

I'm not sure why non-immediate values could not also be used for instruction sub-coding, unless there's a pipeline issue.

-Phil

Heater. · 2013-05-13 15:22

Phil,

In any architecture that provides a rich set of instructions, there is bound to be some overlap in functionality between families of instructions...

Yep, I guess that is what the old RISC vs CISC arguments were about.

At the time I did not understand that debate very much.

All I could see is that instruction set designers were throwing all kinds of bells and whistles into their processors to make it easy for assembly language programmers.
So for example we had the 6809 with a pile of stuff on top of it's 6800 forebear. We had the Z80 with a pile of stuff on top of the 8080 instruction set. In both cases most of the extra stuff was never used. We had the 8086's and up dragging around instructions like DAA that are pretty much never used.

All that I might see as CISC.

On the other hand there were the guys who said "never mind the assembly language programmer, lets make a CPU that efficiently handles what compilers can produce."

That I see a RISC.

The Propeller is confusing as it has many properties of a RISC machine but so many odd ball instructions it is clearly CISC.

Or perhaps that whole debate does not apply here.

pedward · 2013-05-13 16:11

Phil Pilgrim (PhiPi) wrote: »

I'm not sure why non-immediate values could not also be used for instruction sub-coding, unless there's a pipeline issue.

-Phil

It's messier to use the bitwise instructions programmatically than to use the overlapping instructions. It's cleaner and more concise to use ISOB D.bit in immediate mode, but faster to do SHR D, S nr wc in register mode.

Cluso99 · 2013-05-13 19:19

I expect that the non-immediate mode would cause a pipeline issue.

However, for me, I am trying to only use the old P1 compatible instructions in my debugger. And this is where I came unstuck. While I did not quite remember the fact that the SHx/RCx/ROx instructions in P1 used the initial value of D, I was aware it used the D0/D31 bits. So it did not work as I expected. That is when I found that it required a shift/rotate of 1 bit extra to perform correctly, so I consulted the P1 instruction set. I still have not verified the P1 workings.

Anyway, it is true there is more than one way to do this. That is fine because each of the instructions have particular uses.

The 486 instroduced some RISC features in a CISC micro - these instructions were reduced cycles (cannot recall if they were single cycle). I wrote a minicomputer emulator in assembler and targetted the new 486 instructions for execution speed. This made a significant difference. However, I bet a lot of compilers still don't use some of these because the old instructions worked and those older compilers were most likely not reworked.

But the bottom line is... none of this matters! We just need to know what we have been provided with, so we can use it to our best ability.

BTW I am changing the subject slightly to reflect what I now know.

Ariba · 2013-05-13 21:56

Phil Pilgrim (PhiPi) wrote: »

.....
I'm not sure why non-immediate values could not also be used for instruction sub-coding, unless there's a pipeline issue.

Cluso99 wrote: »

I expect that the non-immediate mode would cause a pipeline issue.

I would say it is more a bit encoding issue than a pipeline issue ;-)
If the immediate bit is not set then the instruction is decoded as COGINIT D,S. This instruction has no immediate form. Instead with the immediate bit set, there are 100 or more new instructions added. Most use the S field for an extended opcode, some use a few bits in S to select the bit or the cog number or similar things.

Andy

Phil Pilgrim (PhiPi) · 2013-05-13 22:32

Ah, so. It pays to consult the docs.

-Phil

Cluso99 · 2013-05-13 22:44

Of course you are correct Andy.
But considering the immediate question - the instruction is decoded early in the pipeline, so that any non-immedaite value used for part of the instruction would cause problems in the pipeline. So, the immediate bit probably couldn't be used that way in any case, therefore becoming available for other instructions.

There are over 300 instructions in P2, and they had to be coded from bits somewhere. So various mixes of the ZCRI, CCCC and the S and D bits have been used. The majority of these are in the opcode 000011, but not all variations are restricted to this opcode. For anyone interested, my spreadsheet in the docs thread shows the instructions in bit definition order.

Operation of C flag in SHx/RCx/ROx different in P2

Comments