Using Inline PASM2 to Learn PASM2

JonnyMac · 2020-09-15 21:19

During my presentation on Spin programming I stated that one of my favorite features of the P2 is inline PASM for small sections of code. I love this feature. Just yesterday I was working on a simplistic flash storage object and wanted to speed up the bit-banged SPI coms to the flash. Some will wonder why I didn't choose smart pins for SPI; the truth is that it takes more code to setup and use smart pins for SPI that it does just writing bit-banged code. These are the working methods for shifting data in and out of the flash.

pri shiftout(value, bits) | t, x

  t := ticks                                                    ' copy timing

  org
                cmp       bits, #32                     wz      ' 32 bits?
    if_e        jmp       #.bbso                                ' yes, go

                mov       x, #32                                ' no, move msb to value.[31]
                sub       x, bits
                shl       value, x

.bbso           rep       #8, bits
                 shl      value, #1                     wc      ' get next bit
                 drvc     #SF_SDO                               ' put on SDO
                 waitx    t                                     ' let SDO settle
                 drvh     #SF_SCK                               ' clock the bit
                 waitx    t
                 waitx    t
                 drvl     #SF_SCK
                 waitx    t
  end


pri shiftin(bits) : value | t

  t := ticks                                                    ' copy timing

  org
                rep       #8, bits
                 waitx    t                                     ' let SDI settle
                 testp    #SF_SDI                       wc      ' sample SDI
                 rcl      value, #1                             ' move new bit into value.[0]
                 drvh     #SF_SCK                               ' clock the next bit
                 waitx    t
                 waitx    t
                 drvl     #SF_SCK
                 waitx    t
  end

For ad hoc use like this, bit-banged SPI is sensible. For massive data moves, using the streamer (which I haven't learned yet) and smart pin SPI makes sense -- the loader does that.

What if you're brand new to PASM? The great thing about having inline PASM is that we can use it to learn and test very small bits of code. Case in point: another forum member asked me about the differences between SHL (shift left) and SAL (shift arithmetic left). To be candid, I didn't even know the latter instruction existed! That said, I made a guess that is described in the cryptic notation of the PASM spreadsheet -- and verified with some code.

pub main() | x

  setup()

  wait_for_terminal(true)

  x := $12345678
  term.fhex(shl_test(x, 4), 8)
  term.tx(13)
  term.fhex(sal_test(x, 4), 8)
  term.tx(13)
  term.tx(13)

  x := $12345679
  term.fhex(shl_test(x, 4), 8)
  term.tx(13)
  term.fhex(sal_test(x, 4), 8)
  term.tx(13)

  repeat
    waitct(0)


pub shl_test(value, bits) : result

  org
                shl       value, bits
                mov       result, value
  end


pub sal_test(value, bits) : result

  org
                sal       value, bits
                mov       result, value
  end

The results from PST:

My suspicion was confirmed: SAL shifts the value left and fills in the MSB side with the original bit 0. This makes sense as SAR shifts everything to the right and pads the MSB end with the original bit 31.

The notes in the spreadsheet can be a little cryptic. Here's the description for SHL.

SHL	Shift left.             D = [63:32] of ({D[31:0], 32'b0}       << S[4:0]). C = last bit shifted out if S[4:0] > 0, else D[31]

In English this is saying that a 64-bit value is built from original value (in the high long) and 32 zeros in the low long (0). The entire thing is shifted left and the final result is the upper 32 bits. This explains why SHL pads the LSB side with 0s.

Now for SAL.

SAL	Shift arithmetic left.  D = [63:32] of ({D[31:0], {32{D[0]}}}  << S[4:0]). C = last bit shifted out if S[4:0] > 0, else D[31]

This time, the lower long of the internal 64-bit register is made up of 32 copies of whatever was in our original bit 0. You can see the result of this behavior in the example above, and I've attached my little PASM test program if you want to experiment (and if you're not already a PASM expert, you should experiment).

JRoark · 2020-09-15 22:52

I confess I was the responsible party blowing-up Jon’s mailbox.:)

I’m trying to sort-out why Chip’s SAL is different from every other Shift Arithmetic Left instruction that I have seen. Example: on Intel x86 platforms, ASL/SAL and SHL produce the same opcode. They are simply aliases for each other. The low order bits get filled-in with zeros. But on the P2, SHL and SAL produce two different opcodes, and the SAL fills the low-order bits with ones while SHL fills with zeros.

I’m totally lost as to why this is so different. Can anyone straighten me out on this? Chip? Jon? Buehler?

JonnyMac · 2020-09-15 22:57

...SAL fills the low-order bits with ones while SHL fills with zeros.

Sorry to be persnickety, but SAL fills the low-order bits of the shifted value with what was in bit 0. Have a look at the demo results I posted above (and shared with you earlier).

You could do it like this in Spin.

pub spin_sal(value, bits) : result

  result := value

  repeat (bits & %11111)
    result <<= 1                ' shift left
    result.[0] := result[1]     ' fill with original bit 0

JonnyMac · 2020-09-15 23:02

[deleted]

JRoark · 2020-09-15 23:21

Sorry. Brain fade on my part, Jon. You are correct! That’ll teach me to multitask over dinner.

Edited to add: this is the behaviour I am used to (from Wikipedia: https://en.m.wikipedia.org/wiki/Arithmetic_shift )

“Equivalence of arithmetic and logical left shifts and multiplication:

Arithmetic left shifts are equivalent to multiplication by a (positive, integral) power of the radix (e.g., a multiplication by a power of 2 for binary numbers). Logical left shifts are also equivalent[\b], except multiplication and arithmetic shifts may trigger arithmetic overflow whereas logical shifts do not.”

Peter Jakacki · 2020-09-16 02:26

@JRoark - just be aware that the wiki article is general, not P2 specific. With the P2 we can specify WC WZ WCZ so that the carry can reflect the last bit shifted out or b31 (if shifting left but not shifting). When you have worked with logic ICs then it is very easy to visualize exactly what is happening. One way to visualize is to interact and the way I prefer to do that is with TAQOZ running on the P2 itself and a serial terminal, I can define a SHIFT word in assembly and then interact with it as this console grab shows:

TAQOZ# code SHIFT 
086F8 00E4_4008         _ret_ sal a,#8
 ---  ok         end
TAQOZ# %1011 SHIFT .BIN --- %00000000000000000000101111111111 ok
TAQOZ# %1010 SHIFT .BIN --- %00000000000000000000101000000000 ok

First line I created a new word in the dictionary called SHIFT which automatically switches to assembler (then indents) and then with one single line of code "_ret_ sal a,#8" I shift the parameter 'a' (the top of the data stack) left arithmetically by 8 bits, leaving the result on the stack, and return in the same instruction (because we can). As I type each line in code it will list the machine code since I have the listing turned on.
Now the word SHIFT is available at the TAQOZ prompt, so type in a number which goes on top of the stack (a) and implicitly call our new word SHIFT, then examine what's on top of the stack in binary format with .BIN (Print Binary).

One area we shift left is when we are bit-bashing SPI and I2C data which goes msb first, so we keep shifting one bit at a time with the result in the carry and then drive the data pin according to the state of the carry.

P.S. While there is such a thing as "shift arithmetically right" since the msb is the sign bit, it is not the same thing shifting left since the lsb is not the sign bit. Nonetheless, the P2 can do this and it is the reverse of the SAR operation and had to be called something.

Cluso99 · 2020-09-16 02:48

There is a fundamental difference with shifts and rotates because of the fact that the Propeller can shift up to 32 positions (32=0).
In most other micros you can only shift/rotate on bit per instruction so the result is straight forward.

doggiedoc · 2020-12-24 23:24

Thanks for this example. a) using inline p2asm and b) shl, shr, sal, and sar visualizing. I took it a it further to help "see" the bits.

x := $f000000f
  term.fbin(x, 64)
  term.tx(13)
  term.fbin(shl_test(x, 4), 64)
  term.tx(13)
  term.fbin(sal_test(x, 4), 64)
  term.tx(13)
  term.fbin(shr_test(x, 4), 64)
  term.tx(13)
  term.fbin(sar_test(x, 4), 64)
  term.tx(13)
  term.tx(13)

  x := $0ffffff0
  term.fbin(x, 64)
  term.tx(13)
  term.fbin(shl_test(x, 4), 64)
  term.tx(13)
  term.fbin(sal_test(x, 4), 64)
  term.tx(13)
  term.fbin(shr_test(x, 4), 64)
  term.tx(13)
  term.fbin(sar_test(x, 4), 64)
  term.tx(13)

pub shl_test(value, bits) : result
  org
                shl       value, bits
                mov       result, value
  end

pub shr_test(value, bits) : result
  org
                shr       value, bits
                mov       result, value
  end

pub sal_test(value, bits) : result
  org
                sal       value, bits
                mov       result, value
  end

pub sar_test(value, bits) : result
  org
                sar       value, bits
                mov       result, value
  end

pub setup()

  term.start(RX1, TX1, %0000, BR_TERM)                                  ' start terminal io

For this output:

11110000000000000000000000001111
00000000000000000000000011110000
00000000000000000000000011111111
00001111000000000000000000000000
11111111000000000000000000000000

00001111111111111111111111110000
11111111111111111111111100000000
11111111111111111111111100000000
00000000111111111111111111111111
00000000111111111111111111111111

JonnyMac · 2020-12-24 23:39

This is my generic instruction test method -- it lets me preset the flags before the instruction and return their states after.

pub test_inst(value, param1, param2, flagsin) : result, flagsout

  org
                testb     flagsin, #1                   wc      ' setup flags inputs
                testb     flagsin, #0                   wz
                mov       result, value

'               test instruction here

                bitc      flagsout, #1                          ' return updated flags
                bitz      flagsout, #0
  end

Using Inline PASM2 to Learn PASM2

Comments