signed saturation

ManAtWork · 2020-05-06 12:34

What is the best method to limit the range of a 32 bit signed value to 16 bit signed? I'm looking for the reverse of the SIGNX instruction. Of course, I could do

  FLES x,##$00007FFF
  FGES x,##$FFFF8000

but that would require 4 longwords and 8 clocks. No big deal if only needed once. But I'll probably need this a lot (digital filtering).

  TESTB x,#31 wc
  BITC x,#16<<5+15

Is shorter and faster. ...EDIT: and doesn't work, unfortunatelly.

Is there a more elegant method?

evanh · 2020-05-06 13:10

The true reverse of SIGNX is just basic truncation. You're wanting clipping or scaling. A faster clipping can be got by prefilled register direct addressing. eg:

  ...
  FLES x,limit16sp
  FGES x,limit16sn
  ...

limit16sp  long $00007FFF
limit16sn  long $FFFF8000

Doing scaling instead would be a little more guess work on suitable scale factors but does improve things. Only do the final clipping at output stage so then it can't wrap around due to forced truncation. I'd be inclined to try using the cordic and stick with 32-bit working data as much as possible. But for the fastest option maybe SCAS might be what you want. Although, that does require starting from 16-bit source data in the first place.

evanh · 2020-05-06 13:23

Oh! A really basic scaling is a crafted low-bit-truncation (instead of high-bit-truncation which causes the dreaded wraparound) - for unsigned integers at least. Basically just a shift-right of a predetermined amount that will always bring the most significant "1" bit within the desired 16 bits. Obviously only provides powers of two scaling though.

ersmith · 2020-05-06 13:40

I think you could test the high bit and then mux that into the upper 17 bits, but that's going to take at least 2 instructions as well. FLES/FGES with predefined register limits is probably clearer.

AJL · 2020-05-07 00:09

ersmith wrote: »

I think you could test the high bit and then mux that into the upper 17 bits, but that's going to take at least 2 instructions as well. FLES/FGES with predefined register limits is probably clearer.

That's what the TESTB, BITC sequence does, but it doesn't deliver the desired result:
If the 32 bit result is 32780 or $0000800C then the output is 12 or $0000000C rather than the desired 32767 or $00007FFF.

My follow-up questions are: What is the maximum predicted excursion from 16 bit range? What can be done earlier to prevent excursions, and are the consequences worth the trade-off?

SaucySoliton · 2020-05-07 01:08

Here is a trick for signed to unsigned scaled conversion. May not be useful if you want signed output. If the subtraction result is less than 0, c is set. The scaling will be skipped and the output will be specified by a literal or other register.

		fles    a3,posclamp             '5 prevent overflow of pixel
		subs    a3,blacklevel  wc       '6 correct black level offset 
	if_nc	scas    a3,az                   '7 if non-negative, scale pixel  
		setbyte dy,#0,#3                '8 store scas result or 0

ManAtWork · 2020-05-07 07:40

Ok thanks. So the

  FLES x,limit16sp
  FGES x,limit16sn

seems to be the best solution to my original question. But it's always good to see different solutions to different problems. As the documentation of the P2 instructions is still very sparse I don't understand all of them and I 'm always unsure if I missed something. So any hints are welcome to learn from.

I'm still not sure what approach I use for the filtering. MUL/ADD is fastest but I don't know if 16 bits resolution is sufficient. 32 bits resolution and using the cordic might have some advantages but the CORDIC unit doesn't (directly) support signed multiplication. If one factor is constant I could use the trick of doing a rotation with arcsin(factor) instead of a multiplication which supports negative numbers.

The P2 might even have enough power to use (software) floating point math. In this case I don't have to worry about scaling, overflows and clipping at all and only have to limit the final output.

BTW, a nice trick is that FLE also works on floating point numbers but only if they are positive.

signed saturation

Comments