Fastest way in PASM.... ?challenge?
This started as a quick thing, but became an afternoon obsession. And I'm hoping smarter people can come up with creative solutions.
I want to create a function that "divides" a number by shifting. But when it does, I want it to round the LSB up or down as needed. In normal math: 5 / 2 = 2.5. In integer math, that results in 2, but I would want it to round up to 3.
I want the function be to divide any positive integer by one or more shifts (divide by any power of 2).
Can anyone come up with anything faster than div3?
They were tested like this:
I want to create a function that "divides" a number by shifting. But when it does, I want it to round the LSB up or down as needed. In normal math: 5 / 2 = 2.5. In integer math, that results in 2, but I would want it to round up to 3.
I want the function be to divide any positive integer by one or more shifts (divide by any power of 2).
PUB div1 (val, sh) | v 'executes in 3056 cycles RETURN ((v := (val >> (sh - 1))) & 1) + (v >> 1) PUB div2 (val, sh) 'executes in 2864 cycles RETURN (val >> sh) + ((val >> (sh - 1)) & 1) PUB div3 (val, sh) 'executes in 2704 cycles RETURN ((val & (|< (sh - 1))) + val) >> shI have them in methods to speed up testing, though in the final product I will likely pull them out to speed them up further.
Can anyone come up with anything faster than div3?
They were tested like this:
start := cnt div3(248, 4) time := cnt - start - 368

Comments
I put question marks around the "challenge" because now I realize, it may not be a challenge for some :-P . It's no fun if the first reply is the best method.
Any faster ways (seems unlikely)?
TJZ sh, #:skip ' skip if 0 shifts SUB sh, #1 ' shift one less bit than needed SHR val, sh ' "divide" SHR val, #1 WC ' put "half" bit in c ADDX val, #0 ' if "half" bit is set, round up :skipSame as Dave's solution, but it is slower....TJZ sh, #:skip ' skip if 0 shifts MOV p1, #1 ' decode bit SHL p1, sh ' decode to sh SHR p1, #1 ' decode to sh - 1 ADD val, p1 ' add that to value SHR val, sh ' "divide" :skipJonathan
For SPIN, the way I'm using it, there shouldn't ever be a situation where 0 shifts would occur. Just assume one shift or more.
'Didn't time it; Dave's might be faster.
-Phil
entry ror value, shift ' half bit -> bit 31 shl value, shift wc ' bit 31 -> carry shr value, shift ' cleanup complete addx value, #0 ' apply roundingLooks like yours is faster by two hub windows (I only checked div3(248, 4)).shr value,shift wc addx value,#0So here's my question: given this observation, are there any cases where the present first-to-shift semantics are better than last-to-shift? (Disclaimer: I realize that with a one-clock barrel shifter, "first" and "last" are nonsense terms, but it's still useful to think of the carry as bit32 or bit-1 of an extended, 33-bit destination register.) I'm just trying to fish for why Chip designed these instructions the way he did.
-Phil
Your guess is as good as mine but it would seem rather odd if this was a deliberate decision.
Then again, why would it work with normal ALU based instructions? Different paths? Or not enough space?
In any event, if any of this is true, it would seem that last-to-shift would have been just as easy to implement as first-to-shift.
-Phil
DAT {N+0} test $, ina wc ' SDeR {N+1} if_c jmp #label ' IdSDeR {N+2} nop ' | IdSDeR ' | | {N+2}label nop ' | IdSDeR ' | | ' | +-- here we need to know whether to fetch from PC+1 or src ' +----- flags are known@Bobb: Sorry for getting slightly OT.shl value, #1 shr value, shift add value, #1 shr value, #1JonathanAnd this natively supports 0 shifts. So we can drop the TJZ (for the shift count check) and that is faster and smaller.
@kuroneko
No, that was a fairly interesting discussion. I personally thought about Phil's idea about why must the carry be filled by bit 0 or 31, why can't it be the last bit shifted.