Please help with 64 bit math (pasm) again?
Hi guys, I've been using my 64 bit math object for a while and it's working just fine. But now I needed a new function (2^n - 64 bit average) and decided to clean it up a bit. Well, I've finished on both tasks, but haven't tested either. The thing is, I'm looking at the code and I think it could be improved still. First, the entry (once pointers and function are loaded in spin) hub use could be optimized (??I think??). I'm just not seeing it right now. I don't think the write - back can be improved even though it has equivalent of 6 no-ops. But those aren't big improvements really....
The big change in semantics is, instead of passing 2 pointers to 64 bit numbers... I'm now passing a pointer to a buffer, still 64 bit numbers (read-only in pasm) and the size of the buffer. I have another couple functions to implement at some time (calculate checksum on byte buffer) that will use the same entry method (pointer to buffer and size)
What I am really trying to wrap my head around is doing the power of 2 divide the RIGHT way.. using the flags.. What I have so far:
Seems a bit hefty for pasm?
The FULL code for easy preview
Thanks as always guys!
The big change in semantics is, instead of passing 2 pointers to 64 bit numbers... I'm now passing a pointer to a buffer, still 64 bit numbers (read-only in pasm) and the size of the buffer. I have another couple functions to implement at some time (calculate checksum on byte buffer) that will use the same entry method (pointer to buffer and size)
DAT '' mailboxes
results long 0 [2]
function long 0 ' local copy of function
a1ptr long 0 ' local copy of argument1 pointer
a2ptr long 0 ' local copy of argument2 pointer
PUB Average(SamplePtr, Number)
{{
//////////////////////////////////////////////////////////////////////////////////
// SamplePtr is pointer first sample of buffer //
// Number is power of 2 number of samples to average (2^n) = samples //
// result of addition is stored in arg1prt longs //
//////////////////////////////////////////////////////////////////////////////////
}}
'' arg1_l1 - address of 2 long object local result buffer, pre-loaded with the address of sample buffer
'' arg1_l2 - could carry ?? on call
'' arg2_11 - Sample buffer size expressed as 2^N
'' ?result? is arg2_l2, write back address of results in ?pasm?
results := SamplePtr
DoMath(@results, @Number, "v")
return @results
PUB Divide(arg1ptr, arg2ptr) '' Perform 64 bit divide
{{
//////////////////////////////////////////////////////////////////////////////////
// arg1ptr is pointer to least significant long of dividend //
// arg2ptr is pointer to divisor //
// result of division is stored in arg1ptr longs //
// remainder is stored in arg2ptr long //
//////////////////////////////////////////////////////////////////////////////////
}} DoMath(arg1ptr, arg2ptr, "d")
PRI DoMath(arg1ptr, arg2ptr, func) '' do the math operation
a1ptr := arg1ptr ' move pointers to mailboxes
a2ptr := arg2ptr ' move pointers to mailboxes
function := func ' move pointers to mailboxes
repeat while function ' wait for asm cog to finish
What I am really trying to wrap my head around is doing the power of 2 divide the RIGHT way.. using the flags.. What I have so far:
ave ' calculate number of samples
mov t3, arg2_l1 ' copy arg2 long 1 into average counts_power of two
mov t1, #1 '
shl t1, t3 ' shift t6 by count, 2^N
sub t1, #1 ' sub 1 from total, first value preloaded
' now deal with addresses
mov t4, arg1_l1 ' copy address of buffer to var
mov t5, t4 ' and save a copy in work_add
' load the first sample
rdlong arg1_l1, t5 ' get lsl of argument 1
add t5, #4 ' add long offset to work pointer
rdlong arg1_l2, t5 ' get msl of argument 1
add t5, #4 ' add long offset to work pointer
ave_add ' add remaining samples
call #arg2_fill ' put next sample into arg2 longs
call #addi ' 64 bit add,
sub t1, #1 ' sub 1 from count
tjnz t1, #ave_add ' until we have added all samples
'' power of 2 divide
'prepare t6
mov t6, #0 ' move 0 into t6
mov t1, t3 ' move power of two into t1
pmask shl t6, #1 ' shift t6 left by 1
and t6, #1 ' and 1 to t6
djnz t1, #pmask ' loop
' do division on arg1 longs, shift right by N
shr arg1_l1, t3 ' truncate lsb
mov t2, arg1_l2 ' copy msl
and t2, t6 ' t6 off overflow
mov t1, #32 ' prepare inverse of power of 2
sub t1, t3 ' sub power of 2 from 32
shl t2, t1 ' shift overflow left by inverse power of 2
and arg1_l1, t2 ' and overflow from long 2 to long 1
shr arg1_l2, t3 ' truncate lsb
ave_ret ret
Seems a bit hefty for pasm?
The FULL code for easy preview
{{
//////////////////////////////////////////////////////////////////////////////////
// 64 bit signed addition and subtraction methods //
// Author : Joe Heinz //
// Copyright : Joe Heinz - 2013 //
// Grateful acknowledgement to kuroneko (parallax forum) //
//////////////////////////////////////////////////////////////////////////////////
}}
DAT '' mailboxes
results long 0 [2]
function long 0 ' local copy of function
a1ptr long 0 ' local copy of argument1 pointer
a2ptr long 0 ' local copy of argument2 pointer
cog byte 0 ' cog PASM object is running in
PUB NULL ' not a top level object
PUB Start'(controlPin) '' Start math object in new cog
{{
//////////////////////////////////////////////////////////////////////////////////
// controlPin is pin used to indicate there is a math operation to do //
// returns cog ASM engine was run in or false if no cog available //
//////////////////////////////////////////////////////////////////////////////////
}}
cog := cognew(@entry, @function) +1 ' start asm cog
return cog ' return cog
PUB Stop '' Stop Math Cog
if cog
cogstop(cog~~ - 1)
PUB Average(SamplePtr, Number)
{{
//////////////////////////////////////////////////////////////////////////////////
// SamplePtr is pointer first sample of buffer //
// Number is power of 2 number of samples to average (2^n) = samples //
// result of addition is stored in arg1prt longs //
//////////////////////////////////////////////////////////////////////////////////
}}
'' arg1_l1 - address of 2 long object local result buffer, pre-loaded with the address of sample buffer
'' arg1_l2 - could carry ?? on call
'' arg2_11 - Sample buffer size expressed as 2^N
'' ?result? is arg2_l2, write back address of results in ?pasm?
results := SamplePtr
DoMath(@results, @Number, "v")
return @results
PUB Divide(arg1ptr, arg2ptr) '' Perform 64 bit divide
{{
//////////////////////////////////////////////////////////////////////////////////
// arg1ptr is pointer to least significant long of dividend //
// arg2ptr is pointer to divisor //
// result of division is stored in arg1ptr longs //
// remainder is stored in arg2ptr long //
//////////////////////////////////////////////////////////////////////////////////
}} DoMath(arg1ptr, arg2ptr, "d")
PUB Multiply(arg1ptr, arg2ptr) '' Perform 64 bit multiply
{{
//////////////////////////////////////////////////////////////////////////////////
// arg1ptr is pointer to least significant long of multiplicand //
// arg2ptr is pointer to multiplier //
// result of multiplication is stored in arg1ptr longs //
//////////////////////////////////////////////////////////////////////////////////
}} DoMath(arg1ptr, arg2ptr, "m")
PUB Addition(arg1ptr, arg2ptr) '' Perform 64 bit add
{{
//////////////////////////////////////////////////////////////////////////////////
// arg1ptr is pointer to least significant long of argument1 //
// arg2ptr is pointer to least significant long of argument2 //
// result of addition is stored in arg1ptr longs //
//////////////////////////////////////////////////////////////////////////////////
}} DoMath(arg1ptr, arg2ptr, "a")
PUB Subtract(arg1ptr, arg2ptr) '' Perform 64 bit subtract
{{
//////////////////////////////////////////////////////////////////////////////////
// arg1ptr is pointer to least significant long of host time, minuend //
// arg2ptr is pointer to least significant long of argument2, subtractend //
// result of subtraction is stored in arg1ptr longs //
//////////////////////////////////////////////////////////////////////////////////
}} DoMath(arg1ptr, arg2ptr, "s")
PRI DoMath(arg1ptr, arg2ptr, func) '' do the math operation
a1ptr := arg1ptr ' move pointers to mailboxes
a2ptr := arg2ptr ' move pointers to mailboxes
function := func ' move pointers to mailboxes
repeat while function ' wait for asm cog to finish
DAT org
entry mov fn_ptr, par ' get first fn_ptr of passing method,
loop rdlong funct, fn_ptr ' read the function type *add or subtract*
cmp funct, #0 wz ' if zero
if_z jmp #loop ' try again
' now get pointers to arg1 and arg2
add fn_ptr, #4 ' add long offset to hub pointer
rdlong arg1_ptr, fn_ptr ' get pointer to argument 1
add fn_ptr, #4 ' add long offset to hub pointer
rdlong arg2_ptr, fn_ptr ' get pointer to argument 2
sub fn_ptr, #8 ' subtract 2 long offsets, point back at function
' get 2 longs from arg1 pointer
rdlong arg1_l1, arg1_ptr ' get lsl of argument 1
add arg1_ptr, #4 ' add long offset to arg1 pointer
rdlong arg1_l2, arg1_ptr ' get msl of argument 1
' get 2 longs from arg2 pointer
mov t5, arg2_ptr
call #arg2_fill
{
rdlong arg2_l1, arg2_ptr ' get lsl of argument 2
add arg2_ptr, #4 ' add long offset to arg1 pointer
rdlong arg2_l2, arg2_ptr ' get msl of argument 2
}
' do function
cmp funct, #"a" wz
if_z call #addi
cmp funct, #"s" wz
if_z call #subt
cmp funct, #"m" wz
if_z call #mult
cmp funct, #"d" wz
if_z call #div
cmp funct, #"v" wz
if_z call #ave
' write back 2 longs from arg1 pointer
write wrlong arg1_l2, arg1_ptr ' write most significant long of result
sub arg1_ptr, #4 ' subtract long offset from argument1 pointer
wrlong arg1_l1, arg1_ptr ' write least significant long of result
' write back 2 longs from arg2 pointer
wrlong arg2_l1, arg2_ptr ' write most significant long of result
add arg2_ptr, #4 ' add long offset from argument2 pointer ' this differs since we read in sub
wrlong arg2_l2, arg2_ptr ' write least significant long of result
' zero out function
wrlong zero, fn_ptr ' clear function to return to caller
jmp #loop ' back to loop start
arg2_fill rdlong arg2_l1, t5 ' get lsl of argument 2
add t5, #4 ' add long offset to work pointer
rdlong arg2_l2, t5 ' get msl of argument 2
add t5, #4 ' add long offset to work pointer
arg2_fill_ret ret
ave ' calculate number of samples
mov t3, arg2_l1 ' copy arg2 long 1 into average counts_power of two
mov t1, #1 '
shl t1, t3 ' shift t6 by count, 2^N
sub t1, #1 ' sub 1 from total, first value preloaded
' now deal with addresses
mov t4, arg1_l1 ' copy address of buffer to var
mov t5, t4 ' and save a copy in work_add
' load the first sample
rdlong arg1_l1, t5 ' get lsl of argument 1
add t5, #4 ' add long offset to work pointer
rdlong arg1_l2, t5 ' get msl of argument 1
add t5, #4 ' add long offset to work pointer
ave_add ' add remaining samples
call #arg2_fill ' put next sample into arg2 longs
call #addi ' 64 bit add,
sub t1, #1 ' sub 1 from count
tjnz t1, #ave_add ' until we have added all samples
'' power of 2 divide
'prepare mask
mov t6, #0 ' move 0 into t6
mov t1, t3 ' move power of two into t1
pmask shl t6, #1 ' shift t6 left by 1
and t6, #1 ' and 1 to t6
djnz t1, #pmask ' loop
' do division on arg1 longs, shift right by N
shr arg1_l1, t3 ' truncate lsb
mov t2, arg1_l2 ' copy msl
and t2, t6 ' mask off overflow
mov t1, #32 ' prepare inverse of power of 2
sub t1, t3 ' sub power of 2 from 32
shl t2, t1 ' shift overflow left by inverse power of 2
and arg1_l1, t2 ' and overflow from long 2 to long 1
shr arg1_l2, t3 ' truncate lsb
ave_ret ret
div mov t5, #0 ' prepare MSB place
findm rol arg2_l1, #1 wc ' rotate divisor left 1 place and check for overflow
if_nc add t5, #1 ' if no overflow, add 1 to MSB place
if_nc jmp #findm ' repeat loop until overflow
ror arg2_l1, #1 ' rotate back one
mov t6, #1 ' prepare t6
mov t1, t5 ' move msb shift factor to t1 for itteration count
shl t6, t5 ' move t6 by msb shift factor
div2 cmpsub arg1_l2, arg2_l1 wc ' subtract divsor from dividend and watch for subtraction
muxc t4, t6 ' mux c flag to t4 at t6 place
cmpsub t1, #1 wz ' subtract 1 from shift factor watching for last division
if_nz shr t6, #1 ' shift t6 right by 1
if_nz shr arg2_l1, #1 ' shift divisor right by 1
if_nz jmp #div2 ' and do loop again
shr t6, #1 ' shift t6 right by 1
shr arg2_l1, #1 ' shift divisor right by 1
cmpsub arg1_l2, arg2_l1 wc ' subtract divsor from dividend and watch for subtraction
muxc t4, t6 ' mux c flag to t4 at t6 place
sub t1, #1 wz ' subtract 1 from shift factor watching for last division
mov t1, #1 ' move 1 to t1 for muxc
shl t6, #31 ' move t6 to MSB
mov t2, #32 ' move 32 to t2 for itteration count
div1 shl arg1_l2, #1 ' make room for next bit from LSL
shl arg1_l1, #1 wc ' shift next bit from LSL watching for carry
muxc arg1_l2, t1 ' mux carry into MSL
cmpsub arg1_l2, arg2_l1 wc ' subtract divisor from MSL, watch for subtraction
muxc t3, t6 ' mux c flag to t3
sub t2, #1 wz ' subtract 1 from itteration count
if_nz shr t6, #1 ' move t6 to next place
if_nz jmp #div1 ' do loop again
mov arg2_l1, arg1_l2 ' move remainder in arg2_l1
mov arg2_l2, #0 ' move zero to arg2_l2
mov arg1_l2, t4 ' move t4 to arg1_l2
mov arg1_l1, t3 ' move t3 to arg1_l1
div_ret ret
mult mov t1, arg2_l1 ' get multiplier from arg2_l1 and place in multiplier
mov arg2_l1, arg1_l1 ' else copy longs for repeated addidion
mov arg2_l2, arg1_l2 ' copy the next long
sub t1, #1 ' subtract one from multiplier
:loop call #addi ' jmp to add
djnz t1, #:loop ' and loop
mult_ret ret
addi add arg1_l1, arg2_l1 wc ' do first addition and write c flag
addsx arg1_l2, arg2_l2 ' do second addition
addi_ret ret
subt sub arg1_l1, arg2_l1 wc ' do first subtraction and write c flag
subsx arg1_l2, arg2_l2 ' do second subtraction
subt_ret ret
zero long 0
'3 longs, pointers to hub memory
fn_ptr res
arg1_ptr res
arg2_ptr res
' 5 longs, function and 4 long opperands
funct res
arg1_l1 res
arg1_l2 res
arg2_l1 res
arg2_l2 res
' temp variables
t1 res
t2 res
t3 res
t4 res
t5 res
t6 res
{{
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ TERMS OF USE: MIT License │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation │
│files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, │
│modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software│
│is furnished to do so, subject to the following conditions: │
│ │
│The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.│
│ │
│THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE │
│WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR │
│COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, │
│ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
}}
Thanks as always guys!

Comments
PRI BuildChecksum(ptr, size) | csum, p csum := 0 ' clear checksum variable repeat p from 0 to size ' repeat for entire packet from header csum += byte[ptr] ' add byte to checksum csum &= $FF ' keep low 8 bits csum := $FF - csum ' and subtract from $FF PRI TestChecksum(ptr, size) : csum | p csum := 0 ' clear checksum - 0 repeat p from 0 to size ' loop packet size csum += byte[ptr] ' add byte to checksum csum &= $FF ' keep lowest 8 bits' BUILD CHECKSUM bcheck call #calcheck sub t4, t3 ' subtract from $FF mov arg1_l1, t4 ' move to arg1 long 1 bcheck_ret ret ' TEST CHECKSUM tcheck call #calcheck mov arg1_l1, t3 ' move to arg1 long 1 tcheck_ret ret ' CALCULATE A CHECKSUM calcheck ' calculate checksum mov t1, arg2_l1 ' move count to t1 mov t2, arg1_l1 ' move byte pointer to t2 mov t3, #0 ' zero out t3 mov t4, #$FF ' move $FF to t4 ccloop rdbyte t2, t5 ' get byte from pointer to t4 add t3, t5 ' add byte to t3 add t2, #1 ' add 1 to pointer, next byte djnz t1, #ccloop ' for count and t3, t4 ' mask off low 8 bits calcheck_ret retave ' calculate number of samples mov t3, arg2_l1 ' copy arg2 long 1 into average counts_power of two mov t1, #1 ' shl t1, t3 ' shift t6 by count, 2^N sub t1, #1 ' sub 1 from total, first value preloaded ' now deal with addresses mov t4, arg1_l1 ' copy address of buffer to var mov t5, t4 ' and save a copy in work_add ' load the first sample rdlong arg1_l1, t5 ' get lsl of argument 1 add t5, #4 ' add long offset to work pointer rdlong arg1_l2, t5 ' get msl of argument 1 add t5, #4 ' add long offset to work pointer ave_add ' add remaining samples call #arg2_fill ' put next sample into arg2 longs call #addi ' 64 bit add, sub t1, #1 ' sub 1 from count tjnz t1, #ave_add ' until we have added all samples :loop shr arg1_l2, #1 wc ' shift right l2 rcr arg1_l1, #1 ' carry right l1 djnz t3, #:loop ave_ret retBut the thing I'm wondering is... If I do something like this:
PUB Average(SamplePtr, Number) {{ ////////////////////////////////////////////////////////////////////////////////// // SamplePtr is pointer first sample of buffer // // Number is power of 2 number of samples to average (2^n) = samples // // result of addition is stored in arg1prt longs // ////////////////////////////////////////////////////////////////////////////////// }} '' arg1_l1 - address of 2 long object local result buffer, pre-loaded with the address of sample buffer '' arg1_l2 - could carry ?? on call '' arg2_11 - Sample buffer size expressed as 2^N '' ?result? is arg2_l2, write back address of results in ?pasm? results := SamplePtr DoMath(@results, @Number, "v") return @resultsSince I'm passing the address to results, and the address to Number... Couldn't I do the result assignment in PASM? Just write to @Number + 4. That's assuming the SPIN result is is the next variable, can someone confirm? Now if a method has locally declared longs, that would change right? I guess I could start probing at addresses in viewport but I'm sure someone knows this off the top of their head?My mistake, result is first, i.e. result followed by method parameters followed by local variables.
I'm pretty sure it's result, then parameters, then locals.
example function: BST compiler listing: