Shop OBEX P1 Docs P2 Docs Learn Events
Wanted: Fast Multybyte (32 bit) decrement, No FSR — Parallax Forums

Wanted: Fast Multybyte (32 bit) decrement, No FSR

dkemppaidkemppai Posts: 315
edited 2006-01-14 17:02 in General Discussion
OK, I seem to be havning trouble finding the fastest 32 bit or 24 bit multibyte decrement...
...without using the FSR...· ...Anyone have any good examples?

Thanks,
Dan

Comments

  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-01-13 14:52
    Ive looked into this and the fastest Ive figured out is dec;snz combos, its really fast for 16 bit, but requires some extra cycles for > 16bit. A 32 bit is:

    dec cnt0
    jnz :enddec
    dec cnt1
    jnz :enddec
    dec cnt2
    snz
    dec cnt3
    :enddec

    If you need constant time execution, youll have to have a nop chain leadout.

    Someone let me know if theyve found something faster.

    <edit> wait the code example I gave is actually for incremeting, let me think about the appropriate code for decrementing while Im in transit to work </edit>



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ·1+1=10

    Post Edited (Paul Baker) : 1/13/2006 2:56:44 PM GMT
  • dkemppaidkemppai Posts: 315
    edited 2006-01-13 15:43
    , let me think about the appropriate code for decrementing while Im in transit to work </edit>

    Yeah, your code won't work.

    Decrement is a little more complicated. You can't use the same scheme as an increment...
    ...you have to look for a roll from zero to 255 to decrement the next higher byte. In this
    case zero is cleared...····

    I think I have one that works...

    Check out this code snippit...·· ...It seems to work, although it's not fully tested. It also
    may not be the fastest decrement code possible...


    
    
    DecCounter      test     Cnt0
                    jnz      :DecCnt0
                    test     Cnt1
                    jnz      :DecCnt1
                    test     Cnt2
                    jnz      :DecCnt2
                    test     Cnt3
                    jnz      :DecCnt3       
         
                    ;PlaceCode to run Here on Zero!
    
     
     
    :DecCnt3        dec     cnt3
    :DecCnt2        dec     cnt2
    :DecCnt1        dec     Cnt1
    :DecCnt0        dec     Cnt0
    

    
    

    Post Edited (dkemppai) : 1/13/2006 3:46:24 PM GMT
  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-01-13 15:45
    Ok, for decrementing dec doesn't work because we need to detect the overflow condition, which isnt affected by dec. So I think this is best sequence:

    mov w, #1
    sub cnt0, w
    jnc :enddec
    sub cnt1, w
    jnc :enddec
    sub cnt2, w
    snc
    sub cnt3, w
    :enddec

    In otherwords, the straightforward answer. Also the increment code given above can be replaced with incsz so save a few extra cycles, or
    incjz cnt0, :enddec
    incjz cnt1, :enddec
    incsz cnt2
    incsz cnt3

    I cant remember if incjz is defined, if not its the standard: incsz fr; jmp target.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ·1+1=10
  • BeanBean Posts: 8,129
    edited 2006-01-13 15:48
    Here is what I got for fastest 3 byte decrement (6 cycles)
      MOV W,#1       ; 1 cycle
      SUB counter1,W ; 1 cycle
      SC             ; 1/2 cycles
      SUB counter2,W ; 1 cycle
      SC             ; 1/2 cycles
      DEC counter3   ; 1 cycle
    
    

    A quick check seems to work fine.
    It also takes the same number cycles for each time through.
    Bean.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "SX-Video·Module" Now available from Parallax for only $28.95

    http://www.parallax.com/detail.asp?product_id=30012

    "SX-Video OSD module" Now available from Parallax for only·$49.95
    http://www.parallax.com/detail.asp?product_id=30015

    Product web site: www.sxvm.com

    "Ability may get you to the top, but it takes character to keep you there."


    Post Edited (Bean (Hitt Consulting)) : 1/13/2006 3:52:23 PM GMT
  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-01-13 15:50
    Sorry, I extremely tired right now, I didnt get nearly enough sleep last night, Ill pick this back up doing a head to head comparision and analysis when Im more alert (Im typically more alert in the afternoon).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ·1+1=10
  • Guenther DaubachGuenther Daubach Posts: 1,321
    edited 2006-01-13 16:16
    Arghhh,

    I just copied "my" version of 32 bit decrement from the IDE to paste it into a reply post when I saw that Bean has exactly posted "my" code smile.gif with the difference that Bean decrements 24 bits only.
    I have tested the code with four variables in SXSim, and I can confirm that it works as expected - it takes 11 cycles though.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Greetings from Germany,

    G
  • BeanBean Posts: 8,129
    edited 2006-01-13 16:26
    G
  • Guenther DaubachGuenther Daubach Posts: 1,321
    edited 2006-01-13 16:31
    Double Arghhhh...

    it _TAKES_ 8 cycles for 32 bits - guess what - I found another bug in SXSim, and fixed it already (bad clock count for Skip Bit instructions). So it was really worth while testing this code with SXSim smile.gif .

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Greetings from Germany,

    G
  • pjvpjv Posts: 1,903
    edited 2006-01-13 17:27
    Hi All;

    Another view into this.........isn't this what carryx is all about?

    Assuming carryx mode is enabled, then the following works properly

    mov···· ·w,#1···· ·;load the decrementing value
    stc·················· ·;always set carry prior to the subtract, or clear carry prior to an add
    sub····· var1,w···· ;subtract to decrement
    clr w················ ·;prep to subtract any remaining carry only
    sub····· var2,w···· ;subtract carry if any
    sub····· var3,w··· ·;subtract carry if any
    sub····· var4,w··· ·;subtract carry if any

    Uses 7 cycles, so barely faster than other suggestions.

    Cheers,

    Peter (pjv)

    Post Edited (pjv) : 1/13/2006 5:30:20 PM GMT
  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-01-13 17:28
    aha very clever use of the skip if carry instruction, I concede that that has to be the quickest method, after all no jumps are taken (assuming CARRYX isnt enabled). I think Ill refrain from answering brain-taxing questions for the day, I clearly dont have the mental accuity to do it.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ·1+1=10

    Post Edited (Paul Baker) : 1/13/2006 5:32:16 PM GMT
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2006-01-14 16:26
    7 cycles is 12.5% faster -- that is not 'barely'. Kudos to all of you for teaching me something.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "When all think alike, no one is thinking very much.' - Walter Lippmann (1889-1974)

    ······································································ Warm regards,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan
  • Guenther DaubachGuenther Daubach Posts: 1,321
    edited 2006-01-14 17:02
    On the other hand, you may loose this 12.5% speed advantage in other parts of the code due to the fact that CARRYX is enabled, e.g. by the need for explicitely setting and clearing the C flag before each arithmetic operation. Therefore, I don't like CARRYX that much. Keep in mind - this is a fuse flag, i.e. it remains valid for the whole code and it can't be set or reset by a program instruction (which would be cool indeed).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Greetings from Germany,

    G
Sign In or Register to comment.