Shop OBEX P1 Docs P2 Docs Learn Events
PASM Cycle Timing — Parallax Forums

PASM Cycle Timing

hippyhippy Posts: 1,981
edited 2008-08-22 19:11 in Propeller 1
Do these have the same cycle timing between Start and IsZero/NotZero or does the IF_Z make Start to NotZero fewer cycles in the second example ?

:Start    mov     acc,#1
          tjz     acc,#:IsZero
:NotZero




:Start    mov     acc,#1 WZ
IF_Z      jmp     #:IsZero

:NotZero


Comments

  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-21 17:04
    According to the published instruction timings, they should be the same, i.e. 8 clocks total if the jump is taken. Do you have reason to believe that they are different?

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • lonesocklonesock Posts: 917
    edited 2008-08-21 18:09
    The TJZ version takes 4 clocks more than the IF_Z version. The test-&-jump commands take 8 clocks if the jump is missed, 4 if taken. Here was my test code:

    ' debug
            mov t1, par
            add t1, #8
            mov t_in, cnt
            mov cr_temp,#1 wz
            'tjz cr_temp,#:IsZero
    IF_Z    jmp #:IsZero
    :NotZero
            mov t_last, cnt
    :IsZero
            subs t_last, t_in
            abs t_last, t_last
            wrlong t_last, t1
            add t1, #4
    :hard_stop
            jmp #:hard_stop
    
    



    My reported values were 16 and 12 for the TJZ and IF_Z, respectively. (note: I would comment out one line for each version...this code is for the IF_Z version)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • hippyhippy Posts: 1,981
    edited 2008-08-21 18:14
    It depends on how / when the IF_Z and other conditionals apply; do they turn a JMP into a NOP thus taking 4 cycles regardless like any other NOP or does it take 4/8 cycles with the jump to destination disabled ? Another, more extreme case is "IF_Z RDLONG".

    I don't have hardware setup to test it and the manual seemed unclear, however I have now found this ... "When an instruction’s condition evaluates to FALSE , the instruction dynamically becomes a NOP , elapsing 4 clock cycles but affecting no flags or registers" ( page 368 ).

    It seems then that "IF_Z JMP" is always faster than "TJZ", only ever taking 4 cycles.

    @ lonesock : Thanks for taking the time to test it.
  • lonesocklonesock Posts: 917
    edited 2008-08-21 18:26
    hippy said...
    @ lonesock : Thanks for taking the time to test it.
    You're welcome! [noparse][[/noparse]8^)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-21 20:36
    Hippy,

    Sorry, I misread the intent of your question. But lonesock's answer prodded me to take another look at the pipeline. Paul Baker explained it best:
    Paul Baker said...
    The pipeline is SDIR, Source-Destination-Instruction-Result. The Instruction stage is the fetch for the next instruction (therefore this instruction was fetched 2 clock cycles before the Source fetch). The Full pipeline looks like this:

    IdSDER               'first instruction
        IdSDER           'second instruction
            IdSDER       'third instruction
    
    
    


    These stages are I= fetch instruction, d=decode instruction, S=fetch source operand, D=fetch destination operand, E=execute instruction, R=write result. Stage d is an internal operation, ie no access to memory is performed. I, S and D are all read operations, so that only leaves 1 stage to make an "effect". So all alterations of a cog's state is performed in the Result stage, whether it's writing to outa or whatever.

    Here are some observations relative to your question:

    1. Since the status bits get written during the R stage, they are not yet ready before the next instruction is decoded. So the decision whether to execute the next instruction has to be deferred at least to the S stage, where the source (i.e. the destination of a jump) is fetched.

    2. For JMP-type instructions, the destination operand has to be loaded into the PC before the I stage. So the decision whether to execute has to occur before that. I guess this could occur at either the S or D stage. Maybe S isn't even loaded if a jump isn't taken, but I suspect that it is and that the PC is loaded during D.

    3. Once D passes, and the PC is loaded for the next instruction, the die is cast. If the E stage determines that the jump shouldn't be taken after all (e.g. TJZ, DJNZ), the pipeline has to be flushed and PC reloaded with the address of the next instruction, which entails an extra four clocks.

    It would appear that the earlier decision implied by the IF_NZ is what makes the shorter execution time possible in a no-jump situation.

    At the risk of changing the subject (in order not to copy Paul's quote to yet another thread), how many times have we done something like this:

            movs    nxt,reg
            [b]nop[/b]
    nxt     oper    dest,#0-0
    
    
    


    when we could've done this:

            mov     src,reg
    nxt     oper    dest,src
            ...
    src     long    0-0
    
    
    


    According to the pipeline, the NOP is unnecessary in the second example. Of course, similar savings are not possible if it wasn't an immediate operand to begin with.

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • AleAle Posts: 2,363
    edited 2008-08-22 05:09
    hippy: The tjz and tjnz where optimized for speed so it takes 4 cycles for a jump taken and 8 where no jump is taken (so fetch of the next instruction must occur for the jump destination always so when jump is not taken it has to fetch again).
  • hippyhippy Posts: 1,981
    edited 2008-08-22 13:46
    @ Ale : Yes, I understand that, but it was what happens when conditional and the condition isn't met. In that case it appears it's no longer 8 cycles when jump is not taken but 4; a condition not met stops it being executed as a jump instruction.

    This arose while writing yet another Virtual Machine which handles indirection ...

            mov     acc,src
            test    opc,#%0101 WZ
    IF_NZ   rdlong  acc,acc
            test    opc,#%1000 WZ
    IF_NZ   rdlong  acc,acc
    
    
    



    The questions which came to mind was - how fast is this when 'opc' is zero ? Do the 'rdlong' all become 4 cycles so it's as fast as can be or will each wait for hub access and slow down execution even though they effectively do nothing.

    It then struck me that my cycle counting may have 'gone to pot' anyway because of conditionals with 'jmp'.
  • AleAle Posts: 2,363
    edited 2008-08-22 19:11
    I see. May be a good case and worst case scenarios is the way to describe it. I evaluated the conditien in the execution phase... it made sense... later I forgot wink.gif
Sign In or Register to comment.