PASM Cycle Timing

hippy · 2008-08-21 16:53

Do these have the same cycle timing between Start and IsZero/NotZero or does the IF_Z make Start to NotZero fewer cycles in the second example ?

:Start    mov     acc,#1
          tjz     acc,#:IsZero
:NotZero

:Start    mov     acc,#1 WZ
IF_Z      jmp     #:IsZero

:NotZero

Phil Pilgrim (PhiPi) · 2008-08-21 17:04

According to the published instruction timings, they should be the same, i.e. 8 clocks total if the jump is taken. Do you have reason to believe that they are different?

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!

lonesock · 2008-08-21 18:09

The TJZ version takes 4 clocks more than the IF_Z version. The test-&-jump commands take 8 clocks if the jump is missed, 4 if taken. Here was my test code:

' debug
        mov t1, par
        add t1, #8
        mov t_in, cnt
        mov cr_temp,#1 wz
        'tjz cr_temp,#:IsZero
IF_Z    jmp #:IsZero
:NotZero
        mov t_last, cnt
:IsZero
        subs t_last, t_in
        abs t_last, t_last
        wrlong t_last, t1
        add t1, #4
:hard_stop
        jmp #:hard_stop

My reported values were 16 and 12 for the TJZ and IF_Z, respectively. (note: I would comment out one line for each version...this code is for the IF_Z version)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.

hippy · 2008-08-21 18:14

It depends on how / when the IF_Z and other conditionals apply; do they turn a JMP into a NOP thus taking 4 cycles regardless like any other NOP or does it take 4/8 cycles with the jump to destination disabled ? Another, more extreme case is "IF_Z RDLONG".

I don't have hardware setup to test it and the manual seemed unclear, however I have now found this ... "When an instruction’s condition evaluates to FALSE , the instruction dynamically becomes a NOP , elapsing 4 clock cycles but affecting no flags or registers" ( page 368 ).

It seems then that "IF_Z JMP" is always faster than "TJZ", only ever taking 4 cycles.

@ lonesock : Thanks for taking the time to test it.

lonesock · 2008-08-21 18:26

hippy said...
@ lonesock : Thanks for taking the time to test it.

You're welcome! [noparse][[/noparse]8^)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.

Phil Pilgrim (PhiPi) · 2008-08-21 20:36

Hippy,

Sorry, I misread the intent of your question. But lonesock's answer prodded me to take another look at the pipeline. Paul Baker explained it best:

Paul Baker said...
The pipeline is SDIR, Source-Destination-Instruction-Result. The Instruction stage is the fetch for the next instruction (therefore this instruction was fetched 2 clock cycles before the Source fetch). The Full pipeline looks like this:
IdSDER               'first instruction
    IdSDER           'second instruction
        IdSDER       'third instruction
These stages are I= fetch instruction, d=decode instruction, S=fetch source operand, D=fetch destination operand, E=execute instruction, R=write result. Stage d is an internal operation, ie no access to memory is performed. I, S and D are all read operations, so that only leaves 1 stage to make an "effect". So all alterations of a cog's state is performed in the Result stage, whether it's writing to outa or whatever.

Here are some observations relative to your question:

1. Since the status bits get written during the R stage, they are not yet ready before the next instruction is decoded. So the decision whether to execute the next instruction has to be deferred at least to the S stage, where the source (i.e. the destination of a jump) is fetched.

2. For JMP-type instructions, the destination operand has to be loaded into the PC before the I stage. So the decision whether to execute has to occur before that. I guess this could occur at either the S or D stage. Maybe S isn't even loaded if a jump isn't taken, but I suspect that it is and that the PC is loaded during D.

3. Once D passes, and the PC is loaded for the next instruction, the die is cast. If the E stage determines that the jump shouldn't be taken after all (e.g. TJZ, DJNZ), the pipeline has to be flushed and PC reloaded with the address of the next instruction, which entails an extra four clocks.

It would appear that the earlier decision implied by the IF_NZ is what makes the shorter execution time possible in a no-jump situation.

At the risk of changing the subject (in order not to copy Paul's quote to yet another thread), how many times have we done something like this:

        movs    nxt,reg
        [b]nop[/b]
nxt     oper    dest,#0-0

when we could've done this:

        mov     src,reg
nxt     oper    dest,src
        ...
src     long    0-0

According to the pipeline, the NOP is unnecessary in the second example. Of course, similar savings are not possible if it wasn't an immediate operand to begin with.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!

Ale · 2008-08-22 05:09

hippy: The tjz and tjnz where optimized for speed so it takes 4 cycles for a jump taken and 8 where no jump is taken (so fetch of the next instruction must occur for the jump destination always so when jump is not taken it has to fetch again).

hippy · 2008-08-22 13:46

@ Ale : Yes, I understand that, but it was what happens when conditional and the condition isn't met. In that case it appears it's no longer 8 cycles when jump is not taken but 4; a condition not met stops it being executed as a jump instruction.

This arose while writing yet another Virtual Machine which handles indirection ...

        mov     acc,src
        test    opc,#%0101 WZ
IF_NZ   rdlong  acc,acc
        test    opc,#%1000 WZ
IF_NZ   rdlong  acc,acc

The questions which came to mind was - how fast is this when 'opc' is zero ? Do the 'rdlong' all become 4 cycles so it's as fast as can be or will each wait for hub access and slow down execution even though they effectively do nothing.

It then struck me that my cycle counting may have 'gone to pot' anyway because of conditionals with 'jmp'.

Ale · 2008-08-22 19:11

I see. May be a good case and worst case scenarios is the way to describe it. I evaluated the conditien in the execution phase... it made sense... later I forgot

PASM Cycle Timing

Comments