Conditional Execution Encoding

devin122 · 2010-11-06 15:11

Im writing a Propeller Emulator, and I was looking for a way to quickly decode the conditional field of the opcode. This is what I have so far using standard c syntax . The conditional bits in the opcode are called x_0 through x_3 where x_0 is the least significant bit. C and Z Should be obvious

A = (x_3 & (C & Z)) | (x_2 & (C & !Z)) | (x_1 & (!C & Z)) | (x_0 & (!C & !Z))

If A is true then the instruction is executed, otherwise it is not. I've checked this over and it should work. Any input? Anyone have any simpler solutions?

Nick Mueller · 2010-11-06 15:26

Did you really want to use the bitwise operators & and |?

Nick

devin122 · 2010-11-06 15:32

Yup, its all about the speed, Im trying to make the emulator as fast as possible

Humanoido · 2010-11-09 01:19

Devin122, very useful project! What platforms will it run on?

Nick Mueller · 2010-11-09 02:10

Yup, its all about the speed, Im trying to make the emulator as fast as possible

Bitwise logical operators don't necessarily make it faster. In bitwise, the complete expression has to be evaluated. In logical, short-cut evaluation is allowed and might be faster.

Nick

Ale · 2010-11-09 06:17

May be he wants anemulator for another uprocessor like an (X)AVR(32) (it exists already).

Ariba · 2010-11-09 06:47

If it must be as fast as possible, I would consider a lookup table.
4 condition bits + Carry + Zero flag = 6 address inputs. The returned value must only be a bit (True/Flase) so only 64 bits are necessary (64 bytes may be faster).

Andy

Nick Mueller · 2010-11-09 07:22

If it must be as fast as possible, I would consider a lookup table.

If it must be fast, I'd still start with a clear, readable and most of all working program.
Then I'd consult the profiler, see where inlining could work and look for bottlenecks.

Nick

Ariba · 2010-11-09 08:08

Nick Mueller wrote: »

If it must be fast, I'd still start with a clear, readable and most of all working program.
Then I'd consult the profiler, see where inlining could work and look for bottlenecks.

Nick

What is more clear and readable? This

A = (x_3 & (C & Z)) | (x_2 & (C & !Z)) | (x_1 & (!C & Z)) | (x_0 & (!C & !Z))

or this

A := condTab[x<<2 + C<<1 + Z]

DAT
condTab byte 0,0,0,0  'if_never
        byte 1,0,1,0  'if_c
        byte 0,1,0,1  'if_z
        byte 1,0,0,0  'if_c_and_z
        ...

Andy

Dave Hein · 2010-11-09 08:17

You could use A = (X >> ((C << 1) | Z)) & 1, where X contains the 4 conditional bits. Or you could use A = (I >> (((C << 1) | Z)) + 18) & 1, where I is the instruction. I believe this will work, but you should test it to make sure.

Nick Mueller · 2010-11-09 08:21

or this
Code:
A := condTab[x<<2 + C<<1 + Z]

DAT
condTab byte 0,0,0,0 'if_never
byte 1,0,1,0 'if_c
byte 0,1,0,1 'if_z
byte 1,0,0,0 'if_c_and_z
...

This:
A = x || C || Z;

Nick

Dave Hein · 2010-11-09 08:29

Nick Mueller wrote: »

This:
A = x || C || Z;

Nick

That doesn't give the correct results.

Nick Mueller · 2010-11-09 08:54

That doesn't give the correct results.

Ah! I was confused by the 64 (2^6) entries the "short and clear" table should have had but didn't.

Nick

Nick Mueller · 2010-11-09 09:07

But that:
(x_3 && (C && Z)) || (x_2 && (C && !Z)) || (x_1 && (!C && Z)) || (x_0 && (!C && !Z))

should be:
(x_3 && C && Z) || (x_2 && C && !Z) || (X_1 && !C && Z) || (x_0 && !C && !Z)

further simplified:
(((x_3 && Z) || (x_2 && !Z)) && C) || (((X_1 && Z) || X_0 & !Z)) && !C)

This can be speed up:
if (C)
A = (x_3 && Z) || (x_2 && !Z);
else
A = (X_1 && Z) || X_0 & !Z);

Nick

Dave Hein · 2010-11-09 09:21

Yes, that will work (with some slight syntax corrections). However, I believe A = (X >> ((C << 1) | Z)) & 1 would be faster on most processors. Your solution uses logical ORs, ANDs and NOTs, which compile to test and jumps.

Nick Mueller · 2010-11-10 00:15

However, I believe A = (X >> ((C << 1) | Z)) & 1 would be faster on most processors.

That's a valid assumption. Hopefully, he doesn't use TRUE and FALSE to set the operands. That's why I prefer the logical operators (not knowing what other shortcuts the OP took to "make it fast").

First, a program has to be readable and maintainable. After that, you can comment out the initial algorithm and replace it with hard to read code if one prefers.

Nick

kuroneko · 2010-11-10 00:54

devin122 wrote: »

Im writing a Propeller Emulator, and I was looking for a way to quickly decode the conditional field of the opcode.

To add some fun, any instruction becomes a nop when the next instruction is being fetched from address 0 (and no, it's not limited to location $1FF).

I feel so embarrassed. What I meant to say is that the PC must be at $1FF while the instruction is executed. While this means that the next fetch could come from #0 (but could theoretically come from anywhere else) the reverse isn't true as any jmp #0 will fetch from #0.

mpark · 2010-11-10 03:30

kuroneko wrote: »

To add some fun, any instruction becomes a nop when the next instruction is being fetched from address 0 (and no, it's not limited to location $1FF).

SRSLY? That's kooky.
How do you arrange it so an arbitrary instruction is followed by one fetched from 0?

kuroneko · 2010-11-10 05:47

mpark wrote: »

SRSLY? That's kooky.
How do you arrange it so an arbitrary instruction is followed by one fetched from 0?

DAT
                ...
                
                movi    ctra, #%0_11111_000
                neg     frqa, increment
                mov     phsa, preset
                jmp     phsa                    ' jump to preset + [COLOR="Red"]2[/COLOR]*frqa = target
                                                ' jump target is written [COLOR="Red"]one[/COLOR] cycle later
                ...                             ' i.e. PC = preset + [COLOR="Red"]3[/COLOR]*frqa = -1

target          rdlong  cnt, #0                 ' nop, [COLOR="Red"]cog thinks it's at $1FF (-1)[/COLOR],
                                                ' execution continues at 0
                ...
                
increment       long    target+1
preset          long    target*3+2

Dave Hein · 2010-11-10 07:48

kuroneko,

Exactly how many undocumented features does that code use?

Is there a document that documents undocumented features?

Dave

kuroneko · 2010-11-10 16:12

Dave Hein wrote: »

Exactly how many undocumented features does that code use? Is there a document that documents undocumented features?

If you ask me then I'd say one (undocumented feature). Which is the abort behaviour. I figure it's to do with the hand-over sequence for a coginit just before it executes user code. Anyway, I keep collecting stuff like this in the Propeller Tricks & Traps thread.

mpark · 2010-11-10 23:52

kuroneko wrote: »

DAT
                ...
                
                movi    ctra, #%0_11111_000
                neg     frqa, increment
                mov     phsa, preset
                jmp     phsa                    ' jump to preset + [COLOR="Red"]2[/COLOR]*frqa = target
                                                ' jump target is written [COLOR="Red"]one[/COLOR] cycle later
                ...                             ' i.e. PC = preset + [COLOR="Red"]3[/COLOR]*frqa = -1

target          rdlong  cnt, #0                 ' nop, [COLOR="Red"]cog thinks it's at $1FF (-1)[/COLOR],
                                                ' execution continues at 0
                ...
                
increment       long    target+1
preset          long    target*3+2

Kuroneko, you are kurazy!

What about jmp #0? Where's the next instruction fetched from in that case?

kuroneko · 2010-11-10 23:59

mpark wrote: »

What about jmp #0? Where's the next instruction fetched from in that case?

I don't quite follow. If it's just a jmp #0 then the next fetch is from #1. Can you elaborate?

In case you refer to a phase jump $000:$000^A then the first target is indeed #0 (2+2*(-1)) but as the PC is $1FF (2+3*(-1)) at this point it becomes a nop and is then executed again for real.

^A preset 2, increment = 1 (frqx = -1)

mpark · 2010-11-11 11:17

I guess I'm not understanding what "next instruction" means. Say jmp #0 lives at address 10; what is the next instruction? I was thinking it would be whatever is at address 0. I'm not seeing why you say the next fetch is from 1.

Dave Hein · 2010-11-11 13:56

Include me in the confused group. What's a phased jump? Does it have to do with pipeline, such as with a JNZ where the target address is prefetched rather than the next address? So a jmp #0 is a NOP, or is the instruction at $1ff a NOP? Please explain using small words.

kuroneko · 2010-11-11 16:31

I have the strong feeling that this goes OT so I stop here and move the explanation to a different thread ([post=953138]singularity[/post]).

Just for clarification, the fetch-from-0 isn't enough and in fact irrelevant. The cog must think it's at $1FF (which would normally lead to a fetch from #0). Apologies for the confusion.

Conditional Execution Encoding

Comments