PDA

View Full Version : Conditional Execution Encoding



devin122
11-06-2010, 10:11 PM
Im writing a Propeller Emulator, and I was looking for a way to quickly decode the conditional field of the opcode. This is what I have so far using standard c syntax . The conditional bits in the opcode are called x_0 through x_3 where x_0 is the least significant bit. C and Z Should be obvious

A = (x_3 & (C & Z)) | (x_2 & (C & !Z)) | (x_1 & (!C & Z)) | (x_0 & (!C & !Z))

If A is true then the instruction is executed, otherwise it is not. I've checked this over and it should work. Any input? Anyone have any simpler solutions?

Nick Mueller
11-06-2010, 10:26 PM
Did you really want to use the bitwise operators & and |?


Nick

devin122
11-06-2010, 10:32 PM
Yup, its all about the speed, Im trying to make the emulator as fast as possible

Humanoido
11-09-2010, 08:19 AM
Devin122, very useful project! What platforms will it run on?

Nick Mueller
11-09-2010, 09:10 AM
Yup, its all about the speed, Im trying to make the emulator as fast as possible

Bitwise logical operators don't necessarily make it faster. In bitwise, the complete expression has to be evaluated. In logical, short-cut evaluation is allowed and might be faster.


Nick

Ale
11-09-2010, 01:17 PM
May be he wants anemulator for another uprocessor like an (X)AVR(32) (it exists already).

Ariba
11-09-2010, 01:47 PM
If it must be as fast as possible, I would consider a lookup table.
4 condition bits + Carry + Zero flag = 6 address inputs. The returned value must only be a bit (True/Flase) so only 64 bits are necessary (64 bytes may be faster).

Andy

Nick Mueller
11-09-2010, 02:22 PM
If it must be as fast as possible, I would consider a lookup table.


If it must be fast, I'd still start with a clear, readable and most of all working program.
Then I'd consult the profiler, see where inlining could work and look for bottlenecks.


Nick

Ariba
11-09-2010, 03:08 PM
If it must be fast, I'd still start with a clear, readable and most of all working program.
Then I'd consult the profiler, see where inlining could work and look for bottlenecks.

Nick

What is more clear and readable? This

A = (x_3 & (C & Z)) | (x_2 & (C & !Z)) | (x_1 & (!C & Z)) | (x_0 & (!C & !Z))

or this

A := condTab[x<<2 + C<<1 + Z]

DAT
condTab byte 0,0,0,0 'if_never
byte 1,0,1,0 'if_c
byte 0,1,0,1 'if_z
byte 1,0,0,0 'if_c_and_z
...


Andy

Dave Hein
11-09-2010, 03:17 PM
You could use A = (X >> ((C << 1) | Z)) & 1, where X contains the 4 conditional bits. Or you could use A = (I >> (((C << 1) | Z)) + 18) & 1, where I is the instruction. I believe this will work, but you should test it to make sure.

Nick Mueller
11-09-2010, 03:21 PM
or this
Code:
A := condTab[x<<2 + C<<1 + Z]

DAT
condTab byte 0,0,0,0 'if_never
byte 1,0,1,0 'if_c
byte 0,1,0,1 'if_z
byte 1,0,0,0 'if_c_and_z
...


This:
A = x || C || Z;


Nick

Dave Hein
11-09-2010, 03:29 PM
This:
A = x || C || Z;


Nick

That doesn't give the correct results.

Nick Mueller
11-09-2010, 03:54 PM
That doesn't give the correct results.


Ah! I was confused by the 64 (2^6) entries the "short and clear" table should have had but didn't.


Nick

Nick Mueller
11-09-2010, 04:07 PM
But that:
(x_3 && (C && Z)) || (x_2 && (C && !Z)) || (x_1 && (!C && Z)) || (x_0 && (!C && !Z))

should be:
(x_3 && C && Z) || (x_2 && C && !Z) || (X_1 && !C && Z) || (x_0 && !C && !Z)

further simplified:
(((x_3 && Z) || (x_2 && !Z)) && C) || (((X_1 && Z) || X_0 & !Z)) && !C)

This can be speed up:
if (C)
A = (x_3 && Z) || (x_2 && !Z);
else
A = (X_1 && Z) || X_0 & !Z);


Nick

Dave Hein
11-09-2010, 04:21 PM
Yes, that will work (with some slight syntax corrections). However, I believe A = (X >> ((C << 1) | Z)) & 1 would be faster on most processors. Your solution uses logical ORs, ANDs and NOTs, which compile to test and jumps.

Nick Mueller
11-10-2010, 07:15 AM
However, I believe A = (X >> ((C << 1) | Z)) & 1 would be faster on most processors.

That's a valid assumption. Hopefully, he doesn't use TRUE and FALSE to set the operands. That's why I prefer the logical operators (not knowing what other shortcuts the OP took to "make it fast").

First, a program has to be readable and maintainable. After that, you can comment out the initial algorithm and replace it with hard to read code if one prefers.


Nick

kuroneko
11-10-2010, 07:54 AM
Im writing a Propeller Emulator, and I was looking for a way to quickly decode the conditional field of the opcode.

To add some fun, any instruction becomes a nop when the next instruction is being fetched from address 0 (and no, it's not limited to location $1FF).

I feel so embarrassed. What I meant to say is that the PC must be at $1FF while the instruction is executed. While this means that the next fetch could come from #0 (but could theoretically come from anywhere else) the reverse isn't true as any jmp #0 will fetch from #0.

mpark
11-10-2010, 10:30 AM
To add some fun, any instruction becomes a nop when the next instruction is being fetched from address 0 (and no, it's not limited to location $1FF).

SRSLY? That's kooky.
How do you arrange it so an arbitrary instruction is followed by one fetched from 0?

kuroneko
11-10-2010, 12:47 PM
SRSLY? That's kooky.
How do you arrange it so an arbitrary instruction is followed by one fetched from 0?


DAT
...

movi ctra, #%0_11111_000
neg frqa, increment
mov phsa, preset
jmp phsa ' jump to preset + 2*frqa = target
' jump target is written one cycle later
... ' i.e. PC = preset + 3*frqa = -1

target rdlong cnt, #0 ' nop, cog thinks it's at $1FF (-1),
' execution continues at 0
...

increment long target+1
preset long target*3+2

Dave Hein
11-10-2010, 02:48 PM
kuroneko,

Exactly how many undocumented features does that code use? :) Is there a document that documents undocumented features?

Dave

kuroneko
11-10-2010, 11:12 PM
Exactly how many undocumented features does that code use? :) Is there a document that documents undocumented features?

If you ask me then I'd say one (undocumented feature). Which is the abort behaviour. I figure it's to do with the hand-over sequence for a coginit just before it executes user code. Anyway, I keep collecting stuff like this in the Propeller Tricks & Traps (http://forums.parallax.com/showpost.php?p=864343) thread.

mpark
11-11-2010, 06:52 AM
DAT
...

movi ctra, #%0_11111_000
neg frqa, increment
mov phsa, preset
jmp phsa ' jump to preset + 2*frqa = target
' jump target is written one cycle later
... ' i.e. PC = preset + 3*frqa = -1

target rdlong cnt, #0 ' nop, cog thinks it's at $1FF (-1),
' execution continues at 0
...

increment long target+1
preset long target*3+2

Kuroneko, you are kurazy!

What about jmp #0? Where's the next instruction fetched from in that case?

kuroneko
11-11-2010, 06:59 AM
What about jmp #0? Where's the next instruction fetched from in that case?

I don't quite follow. If it's just a jmp #0 then the next fetch is from #1. Can you elaborate?

In case you refer to a phase jump $000:$000A then the first target is indeed #0 (2+2*(-1)) but as the PC is $1FF (2+3*(-1)) at this point it becomes a nop and is then executed again for real.

A preset 2, increment = 1 (frqx = -1)

mpark
11-11-2010, 06:17 PM
I guess I'm not understanding what "next instruction" means. Say jmp #0 lives at address 10; what is the next instruction? I was thinking it would be whatever is at address 0. I'm not seeing why you say the next fetch is from 1.

Dave Hein
11-11-2010, 08:56 PM
Include me in the confused group. What's a phased jump? Does it have to do with pipeline, such as with a JNZ where the target address is prefetched rather than the next address? So a jmp #0 is a NOP, or is the instruction at $1ff a NOP? Please explain using small words. :)

kuroneko
11-11-2010, 11:31 PM
I have the strong feeling that this goes OT so I stop here and move the explanation to a different thread (singularity).

Just for clarification, the fetch-from-0 isn't enough and in fact irrelevant. The cog must think it's at $1FF (which would normally lead to a fetch from #0). Apologies for the confusion.