PDA

View Full Version : Waitpeq/waitpna resume timing



Cliff L. Biffle
05-09-2008, 04:11 PM
I'm extending my instruction scheduler to handle external events, and I was wondering if anyone could help me with the precise timing behavior of waitpna and waitpeq on the P8X32A.

I'm dealing with some particularly tricky bus interface code, so the specifics have suddenly become important. http://forums.parallax.com/images/smilies/smile.gif I don't have the bench equipment I need to answer this question on my own.

In the below, I make some assumptions: INAx is the single pin we're monitoring; we're waiting for it to go high; we're using waitpeq. If the timing can vary in other cases (such as high-low transitions or waitpna), please let me know.

If INAx is held high, waitpeq should take 5 cycles.

If INAx is held low, waitpeq should spin indefinitely.

But if INAx transitions during execution of waitpeq, how soon is it noticed? Or, put another way, what hold time is required before the next instruction is allowed to execute?

If I've been unclear, tell me and I'll rephrase or elaborate. I'll add the results to my Propeller timing info page after testing them. For my application I need precision of no more than about 0.5clk, but if anyone has better it'd be great.

Thanks!

Edit: Just to clarify, I've seen a post from Paul Baker about a year ago that specifies that the next instruction executes on the next cycle after the condition is true, which almost answers my question -- but I'm hoping someone (possibly Paul) can be more specific about hold times for the comparison.

Post Edited (Cliff L. Biffle) : 5/9/2008 8:19:09 AM GMT

kuroneko
05-09-2008, 05:09 PM
Cliff L. Biffle said...
If INAx is held high, waitpeq should take 5 cycles.


FWIW, it's 6 cycles. http://forums.parallax.com/showthread.php?p=722987

Ken Peterson
05-09-2008, 08:33 PM
Perhaps someone form Parallax can chime in on why the data sheet says 5+, and exactly how many clock cycles after the transition does the next instruction execute?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Cliff L. Biffle
05-09-2008, 11:54 PM
kuroneko said...

Cliff L. Biffle said...
If INAx is held high, waitpeq should take 5 cycles.


FWIW, it's 6 cycles. http://forums.parallax.com/showthread.php?p=722987


I notice you haven't gotten confirmation from Parallax on this.


Ken Peterson said...
Perhaps someone form Parallax can chime in on why the data sheet says 5+, and exactly how many clock cycles after the transition does the next instruction execute?


Parallax has chimed in on this before; Paul Baker said
[quote]To clarify a point, waitpeq is deterministic with respect to the pin state. The reason it is listed as 5+ is it takes 4 clocks to process the instruction, plus however many cycles of compare necessary to achieve the wait state. If the value is true at the beginning it will take 5 cycles since only one compare cycle occurs. For situations where more than one compare cycle occurs, the next instruction begins execution on the next clock cycle after a comparison evaluates true.

(http://forums.parallax.com/showthread.php?p=656005)

Kuroneko's statement would seem to conflict with that, however.

kuroneko
05-10-2008, 02:05 PM
Cliff L. Biffle said...
Kuroneko's statement would seem to conflict with that, however.


Well, I wish it was 5+ and I simply overlooked something. But as mentioned in the other thread, even with the monitored pin being static I get a 6 cycle delay (as opposed to having the equation always resolve to true and thereby ignoring the pin state altogether).

Harley
05-15-2008, 03:12 AM
Funny, I just the other day measured additional 2 cycles to the 4 after the leading edge of a true state for WAITPEQ.

The instruction is set up way before the the input pulse appears. And using ViewPort and some 'test pulses' on another I/O to mark events, I see 100 nsec delay between the awaited pulse and a 'test pulse' (running at 80 MHz). Never do I see a total of 5 cycles, but a constant 6. Costing another 50 nsec response. LIFE!!!
http://forums.parallax.com/images/smilies/yeah.gif

What might the condition have to be to see a 5 clock response?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko

Cliff L. Biffle
05-15-2008, 07:40 AM
Well, with no word from Parallax (though I haven't contacted them directly), I've got a Proto Board on the bench with my scope. I'll post traces when I've got 'em.

Cliff L. Biffle
05-15-2008, 01:20 PM
Here's my preliminary report. Kuroneko appears to be correct -- I've not been able to get any wait* instruction to take less than 6 cycles.

Bench configuration:
- P8X32A (LQFP) on Parallax Proto Board. Date code 0641.
- Regulator producing 3.6V (slightly out of spec)
- Crystal stable at 80MHz.
- Tektronix TDS1012B.

Automated test suites run using propasm, make, and Remy Blank's Loader.py. I've omitted the clock configuration directives from the source here.

Baseline: toggle.pa




toggle
mov DIRA, MASK
:loop
or OUTA, MASK
andn OUTA, MASK
jmp #:loop

mask long $FFFFFFFF




http://www.cliff.biffle.org/images/pt-toggle.png

As expected this shows a 33% duty cycle, 50ns (4 cyc) high, 100ns (8c) low.

Testing waitpeq
For this run P0 was pulled directly to ground.




waitgnd
or DIRA, OUTPUT
:loop
or OUTA, OUTPUT
waitpeq ZERO, INPUT
andn OUTA, OUTPUT
jmp #:loop

OUTPUT long $0000FF00
INPUT long $00000001
ZERO long 0




http://www.cliff.biffle.org/images/pt-waitpeq.png

The high phase of the signal has grown by 75ns/6c. This supports Kuroneko's hypothesis that the fastest possible invocation of waitpeq will take 6c.

Testing waitpna and resume latency
For this test, I added a switch to pull P0 to rail. Channel 2 was tied to P0 with a rising-edge trigger at 1.63V. The chip was allowed to idle at waitpne for several seconds before P0 went high.




waithi
or DIRA, OUTPUT
:loop
or OUTA, OUTPUT
waitpne ZERO, INPUT
andn OUTA, OUTPUT
jmp #:loop

OUTPUT long $0000FF00
INPUT long $00000001
ZERO long 0




http://www.cliff.biffle.org/images/pt-waitpeq2.png

This one answers my original question: there appears to be a minimum latency of two cycles from the time that a wait condition is satisfied to the time that the next instruction begins execution. If anyone else's data fail to support this, I can do a more thorough test using a random external source and the next instruction's input latching characteristics.

I repeated this test several times and saw 75ns/2c - 80ns most of the time. If my model is correct the latency could approach 87.5ns/3c, but the data failed to support this thus far.

Testing waitcnt

While I was at it I constructed a test fixture for waitcnt. A script repeatedly recompiled the code below with different values for {CYCLES} and recorded the traces that resulted at P15.




timecnt
mov DIRA, OUTPUT
:loop
or OUTA, OUTPUT
mov time, CNT
add time, #{CYCLES}
waitcnt time, #0
andn OUTA, OUTPUT
jmp #:loop

time long 0
OUTPUT long $0000FF00




Values of CYCLES in 0..8 (inclusive) failed to transition in the time alloted (500ms).

Values of 9 and greater had high-phase periods of 212.5ns + (CYCLES - 8) * 12.5ns, and the expected 100ns low period.

Here's the trace for CYCLES=11:
http://www.cliff.biffle.org/images/pt-waitcnt.png

Commenting out the waitcnt instruction gave a high period of 150ns, giving a best-case waitcnt time of 225ns - 150ns = 75ns, or 6c. This supports Kuroneko's hypothesis.

I'm not sure I can tell, from this data, what the resume latency of waitcnt is. It's made more difficult by the ill-defined latching behavior of operands. I can say with confidence that it's between 2c and 3c, and judging from the behavior of waitpeq/waitpna I suspect 2c -- but again, my data is inconclusive.

If anyone has suggestions for improving my methods, lemme know.

Edit: Actually, I think the data may suggest an event-to-resume latency of three cycles, at least for waitcnt.

Post Edited (Cliff L. Biffle) : 5/15/2008 5:49:11 AM GMT

Phil Pilgrim (PhiPi)
05-15-2008, 01:47 PM
I find it interesting that the minimum delays observed for holding a pin in the desired state are consistent with those for transistioning a pin to the desired state. Since the measured delay is from the edge detected at the pin, I would expect the latter to be a further cycle or two delayed due to input synchronization. I haven't found any references in the docs to indicate how many stages of synchronization the Prop employs, though.

-Phil

Cliff L. Biffle
05-15-2008, 02:26 PM
Personal correspondence from a couple years back suggested that external inputs were latched within the corresponding fetch cycle (cycle 0 or 1 of the instruction), but I haven't tested it. I believe that this is supported by my PFcam driver for the OV6620, which assumes no input buffering or propagation delay (and works fine).

I'll work out a way to test this.

Ale
05-15-2008, 04:26 PM
I did similar experiments, at 80 MHz clock, but using a slightly modified loop, i.e. not writing the variable before the waitcnt in case the pipeline gets *somehow* stalled, something that should not happen http://forums.parallax.com/images/smilies/smile.gif, and it does not happen.



org

c5_start mov DIRA,k5_pin
mov OUTA,#0

c5_lbl0
mov k5_temp,CNT
add k5_temp,#18
or OUTA,k5_pin
nop
waitcnt k5_temp,#0
andn OUTA,k5_pin
jmp #c5_lbl0

k5_temp long 0
k5_pin long $8000





Well, seems that 17 is the minimum (or it waits 2^32-16 or so cycles). If I remove the nop... 13 as expected is the minimum.

kuroneko
05-15-2008, 05:20 PM
Ale said...
Well, seems that 17 is the minimum (or it waits 2^32-16 or so cycles). If I remove the nop... 13 as expected is the minimum.


Well, that just proves to me that a NOP takes 4 cycles ... (reading 17 and 13 as replacement for #18 in your code, correct me if I'm wrong).

Also, assuming the picture shows the behaviour of the code you posted (#18), then I can see the following:

- the 190ns are roughly 15 cycles (at 80MHz)
- 8 are consumed by the NOP and the toggle
- that leaves 7 cycles for the waitcnt

You also stated that #17 is the minimum for waitcnt not to stall for a very long time, which makes me believe that the pulse width could go down to 14 cycles which still leaves 6 for waitcnt. Would it be possible to take a measurement for case #17?

Post Edited (kuroneko) : 5/15/2008 9:32:09 AM GMT

Ale
05-15-2008, 07:10 PM
Sure,

here it is for 17 cycles. 16 sends it to the no trigger version http://forums.parallax.com/images/smilies/wink.gif

stevenmess2004
05-15-2008, 07:16 PM
This is weird because I'm sure that Paul needed the 1 cycle offset in his high speed serial object.

kuroneko
05-15-2008, 07:17 PM
Thanks. Seems that pipeline stalls don't come into it. Well, I can live with 6+ but there must be a reason that the docs mention 5+ ...

stevenmess2004
05-15-2008, 08:09 PM
With the 17 we are getting 5+ for the waitcnt. (17-4-4-4=5)

Don't know what's going on with the other waits though. What happens with a waitvid?

kuroneko
05-15-2008, 08:20 PM
stevenmess2004 said...
With the 17 we are getting 5+ for the waitcnt. (17-4-4-4=5)


What do you mean by 17? CNT adjustment? From the last posted scope screen I count 14 cycles which is 4 + 4 + 6.

stevenmess2004
05-15-2008, 08:59 PM
True, sorry. But I wonder if that is where the 5+ came from? Would also be interesting to change the wait instruction to something like this


waitcnt OUTA,%111



Don't know if it will do anything useful but could be interesting.

Cliff L. Biffle
05-16-2008, 04:24 AM
Steven, Kuroneko's right -- your test fixture is identical to mine with an additional 8 cycle latency. So 17 cycles is the result I would have predicted (9 + 8).

Having waitcnt write directly to the output latch would be interesting to verify the latching behavior of the Writeback stage. I haven't tested it specifically, but I believe my measurements show the Writeback affecting outputs within the final cycle of the instruction.

If I've miscalculated the waitpne resume latency it might be +1 cycle however. I can see a test fixture for this.

I'll ping some of the Parallax guys directly on this -- there's only so much reverse engineering I want to do when I'm not getting paid to do it. http://forums.parallax.com/images/smilies/smile.gif


The good news is, using the timings calculated in my last post, I've now got perfect phase lock in my OV6620 driver -- despite the mutually prime clock frequencies (17.73MHz vs. 96MHz).

kuroneko
05-16-2008, 08:18 AM
Cliff L. Biffle said...
Having waitcnt write directly to the output latch would be interesting to verify the latching behavior of the Writeback stage. I haven't tested it specifically, but I believe my measurements show the Writeback affecting outputs within the final cycle of the instruction.


http://forums.parallax.com/showthread.php?p=720785 may be of interest to you. It helped me :)

Cliff L. Biffle
05-17-2008, 06:39 AM
Yeah, I've read that before. After the timing error, though, I'm trying to measure rather than assume.

Paul Baker
05-20-2008, 06:46 AM
I have verfied that that WAITPxx does indeed take 6+ cycles to execute, we will be updating our documents to reflect this. Nice catch.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker (mailto:pbaker@parallax.com)
Propeller Applications Engineer
[/url][url=http://www.parallax.com] (http://www.parallax.com)
Parallax, Inc. (http://www.parallax.com)

Cliff L. Biffle
05-20-2008, 09:05 AM
Thanks for the confirmation, Paul. Do you have any insight on my original question -- namely, after the waitxxx condition is met, how long until the next instruction kicks in? Two cycles? Three?

kuroneko
05-20-2008, 12:46 PM
Cliff L. Biffle said...
Thanks for the confirmation, Paul. Do you have any insight on my original question -- namely, after the waitxxx condition is met, how long until the next instruction kicks in? Two cycles? Three?


I was curious myself. So I put together a quick and dirty test program. This does a waitpeq followed by an OUTA modification. Two observers wait on the waitpeq pin and the modified OUTA pin respectively and then record CNT. The difference is 7. Given that OUTA happens in the result phase, the next instruction therefore starts at 7-3 cycles after the wait condition has been met, i.e. if the wait condition happens at CNT, the next instruction starts at CNT+4.

Paul Baker
05-21-2008, 03:18 AM
Kuroneko, I too confirm a 7 cycle delay for an already established wait period + mov instruction, but I don't concur with your conclusion. I have discussed this behavior with another engineer and this is what we think is happening. But first a review of the Propeller Pipline:

IdSDeR 1st instruction
IdSDeR 2nd instruction
IdSDeR 3rd instruction

Where
I = instruction fetch
d = decode
S = fetch source
D = fetch destination
e = execute
R = write result


When a waitxxx instruction is executed, the pipeline stalls. Now we know that the output of the mov is done in the result stage. If we count back 7 cycles that places us in the D stage of the waitcnt, but this doesn't make sense because this is not a logical place for the waitcnt to stall (whereas stalling at stage e makes sense).

So a further examination of the waitxxx behavior is needed:

The following is a representation of·a pin state a waitpxx instruction is performing on. The | represents a clock boundry, a _ represents no change of the pin and a / represents a change of the pin




|_|_|/|_|_|
1 2 3 4 5 6


as you can see the event happens between cycle 3 and 4. At clock cycle 4 the waitpxx registers a true condition, but this does not immediately result in·restarting the pipeline because the true condition must be given time to propagate throughout the cog. Therefore the cog does not restart the pipeline until cycle 5. When the pipeline restarts it will be at the R stage of the waitxxx and the d stage of the following instruction. Take the 2 cycles needed to register and propage the true condition plus the 5 remaining stages of the next instruction we are left with 7 cycles.

While I am not absolutely certain this is precisely whats going on, it is a good enough explanation of observed behavior that it should suffice to think·of waitxxx's behavior in this way.

One further note, the tests that we both did were synchronous in nature, for asynchronous test it will be found that the response time will be (6,7] or anything between 6 and 7 cyles, but not including·6 itself (ie any marginal value above 6 such as 6.02).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker (mailto:pbaker@parallax.com)
Propeller Applications Engineer
[/url][url=http://www.parallax.com] (http://www.parallax.com)
Parallax, Inc. (http://www.parallax.com)

Post Edited (Paul Baker (Parallax)) : 5/20/2008 7:55:04 PM GMT

Cluso99
05-21-2008, 07:17 AM
I released a DataLogger that can read all pins for 1880 clock cycles (yes, ALL clock cycles @ 12.5nS with 80MHz). It uses 4 cogs to do this and all 8 are used in my demo.

But one cog is doing the presetting of the triggering point and this is easily modified to place whatever instructions you like after the initial set of pin2. If I recall, there is about 15 clock latency before the sampling takes effect but you can see that from the dataset. Place your code once Pin2 starts toggling. Enjoy http://forums.parallax.com/images/smilies/smile.gif

I am just about to drive from Sydney to Brisbane (actually Gosford to Gold Coast for those who know Australia) so will be out of action for a few days. I can then have a look at the timings.

kuroneko
05-21-2008, 07:23 AM
Paul Baker (Parallax) said...
Kuroneko, I too confirm a 7 cycle delay for an already established wait period + mov instruction, but I don't concur with your conclusion.


Thanks for this useful piece of information. I agree that I could always be wrong as I can only observe so much (not knowing the internals). While investigating a different problem I noticed that the stall for a waitcnt would end up in the D phase which - as you put it - doesn't make much sense (given that there are valid S and D registers to be fetched).

Post Edited (kuroneko) : 5/21/2008 3:08:17 AM GMT

Cliff L. Biffle
05-23-2008, 11:36 AM
Paul, your description of the pipeline implies that the next instruction's source value is latched before the waitpeq instruction waits -- which could obviously be a long time.

This should be easy enough to verify, and could actually be useful (to e.g. detect pin changes).

Stalling between the completion of the S and D stages matches my measurements, I think. Nice to see that your description matches the pipeline model I inferred a couple years back (www.cliff.biffle.org/software/propeller/notes.html (http://www.cliff.biffle.org/software/propeller/notes.html)).

Cluso99
05-23-2008, 07:26 PM
Attached is a timing diagram (sample_43A.spin) of waitpeq/waitpne instructions and mov outa,#x. I have added the IdSDER info as provided by Paul and my observations confirm his statements. The source code of the instructions is also attached.

The reason the waitpeq takes·6 cycles is because the instruction pipeline is flushed. Therefore, the pin sampling takes place on the E cycle for the waitpeq instruction. The mov outa,#x writes to the output pin during the R cycle, as stated by Paul

The clocking diagram is output by my program and saved by Hyperterminal into a compatable *.spin·file http://forums.parallax.com/images/smilies/cool.gif

Phil Pilgrim (PhiPi)
05-24-2008, 12:43 AM
This is all conjecture on my part, but Paul mentioned the pipeline stalling, not flushing. There's really no reason to flush the pipe, since the next instruction never changes. It would make more sense, I think, to assume that the execution phase of the WAIT instructions requires at least three clocks to complete. Moreover, I'm guessing that the execution phase itself is pipelined, overlapping latching of the inputs, ANDing with the source, and comparing with the destination. OTOH, the ANDing and comparison could be done as a Boolean in one step, with an additional step required for deciding the next state. That the execution phase be pipelined (assuming it's multi-step) is necessary; otherwise, the timing granularity would be greater than one clock.

Here's a diagram that illustrates my conjecture:




IdSDeR OR x, y
IdSDlac WAITPEQ one, pin10
lac 'Latch inputs, AND source, compare with destination.
lac
lacR
I????IdSDeR ADD a, b

...or...

IdSDeR OR x, y
IdSDlcd WAITPEQ one, pin0
lcd 'Latch inputs, AND and compare, decide next state.
lcd
lcdR
I????IdSDeR ADD a, b




What's unknowable is where, in the execute pipeline, the next instruction is read: at the beginning, at the end, or multiple times at each stage? This uncertainty is represented by the question marks above. But, in any event, it's read before the result stage of the WAIT, so the actual mechanism is unimportant.

This hypothesis, compared with Cluso's, is only testable if the WAITPEQ/WAITPNE instruction can be forced (via "wr") to write something to the destination different from what was there before. The destination address would have to be the address of the next instruction. Then we could tell whether that instruction was fetched before (my hypothesis) or after (Cluso's hypothesis) the result of the WAIT was written. I haven't tried a WAITPEQ/WAITPNE with a forced "wr", so I don't know whether it can be forced to write something. If it can, what it writes could well be the result of the AND, in which case a WAITPNE would be capable of changing the destination to a different value.

For the WAITCNT instruction, this test is much easier, since there is a known result to write. So I tried the following:




CON

_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000

PUB start

cognew(@waittest, 0)

DAT

waittest mov dira,#3
waitcnt nexti, #1
nexti mov outa, #2
jmp #nexti




As a result of the WAITCNT, nexti should be incremented by one, which would cause both A0 and A1 to rise simultaneously, assuming the result gets written before the MOV instruction is fetched. In fact, A1 rises 100ns ahead of A0, which indicates that the pipe was never flushed during the WAITCNT, but merely stalled.

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 5/23/2008 4:50:29 PM GMT

kuroneko
06-19-2009, 08:52 AM
Some after breakfast findings:
waitpeq with wr set performs an add dst, src
the next instruction is not affected
mask value (not location) is fetched during Execution stage (same as mov dst, src)
pins are first sampled in the stage after that (L)
result is written - if applicable - in the last cycleWhich boils down to something like this (for running straight through):



sdEL?R


Not sure about cycle 5, maybe it is used for decision making as Phil suggests. Seems odd though because sample resolution is 1 cycle.

Phil Pilgrim (PhiPi)
06-19-2009, 10:02 AM
kuroneko said...
waitpeq with wr set performs an add dst, src

Gad! They're not even related! I wonder what's up with that? Does it still wait for a match?

-Phil

Addendum: Are you sure it's not a sub dst,src? That would make a little more sense, due to the compare.

kuroneko
06-19-2009, 10:10 AM
Phil Pilgrim (PhiPi) said...
Gad! They're not even related! I wonder what's up with that? Does it still wait for a match?

Addendum: Are you sure it's not a sub dst,src? That would make a little more sense, due to the compare.

Yes, the wait behaviour remains the same. And yes, it is an add, waitpeq :007C0002, :00FF00FF wr results in $017B0101 being stored in dst.

Phil Pilgrim (PhiPi)
06-19-2009, 10:39 AM
Oh, I think I know where that comes from now. Chip must've used some of the logic from waitcnt, which does do an add with the default wr.

-Phil

Cluso99
06-19-2009, 12:20 PM
With partial decodes and all that (to save silicon), it may not even be a waitxxx instruction.

BTW, somewhere Chip has said the waitxxx instructions are in fact 6 cycles minimum. This may mean that the waitxxx instructions go to different states while waiting. We just do not know. The only thing that we could determine is where the input pin is sampled from using a scope and varying where the pin changed state. This is just too hard to bother.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBladeProp (http://forums.parallax.com/showthread.php?p=786418), SixBladeProp (http://forums.parallax.com/showthread.php?p=780033), website (Multiple propeller pcbs) (http://bluemagic.biz/cluso.htm)
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator) (http://forums.parallax.com/showthread.php?p=790917)
· Prop Tools under Development or Completed (Index) (http://forums.parallax.com/showthread.php?p=753439)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index (http://forums.parallax.com/showthread.php?p=778427))
· Search the Propeller forums (via Google) (http://search.parallax.com/search?site=parallax&client=parallax&output=xml_no_dtd&proxystylesheet=parallax&proxycustom=<HOME/>&ie=&oe=&lr=)
My cruising website is: ·www.bluemagic.biz (http://www.bluemagic.biz)·· MultiBladeProp is: www.bluemagic.biz/cluso.htm (http://www.bluemagic.biz/cluso.htm)

Phil Pilgrim (PhiPi)
06-19-2009, 12:36 PM
Yeah, any code that relies on such a fine level of processor state analysis probably needs to be "rethunk" anyway. http://forums.parallax.com/images/smilies/smile.gif

-Phil

stevenmess2004
06-19-2009, 08:42 PM
Something else to think about. What happens if the instruction is set to not execute due to the conditions/flags? Do the instructions take 4 cycles or 6 cycles?

kuroneko
06-19-2009, 09:28 PM
stevenmess2004 said...
What happens if the instruction is set to not execute due to the conditions/flags? Do the instructions take 4 cycles or 6 cycles?

If it doesn't execute due to conditions it's a nop, therefore it takes 4 cycles.