INDA/IINDB alternative?
Seairth
Posts: 2,474
What if INDA/INDB were changed as follows:
The thought here is that this would allow COND bits to be used with INDA/INDB instructions. This is all predicated on the idea that code using INDA/INDB will typically be mutating those registers in a consistent manner (e.g. iterating over an array), so mode changes (e.g. switching from pre-increment to post-increment) would be relatively infrequent. This would also simplify the pipeline processing a bit, which can only be a good thing. Note that this would also require the SETINDx opcode to be moved to the extend instruction set and broken into two separate instructions. It might be possible to use the Z/C flags to also set the two-bit mode within the same instruction, since that's when you're likely to do a mode change anyhow.
- Make them 11-bit registers, where the two MSBs have the same encodings as found in the COND bits, and the nine LSBs are the same as before.
- Add an instruciton (or two) to set the two MSBs.
The thought here is that this would allow COND bits to be used with INDA/INDB instructions. This is all predicated on the idea that code using INDA/INDB will typically be mutating those registers in a consistent manner (e.g. iterating over an array), so mode changes (e.g. switching from pre-increment to post-increment) would be relatively infrequent. This would also simplify the pipeline processing a bit, which can only be a good thing. Note that this would also require the SETINDx opcode to be moved to the extend instruction set and broken into two separate instructions. It might be possible to use the Z/C flags to also set the two-bit mode within the same instruction, since that's when you're likely to do a mode change anyhow.
Comments
The big problem is that all these INDA/INDB pointer usages and modifications occur very early in stage 2 of the pipeline, whereas conditional execution takes place in stage 4. You couldn't know, two instructions in advance, which INDA/INDB modifications shouldn't occur in stage 2, in order to maintain the correct INDA/INDB values. Also, register addresses must be resolved in stage 2.
Having said that, I got initially excited about your proposal. I had to try it out in my head before I recognized the old, familiar problem that I've faced a dozen times, already. Working on Prop2 is becoming, in some ways, like being stuck in a "fun house" with warped mirrors covering every wall. I can say I'm almost out, though.
NOTE: your comments made me realize that, with the current design, one must be very careful to avoid using INDA/INDB in the next two (or maybe just one?) instructions that immediately follow a non-delayed branch instruction. Otherwise, even though the instruction itself is *never* meant to be executed prior to the branch (like would happen with a delayed branch instruction), the INDA/INDB registers would still be updated due to the pipeline behavior.
The operative word is "fun." The entire Prop2 experience is fun and is about to become "funnest." I only wish you had been born 20 years earlier… or that I had been born twenty years later.
Rich
You're right, that WAS a problem, but I fixed it early on. I added an extra bit to each pipeline stage that signals whether or not that stage is still valid, regardless of the 4-bit condition field. So, those instructions trailing branches that get cancelled don't wreak the havoc that you might suppose.
I hope a lot of people wind up feeling that way. In this day, there seems to be an attitude that 'implementation' is a bad word. Everything should be written in a high-level language and nobody should ever have to get to know anything about any hardware, unless maybe VHDL or Verilog defines it.
There will be specifics to learn and exploit to use the Prop2 well, and I get a big charge out of that kind of thing. You and I think it's fun. I hope the world isn't too spoiled, already, to enjoy it, also. The implication of the other mode of thought is that a processor might as well be designed only to run compiled code efficiently, and an assembly-language programmer's perspective is utterly irrelevant.
This chip is made to be enjoyed in assembly language, first.
I think your quote Chip will become an HISTORIC one!
It's the PRIMARY reason I use propeller chips!
Fun mixed with power, a rare combination.
Can't wait. I've been holding off until this update is done. Seems like it's a final kind of thing, so it will be time to dig in, unlearn some stuff, then go forward on what is now to be the P2!
I love the regular P1 instruction set. P2 is of course more complex, but with the new mods, the P2 set is still quite regular. And when I don't need the speed - there is almost always a section of main code that doesn't require speed - I use spin.
Just today, a project that is almost shipping requires an extra mod - I have just added another cog (I have a few spare in my 3 prop solution ) to do that part in assembler with almost no changes to the main code. However, if it was on a single micro, it would have been a big mess to do!
Okay, so if the indirect registers are updated only when an instruction is shifting from stage 2 to 3 (and it's still valid), then you would have to make sure that there was at least one instruction between a non-delayed branch and the indirect-addressing instruction. Is that correct?
No. For stage 1, there is no flop yet. It's a logic signal. Thereafter, for stages 2 and 3, there are flops combined with logic after the Q output. So, the 'valid' flops propagate forward, but can be cancelled at any stage. Once an instruction is cancelled as it travels through the pipeline, it stays cancelled. Cancellation can happen in stages 1 through 3. If an instruction is still valid at stage 4, it was never cancelled, or never a trailing instruction behind a cancelling branch.
I think I am asking the question wrong. Suppose the following code:
In the pipeline, you end up with the MOV in stage 3 when the DJNZ is executed in stage 4. Supposing that the branch occurs, the MOV would be cancelled, except the update of INDA itself had already occurred one clock cycle earlier. I don't see how the "valid" flops would help here. What am I missing?
The MOV gets cancelled at stage 3 and INDA gets put back to the appropriate prior value. There is circuitry which handles late cancellations for INDA/INDB, and it took me a couple of days to figure out the rules for it. The insurmountable problem is that a read address has to be produced at stage 2, and if another instruction higher in the pipeline later gets cancelled, that stage 2 address that was already issued is now known to be bad. There's no way to back up.
Let's start with the current design (indirect-addressing instructions are encoded in the COND bits and execute unconditionally). Now, let's suppose that a local copy of INDA is propagated through Stages 2, 3, and 4. Now, suppose the following pseudo-code for Stage 2 of the pipeline:
Note that this code is ALWAYS reading a value into stage2.INDA, even if Stage 2 doesn't contain an indirect-addressing instruction). The pseudo-code that I left out was the fetching of the value for the d-field/s-field and the increment/decrement update of stage2.INDA. I am assuming that this will work roughly the same as it does now, except for which set of INDA registers are being read/written.
Further, suppose that:
If there are no cancelled instructions in the pipeline, then the above code should result in Stage 2 getting the same INDA value as in the current approach (where the value is stored in Stage 2 of the prior cycle). If the instruction in Stage 4 (e.g. DJNZ) causes the instructions Stage 3 and Stage 2 to be cancelled, then the next instruction would read "registers.INDA", which was updated in Stage 4 of the prior clock cycle. The modified value in this clock cycle's Stage 3 and Stage 4 would be ignored because those instructions were cancelled (in the prior clock cycle). Maybe a better example of this would be "JMPRET INDA, ++INDA". If that instruction reaches Stage 4 without being cancelled, it will ensure that any values stage2.INDA and stage3.INDA will be ignored, while the value in stage4.INDA 4 will be written to registers.INDA.
Unfortunately, multitasking would have to change slightly. When an instruction cancels the other instructions in the pipeline, it must cancel ALL instructions, even those in other tasks. This is due to the fact that the other tasks would be propagating potentially invalid INDA values, even if they weren't indirect-addressing instructions. This would also mean that multiple PCs would potentially have to be updated. Yes, I know that this would mean that one task would cause another task to stall, but it's not like we're writing deterministic code in the mode anyhow. Note, by the way, that a self-jumping instruction (e.g. WAITxxx), assuming it cancels the other instructions just like a branching instruction, should not encounter the issue discussed in thread "[thread=151114]Using WAITVID with INDA++ and multi-tasking + other observations[/thread]".
I hope that was understandable. But I'm not done yet. Supposing that I'm not complete off base with the above approach, let's consider my original proposal: allow indirect-addressing instructions to be conditionally executed (see earlier posts for details). With the above approach, INDA would not be useful in conditional instructions, because the conditionally-skipped instruction would still cause registers.INDA to be updated (after all, the instruction was skipped, not cancelled). To get around this, we could make use of another technique that's already in the processor: self-jumping. In this case, however, the instruction would jump to PC+1. This would cause the remaining instructions in the pipeline to be cancelled, then reloaded. Of course, you would get a three-cycle penalty for this, but INDA would stay consistent. Not a bad trade-off, I think, for getting back full use of the COND bits.
Okay, that's it for now. I realize it's a lot to chew over. If something doesn't make sense, make sure to ask for clarification. I'm fully aware that I may not have explained some of this very well. And I'm also fully aware that some of this just might not be possible (no matter how much a clarify my thoughts). Like I said, in the end, it might do nothing more than spur a better idea in the mind of someone who understands this stuff better than I do. And if that's all that I manage to do with this, that would be fantastic!
I know some people knock SPIN also, but that too, is such a great language, and it really compliments PASM well, I think Chip did a top notch job having the complete package of both languages being able to run on the Prop, giving you the awesomely quick prototyping speed of SPIN with the punch of PASM, writing any prop app was a pure pleasure! I'm really looking forward to when P2 comes and all that fun to be had!
Anyway, sorry I was off the main topic a little, but thought it just had to be said! Hats off to you Chip! you sir are a genius and a gent let the good times roll!
I am coming in late to this discussion... but I woke up early today :-)
If I correctly understand, you were proposing to make INDA/B instructions be conditionally executable. Thanks for catching that, I did not realize they were not already conditionally executable!
Again, if I get it, your latest (above) way of allowing that would
- not make them execute conditionally
- make multi-tasking non-deterministic if INDA/INDB are used
I agree that it would be useful to have INDA/B using instructions be able to be conditional, however only if it does not add delays to P2 production, and only if it does not cause the pipeline issues you outlined above.
I also have strong reservations about robbing two bits in the INDA/B registers, as that would introduce compatibility headaches in future Props (P3? P4?) that will likely need more than 9 bits of indirect addressing.
If only 1 operand has INDA/INDB specified...
2 bits for the PRE/POST & INC/DEC (as currently)
2 bits for Z & C conditional subset only
- 00 = always
- 01 = if_C
- 10 = if_Z
- 11 = if_Z_and_C
If 2 operands have INDA/INDB specified...
2 bits for the PRE/POST & INC/DEC for Dest (as currently)
2 bits for the PRE/POST & INC/DEC for Srce (as currently)
Just a thought. I know it does not allow for NC and NZ testing, but a subset is better than nothing and hopefully simple to implement.
BTW Don't we have the same issue with PTRA/PTRB ?
As for PTRA/PTRB, I believe those are only used in dedicated instructions and are therefore encoded differently. I think the same goes for SPA/SPB.
I just re-read the older specs and found this (could be a killer to the idea of using conditionals anyway)...
That's a good thing
I'm sure that everyone, everyday, is learning a little bit more, about the wonderful job done by Chip on the Propeller 2.
Cases as such described in these posts drove my mind back to early studies about speculative instruction execution, its caveats, consequences and advantages.
To detour from problems of INDA++ and similar ones, one of the possibilities is to have two sets of registers, for each involved pointer, with agregated enable bits, so only one of them is realy to be called as INDA, and the other, INDAshdw.
Everytime a INDA++ reference enters the first stage of the pipeline, it gets copied to its shadow register, i.e. the one that has its enable bit negated.
When INDA++ is executed in the second stage of the pipeline, the shadow retains the early value.
When any instruction at the fourth stage, determines the abandon of the other ones, still progressing thru the pipeline, the enable bits should be reversed, then the effects of pre-incrementing at the second stage, get logically canceled.
No one will know, for sure, if the INDA or its shadow will get emerged from pipeline's operation as the final INDA, but its name don't matter at all, only its contents.
Something like the OGT Z80 EXX or EX AF, AF', but exclusively under pipeline's behavior control.
Perhaps it's all a nonsense of mine, perhaps not.
Yanomani
Well, this got me really excited, but then I remembered the problem of having to back up the INDA/INDB states. Adding another layer of backup (stage 4) would certainly blow the timing. That circuit is quite complicated, already, and I flattened it as much as I could (it's rather big in area for what it does). I don't want to say never, but it's hard to think of overcoming the INDA/INDB limitations. That thing is already doing way more than I initially figured was reasonable. I love the idea of sneaking some use out an unused bit pair in CCCC, but having to back up states across 3 pipeline stages would be murder.
By the way, there are no such problems with PTRA/PTRB, as they are confined to stages 3 and 4, and 4 is where the conditionality happens. Same story with SPA and SPB.
That's pretty much how it works, already!
Would the approach I outlined in post #16 not avoid this?
If the INDa/b is non-incrementing (ie respective CC bits =00), then the other pair of CC bits represent IF_Z, IF_C, IF_Z_AND_C and 00=ALWAYS.
This should solve the backing up problem and we could do this conditional sequence...
IF_Z MOV/etc INDn,someval
IF_Z INCINDn 'new instructions to ADD/SUB INDn,#1 using the wrap FIXINDx settings.
If this is doable, what would be the best 3 Z & C options?
00 = always
01 = if_C
10 = if_Z
11 = if_Z_OR_C / if_Z_AND_C or some other condition such as IF_NC ???
or maybe this is more in keeping
00 = IF_NC_AND_NZ
01 = IF_C
10 = IF_Z
11 = ALWAYS
I like the idea of this instruction.
This instruction would also help in multi-tasking operations that use INDx in self-jumping instructions.
Nice.
Then, when an instruction in stage 4 cancels instructions, it does so only for the instructions of the current TASK (which I believe is already the behavior). This would mean that every task could safely use INDA/INDB at the same time (okay, technically, they are using their own copy). And the hope is that this approach would simplify INDA/INDB handling enough overall to make up for the addition of the extra sets of registers.
At the moment in multi-tasking you have to decide which task gains the
most benefit from the INDx registers and use MOVS/MOVD (soon to be SETS/SETD) in the other tasks to achieve the same indexing.
Sadly I think I hear the sound of bullets bouncing of the walls again!