Bill,
I'm confused. You seem to be asking for it to work exactly how it works (except the range is -16 to 15 not -32 to 31).
RDLONG D, PTRA[+4] <-- does not change PTRA, just uses PTRA + 4 (longs) as the address
RDLONG D, PTRA[++4] <-- does modify PTRA, pre incrementing by 4 longs, then reading from that address
Is it that you want it to not do the scaling when it's not modifying the pointer? Or what?
Hi Roy,
Is there any chance of uploading a PDF file with all of the instruction descriptions in the same place? It's great to read about them a few at a time but it would be nice to have a consolidated reference.
For these instructions the top bit of the source field is always 1, so you only have 8 bits to work with, and that's why the range is one bit less than you expected. E is for effects (z,c,r) as in the WZ, WC, WR, etc stuff. I am not certain what N is, perhaps Chip will chime in.
David,
I don't have a complete file of all this yet. I am composing the information from notes I took while talking with Chip about it. Once I have all the stuff composed, I'll make a single file with all of it.
Also, I will be starting a different thread for these instruction descriptions.
For these instructions the top bit of the source field is always 1, so you only have 8 bits to work with, and that's why the range is one bit less than you expected. E is for effects (z,c,r) as in the WZ, WC, WR, etc stuff. I am not certain what N is, perhaps Chip will chime in.
Did the Prop II gain zero overhead loops (as seen in DSPs) ?
The seriously limited primary code memory, makes the usual loop-unrolling rather less an option than on most other controllers.
What happens if we want to read or write all pins at once?
I suppose there is no such mov or other on cog instruction.
Can something like "rdquad pins, hubptr" read all pins at once?
That's not really an opcode level problem ?
If it was important enough to include, it could be done the same way 8 bit uC manage 16b writes, with a simple flag queue - writes go into buffers until the last one, which transfers everything.
Where would someone need to update all pins at once ? { buying a ground bounce worst case }
I can see that pin swap freedom on ports would allow tighter PCB designs, but users might not want ALL port writes queued, and now that gets more complex...
I still question the use of pinx as a reserved word, because of the confusion it can cause. It would be better to force programmers to write inx and outx and to flag errors when they are used inappropriately. Programs would not only be more reliable, but also more readable, since the syntax would correlate with the intent. If this were done, I think something like test ina,#1 would be a little less objectionable, since the intent is much clearer from the syntax than test pina,#1. In any event, I see no reason to go back to having inx and outx occupy separate addresses.
-Phil
I am in total agreement with Phil's comments here. No need to have separate INx and OUTx locations, just make the compiler do it properly to avoid coding errors.
Now what makes more sense is to use the compiler to reposition the source and destination registers according to what is really being done, not how the propII actually formats the instructions.
WRxxxx hubptr, [#]data
RDxxxx cogdata, [#]hubptr <
reversed to the current compiler, but makes more sense to a programmer
mov data, INx
mov OUTx,[#]data
we could also have (if the instruction set permits, which seems to be the case). The compiler just handles it.
test INx,[#]data
test OUTx,[#]data
The only issue I see in having the compiler placing the source/destination in the appropriate bits is that the MOVS and MOVD instructions become a little more obscure, but I think this is outweighed by the normal programming structure.
I definately do not want our nice, powerful, and fancy new instructions make the PropII instruction set be so complex that it looks like a dogs breakfast (as is the case in some other micros). In fact, I think it would be nice for the Prop II to have two lists of instructions, one being the "regular" normally used instructions and the others being the "super" instructions for complex operations. In other words, hide the complexities from those who don't care.
On all hub memory read instructions, S can be replaced with a PTRA or PTRB. Also for hub writes, D can be replaced with PTRA or PTRB. There are several ways to use the PTRx registers with hub access instructions.
Note: The range for the constants inside the []'s below is -16 to +15 (5 bits signed) However, this value is scaled by the size of the RD or WR you are doing. 1 for RD/WRBYTE, 2 for RD/WRWORD, 4 for RD/WRLONG, and 16 for RD/WRQUAD.
RDLONG D, PTRA - reads the hub address that PTRA points to
RDBYTE D, PTRB[+3] - reads the hub address that PTRB+3 points to
RDLONG D, PTRA[++2] - pre-increment PTRA by 2 longs (so actually adding 8 to PTRA), then read the hub address that PTRA now points to
RDLONG D, PTRB[6--] - reads the hub address that PTRB points to, then post decrement PTRB by 6 longs (so actually subtracts 24 from PTRB)
WRLONG D, PTRB - writes to the hub address that PTRB points to
WRWORD D, PRTA[++4] - pre increments PTRA by 4 words (advancing PTRA by 8), then writes to the hub address that PTRA now points to
These work with all forms of RDxxxx & WRxxxx, including the new RDQUAD/WRQUAD, and the cached read versions of RDxxxxC. Something else to note is that when doing RD/WR instructions with a constant, like RDLONG D, #xxx, the range of xxx is now limited 0 to 255 instead of 0 to 511. That upper bit is used to indicate using PTRx stuff.
Also, the compiler/assembler will be able to shorthand PTRx[++1] to ++PTRx along with all the other variants of pre/post inc/dec by 1.
I will post more later about RDQUAD/WRQUAD, and the cached RDxxxxC instructions.
I (too) overlooked the fact that you can access an offset to the pointer.
RDBYTE D, PTRB[+3] - reads the hub address that PTRB+3 points to
So you can access
* an offset to the pointer
* pre-increment the pointer
* post-decrement the pointer
So, the instruction set is aiding stack and high level instructions. However, a fast block load can still be achieved, but it must be remembered either that the first access should not pre-increment, or that the pointer must first be decremented.
I would have preferred the options of none/pre/post increment/decrement but realise there is a lack of bits. However, I am quite happy with what has been discussed.
I don't like the idea of hiding what is actually happening from the programmer. Either stick with the PINx notation, or split the registers like in Prop 1. To me it is most important that things be clear.
Also, using the mapped registers as storage when not doing I/O is not really something that needs to be retained, especially given the tons of extra storage available now.
I'm thinking of a case like a 96 channel logic analyzer ... PPLA2 for example or a stand-alone LA product. A logic analyzer needs simultaneous sampling. With that Propeller 2 would make a very good multi-channel Logic Analyzer processor
A logic tester might want many pins asserted at once (not a popular thing these days).
If ground bounce is really a problem, then that's fine.
Cluso99,
The PTRx stuff already supports all the combinations. Pre/Post/None & Inc/Dec. There is a bit to select the pointer (A or , a bit to indicate if it's should modify the pointer or not, and a bit to select between pre and post when modifying. The range is signed -16 to +15.
You can do PTRx[++n], PTRx[n++], PTRx, or PTRx[n--] and it will modify the pointer. Also, you can do PTRx[+n] or PTRx[-n] and it will not modify the pointer.
The examples in my original post were not exhaustive.
I guess need to be a bit more careful posting these descriptions. To be sure that you all properly understand how they work.
I am in total agreement with Phil's comments here. No need to have separate INx and OUTx locations, just make the compiler do it properly to avoid coding errors.
I don't like the idea of hiding what is actually happening from the programmer. Either stick with the PINx notation, or split the registers like in Prop 1. To me it is most important that things be clear.
Since they'd both wind up as the same opcode, why not have both? Let the programmer use PINx if he understands how it works and use INx/OUTx if he wants some hand-holding from the compiler.
I don't like the idea of hiding what is actually happening from the programmer. Either stick with the PINx notation, or split the registers like in Prop 1. To me it is most important that things be clear.
So I suppose you'd also like to get rid of the test instruction, too, and force people to use and with nr in order not to hide the architectural details? Come on, Roy. There are a ton of details that are hidden in the Prop now that make programs more readable and reliable. By deprecating inx and outx and requiring the use of pinx, users are forced to remember rules which, if they're the tiniest bit obscure, will lead to programming errors and tech support headaches for Parallax. The idea is to make PASM programming productive, not ivory-tower pure, and to make programs say what they mean. Remember: pinx is not a register; but inx and outxare registers -- different registers that just happen to occupy the same address.
Conversely, the consequences of separating inx and outx to occupy different addresses are dire, since even more SFR addresses get gobbled up, forcing four more SFRS into the special-access shadows where it's less convenient to use them.
Phil,
I think it goes a bit farther than test actually being and with nr. In the end, I will trust Chip to do what he feels is best. I think if it was up to me and I had to decide right now, I would split them like the Prop1. Only because it keeps things the same as before and is a bit more clear. I believe that right now the Prop2 only has 8 mapped registers, so it would go to 12 mapped registers, and that still 4 less than the Prop1.
However, in my Prop coding so far, I have yet to use the I/O pins in a way that would not work with it just being PINx like Chip has it on the Prop2. I've never had a situation where I needed to read back from OUTA, or write to INA. I tend to read INA into another variable, and then do tests on that variable. and likewise I tend to build my output into a variable and then write that to OUTA. So it would work just fine with the PINx stuff.
I believe that right now the Prop2 only has 8 mapped registers...
It must have more than that. The essentials for being mapped are dira - dird, pina - pind, frqa, frqb, probably par, and (I would hope) cnt. That's twelve already. (If there aren't a full sixteen, I would have to question -- again -- why phsa and phsb couldn't be mapped here.)
I agree with you that ina and outa are used so differently that it's unnecessary for them occupy different addresses. I just think it's a cruel trick to make programmers (and those who read their programs) remember the context rules for some illusory "pinx", when the assembler could easily handle references to, and flag misuse of, inx and outx.
Phil,
The registers cnt, frqa and frqb are not mapped, and par doesn't exist anymore, it's functionality is replaced by ptra and ptrb (which are not mapped).
I am glad that we are getting a clearer picture of what the instruction set and functionality of the Propeller II will be. I have large amounts of Propeller II code that I am rewriting due to the updated information. I am glad that many things are a lot simpler with the Prop II.
So cnt is not mapped? How are we supposed to do this?
neg duration,cnt
...
add duration,cnt
And why on earth exclude frqx from the memory mapping when it's so handy to do math operations on them? I don't get it. I hope Chip can explain his rationale.
Phil,
Is it really that bad to do a GETxxxx instruction to retrieve the register value, then do the math you want and finally do a SETxxxx instruction to update the register if needed?
Yes, it really is that bad when you're trying to eke out as many nanoseconds of accuracy or real-time response as possible -- especially if, like you say, there are unoccupied slots in the former 16-long SFR address space going to waste. Besides, with only 496 instruction words available, needing three instructions to do what used to take one is a rather large hit. It seems that the elegant microcontroller whose counters I've loved to use (and abuse) is turning into a rather-more-difficult-to-hack microprocessor, to the delight of the big-iron folks, but not to me.
Roy,
It slows things down in situations likely to need very high speed. Sure the Prop II will be faster, but requiring multiple instructions to do what was doable in a single instruction before will lose some of the effective speed of the Prop II.
With the instructions being effectively 1 clock, and the increased clock rate, you'll have significantly better accuracy or real-time response than you have on the Prop 1, even with the extra instructions needed to access the unmapped registers.
I, honestly, don't have a strong opinion either way on this. I'd be fine with more mapped registers, or even less.
I still wait for descriptive instruction set - But if I understand Counters will have now some extra functions TYPE auto reload and some other possibilitys that help much even if some times it need be reloaded.
With the instructions being effectively 1 clock, and the increased clock rate, you'll have significantly better accuracy or real-time response than you have on the Prop 1, even with the extra instructions needed to access the unmapped registers.
I, honestly, don't have a strong opinion either way on this. I'd be fine with more mapped registers, or even less.
With the instructions being effectively 1 clock, and the increased clock rate, you'll have significantly better accuracy or real-time response than you have on the Prop 1, even with the extra instructions needed to access the unmapped registers.
True. But if it could be faster yet -- by a factor of three -- and not put additional burden on the limited space for instructions, returning the SFRs to the cog's memory space is a no-brainer. I can see no justification for hiding these registers if the memory space is available to expose them.
With the instructions being effectively 1 clock, and the increased clock rate, you'll have significantly better accuracy or real-time response than you have on the Prop 1, even with the extra instructions needed to access the unmapped registers.
I, honestly, don't have a strong opinion either way on this. I'd be fine with more mapped registers, or even less.
Yes, it really is that bad when you're trying to eke out as many nanoseconds of accuracy or real-time response as possible -- especially if, like you say, there are unoccupied slots in the former 16-long SFR address space going to waste. Besides, with only 496 instruction words available, needing three instructions to do what used to take one is a rather large hit. It seems that the elegant microcontroller whose counters I've loved to use (and abuse) is turning into a rather-more-difficult-to-hack microprocessor, to the delight of the big-iron folks, but not to me.
-Phil
Rather than trade-off here, why not do what many uC aleady do, and allow a run-time selection of exactly what is mapped ?
Speed and Size are very important given the code memory map size.
{ which is also why I'm amazed there seems to be no zero overhead loop support }
If we are going to have quad word read/write access to hub mem, this means that at effectively 40MIPS we will have 192K of code memory. So why are we so worried about this? If we have the registers back we still have a huge code space for most things, as very few things need that much speed, and it is proven that we can do it.
Comments
Hi Roy,
Is there any chance of uploading a PDF file with all of the instruction descriptions in the same place? It's great to read about them a few at a time but it would be nice to have a consolidated reference.
Thanks,
David
Here are the bit fields for the RDxxxx and WRxxxx stuff: For these instructions the top bit of the source field is always 1, so you only have 8 bits to work with, and that's why the range is one bit less than you expected. E is for effects (z,c,r) as in the WZ, WC, WR, etc stuff. I am not certain what N is, perhaps Chip will chime in.
David,
I don't have a complete file of all this yet. I am composing the information from notes I took while talking with Chip about it. Once I have all the stuff composed, I'll make a single file with all of it.
Also, I will be starting a different thread for these instruction descriptions.
I suspect 'S' selects PTRA or PTRB
Don't know U & P..... Hmm..
Maybe U = Update the pointer
and if P selects between pre- or post- update, it all makes sense - including the offset range!
It's making sense now, if the above is correct.
The seriously limited primary code memory, makes the usual loop-unrolling rather less an option than on most other controllers.
That's not really an opcode level problem ?
If it was important enough to include, it could be done the same way 8 bit uC manage 16b writes, with a simple flag queue - writes go into buffers until the last one, which transfers everything.
Where would someone need to update all pins at once ? { buying a ground bounce worst case }
I can see that pin swap freedom on ports would allow tighter PCB designs, but users might not want ALL port writes queued, and now that gets more complex...
I am in total agreement with Phil's comments here. No need to have separate INx and OUTx locations, just make the compiler do it properly to avoid coding errors.
Now what makes more sense is to use the compiler to reposition the source and destination registers according to what is really being done, not how the propII actually formats the instructions.
WRxxxx hubptr, [#]data
RDxxxx cogdata, [#]hubptr <
reversed to the current compiler, but makes more sense to a programmer
mov data, INx
mov OUTx,[#]data
we could also have (if the instruction set permits, which seems to be the case). The compiler just handles it.
test INx,[#]data
test OUTx,[#]data
The only issue I see in having the compiler placing the source/destination in the appropriate bits is that the MOVS and MOVD instructions become a little more obscure, but I think this is outweighed by the normal programming structure.
I definately do not want our nice, powerful, and fancy new instructions make the PropII instruction set be so complex that it looks like a dogs breakfast (as is the case in some other micros). In fact, I think it would be nice for the Prop II to have two lists of instructions, one being the "regular" normally used instructions and the others being the "super" instructions for complex operations. In other words, hide the complexities from those who don't care.
What do others think???
I (too) overlooked the fact that you can access an offset to the pointer.
So you can access
* an offset to the pointer
* pre-increment the pointer
* post-decrement the pointer
So, the instruction set is aiding stack and high level instructions. However, a fast block load can still be achieved, but it must be remembered either that the first access should not pre-increment, or that the pointer must first be decremented.
I would have preferred the options of none/pre/post increment/decrement but realise there is a lack of bits. However, I am quite happy with what has been discussed.
Also, using the mapped registers as storage when not doing I/O is not really something that needs to be retained, especially given the tons of extra storage available now.
I'm thinking of a case like a 96 channel logic analyzer ... PPLA2 for example or a stand-alone LA product. A logic analyzer needs simultaneous sampling. With that Propeller 2 would make a very good multi-channel Logic Analyzer processor
A logic tester might want many pins asserted at once (not a popular thing these days).
If ground bounce is really a problem, then that's fine.
Since we have pre-increment and post-decrement and none, could we also have post increment???
The PTRx stuff already supports all the combinations. Pre/Post/None & Inc/Dec. There is a bit to select the pointer (A or , a bit to indicate if it's should modify the pointer or not, and a bit to select between pre and post when modifying. The range is signed -16 to +15.
You can do PTRx[++n], PTRx[n++], PTRx, or PTRx[n--] and it will modify the pointer. Also, you can do PTRx[+n] or PTRx[-n] and it will not modify the pointer.
The examples in my original post were not exhaustive.
I guess need to be a bit more careful posting these descriptions. To be sure that you all properly understand how they work.
Since they'd both wind up as the same opcode, why not have both? Let the programmer use PINx if he understands how it works and use INx/OUTx if he wants some hand-holding from the compiler.
Conversely, the consequences of separating inx and outx to occupy different addresses are dire, since even more SFR addresses get gobbled up, forcing four more SFRS into the special-access shadows where it's less convenient to use them.
-Phil
I think it goes a bit farther than test actually being and with nr. In the end, I will trust Chip to do what he feels is best. I think if it was up to me and I had to decide right now, I would split them like the Prop1. Only because it keeps things the same as before and is a bit more clear. I believe that right now the Prop2 only has 8 mapped registers, so it would go to 12 mapped registers, and that still 4 less than the Prop1.
However, in my Prop coding so far, I have yet to use the I/O pins in a way that would not work with it just being PINx like Chip has it on the Prop2. I've never had a situation where I needed to read back from OUTA, or write to INA. I tend to read INA into another variable, and then do tests on that variable. and likewise I tend to build my output into a variable and then write that to OUTA. So it would work just fine with the PINx stuff.
Roy
I agree with you that ina and outa are used so differently that it's unnecessary for them occupy different addresses. I just think it's a cruel trick to make programmers (and those who read their programs) remember the context rules for some illusory "pinx", when the assembler could easily handle references to, and flag misuse of, inx and outx.
-Phil
The registers cnt, frqa and frqb are not mapped, and par doesn't exist anymore, it's functionality is replaced by ptra and ptrb (which are not mapped).
Question to them that will have mapped registers.
Why have that IF instructions to handle hidden registers to them have same possibilitys that any Mapped ones AND free space for usable programs?
In short term - Why so much talk on Mapped registers -- My standpoint -- As litle that ones as it is ever possible
Thank you Roy.
And why on earth exclude frqx from the memory mapping when it's so handy to do math operations on them? I don't get it. I hope Chip can explain his rationale.
-Phil
Is it really that bad to do a GETxxxx instruction to retrieve the register value, then do the math you want and finally do a SETxxxx instruction to update the register if needed?
-Phil
It slows things down in situations likely to need very high speed. Sure the Prop II will be faster, but requiring multiple instructions to do what was doable in a single instruction before will lose some of the effective speed of the Prop II.
I, honestly, don't have a strong opinion either way on this. I'd be fine with more mapped registers, or even less.
I still wait for descriptive instruction set - But if I understand Counters will have now some extra functions TYPE auto reload and some other possibilitys that help much even if some times it need be reloaded.
-Phil
Phil and Mike have a good point. 3 instructions Vs. 1, often in more than one place, uses far more cog memory then putting those register back.
My single cog hirez text driver have a patch from kuroneko that uses that style of manipulation.
Rather than trade-off here, why not do what many uC aleady do, and allow a run-time selection of exactly what is mapped ?
Speed and Size are very important given the code memory map size.
{ which is also why I'm amazed there seems to be no zero overhead loop support }