There is an extra fuse (#65) which will disable any code reading the fuse bits. This will be necessary in any decryption method AFAIK.
The hub ROM is a separate block but is very small (IIRC I saw 128 longs mentioned). I am unsure how it is being mapped, so it may just replace a hole in hub ram.
Cog RAM space is actually larger by 16 longs (was 496, now 512) because the registers are now accessable by special instructions. This is how we can get more instructions too. While 16 does not sound like much, it is another 3%. And we have some restricted access to the CLUT 128x32 if not used in video.
Cog RAM space is actually larger by 16 longs (was 496, now 512) because the registers are now accessable by special instructions.
Are you sure about that? I knew that some SFRs had to be remapped, since so many were added. If they've all been remapped, that's very bad news, indeed, since it precludes things like:
add frqa,#8
test mask,ina wz
for example.
Several months ago, I pleaded with Chip to keep the phsx registers exposed -- to no avail -- but I thought that, surely, the frqx and port registers registers would be spared. This really sucks if it's true.
Phil is right some registers are still mapped at the end of the cog space (mostly it's the in, out, and dir registers for the 4 ports). However, there are new instructions for manipulating some of the non-mapped registers. Like the phase ones have an instruction to add to it.
It'll all be more clear when I get some posts up later tonight and tomorrow.
Thanks, Roy. That is a relief. However, I was under the impression from Chip that the in and out registers would be combined, depending on src/dst use, to make room in the address space for others. Was this plan abandoned?
Thanks, Roy. That is a relief. However, I was under the impression from Chip that the in and out registers would be combined, depending on src/dst use, to make room in the address space for others. Was this plan abandoned?
-Phil
Phil, it's true that the PHSx registers are written and read via instructions now, but this allows for some nice things. For example, aside from SETPHSx and GETPHSx, there are two others:
GETPHZx - gets PHSx into D and clears PHSx. It doesn't exactly clear PHSx, though - it only writes to PHSx what was about to be added into it, excluding the present accumulation. This gets rid of a multi-instruction sequence that is currently needed on the Propeller to compute the delta since the last reading. This all happens in one clock now.
ADDPHSx - adds D into PHSx, along with whatever else was getting added into PHSx in the same clock. This currently takes a multi-instruction sequence to accomplish on the Propeller, since ongoing FRQ accumulations must be accounted for.
About the INx and OUTx register pairs being merged into a single "PINx" register:
For an instruction that doesn't write to PINx, but uses PINx as D and/or S, the IN port is always selected for D and/or S. For an instruction that writes to PINx, the OUT port is read for D and/or S. Here are some examples:
OR PINA,#1 'ORs #$00000001 into what is currently OUTA.
TEST PINA,#1 'TESTs what is currently INA with #$00000001.
TEST mask,PINA 'TESTs mask with what is currently INA.
ADD PINA,PINA 'ADDs OUTA to OUTA
So you can see that there is no danger of latching INx bits into OUTx bits on a read-modify-write instruction to PINx. I've been writing code for PINx for a while now and have not run into a situation where this arrangement was stifling in any way. What it does is makes 4 more cog registers available for code and data.
First there are two new registers, PTRA & PTRB. They get initialized (both to the same value) via COGINIT and replace the old PAR register, but add a whole new set of functionality.
Here are the instructions for directly manipulating the PTRx registers:
They are all pretty self explanatory. However, where they become much more interestig is when you use them with and hub access (RDxxxx/WRxxxx).
On all hub memory read instructions, S can be replaced with a PTRA or PTRB. Also for hub writes, D can be replaced with PTRA or PTRB. There are several ways to use the PTRx registers with hub access instructions.
Note: The range for the constants inside the []'s below is -16 to +15 (5 bits signed) However, this value is scaled by the size of the RD or WR you are doing. 1 for RD/WRBYTE, 2 for RD/WRWORD, 4 for RD/WRLONG, and 16 for RD/WRQUAD.
RDLONG D, PTRA - reads the hub address that PTRA points to
RDBYTE D, PTRB[+3] - reads the hub address that PTRB+3 points to
RDLONG D, PTRA[++2] - pre-increment PTRA by 2 longs (so actually adding 8 to PTRA), then read the hub address that PTRA now points to
RDLONG D, PTRB[6--] - reads the hub address that PTRB points to, then post decrement PTRB by 6 longs (so actually subtracts 24 from PTRB)
WRLONG D, PTRB - writes to the hub address that PTRB points to
WRWORD D, PRTA[++4] - pre increments PTRA by 4 words (advancing PTRA by 8), then writes to the hub address that PTRA now points to
These work with all forms of RDxxxx & WRxxxx, including the new RDQUAD/WRQUAD, and the cached read versions of RDxxxxC. Something else to note is that when doing RD/WR instructions with a constant, like RDLONG D, #xxx, the range of xxx is now limited 0 to 255 instead of 0 to 511. That upper bit is used to indicate using PTRx stuff.
Also, the compiler/assembler will be able to shorthand PTRx[++1] to ++PTRx along with all the other variants of pre/post inc/dec by 1.
I will post more later about RDQUAD/WRQUAD, and the cached RDxxxxC instructions.
For an instruction that doesn't write to PINx, but uses PINx as D and/or S, the IN port is always selected for D and/or S. For an instruction that writes to PINx, the OUT port is read for D and/or S. Here are some examples:
OR PINA,#1 'ORs #$00000001 into what is currently OUTA.
TEST PINA,#1 'TESTs what is currently INA with #$00000001.
TEST mask,PINA 'TESTs mask with what is currently INA.
ADD PINA,PINA 'ADDs OUTA to OUTA
So how do I achieve what is now a test outa, [#]src? And it seems that e.g. movd outa, ina becomes tricky as well.
So how do I achieve what is now a test outa, [#]src? And it seems that e.g. movd outa, ina becomes tricky as well.
In the first case, you can't do it, as OUTx is never readable in a non-read-modify-write operation. Do you think you could demonstrate a case in which you would need to do an operation like this? Also, is it compelling enough that every cog should sacrifice four RAM registers to support it?
In the second case, you'd have to copy PINx into a register (to force an INx read), then use that register as the source for the MOVD PINx,register. Is there a compelling reason to do something like this?
I've never done any operations like you've expressed here, but if you could demonstrate that being able to do these operations is worth 4 cog registers, I'll separate INx and OUTx like they are on the current Propeller.
I, probably like you, really dislike programming controllers which get pin inputs and outputs mixed up into a single register. The bane of this arrangement is that inputs are always threatening to latch back into outputs on read-modify-write operations which are intended to only affect outputs. I feel like this new PINx arrangement gets around this problem while conserving register space. It's true that some things cannot be done, but what is most practical can be, without any minding. At least, this is what I've found in my own programming, so far.
Thanks for the clarification. I think I would have been more inclined to have
test pina,#1
target outa, since the test of ina is already covered by
test one,pina
I believe that implying dst == outa and src == ina would address kuroneko's question, as well. As you will recall, my examination of all the Prop I PASM code I could get my hands on revealed almost no cases where outa was used in the src of an instruction. The few exceptions I came across, IIRC, were movs, which could be handled by special access instructions.
I've never done any operations like you've expressed here, but if you could demonstrate that being able to do these operations is worth 4 cog registers, I'll separate INx and OUTx like they are on the current Propeller.
I was just being nosy. AFAIAC there is no reason to change the new design. As long as it's made clear what the implications are (e.g. what was possible before and what stops working in the next revision of the chip). I definitely used I/O registers as generic storage when the cog itself wasn't doing real I/O. Also, some combined storage and I/O was being done when cycles count.
In addition to what Phil said, assigning outa to the dst slot and ina to the src slot would look cleaner.
Thanks for the clarification. I think I would have been more inclined to have
test pina,#1
target outa, since the test of ina is already covered by
test one,pina
I believe that implying dst == outa and src == ina would address kuroneko's question, as well. As you will recall, my examination of all the Prop I PASM code I could get my hands on revealed almost no cases where outa was used in the src of an instruction. The few exceptions I came across, IIRC, were movs, which could be handled by special access instructions.
-Phil
I would think think the use of
[indent]test pina,#1[/indent]
where OUTA was used instead of INA would be a huge source of coding errors. I would expect it to compare INA with #1 even though pina is in the destination.
Unfortunately I have hardly ever used PHSx so I cannot comment on the implementation of that section.
We can't do test ina,#1 now. I don't see where having that restriction propagate to the Prop II makes things any worse. Plus, it's a much easier rule to remember than making it depend on the state of the wr bit. Especially confusing would be
or pina,#1 nr,wz
if pina were dependent on the wr bit. I remain steadfast in my belief that the register addressed should be solely dependent on whether it's the source or it's the destination.
Cluso99, I forgot that detail in my post, I have since gone back and edited it. It's like you describe. I just missed that in my notes as I was typing up the message. Sorry about that. (I also had the WRxxxx parameter order messed up).
If at all possible could you clear out the incorrect quoted stuff in your reply (since my post is now edited), I'd like to avoid people getting the wrong info if they read that quoted bit instead of my edited message.
RDQUAD and WRQUAD work a little differently than the normal RD/WR instructions, so they are not really as handy for reading larges amounts of data into the cog. The cached reads will achieve the best performance for that. The RD/WRQUAD stuff is really more geared towards LMM style stuff or other VMs.
It is much more relevant to what a user would EXPECT, and not what detailed instruction says. You do not want to have to lookup the instruction set everytime you program a prop or you would give it a miss as being a nightmare to program. SImply put, if you are comparing a pin, then you expect it to be the input to that pin, not what you output. While it is nice to have other ways to check the OUTA, it should not be at the expense of expectations (cannot think of a better way to put it).
Now, having said that, there is nothing wrong with the compiler taking care of it as follows...
test INA,#1 '\\ same as next instruction
test PINA,#1 '// produces the same op as above
test OUTA,#19
I would be in favor of not having pina be a reserved word at all and just using outa and ina. In such a case, both of the following would be flagged as errors, even though they both refer to the unspeakable pina.
test ina,#1
mov x,outa
That way, the user can never mistake the intent of the instruction.
While it is nice to have other ways to check the OUTA, it should not be at the expense of expectations (cannot think of a better way to put it).
Absolutely, the Principal of Least Astonishment.
So far the Prop instruction set has been very good at this. The instructions are very regular in their behavior. Once you have learned how a few of them go you can guess how the rest will go with a good degree of accuracy. There aren't any odd cases that surprise you. This makes learning PASM a breeze and coding in PASM a pleasure.
I haven't thought to hard about what I'm seeing here yet but I hope the Prop II continues in this tradition of regularity and least astonishment.
Looks good, but I do have a couple of questions: (no doubt to just waking up and not having my coffee yet this morning)
if I read this correctly,
- would "add pinx,reg" still work the same as the current "add outa,reg"? (I build fast external parallel counters with this, assuming the correct bits were set for output)
- would "mov reg,pinx" still give me the current state of the pins, regardless of their being inputs or outputs? (i would assume so, just checking)
- will "wrlong ptr, pinx" now write the pins to the hub? (current state, regardless of input and output)
- will "rdlong pinx,ptr" now write the hub data to the pins? (assuming configured for output)
I am verifying the above as they affect my external memory interfaces for Prop2
Phil, it's true that the PHSx registers are written and read via instructions now, but this allows for some nice things. For example, aside from SETPHSx and GETPHSx, there are two others:
GETPHZx - gets PHSx into D and clears PHSx. It doesn't exactly clear PHSx, though - it only writes to PHSx what was about to be added into it, excluding the present accumulation. This gets rid of a multi-instruction sequence that is currently needed on the Propeller to compute the delta since the last reading. This all happens in one clock now.
ADDPHSx - adds D into PHSx, along with whatever else was getting added into PHSx in the same clock. This currently takes a multi-instruction sequence to accomplish on the Propeller, since ongoing FRQ accumulations must be accounted for.
About the INx and OUTx register pairs being merged into a single "PINx" register:
For an instruction that doesn't write to PINx, but uses PINx as D and/or S, the IN port is always selected for D and/or S. For an instruction that writes to PINx, the OUT port is read for D and/or S. Here are some examples:
OR PINA,#1 'ORs #$00000001 into what is currently OUTA.
TEST PINA,#1 'TESTs what is currently INA with #$00000001.
TEST mask,PINA 'TESTs mask with what is currently INA.
ADD PINA,PINA 'ADDs OUTA to OUTA
So you can see that there is no danger of latching INx bits into OUTx bits on a read-modify-write instruction to PINx. I've been writing code for PINx for a while now and have not run into a situation where this arrangement was stifling in any way. What it does is makes 4 more cog registers available for code and data.
RDQUAD and WRQUAD work a little differently than the normal RD/WR instructions, so they are not really as handy for reading larges amounts of data into the cog. The cached reads will achieve the best performance for that. The RD/WRQUAD stuff is really more geared towards LMM style stuff or other VMs.
Roy
Please clarify about the cached reads.
Q2: Is Syncing between hub and cog still the programmers chore or are there now new and better methods of syncing? More importantly, in PASAM, has the methodology changed due to the cached reads ?
Q3: About the cog to cog port / pipeline. Is this a set of registers (longs?) that are common between the cogs?
I'm sorry if the questions are repeated or out of "sync" with the current sub-subject.
Looks good, but I do have a couple of questions: (no doubt to just waking up and not having my coffee yet this morning)
if I read this correctly,
- would "add pinx,reg" still work the same as the current "add outa,reg"? (I build fast external parallel counters with this, assuming the correct bits were set for output)
- would "mov reg,pinx" still give me the current state of the pins, regardless of their being inputs or outputs? (i would assume so, just checking)
- will "wrlong ptr, pinx" now write the pins to the hub? (current state, regardless of input and output)
- will "rdlong pinx,ptr" now write the hub data to the pins? (assuming configured for output)
I am verifying the above as they affect my external memory interfaces for Prop2
Everything would work as you anticipate. You've got the RDLONG's operands backwards, though.
Maybe these are silly questions. I'll check that later
I suppose pinx means one of pina, pinb, pinc, pind.
Do these pinx hub "incantations" work?
wrlong pinx, hubadr ' write value from input pins to some hub address
rdlong pinx, hubadr ' read value from some hub address to output pins
Since "add pinx, pinx" sums the output can I assume pinx is always
a reflection practically speaking of what is on the physical pins?
There must be some delayed storage in there for add pinx, pinx to work.
These operations seem obvious:
mov pinx, var ' put value of var onto pinx output
mov var, pinx ' read value of pinx input to var
What happens if we want to read or write all pins at once?
I suppose there is no such mov or other on cog instruction.
Can something like "rdquad pins, hubptr" read all pins at once?
I still question the use of pinx as a reserved word, because of the confusion it can cause. It would be better to force programmers to write inx and outx and to flag errors when they are used inappropriately. Programs would not only be more reliable, but also more readable, since the syntax would correlate with the intent. If this were done, I think something like test ina,#1 would be a little less objectionable, since the intent is much clearer from the syntax than test pina,#1. In any event, I see no reason to go back to having inx and outx occupy separate addresses.
Bill,
I'm confused. You seem to be asking for it to work exactly how it works (except the range is -16 to 15 not -32 to 31).
RDLONG D, PTRA[+4] <-- does not change PTRA, just uses PTRA + 4 (longs) as the address
RDLONG D, PTRA[++4] <-- does modify PTRA, pre incrementing by 4 longs, then reading from that address
Is it that you want it to not do the scaling when it's not modifying the pointer? Or what?
Comments
There is an extra fuse (#65) which will disable any code reading the fuse bits. This will be necessary in any decryption method AFAIK.
The hub ROM is a separate block but is very small (IIRC I saw 128 longs mentioned). I am unsure how it is being mapped, so it may just replace a hole in hub ram.
Cog RAM space is actually larger by 16 longs (was 496, now 512) because the registers are now accessable by special instructions. This is how we can get more instructions too. While 16 does not sound like much, it is another 3%. And we have some restricted access to the CLUT 128x32 if not used in video.
test mask,ina wz
for example.
Several months ago, I pleaded with Chip to keep the phsx registers exposed -- to no avail -- but I thought that, surely, the frqx and port registers registers would be spared. This really sucks if it's true.
-Phil
It'll all be more clear when I get some posts up later tonight and tomorrow.
-Phil
Phil, it's true that the PHSx registers are written and read via instructions now, but this allows for some nice things. For example, aside from SETPHSx and GETPHSx, there are two others:
GETPHZx - gets PHSx into D and clears PHSx. It doesn't exactly clear PHSx, though - it only writes to PHSx what was about to be added into it, excluding the present accumulation. This gets rid of a multi-instruction sequence that is currently needed on the Propeller to compute the delta since the last reading. This all happens in one clock now.
ADDPHSx - adds D into PHSx, along with whatever else was getting added into PHSx in the same clock. This currently takes a multi-instruction sequence to accomplish on the Propeller, since ongoing FRQ accumulations must be accounted for.
About the INx and OUTx register pairs being merged into a single "PINx" register:
For an instruction that doesn't write to PINx, but uses PINx as D and/or S, the IN port is always selected for D and/or S. For an instruction that writes to PINx, the OUT port is read for D and/or S. Here are some examples:
OR PINA,#1 'ORs #$00000001 into what is currently OUTA.
TEST PINA,#1 'TESTs what is currently INA with #$00000001.
TEST mask,PINA 'TESTs mask with what is currently INA.
ADD PINA,PINA 'ADDs OUTA to OUTA
So you can see that there is no danger of latching INx bits into OUTx bits on a read-modify-write instruction to PINx. I've been writing code for PINx for a while now and have not run into a situation where this arrangement was stifling in any way. What it does is makes 4 more cog registers available for code and data.
First there are two new registers, PTRA & PTRB. They get initialized (both to the same value) via COGINIT and replace the old PAR register, but add a whole new set of functionality.
Here are the instructions for directly manipulating the PTRx registers:
They are all pretty self explanatory. However, where they become much more interestig is when you use them with and hub access (RDxxxx/WRxxxx).
On all hub memory read instructions, S can be replaced with a PTRA or PTRB. Also for hub writes, D can be replaced with PTRA or PTRB. There are several ways to use the PTRx registers with hub access instructions. These work with all forms of RDxxxx & WRxxxx, including the new RDQUAD/WRQUAD, and the cached read versions of RDxxxxC. Something else to note is that when doing RD/WR instructions with a constant, like RDLONG D, #xxx, the range of xxx is now limited 0 to 255 instead of 0 to 511. That upper bit is used to indicate using PTRx stuff.
Also, the compiler/assembler will be able to shorthand PTRx[++1] to ++PTRx along with all the other variants of pre/post inc/dec by 1.
I will post more later about RDQUAD/WRQUAD, and the cached RDxxxxC instructions.
In the first case, you can't do it, as OUTx is never readable in a non-read-modify-write operation. Do you think you could demonstrate a case in which you would need to do an operation like this? Also, is it compelling enough that every cog should sacrifice four RAM registers to support it?
In the second case, you'd have to copy PINx into a register (to force an INx read), then use that register as the source for the MOVD PINx,register. Is there a compelling reason to do something like this?
I've never done any operations like you've expressed here, but if you could demonstrate that being able to do these operations is worth 4 cog registers, I'll separate INx and OUTx like they are on the current Propeller.
I, probably like you, really dislike programming controllers which get pin inputs and outputs mixed up into a single register. The bane of this arrangement is that inputs are always threatening to latch back into outputs on read-modify-write operations which are intended to only affect outputs. I feel like this new PINx arrangement gets around this problem while conserving register space. It's true that some things cannot be done, but what is most practical can be, without any minding. At least, this is what I've found in my own programming, so far.
Thanks for the clarification. I think I would have been more inclined to have
target outa, since the test of ina is already covered by
I believe that implying dst == outa and src == ina would address kuroneko's question, as well. As you will recall, my examination of all the Prop I PASM code I could get my hands on revealed almost no cases where outa was used in the src of an instruction. The few exceptions I came across, IIRC, were movs, which could be handled by special access instructions.
-Phil
In addition to what Phil said, assigning outa to the dst slot and ina to the src slot would look cleaner.
I would think think the use of where OUTA was used instead of INA would be a huge source of coding errors. I would expect it to compare INA with #1 even though pina is in the destination.
Unfortunately I have hardly ever used PHSx so I cannot comment on the implementation of that section.
We can't do test ina,#1 now. I don't see where having that restriction propagate to the Prop II makes things any worse. Plus, it's a much easier rule to remember than making it depend on the state of the wr bit. Especially confusing would be
if pina were dependent on the wr bit. I remain steadfast in my belief that the register addressed should be solely dependent on whether it's the source or it's the destination.
-Phil
If at all possible could you clear out the incorrect quoted stuff in your reply (since my post is now edited), I'd like to avoid people getting the wrong info if they read that quoted bit instead of my edited message.
RDQUAD and WRQUAD work a little differently than the normal RD/WR instructions, so they are not really as handy for reading larges amounts of data into the cog. The cached reads will achieve the best performance for that. The RD/WRQUAD stuff is really more geared towards LMM style stuff or other VMs.
Roy
It is much more relevant to what a user would EXPECT, and not what detailed instruction says. You do not want to have to lookup the instruction set everytime you program a prop or you would give it a miss as being a nightmare to program. SImply put, if you are comparing a pin, then you expect it to be the input to that pin, not what you output. While it is nice to have other ways to check the OUTA, it should not be at the expense of expectations (cannot think of a better way to put it).
Now, having said that, there is nothing wrong with the compiler taking care of it as follows...
That way, the user can never mistake the intent of the instruction.
-Phil
Absolutely, the Principal of Least Astonishment.
So far the Prop instruction set has been very good at this. The instructions are very regular in their behavior. Once you have learned how a few of them go you can guess how the rest will go with a good degree of accuracy. There aren't any odd cases that surprise you. This makes learning PASM a breeze and coding in PASM a pleasure.
I haven't thought to hard about what I'm seeing here yet but I hope the Prop II continues in this tradition of regularity and least astonishment.
http://en.wikipedia.org/wiki/Principle_of_least_astonishment
if I read this correctly,
- would "add pinx,reg" still work the same as the current "add outa,reg"? (I build fast external parallel counters with this, assuming the correct bits were set for output)
- would "mov reg,pinx" still give me the current state of the pins, regardless of their being inputs or outputs? (i would assume so, just checking)
- will "wrlong ptr, pinx" now write the pins to the hub? (current state, regardless of input and output)
- will "rdlong pinx,ptr" now write the hub data to the pins? (assuming configured for output)
I am verifying the above as they affect my external memory interfaces for Prop2
Thank you for the very detailed pre-education of PASAM! I am truely loving it.
Q1: Please clarify about the cached reads.
Q2: Is Syncing between hub and cog still the programmers chore or are there now new and better methods of syncing? More importantly, in PASAM, has the methodology changed due to the cached reads ?
Q3: About the cog to cog port / pipeline. Is this a set of registers (longs?) that are common between the cogs?
I'm sorry if the questions are repeated or out of "sync" with the current sub-subject.
KK
Thank you for the docs!
1) What do E and N stand for below? (may be a dumb question, but I have not had coffee yet)
2) Why is the range -16..15 for offsets? I would have expected -32..31 unless you are using two bits to select pointers (ie four index registers)
If I remember correctly:
- if bit8 is 0, it is absolute hub addressing using b7-b0,
- if bit8 is 1, it uses new pointer modes
Could you please post bit-level encoding of the SRC field?
Everything would work as you anticipate. You've got the RDLONG's operands backwards, though.
It is still not entire Instructions set presented -->
BUT can I find that instructions in this list.
Look attached PDF
I suppose pinx means one of pina, pinb, pinc, pind.
Do these pinx hub "incantations" work? Since "add pinx, pinx" sums the output can I assume pinx is always
a reflection practically speaking of what is on the physical pins?
There must be some delayed storage in there for add pinx, pinx to work.
These operations seem obvious: What happens if we want to read or write all pins at once?
I suppose there is no such mov or other on cog instruction.
Can something like "rdquad pins, hubptr" read all pins at once?
I still question the use of pinx as a reserved word, because of the confusion it can cause. It would be better to force programmers to write inx and outx and to flag errors when they are used inappropriately. Programs would not only be more reliable, but also more readable, since the syntax would correlate with the intent. If this were done, I think something like test ina,#1 would be a little less objectionable, since the intent is much clearer from the syntax than test pina,#1. In any event, I see no reason to go back to having inx and outx occupy separate addresses.
-Phil
I'm confused. You seem to be asking for it to work exactly how it works (except the range is -16 to 15 not -32 to 31).
RDLONG D, PTRA[+4] <-- does not change PTRA, just uses PTRA + 4 (longs) as the address
RDLONG D, PTRA[++4] <-- does modify PTRA, pre incrementing by 4 longs, then reading from that address
Is it that you want it to not do the scaling when it's not modifying the pointer? Or what?
Could you please post the bit-level encoding for the 'src' field for these instructions?
Thanks,
Bill