Propeller II update - BLOG

Cluso99 · 2011-05-26 17:18

Just to correct a few minor things...

There is an extra fuse (#65) which will disable any code reading the fuse bits. This will be necessary in any decryption method AFAIK.

The hub ROM is a separate block but is very small (IIRC I saw 128 longs mentioned). I am unsure how it is being mapped, so it may just replace a hole in hub ram.

Cog RAM space is actually larger by 16 longs (was 496, now 512) because the registers are now accessable by special instructions. This is how we can get more instructions too. While 16 does not sound like much, it is another 3%. And we have some restricted access to the CLUT 128x32 if not used in video.

Phil Pilgrim (PhiPi) · 2011-05-26 17:45

Cluso99 wrote:

Cog RAM space is actually larger by 16 longs (was 496, now 512) because the registers are now accessable by special instructions.

Are you sure about that? I knew that some SFRs had to be remapped, since so many were added. If they've all been remapped, that's very bad news, indeed, since it precludes things like:

add frqa,#8
test mask,ina wz

for example.

Several months ago, I pleaded with Chip to keep the phsx registers exposed -- to no avail -- but I thought that, surely, the frqx and port registers registers would be spared. This really sucks if it's true.

-Phil

Roy Eltham · 2011-05-26 18:02

Phil is right some registers are still mapped at the end of the cog space (mostly it's the in, out, and dir registers for the 4 ports). However, there are new instructions for manipulating some of the non-mapped registers. Like the phase ones have an instruction to add to it.

It'll all be more clear when I get some posts up later tonight and tomorrow.

Phil Pilgrim (PhiPi) · 2011-05-26 18:39

Thanks, Roy. That is a relief. However, I was under the impression from Chip that the in and out registers would be combined, depending on src/dst use, to make room in the address space for others. Was this plan abandoned?

-Phil

Cluso99 · 2011-05-26 21:16

Thanks for the clarification Roy. Obviously I am not up to date with the latest changes.

cgracey · 2011-05-26 22:57

Phil Pilgrim (PhiPi) wrote: »

Thanks, Roy. That is a relief. However, I was under the impression from Chip that the in and out registers would be combined, depending on src/dst use, to make room in the address space for others. Was this plan abandoned?

-Phil

Phil, it's true that the PHSx registers are written and read via instructions now, but this allows for some nice things. For example, aside from SETPHSx and GETPHSx, there are two others:

GETPHZx - gets PHSx into D and clears PHSx. It doesn't exactly clear PHSx, though - it only writes to PHSx what was about to be added into it, excluding the present accumulation. This gets rid of a multi-instruction sequence that is currently needed on the Propeller to compute the delta since the last reading. This all happens in one clock now.

ADDPHSx - adds D into PHSx, along with whatever else was getting added into PHSx in the same clock. This currently takes a multi-instruction sequence to accomplish on the Propeller, since ongoing FRQ accumulations must be accounted for.

About the INx and OUTx register pairs being merged into a single "PINx" register:

For an instruction that doesn't write to PINx, but uses PINx as D and/or S, the IN port is always selected for D and/or S. For an instruction that writes to PINx, the OUT port is read for D and/or S. Here are some examples:

OR PINA,#1 'ORs #$00000001 into what is currently OUTA.
TEST PINA,#1 'TESTs what is currently INA with #$00000001.
TEST mask,PINA 'TESTs mask with what is currently INA.
ADD PINA,PINA 'ADDs OUTA to OUTA

So you can see that there is no danger of latching INx bits into OUTx bits on a read-modify-write instruction to PINx. I've been writing code for PINx for a while now and have not run into a situation where this arrangement was stifling in any way. What it does is makes 4 more cog registers available for code and data.

Roy Eltham · 2011-05-26 23:04

New pointer registers and instructions on Prop 2:

First there are two new registers, PTRA & PTRB. They get initialized (both to the same value) via COGINIT and replace the old PAR register, but add a whole new set of functionality.

Here are the instructions for directly manipulating the PTRx registers:

000011 EEE1CCCC DDDDDDDDD 000001100     GETPTRA D
000011 EEE1CCCC DDDDDDDDD 000001101     GETPTRB D
000011 EN01CCCC nnnnnnnnn 010000100     SETPTRA D/#n
000011 EN01CCCC nnnnnnnnn 010000101     SETPTRB D/#n
000011 EN01CCCC nnnnnnnnn 010111100     ADDPTRA D/#n
000011 EN01CCCC nnnnnnnnn 010111101     SUBPTRA D/#n
000011 EN01CCCC nnnnnnnnn 010111110     ADDPTRB D/#n
000011 EN01CCCC nnnnnnnnn 010111111     SUBPTRB D/#n

They are all pretty self explanatory. However, where they become much more interestig is when you use them with and hub access (RDxxxx/WRxxxx).

On all hub memory read instructions, S can be replaced with a PTRA or PTRB. Also for hub writes, D can be replaced with PTRA or PTRB. There are several ways to use the PTRx registers with hub access instructions.

Note: The range for the constants inside the []'s below is -16 to +15 (5 bits signed) However, this value is scaled by the size of the RD or WR you are doing. 1 for RD/WRBYTE, 2 for RD/WRWORD, 4 for RD/WRLONG, and 16 for RD/WRQUAD.

RDLONG D, PTRA      - reads the hub address that PTRA points to
RDBYTE D, PTRB[+3]  - reads the hub address that PTRB+3 points to
RDLONG D, PTRA[++2] - pre-increment PTRA by 2 longs (so actually adding 8 to PTRA), then read the hub address that PTRA now points to
RDLONG D, PTRB[6--] - reads the hub address that PTRB points to, then post decrement PTRB by 6 longs (so actually subtracts 24 from PTRB)
WRLONG D, PTRB      - writes to the hub address that PTRB points to
WRWORD D, PRTA[++4] - pre increments PTRA by 4 words (advancing PTRA by 8), then writes to the hub address that PTRA now points to

These work with all forms of RDxxxx & WRxxxx, including the new RDQUAD/WRQUAD, and the cached read versions of RDxxxxC. Something else to note is that when doing RD/WR instructions with a constant, like RDLONG D, #xxx, the range of xxx is now limited 0 to 255 instead of 0 to 511. That upper bit is used to indicate using PTRx stuff.

Also, the compiler/assembler will be able to shorthand PTRx[++1] to ++PTRx along with all the other variants of pre/post inc/dec by 1.

I will post more later about RDQUAD/WRQUAD, and the cached RDxxxxC instructions.

kuroneko · 2011-05-26 23:16

cgracey wrote: »

For an instruction that doesn't write to PINx, but uses PINx as D and/or S, the IN port is always selected for D and/or S. For an instruction that writes to PINx, the OUT port is read for D and/or S. Here are some examples:

OR PINA,#1 'ORs #$00000001 into what is currently OUTA.
TEST PINA,#1 'TESTs what is currently INA with #$00000001.
TEST mask,PINA 'TESTs mask with what is currently INA.
ADD PINA,PINA 'ADDs OUTA to OUTA

So how do I achieve what is now a test outa, [#]src? And it seems that e.g. movd outa, ina becomes tricky as well.

cgracey · 2011-05-26 23:36

kuroneko wrote: »

So how do I achieve what is now a test outa, [#]src? And it seems that e.g. movd outa, ina becomes tricky as well.

In the first case, you can't do it, as OUTx is never readable in a non-read-modify-write operation. Do you think you could demonstrate a case in which you would need to do an operation like this? Also, is it compelling enough that every cog should sacrifice four RAM registers to support it?

In the second case, you'd have to copy PINx into a register (to force an INx read), then use that register as the source for the MOVD PINx,register. Is there a compelling reason to do something like this?

I've never done any operations like you've expressed here, but if you could demonstrate that being able to do these operations is worth 4 cog registers, I'll separate INx and OUTx like they are on the current Propeller.

I, probably like you, really dislike programming controllers which get pin inputs and outputs mixed up into a single register. The bane of this arrangement is that inputs are always threatening to latch back into outputs on read-modify-write operations which are intended to only affect outputs. I feel like this new PINx arrangement gets around this problem while conserving register space. It's true that some things cannot be done, but what is most practical can be, without any minding. At least, this is what I've found in my own programming, so far.

Phil Pilgrim (PhiPi) · 2011-05-26 23:43

Chip,

Thanks for the clarification. I think I would have been more inclined to have

test pina,#1

target outa, since the test of ina is already covered by

test one,pina

I believe that implying dst == outa and src == ina would address kuroneko's question, as well. As you will recall, my examination of all the Prop I PASM code I could get my hands on revealed almost no cases where outa was used in the src of an instruction. The few exceptions I came across, IIRC, were movs, which could be handled by special access instructions.

-Phil

kuroneko · 2011-05-26 23:51

cgracey wrote: »

I've never done any operations like you've expressed here, but if you could demonstrate that being able to do these operations is worth 4 cog registers, I'll separate INx and OUTx like they are on the current Propeller.

I was just being nosy. AFAIAC there is no reason to change the new design. As long as it's made clear what the implications are (e.g. what was possible before and what stops working in the next revision of the chip). I definitely used I/O registers as generic storage when the cog itself wasn't doing real I/O. Also, some combined storage and I/O was being done when cycles count.

In addition to what Phil said, assigning outa to the dst slot and ina to the src slot would look cleaner.

Cluso99 · 2011-05-27 00:06

Phil Pilgrim (PhiPi) wrote: »

Chip,

Thanks for the clarification. I think I would have been more inclined to have
test pina,#1

target outa, since the test of ina is already covered by
test one,pina

I believe that implying dst == outa and src == ina would address kuroneko's question, as well. As you will recall, my examination of all the Prop I PASM code I could get my hands on revealed almost no cases where outa was used in the src of an instruction. The few exceptions I came across, IIRC, were movs, which could be handled by special access instructions.

-Phil

I would think think the use of

[indent]test pina,#1[/indent]

where OUTA was used instead of INA would be a huge source of coding errors. I would expect it to compare INA with #1 even though pina is in the destination.

Unfortunately I have hardly ever used PHSx so I cannot comment on the implementation of that section.

kuroneko · 2011-05-27 00:12

@Cluso99: To be fair, test ina, #1 doesn't work now so no harm done

It's not that we'd have to re-educate people.

Phil Pilgrim (PhiPi) · 2011-05-27 00:16

Cluso99,

We can't do test ina,#1 now. I don't see where having that restriction propagate to the Prop II makes things any worse. Plus, it's a much easier rule to remember than making it depend on the state of the wr bit. Especially confusing would be

or pina,#1 nr,wz

if pina were dependent on the wr bit. I remain steadfast in my belief that the register addressed should be solely dependent on whether it's the source or it's the destination.

-Phil

Cluso99 · 2011-05-27 00:18

Very nice indeed.

Roy Eltham · 2011-05-27 00:27

Cluso99, I forgot that detail in my post, I have since gone back and edited it. It's like you describe. I just missed that in my notes as I was typing up the message. Sorry about that. (I also had the WRxxxx parameter order messed up).

If at all possible could you clear out the incorrect quoted stuff in your reply (since my post is now edited), I'd like to avoid people getting the wrong info if they read that quoted bit instead of my edited message.

RDQUAD and WRQUAD work a little differently than the normal RD/WR instructions, so they are not really as handy for reading larges amounts of data into the cog. The cached reads will achieve the best performance for that. The RD/WRQUAD stuff is really more geared towards LMM style stuff or other VMs.

Roy

Cluso99 · 2011-05-27 00:35

It is not relevant to what we have now.

It is much more relevant to what a user would EXPECT, and not what detailed instruction says. You do not want to have to lookup the instruction set everytime you program a prop or you would give it a miss as being a nightmare to program. SImply put, if you are comparing a pin, then you expect it to be the input to that pin, not what you output. While it is nice to have other ways to check the OUTA, it should not be at the expense of expectations (cannot think of a better way to put it).

Now, having said that, there is nothing wrong with the compiler taking care of it as follows...

test   INA,#1       '\\ same as next instruction
test   PINA,#1     '// produces the same op as above
test   OUTA,#19

Phil Pilgrim (PhiPi) · 2011-05-27 00:41

I would be in favor of not having pina be a reserved word at all and just using outa and ina. In such a case, both of the following would be flagged as errors, even though they both refer to the unspeakable pina.

    test    ina,#1
    mov     x,outa

That way, the user can never mistake the intent of the instruction.

-Phil

Cluso99 · 2011-05-27 00:46

Thanks Roy. I have deleted the contents to save confusion.

Cluso99 · 2011-05-27 00:48

Yes, agreed Phil

Removes any doubts or confusion.

Heater. · 2011-05-27 01:02

Cluso,

While it is nice to have other ways to check the OUTA, it should not be at the expense of expectations (cannot think of a better way to put it).

Absolutely, the Principal of Least Astonishment.

So far the Prop instruction set has been very good at this. The instructions are very regular in their behavior. Once you have learned how a few of them go you can guess how the rest will go with a good degree of accuracy. There aren't any odd cases that surprise you. This makes learning PASM a breeze and coding in PASM a pleasure.

I haven't thought to hard about what I'm seeing here yet but I hope the Prop II continues in this tradition of regularity and least astonishment.

http://en.wikipedia.org/wiki/Principle_of_least_astonishment

Bill Henning · 2011-05-27 05:54

Looks good, but I do have a couple of questions: (no doubt to just waking up and not having my coffee yet this morning)

if I read this correctly,

- would "add pinx,reg" still work the same as the current "add outa,reg"? (I build fast external parallel counters with this, assuming the correct bits were set for output)

- would "mov reg,pinx" still give me the current state of the pins, regardless of their being inputs or outputs? (i would assume so, just checking)

- will "wrlong ptr, pinx" now write the pins to the hub? (current state, regardless of input and output)

- will "rdlong pinx,ptr" now write the hub data to the pins? (assuming configured for output)

I am verifying the above as they affect my external memory interfaces for Prop2

cgracey wrote: »

Phil, it's true that the PHSx registers are written and read via instructions now, but this allows for some nice things. For example, aside from SETPHSx and GETPHSx, there are two others:

GETPHZx - gets PHSx into D and clears PHSx. It doesn't exactly clear PHSx, though - it only writes to PHSx what was about to be added into it, excluding the present accumulation. This gets rid of a multi-instruction sequence that is currently needed on the Propeller to compute the delta since the last reading. This all happens in one clock now.

ADDPHSx - adds D into PHSx, along with whatever else was getting added into PHSx in the same clock. This currently takes a multi-instruction sequence to accomplish on the Propeller, since ongoing FRQ accumulations must be accounted for.

About the INx and OUTx register pairs being merged into a single "PINx" register:

For an instruction that doesn't write to PINx, but uses PINx as D and/or S, the IN port is always selected for D and/or S. For an instruction that writes to PINx, the OUT port is read for D and/or S. Here are some examples:

OR PINA,#1 'ORs #$00000001 into what is currently OUTA.
TEST PINA,#1 'TESTs what is currently INA with #$00000001.
TEST mask,PINA 'TESTs mask with what is currently INA.
ADD PINA,PINA 'ADDs OUTA to OUTA

So you can see that there is no danger of latching INx bits into OUTx bits on a read-modify-write instruction to PINx. I've been writing code for PINx for a while now and have not run into a situation where this arrangement was stifling in any way. What it does is makes 4 more cog registers available for code and data.

KaosKidd · 2011-05-27 06:05

For everyone...
Thank you for the very detailed pre-education of PASAM! I am truely loving it.
Q1:

Roy Eltham wrote: »

..{removed for clarity}..

RDQUAD and WRQUAD work a little differently than the normal RD/WR instructions, so they are not really as handy for reading larges amounts of data into the cog. The cached reads will achieve the best performance for that. The RD/WRQUAD stuff is really more geared towards LMM style stuff or other VMs.

Roy

Please clarify about the cached reads.

Q2: Is Syncing between hub and cog still the programmers chore or are there now new and better methods of syncing? More importantly, in PASAM, has the methodology changed due to the cached reads ?

Q3: About the cog to cog port / pipeline. Is this a set of registers (longs?) that are common between the cogs?

I'm sorry if the questions are repeated or out of "sync" with the current sub-subject.

KK

Bill Henning · 2011-05-27 06:09

Hi Roy & Chip,

Thank you for the docs!

1) What do E and N stand for below? (may be a dumb question, but I have not had coffee yet)

2) Why is the range -16..15 for offsets? I would have expected -32..31 unless you are using two bits to select pointers (ie four index registers)

If I remember correctly:

- if bit8 is 0, it is absolute hub addressing using b7-b0,
- if bit8 is 1, it uses new pointer modes

Could you please post bit-level encoding of the SRC field?

cgracey · 2011-05-27 06:35

Bill Henning wrote: »

Looks good, but I do have a couple of questions: (no doubt to just waking up and not having my coffee yet this morning)

if I read this correctly,

- would "add pinx,reg" still work the same as the current "add outa,reg"? (I build fast external parallel counters with this, assuming the correct bits were set for output)

- would "mov reg,pinx" still give me the current state of the pins, regardless of their being inputs or outputs? (i would assume so, just checking)

- will "wrlong ptr, pinx" now write the pins to the hub? (current state, regardless of input and output)

- will "rdlong pinx,ptr" now write the hub data to the pins? (assuming configured for output)

I am verifying the above as they affect my external memory interfaces for Prop2

Everything would work as you anticipate. You've got the RDLONG's operands backwards, though.

Sapieha · 2011-05-27 07:26

Hi Chip.

It is still not entire Instructions set presented -->
BUT can I find that instructions in this list.

Look attached PDF

cgracey wrote: »

Everything would work as you anticipate. You've got the RDLONG's operands backwards, though.

jazzed · 2011-05-27 07:27

Maybe these are silly questions. I'll check that later

I suppose pinx means one of pina, pinb, pinc, pind.

Do these pinx hub "incantations" work?

wrlong pinx, hubadr   ' write value from input pins to some hub address
rdlong pinx, hubadr   ' read value from some hub address to output pins

Since "add pinx, pinx" sums the output can I assume pinx is always
a reflection practically speaking of what is on the physical pins?

There must be some delayed storage in there for add pinx, pinx to work.

These operations seem obvious:

mov pinx, var ' put value of var onto pinx output
mov var, pinx ' read value of pinx input to var

What happens if we want to read or write all pins at once?
I suppose there is no such mov or other on cog instruction.
Can something like "rdquad pins, hubptr" read all pins at once?

Phil Pilgrim (PhiPi) · 2011-05-27 08:30

Chip,

I still question the use of pinx as a reserved word, because of the confusion it can cause. It would be better to force programmers to write inx and outx and to flag errors when they are used inappropriately. Programs would not only be more reliable, but also more readable, since the syntax would correlate with the intent. If this were done, I think something like test ina,#1 would be a little less objectionable, since the intent is much clearer from the syntax than test pina,#1. In any event, I see no reason to go back to having inx and outx occupy separate addresses.

-Phil

Roy Eltham · 2011-05-27 10:33

Bill,
I'm confused. You seem to be asking for it to work exactly how it works (except the range is -16 to 15 not -32 to 31).

RDLONG D, PTRA[+4] <-- does not change PTRA, just uses PTRA + 4 (longs) as the address
RDLONG D, PTRA[++4] <-- does modify PTRA, pre incrementing by 4 longs, then reading from that address

Is it that you want it to not do the scaling when it's not modifying the pointer? Or what?

Bill Henning · 2011-05-27 10:53

Sorry Roy, I mis-read the descriptions, I read it before my morning coffee. I revised the message to indicate the information I am seeking... namely

Could you please post the bit-level encoding for the 'src' field for these instructions?

Thanks,

Bill

Propeller II update - BLOG

Comments