Propeller II update - BLOG

potatohead · 2011-05-28 18:09

Well, the worry is all about COG performance with the onboard counters and video hardware. Multi-instruction access limit the upper range of performance, and could muck with clever self-modify code tricks, common in the COG.

Maybe it's not a worry, but I hesitate to say that, given how much we got from Prop I. We got more than originally anticipated. Would be really nice for that to happen again

For the LMM program, yeah, it's not such a big deal. There will be latency anyway, just because of the VM / kernel needed to manage execution of that. But, for COG space, where we often will build out that "software defined silicon", it's a genuine question, IMHO.

My personal opinion here is to map more registers, and count on the fact that we've got more potent instructions, along with REP, for loops, both of which will save quite a bit in the COG. Just two loops written with REP probably would save the few longs it costs to put the key registers in the address space. That's my .02

Cluso99 · 2011-05-28 18:30

Roy: Thanks for the PTRx explanation of pre/post/none inc/dec. This is fantastic news.

As for CNT, it is often used and not being able to use it as a source to me seems to be a problem, as Phil and Bill have said. As for FRQx and perhaps PHSx, I haven't used them much so cannot comment.

I do know that recovering the memory back from register usage gives us a little more extremely valuable program space. While I am loathe to suggest more silicon requirements (and delays), jmg has a valid point. Is it possible to optionally map the registers back into the register space? Perhaps an instruction to flip them in/out? Otherwise, I think I would prefer registers over memory, considering the additions for LMM and Overlay PASM.

potatohead · 2011-05-28 18:33

I concur, even if we cannot toggle mapping them in and out. The COG is the COG. We will always fill it, and when we do, there is LMM, or dividing tasks. A few words here and there won't significantly impact that reality, whereas multi-instruction access to key registers will impact a lot of things.

cgracey · 2011-05-29 01:45

Phil,

We've got 54 special-purpose registers.

I know it's pointless to map them all, since in any given program, the special instructions to access them would be smaller than the footprint of them all in cog ram.

You would like to see CNT mapped, and also FRQx and PHSx. Any others, from looking at the instruction set?

Chip

cgracey · 2011-05-29 01:49

Phil Pilgrim (PhiPi) wrote: »
So cnt is not mapped? How are we supposed to do this?
    neg   duration,cnt
    ...
    add   duration,cnt
And why on earth exclude frqx from the memory mapping when it's so handy to do math operations on them? I don't get it. I hope Chip can explain his rationale.

-Phil

You'd have to do this:

getcnt first
...
getcnt last
sub last, first

This takes one more instruction, plus another register.

This is a good argument for mapping CNT.

Phil Pilgrim (PhiPi) · 2011-05-29 02:35

Chip,

These are the registers which, at a minimum, I believe should be exposed:

1. dira - dirc (3)
2. ina/outa - inc/outc (3)
3. cnt (1)
4. frqa, frqb (2)

It would be very nice to map phsa and phsb; but without being able perform perform read-modify-write instructions on them while counting, it's probably less important than frqa and frqb. Also, awhile back we discussed splitting the dir registers to enable the PMOS and NMOS output drivers separately. Did this make it to the latest rev? If so, and depending on how it's implemented, line 1. could increase to six SFRs.

Any others, from looking at the instruction set?

I'm mostly concerned with the I/O and counters, since those represent possible real-time bottlenecks if they're hidden. If there are counter registers supporting additional modes (e.g. PWM) that have real-time consequences, they should be exposed, too. Configuration registers, like ctra and ctrb are almost always simple load/save and don't need the same exposure.

Is there an exhaustive list somewhere of the 54 SFRs and what they're used for? It would be nice to understand all of them to know which ones would benefit from RMW access. Also, are there access instructions for the hidden registers that do more than load and save?

Thanks,
-Phil

cgracey · 2011-05-29 03:02

Phil,

DIRA..DIRD are all implemented as registers.

PINA..PIND (could be renamed to INx/OUTx, as discussed) are all implemented as registers. The PIND/DIRD is for the inter-cog long exchange.

CNT, FRQA, and FRQB are currently hidden, but could be brought out.

By the way, if you currently do two sequential GETCNT D instructions, the first returns the lower 32 bits of the system counter, and the second returns the upper 32 bits, time-aligned with the lower. The WAITCNT is still 32 bits, though. You could read all 64 bits and use the divider (64/32) to compute, say seconds since start-up, and it would go almost 6,000 years at 100MHz, before rolling over.

I will think about getting the CNT, FRQA, and FRQB registers brought out.

About the separate PMOS and NMOS controls: Each pin is configurable to drive 8 different types of highs and lows. They are: fast (normal), slow (emi), 1500 ohms, 10k ohms, 100k ohms, 100uA, 10uA, and hi-z. High and low are separately programmable and you can also invert both the input and output signals, plus reclock the output at the pad to get better delta modulation.

I need to make a list of all the instructions, but time is kind of tight now, as I'm racing to get these changes made over the weekend so that I don't hold up the synthesis work.

Chip

Phil Pilgrim (PhiPi) wrote: »

Chip,

These are the registers which, at a minimum, I believe should be exposed:
1. dira - dirc (3)
2. ina/outa - inc/outc (3)
3. cnt (1)
4. frqa, frqb (2)

It would be very nice to map phsa and phsb; but without being able perform perform read-modify-write instructions on them while counting, it's probably less important than frqa and frqb. Also, awhile back we discussed splitting the dir registers to enable the PMOS and NMOS output drivers separately. Did this make it to the latest rev? If so, and depending on how it's implemented, line 1. could increase to six SFRs.

I'm mostly concerned with the I/O and counters, since those represent possible real-time bottlenecks if they're hidden. If there are counter registers supporting additional modes (e.g. PWM) that have real-time consequences, they should be exposed, too. Configuration registers, like ctra and ctrb are almost always simple load/save and don't need the same exposure.

Is there an exhaustive list somewhere of the 54 SFRs and what they're used for? It would be nice to understand all of them to know which ones would benefit from RMW access. Also, are there access instructions for the hidden registers that do more than load and save?

Thanks,
-Phil

jmg · 2011-05-29 04:38

cgracey wrote: »

By the way, if you currently do two sequential GETCNT D instructions, the first returns the lower 32 bits of the system counter, and the second returns the upper 32 bits, time-aligned with the lower.

Is that true of all the counters, or only for the system counter ?

cgracey wrote: »

About the separate PMOS and NMOS controls: Each pin is configurable to drive 8 different types of highs and lows. They are: fast (normal), slow (emi), 1500 ohms, 10k ohms, 100k ohms, 100uA, 10uA, and hi-z.

Nice; Is there an Edge accelerate mode ? (higher drive, briefly, on level change )
Is it correct the pins are not not 5V tolerant ?

Phil Pilgrim (PhiPi) · 2011-05-29 07:55

Hey Chip,

I waffled on phsa and phsb but shouldn't have. (I was 2:30 a.m. -- way past my bedtime.

) If possible, please add them to the list of SFRs to expose, for the same reasons cnt should be exposed (the neg-plus-add construct, among others). Also, these two registers are constantly used in ways for which they were not intended, and it would be nice to keep that option available.

Thanks,
-Phil

Sapieha · 2011-05-29 09:21

Hi Chip.

I need to make a list of all the instructions, but time is kind of tight now, as I'm racing to get these changes made over the weekend so that I don't hold up the synthesis work.

If it is for at have test chip as fast as possible - That is good decision.
BUT if that will totally Close possible changes in END chip -- Not good.

davidsaunders · 2011-05-29 10:44

WOW. Thank you Chip. Now I really want a Propeller II or 10000 Prop IIs. How long would be a good estimate before Pre-Orders are available? (May not have the cash for as many as I want [Have to start saving up], may be starting with 20 or 30).

BigFoot · 2011-05-29 16:25

Jmg: They are using a 3.3V process so I don't believe anything on the chip will directly interface to 5V.

jmg · 2011-05-29 17:32

BigFoot wrote: »

Jmg: They are using a 3.3V process so I don't believe anything on the chip will directly interface to 5V.

That depends on the FAB : For examples of Microcontrollers with Low Voltage cores, and Wide Vcc operation to 5V, see Silabs (8 bit), Nuvoton & Fujitsu (32 bit), Cypress new PSoC3/5 series, and even Freescale's teensy RS08 series can do this.

BigFoot · 2011-05-29 17:38

Jmg; I asked Beau about this at last years UPEW and he explained that ~3.6 volts is the maximum
the chip will take. In order to squeeze all those transistors on the smallest die possible they had to
go this way. This is also why the core logic works at 1.8 volts.

jmg · 2011-05-29 18:30

BigFoot wrote: »

Jmg; I asked Beau about this at last years UPEW and he explained that ~3.6 volts is the maximum
the chip will take. In order to squeeze all those transistors on the smallest die possible they had to
go this way. This is also why the core logic works at 1.8 volts.

In the examples I gave, most also have 'core logic' at 1.8V

It is already being done, and even on small, low cost controllers. It is not the brick wall some thought it was in the past.

Phil Pilgrim (PhiPi) · 2011-05-29 18:57

Chip,

One more argument for exposing frqa and frqb:

Thank what has to be done now to tweak a counter's phase:

    add   frqa,#adj
    sub   frqa,#adj

Now, imagine what a PITA that would be if frqa and frqb were hidden.

-Phil

Roy Eltham · 2011-05-31 00:34

Just wanted to post a quick note to let you know why I haven't posted more prop 2 instruction details. I've been sick for the last few days, and not doing much of anything. Yay, for the long weekend to be sick on. :P
I will post more info in soon.

Rayman · 2011-05-31 06:24

The Prop 2 would be great for these 320x240 touchscreens I'm working with. I was just thinking that the Prop2 could interface directly to the touchscreen, saving me money and trouble. Also, the internal RAM is enough for full-screen 16-bit display. Plenty of pins left over. Just wish I knew when exactly it was coming... I'd drop my plans for a Solomon graphics chip for a Prop1 solution, if I knew for sure it would be available in the near future... Chip has been saying 1 year away for a few years now. But, at UPEW he said it could be 3 months (but that's only if everything goes perfectly).

Dave Hein · 2011-05-31 08:44

I have some questions about the instructions that use a D/#n instruction. Does this mean that either a cog memory location can be specfied for the destination field, or an immediate value can be specified if the # character is used? That is, for these instructions, is the destination field treated just like the source field for the older Prop 1 instructions?

How do instructions work that specify something like DDDDDnnnn for the destination field. Does this mean that the immediate value must be four bits in size, or at least masked to four bits? In that case, are the remaining D bits set to zero?

Dave Hein · 2011-06-02 15:11

It would be useful to be able to access registers directly, especially ones like cnt, dirx, inx, outx, frqx, phsx, etc. Why not just provide a shadow-enable bit that would make the 50 or so registers available when set, and cog memory available when not set? The registers could be mapped at the end of cog memory like the 16 registers in the Prop 1 are. The instruction fetcher would ignore the shadow-enable bit, and would always fetch its instructions from cog memory. If this area of cog memory is only used for instructions, and is never modified, there is no reason that it needs to be visible to instructions. Self-modifying code could be done in the rest of the cog memory without any limitations, and could be done in the shadow area by clearing the shadow-enable bit.

The other possibility would be to put the shadow area at the beginning of cog memory. This would allow variable storage to be at the end of a PASM program like we normally do with the Prop 1.

Cluso99 · 2011-06-02 16:07

Dave:
Agreed it would be nice for ALL registers to be enabled/disabled by a shadow enable bit (or instruction). The instruction fetcher would always use the shadow memory (just as the prop 1 does now). All the standard instructions would have direct access to the registers (with shadow disabled) so all the speedups and quirks we do now would work - and I am sure there would be a lot more with the new registers too! Yet, we would still have the memory usable for code. Best of both worlds!

I was just going to disagree with you about placing the registers at the cog $0... due more for consistency with prop 1 than anything. But thinking more, I realise at the beginning would be far better because...
* Usually initialisation code is first, so once executed it can be swapped out for the register space subsequently (if required).
* Placing at the end requires the code using shadow to be placed there. This will most likely result in bigger binaries because we have to fill (space out) the whole cog space.
* Default is memory (shadow enabled) which permits self-modified code
* Direct register access would mainly on be used for specialised code where programmers understand the implications of shadow space and self-modifying code.

However, I suspect we have discovered this too late for Prop II version 1. Maybe version 2?

Phil Pilgrim (PhiPi) · 2011-06-02 16:19

I like the shadow idea, but 50 registers is a lot. I'd be more inclined to limit it to 16 and use a permanently shadowed register as a bank switch. The registers that Dave mentioned could be included in one bank, since they're the ones that would get the most accesses, making bank switching infrequent.

-Phil

Baggers · 2011-06-03 03:36

I've got a quick question about the deterministic-ness of the propII if it uses the dual thread, where each thread gets an instruction each, I take it that the hub ops throw each thread's hub slot out of order?

Rayman · 2011-06-03 03:43

I wonder how much trouble it would be to add in an onboard temperature sensor...

Dave Hein · 2011-06-03 06:24

It just seems like it would be easier to implement registers that are directly addressable than to have special instructions to access them. The address lines have to be decoded and individual registers are selected in either case. The shadow method would have fewer special cases than having special instructions, and it would allow for any operation to be performed on the registers. There would be no need for special SET, GET, ADD and SUB instructions. It would just use the existing mov, add and sub instructions.

I realize it is very late in the development phase to propose something like this, but it seems like it just a matter of where the register address lines are extracted from the instruction. As long as the shadow RAM locations just execute code there is really no reason to have to access them directly, and they could always be accessed by flipping the shadow-enable bit.

The only complication I see is if a RET is located in the shadow RAM. A special case may be needed for the JMPRET/CALL instruction so it writes the return address in the cog memory and not a register. Or, the user could just make sure that there are no RET instructions in the area of the shadow RAM.

Cluso99 · 2011-06-03 06:27

Ray: I would rather an accurate trimmable oscillator - then again they seem to have onboard temp sensor too.

BigFoot · 2011-06-03 06:33

Rayman, Here is an inexpensive touch screen solution that runs off the I2C bus and is very responsive
even with our 7" display.

davidsaunders · 2011-06-03 06:40

Why not just have a set of instructions that allows the use of the alternate address space, that of the functional registers, thus allowing up to 512 functional registers, this would mean having fewer instructions for cog mem, though one could also add a series of ops that use the bits normally reserved for effects, thus not allowing effects for the particular op group(s). If thus done what would we lose? I see nothing lost, and if there are less than 512 functional registers we could gain effective Cog mem.

davidsaunders · 2011-06-03 06:41

To my last. Though we may want to reserve this thought for the Prop 3.

davidsaunders · 2011-06-03 06:45

I forgot:

The idea I mentioned above would not increase the logic of a cog, and may indeed reduce it. All the decoding is already there, and there would be fewer special cases to handle if done this way.

Propeller II update - BLOG

Comments