The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Rayman · 2014-04-11 15:50

Actually, that article gives me the thought...

Wouldn't it be nice if hubexec mode would also let you run routines that are in cog ram? Does it do that already?
Would that give you more predictive timing on critical timing routines, say like SDRAM access or something?

mindrobots · 2014-04-11 16:01

jazzed wrote: »

Right it's just running code "on the metal" as I suggested before. You can call it "Pink Floyd" if you like. I don't care. It doesn't matter, but I called it native. It's not important.

Pink isn't well he stayed back at the hotel.
They sent us along as a surrogate band
We're gonna find out where you all really stand!

Heater. · 2014-04-11 16:04

I did not get it either.

A Spin program may well have some assembler in it. In a DAT section.

On a P1 this gets loaded to GOG and runs.

On a PII it may well never get loaded in a lump like that. It just gets run from where it is.

Makes no difference to OBEX style objects.

Heater. · 2014-04-11 16:12

Rayman,

I think executing from cog registers is the same thing as executing from L2 cache on a "normal" core...

Not according to anyting I ever read.

Registers are registers. Memory is memory. There may well be one or more layers of cache.

For sure on a normal processor, cache or not, you cannot do:

mov    r0, #22
jmp    r0

On a Prop you can.

Heater. · 2014-04-11 16:13

mindrobots,

Is there anybody in there?

Cluso99 · 2014-04-11 16:40

Rayman wrote: »

Actually, that article gives me the thought...

Wouldn't it be nice if hubexec mode would also let you run routines that are in cog ram? Does it do that already?
Would that give you more predictive timing on critical timing routines, say like SDRAM access or something?

Yes, you can JMP/CALL/RET between hub and cog at will. There are 4 sets of JMP/CALL/RET/PUSH/POP, where the return is placed to $1EF, onto PTRA stack, PTRB stack, or the 4-level internal stack. And there is the JMPSW which is the same as the old P1 JMPRET (not sure if this works for hub though).
see the Instructions thread.

evanh · 2014-04-11 16:47

Rayman wrote: »

Actually, that article gives me the thought...

Wouldn't it be nice if hubexec mode would also let you run routines that are in cog ram? Does it do that already?
Would that give you more predictive timing on critical timing routines, say like SDRAM access or something?

Hubexec is integral to the redesigned normal Cog operating mode, unless the P1+ is different to the P2 implementation, rather than it being a selection of processor modes that are switched between. So, by-in-large, Cogexec and Hubexec are in reference to where the executing code happens to reside.

There is still timing/fetching issues with HubRAM that causes stalls when the executing code resides in HubRAM. So timing critical code would still best reside in the Cog.

jazzed · 2014-04-11 16:49

Hmm.

We're just two lost souls swimming in a fish bowl year after year.

Electrodude · 2014-04-11 16:53

Can there be a way to automatically wrquad ptra+=16 the tiny stack on overflow and rdquad ptra; ptra-=16 it on underflow, i.e. to automatically swap it into or out of the hub? It would be faster than using only a hub stack but would allow for a bigger stack than just the tiny stack.

If you can't do this, can you please at least make the tiny stack have 8 levels?

electrodude

evanh · 2014-04-11 17:02

Is it even a good idea to have the tiny stack at all? From what I've read compilers just won't use it. Doesn't that then mean that assembly coding can do fine through other means?

Cluso99 · 2014-04-11 17:11

evanh wrote: »

Is it even a good idea to have the tiny stack at all? From what I've read compilers just won't use it. Doesn't that then mean that assembly coding can do fine through other means?

Yes, I have wondered that too. But without it, we have to use hub (or the fixed $1EF which means we then need to save it somewhere ie make our own stack). But 4 deep is small.

Maybe we could have 1 hub stack (using PTRB) and one cog stack (using INDB) or a deeper LIFO ???

David Betz · 2014-04-11 17:28

evanh wrote: »

Is it even a good idea to have the tiny stack at all? From what I've read compilers just won't use it. Doesn't that then mean that assembly coding can do fine through other means?

Is there a tiny stack in P1+. I hadn't noticed. You're right that the compiler is unlikely to use this for anything other than COG helper functions. I think PASM programmers liked the idea of it though and I like the idea that self-modifying code isn't required.

David Betz · 2014-04-11 17:32

Rayman wrote: »

Actually, that article gives me the thought...

Wouldn't it be nice if hubexec mode would also let you run routines that are in cog ram? Does it do that already?
Would that give you more predictive timing on critical timing routines, say like SDRAM access or something?

I had assumed that was already possible.

Cluso99 · 2014-04-11 17:42

David Betz wrote: »

Is there a tiny stack in P1+. I hadn't noticed. You're right that the compiler is unlikely to use this for anything other than COG helper functions. I think PASM programmers liked the idea of it though and I like the idea that self-modifying code isn't required.

It's in the instruction spec (unless Chip has forgotten to remove it) as being 4-levels deep. Seems like the P2 version we had for tasks.
Unfortunately 4 deep is a bit short to be of much use.

rjo__ · 2014-04-11 18:23

I am constantly amazed by how little information is necessary to give you guys a complete understanding... It is just f***ing astounding.
The idea that ozpropdev does what he does on a regular basis is a perfect example.

I think most of the people lurking have no idea what you guys are talking about. I have no clue how the next chip is going to operate... the instructions are fine, I get them
very easily... after that it is less than a blur.

How about throwing some diagrams into your arguments?

rjo__ · 2014-04-11 18:25

By the way, I like the idea of calling the default (first user experience) mode... native. It gives the natural impression that other things are possible.

Phil Pilgrim (PhiPi) · 2014-04-11 18:48

Well, this thread has certainly devolved into mediocrity! I stay away for a couple days, and arguments that used to be centered on substance now revolve around semantics. Where's the passion, guys?

Seriously, it's probably a good sign that actual deicsions are being made. Decisions presage progress; progress presages silicon. (I'm not holding my breath, though.)

-Phil

mindrobots · 2014-04-11 18:48

rjo__ wrote: »

By the way, I like the idea of calling the default (first user experience) mode... native. It gives the natural impression that other things are possible.

When I first used "Native" last night, it was in parentheses and mostly as a space holde between CMM and XMM. The comment afterdwards said "a big flat address space like everyone else"

I'm REALLY glad now I didn't put "commando"...I was considering it last night as a joke.

Cluso99 · 2014-04-11 18:49

Chip,

Are these registers correct?

Should the first INDA/INDB be PTRA/PTRB ? They are INA & INB (the Port Inputs) - thanks Seairth.

Are you implementing INDA/INDB ?

addr        read        write        name        background
--------------------------------------------------------------------------
000..1EE    RAM        RAM           -           -
1EF         RAM        RAM           (used by LINK to save return address)
                                               
1F0         CNT        -             CNT         DCACHE0
1F1         RND        -             RND         DCACHE1
1F2         INA        -             INA [s]PTRA?[/s]   DCACHE2
1F3         INB        -             INB [s]PTRB?[/s]   DCACHE3
1F4         RAM        RAM+OUTA      OUTA        -
1F5         RAM        RAM+OUTB      OUTB        -
1F6         RAM        RAM+DIRA      DIRA        -
1F7         RAM        RAM+DIRB      DIRB        -
1F8         RAM        RAM+CTRA      CTRA        -
1F9         RAM        RAM+CTRB      CTRB        -
1FA         RAM        RAM+FRQA      FRQA        -
1FB         RAM        RAM+FRQB      FRQB        -
1FC         PHSA       PHSA          PHSA        ICACHE0
1FD         PHSB       PHSB          PHSB        ICACHE1
1FE         indirect   indirect      INDA        ICACHE2
1FF         indirect   indirect      INDB        ICACHE3

Cluso99 · 2014-04-11 18:54

Phil and kuroneko,

Do we need 2 sets of counters in each cog?
Would one set suffice?
I presume some modes could be removed because of the new Smart I/O features? If so, what?

Phil Pilgrim (PhiPi) · 2014-04-11 19:11

Cluso wrote:

Phil and kuroneko,

Do we need 2 sets of counters in each cog?
Would one set suffice?

I was rather hoping for three. My signal front-ends-plus-I/Q-mixers use three counters (actually five, including the local oscillators), and i've had to start an extra cog in the P1, just to accommodate the third one.

A single counter per cog is useless for the kind of stuff I do. But, again, if more can't be accommodated, I'm content to keep using the P1. It'll be lower-power and less expensive anyway. At some point, it would be nice to see a multi-core chip from somebody that's optimized for RF apps, without having to resort to FPGAs.

-Phil

Ramon · 2014-04-11 19:11

Phil Pilgrim (PhiPi) wrote: »

Well, this thread has certainly devolved into mediocrity!

It looks that we cannot self-moderate ourselves in this forum. It could be possible to close all threads in the P2 forum for a while? or limit the number of post per day? (Allow just one post per person per day.) I think that right now we are a huge waste of time for Chip.

Do we want to do the same again? The best microcontroller never made.

RossH · 2014-04-11 19:12

jazzed wrote: »

Ya Heater that's it.

Ross, I don't understand what you mean. OBEX is comprised of Spin code. How is it relevant?

I'm assuming there will be an OBEX equivalent for high level languages other than Spin. Then it will become quite important to know which objects use COG mode and which use HUB mode.

Ross.

Seairth · 2014-04-11 19:17

Cluso99 wrote: »

Chip,

Are these registers correct?

Should the first INDA/INDB be PTRA/PTRB ?
Are you implementing INDA/INDB ?

addr        read        write        name        background
--------------------------------------------------------------------------
000..1EE    RAM        RAM           -           -
1EF         RAM        RAM           (used by LINK to save return address)
                                               
1F0         CNT        -             CNT         DCACHE0
1F1         RND        -             RND         DCACHE1
1F2         INA        -             INA PTRA?   DCACHE2
1F3         INB        -             INB PTRB?   DCACHE3
1F4         RAM        RAM+OUTA      OUTA        -
1F5         RAM        RAM+OUTB      OUTB        -
1F6         RAM        RAM+DIRA      DIRA        -
1F7         RAM        RAM+DIRB      DIRB        -
1F8         RAM        RAM+CTRA      CTRA        -
1F9         RAM        RAM+CTRB      CTRB        -
1FA         RAM        RAM+FRQA      FRQA        -
1FB         RAM        RAM+FRQB      FRQB        -
1FC         PHSA       PHSA          PHSA        ICACHE0
1FD         PHSB       PHSB          PHSB        ICACHE1
1FE         indirect   indirect      INDA        ICACHE2
1FF         indirect   indirect      INDB        ICACHE3

Thats "IN", not "IND".

rjo__ · 2014-04-11 19:22

I agree that from time to time things get messy around here. BUT there is plenty of information flying around. I wouldn't be coming here if I thought it was just wasted conversation.

I don't think Chip is paying much attention at all. I think he sees a few words, knows what the rest of the conversation is going to be and then goes back to work.

RossH · 2014-04-11 19:26

Ramon wrote: »

It looks that we cannot self-moderate ourselves in this forum. It could be possible to close all threads in the P2 forum for a while? or limit the number of post per day? (Allow just one post per person per day.) I think that right now we are a huge waste of time for Chip.

Do we want to do the same again? The best microcontroller never made.

The only time I really get worried is when I see Chip actually posting a lot. That tells us he's not busy concentrating on the P16X32, or perhaps he's got himself a knotty problem he just can't solve.

Normally, I think he just drop by occasionally to poke a stick in the anthill, to see if anything interesting crawls out.

Ross.

rjo__ · 2014-04-11 19:27

Phil,

I respect your work. And if you say that there is something in the architecture that will keep you from being able to use it for your RF work... that is important to me.
I know of all kinds of University level research that depend upon RF modulation. This means that there are academic markets that this chip cannot penetrate.
It also means that there is research that won't be done because budgets won't allow it.

So, it isn't just you.

I don't understand the problem, but if you say it is there, that is good enough for me. I wish you would be a little more verbose. I would like to understand it more, but I also understand that you are not exactly happy right now:)

Rich

jmg · 2014-04-11 20:10

Phil Pilgrim (PhiPi) wrote: »

I was rather hoping for three. My signal front-ends-plus-I/Q-mixers use three counters (actually five, including the local oscillators), and i've had to start an extra cog in the P1, just to accommodate the third one.

A single counter per cog is useless for the kind of stuff I do. But, again, if more can't be accommodated, I'm content to keep using the P1. It'll be lower-power and less expensive anyway. At some point, it would be nice to see a multi-core chip from somebody that's optimized for RF apps, without having to resort to FPGAs.

My understanding was Chip was looking at removing Counters in the COGs entirely, and using the Pin-Cell counters.
( certainly not adding more COG counters )

There is a 32b adder and Mux and Carry out needed to do NCO at the pins, and that does boost the size of the Pin-Cell and slow it down.
I tried this, and got varying impacts depending on which Lattice device I targeted.
The 65nm ECP3 FPGA (~same as Cyclone IV?) seems to have Clock Enables and ripple logic, and it has less impact from adding NCO+Carry mode, than older FPGA/CPLD choices. With a pipelined mux the ECP3 P&R reported over 220MHz

So this may come down to Speed and Die area, with perhaps some backward compatibility questions, and may even depend on the On-Semi tools and how they think.

I would think the 200MHz target is important to try to meet, especially as the cores now only need to hit 100MHz

If the existing COG counters are small enough, it would seem sensible to keep them for backward compatible reasons, if nothing else.
If that is done, the Pin Cells can be a little smaller, and faster, without duplicating the NCO feature.

Phil Pilgrim (PhiPi) · 2014-04-11 20:13

rjo__ wrote:

I don't think Chip is paying much attention at all.

That would be my fondest hope and a good sign that real progress is being made.

Phil ... but I also understand that you are not exactly happy right now

I can't honestly say that I'm unhappy. I'll use -- or not -- whatever actual silicon results from this very flawed and much too-open dev process. And if I don''t, I can make the P1 jump through whatever hoops I need it too.

But I can't help grimacing occasionally at the tragedy unfolding before our eyes that seems, at times, to be a subconcious effort not to have a real end in sight -- that somehow the process of getting there trumps whatever "there" is. While I can throw my total respect behind such a posture for things like hiking, kayakng, sailing, etc., in this context, it can only have a tragic end. And I shudder thinking about it, because Parallax has held such a pivotal importance in my life, in the firends I've made, and in my business.

-Phil

rjo__ · 2014-04-11 20:17

My understanding is that in order to get the numbers, Chip had to move some of the logic outward to the pins for a variety of reasons including heat dissipation. I think we are firm on the number of Cogs... and there might be no way to get Phil to where he wants to be without moving even more toward the pins. I think the functionality is the issue... and I see Phil as the canary in the coal mine right now. I am sure Chip is looking very, very carefully at what Phil is saying.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments