The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

kwinn · 2015-06-25 06:51

ctwardell wrote: »

I'm not sure what is more of a surprise, the lack of news or a three word post from Doug...

;-)

C.W.

LOL, that was quite a surprise ;-)

Cluso99 · 2015-07-01 07:27

There is a brief update from Chip on the other major/recent P2 thread.

Dave Hein · 2015-07-07 19:24

P2 Watch: 16 days since the end of Spring.

        June                  July
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
   -- -- -- -- -- --            1  2  3  4
-- -- -- -- -- -- --   5  6  7 -- -- -- --
-- -- -- -- -- -- --  -- -- -- -- -- -- --
21 22 23 24 25 26 27  -- -- -- -- -- -- --
28 29 30              -- -- -- -- -- --

lardom · 2015-07-07 21:59

The ancient Babylonians' number system was base 60. (Imagine their BSCII code!) Sixty is a convenient number, since it has so many prime factors (2,2,3,5) and can be subdivided neatly in so many ways. 60, divided by its largest prime factor, 5, is 12. Hence, dozens, twelve-hour demi-days, 12 months in a year, 12 points to a pica (a printer's ruler is divided into 1/12ths of an inch, not 1/16ths), etc. Moreover, 60 seconds to a minute and 60 minutes to an hour (or degree) are all derived from this handy multiplicity of primes. Had we evolved with 12 fingers (or maybe 16 for the digital age?) instead of 10, math would be so much easier!

-PhilInteresting. So we use base 60 and base 10. Never asked myself why.

Heater. · 2015-07-07 22:53

Well it's all very confusing. Over here in the west we were very happy with the Roman system. Which is base 10 of course.
When the Indians put a zero into the mix, along with their decimal number system, nobody liked it. (Turns out the zero was not an Arabic invention)
Also turns out that base 10 is not "natural" for humans because of the number of their fingers. Different cultures around the world have used different number bases. And still do.
Sorry I don't have any links to hand to back up my assertions.

MJB · 2015-07-08 00:12

Also turns out that base 10 is not "natural" for humans because of the number of their fingers. Different cultures around the world have used different number bases. And still do.
Sorry I don't have any links to hand to back up my assertions.

when I first visited an arabic market in Cairo I realized why 12 is a natural base.
Just look at your hand.
Hold out all the fingers.
Now take the thumb as a pointer.
Point it at the base segment of the small finger = 1
then the middle segement of the small finger = 2
then the front segment of the small finger = 3
...
you can imagine by now hat we go on with the other 3 fingers the same
and count up to 12.

the woman at the market, where I was buying several things, was so fast using her
single hand as a calculator to sum up the different prices I could not do this as fast on paper ...

so --- 12 is a natural number ;-)

Leon · 2015-07-08 00:27

Also, 12 has more divisors than 10, which can make it more useful.

Dave Hein · 2015-07-13 18:19

P2 Watch: 22 days since the end of Spring.

June                  July
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
   -- -- -- -- -- --            1  2  3  4
-- -- -- -- -- -- --   5  6  7  8  9 10 11
-- -- -- -- -- -- --  12 13 -- -- -- -- --
21 22 23 24 25 26 27  -- -- -- -- -- -- --
28 29 30              -- -- -- -- -- --

cgracey · 2015-07-14 09:49

Update:
I've got all the hub exec stuff working, including launching cogs straight into hub memory without loading them.
I had a flight of fancy and thought maybe I could add a timer interrupt - just something simple where you specify a number of clocks and a CALL address. Then, I started looking at all the stuff that wouldn't work off the bat, like the AUGS/AUGD instructions and the ALTDS instruction, which modifies then next instruction's D and S fields. I started remembering Prop2-Hot and went and looked at the doc's for it. It all started coming back to me. Better to let the user make a simple task switcher than to embark on a hardware solution this time.
I've been optimizing the design and I've got it sped up 10%, so far, to 110MHz on the Cyclone IV. The K-factor correction in the hub CORDIC is disabled for now, since that had become the critical path at 102MHz. I might redo the K correction into additional stages, instead of having it integrated into the CORDIC steps. For now, I want to make sure the cogs are not suffering from critical paths that stick out like sore thumbs. There are a few I see now that I'll be working on.
Anyway, the cog and hub functionality is complete. I just need to do a little more optimization on the cog and then make a smart pin that covers the common bases, so that we can put it onto every pin.

cgracey · 2015-07-14 09:58

Here is a slack histogram of the current design.
I set the target frequency to 160MHz, which is impossibly high, so the sub-160MHz paths are shown in red. It's the shape that really matters, though. Ideally, you'd like to have all those left-most paths pushed to the right so that you get a nice vertical wall. That's what I'm working on now.

jmg · 2015-07-14 10:05

Update:
I've got all the hub exec stuff working, including launching cogs straight into hub memory without loading them.Sounds good.

Anyway, the cog and hub functionality is complete. I just need to do a little more optimization on the cog and then make a smart pin that covers the common bases, so that we can put it onto every pin.

Does that mean the Opcodes are now defined/stable enough, to develop other tools ?

cgracey · 2015-07-14 10:24

jmg, yes. I will post the opcodes tomorrow. I still need to document the streamer, as it's not apparent from the instruction set how it works. Most instructions should make immediate sense, though.

jmg · 2015-07-14 10:26

I had a flight of fancy and thought maybe I could add a timer interrupt - just something simple where you specify a number of clocks and a CALL address. Then, I started looking at all the stuff that wouldn't work off the bat, like the AUGS/AUGD instructions and the ALTDS instruction, which modifies then next instruction's D and S fields.
I'm not sure I follow. If the ALTDS derives the next instruction's Address from its own, and does not use any cycle borrowing tricks, it should be self-contained and complete before the PC branches, and then the RET carries on at following opcode, with the previously changed D & S fields.
Flag saving is needed, but the pgm flow seems able to tolerate a timer branch ?

David Betz · 2015-07-14 10:28

Hi Chip,

This is great news! Any chance you could post a preliminary opcode list?

Thanks,
David

potatohead · 2015-07-14 10:30

Way to go Chip!

Oh, so many questions...

I'll wait.

ozpropdev · 2015-07-14 10:37

Exciting news Chip!
Looking forward to diving into testing again.

David Betz · 2015-07-14 10:38

Way to go Chip!

Oh, so many questions...

I'll wait.

I only asked about the opcode list because I think it is automatically generated so I figured it would be essentially free for him to post. I will also promise not to make any suggestions for changes! :-)

potatohead · 2015-07-14 10:40

No worries! It wasn't directed at you David. He's gonna post ops. Don't get, unless you ask, right?

I think we all are gonna have to think long and hard about that. Good news is it's mostly there and we will recognize most of it too.

Baggers · 2015-07-14 10:41

Excellent news Chip

No need for interrupts P1 did well without interrupts, and well the P2 has 16 cogs, and up to 4 threads on each, so with a chip with essentially 64 threads, we don't need interrupts!

jmg · 2015-07-14 10:45

... the P2 has 16 cogs, and up to 4 threads on each, so with a chip with essentially 64 threads, we don't need interrupts!

I'm not sure the 4 threads made the cut this time ? (pity).
A HW branch system may be simpler than 4 threads.

Baggers · 2015-07-14 10:47

... the P2 has 16 cogs, and up to 4 threads on each, so with a chip with essentially 64 threads, we don't need interrupts!

I'm not sure the 4 threads made the cut this time ? (pity).
A HW branch system may be simpler than 4 threads.

Oh well, still 16 cogs should be more than enough

David Betz · 2015-07-14 10:58

Excellent news Chip

No need for interrupts P1 did well without interrupts, and well the P2 has 16 cogs, and up to 4 threads on each, so with a chip with essentially 64 threads, we don't need interrupts!

Notice I'm now asking for an opcode list even though I previously suggested it would be a good idea to wait for silicon to do much tools work. I suppose an FPGA image will be too temptying to resist! Now, should I order one of those 1-2-3 boards? :-)

potatohead · 2015-07-14 11:07

...depends on when he's ready to post up an image. Hopefully, that answer is yes David.

evanh · 2015-07-14 11:27

cgracey said:
Update:
I've got all the hub exec stuff working, including launching cogs straight into hub memory without loading them.

YES!!! Good going there Chip. I didn't think HubExec was going to be that quick a solution. /me a happy dude.

Cluso99 · 2015-07-14 12:53

Fantastic news Chip.

Interesting being able to launch a cog directly into hubexec without loading cog ram. Makes a cog quicker to start.

Looking forward to seeing the instruction set.

Dave Hein · 2015-07-14 12:55

Oh boy, P2 day is almost here! I'm happy to wait for the op code list until the FPGA image and assembler are ready. If Chip releases bits and pieces early it will just slow down progress on the FPGA image because he will be bombarded with questions and change requests.

David Betz · 2015-07-14 12:58

Oh boy, P2 day is almost here! I'm happy to wait for the op code list until the FPGA image and assembler are ready. If Chip releases bits and pieces early it will just slow down progress on the FPGA image because he will be bombarded with questions and change requests.

I think we need to all agree not to ask for changes. I would like to have the instruction set early to start thinking about modifying PropGCC.

cgracey · 2015-07-14 21:18

Here is the instruction set.
This architecture does NOT have 4 tasks per cog, only one. The hardware to facilitate that would be too much for this 180nm design.
Having at least some simple mechanism to cause periodic, automatic CALLs would be good, though. I'm still ruminating over this timer interrupt idea. It's too expensive to add a bunch of hardware to allow breaks at any time, however, if I qualify the interrupt request with no-AUGS/D-in-progress and no-ALTDS-in-progress, it might have very minimal cost. I'm still trying to make sure this isn't just delusional. This would make things like background serial ports possible without much coding.

Baggers · 2015-07-14 21:30

Here is the instruction set.
This architecture does NOT have 4 tasks per cog, only one. The hardware to facilitate that would be too much for this 180nm design.
Having at least some simple mechanism to cause periodic, automatic CALLs would be good, though. I'm still ruminating over this timer interrupt idea. It's too expensive to add a bunch of hardware to allow breaks at any time, however, if I qualify the interrupt request with no-AUGS/D-in-progress and no-ALTDS-in-progress, it might have very minimal cost. I'm still trying to make sure this isn't just delusional. This would make things like background serial ports possible without much coding.

Cheers for the instruction set Chip

As for one task per cog, that's perfectly fine, best keeping the real estate small, but having to not use AUGS/D or ALTDS when using interrupts is totally fine with me!

jmg · 2015-07-14 21:31

Having at least some simple mechanism to cause periodic, automatic CALLs would be good, though. I'm still ruminating over this timer interrupt idea. It's too expensive to add a bunch of hardware to allow breaks at any time, however, if I qualify the interrupt request with no-AUGS/D-in-progress and no-ALTDS-in-progress, it might have very minimal cost. I'm still trying to make sure this isn't just delusional. This would make things like background serial ports possible without much coding.

Sounds a good idea.
If you do need to pair-opcodes to manage this, I'd suggest making the travel time the same in all cases, to remove any opcode-dependant jitter - that may mean a dummy delay in some opcodes.
Such a fixed delay may even make the coding simpler ?

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments