The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

potatohead · 2015-07-22 14:48

We got interrupts because Chip was working toward an image and found he was going to require more COGS than he did on "hot" to do the ROM programs, encryption, monitor, etc...

The idea of a timer to replace the tasker resolved that issue.

Since it was done, several of us, myself included, realized there are some use cases to discuss and if we are to add interrupts, they need to be great interrupts and here we are.

That use case discussion has almost played out. Very soon, if not now, Chip is back at building the FPGA image.

It's not really a delay as much as it is a rational assessment of the design and a need identified and resolved at this point.

Last comments on the design were sorting out the timing critical paths, to which he had disabled the math function for a bit to clean up the COG. And at this moment, "getting the COG scrubbed clean" is the goal.

Really, we can look at it as a delay to AN image. Maybe a week? Or then again, given the ROM is harder, maybe not right? This feature makes the port from "hot" much easier, as well as it enables the COGS to actually make great use of the egg beater style HUB, FIFO, etc... he's got in there right now too.

The image we get now is a much more potent thing, and it took what? A week or so? No brainer really.

Given the coding needed for the ROM, this may have been free. Seriously.

David Betz · 2015-07-22 15:06

We got interrupts because Chip was working toward an image and found he was going to require more COGS than he did on "hot" to do the ROM programs, encryption, monitor, etc...

The idea of a timer to replace the tasker resolved that issue.

Since it was done, several of us, myself included, realized there are some use cases to discuss and if we are to add interrupts, they need to be great interrupts and here we are.

That use case discussion has almost played out. Very soon, if not now, Chip is back at building the FPGA image.

It's not really a delay as much as it is a rational assessment of the design and a need identified and resolved at this point.

Last comments on the design were sorting out the timing critical paths, to which he had disabled the math function for a bit to clean up the COG. And at this moment, "getting the COG scrubbed clean" is the goal.

Really, we can look at it as a delay to AN image. Maybe a week? Or then again, given the ROM is harder, maybe not right? This feature makes the port from "hot" much easier, as well as it enables the COGS to actually make great use of the egg beater style HUB, FIFO, etc... he's got in there right now too.

The image we get now is a much more potent thing, and it took what? A week or so? No brainer really.

Given the coding needed for the ROM, this may have been free. Seriously.

Good to hear. Let's hope no more suggested enhancements sneak in before an FPGA image.

David Betz · 2015-07-22 15:08

Has Chip confirmed that an image will be made available for the BeMicro CV A9?

Bill Henning · 2015-07-22 15:16

I think that could be collapsed to:

LOCKSET D/#n {WC} {WZ}
LOCKCLR D/#n {WC} {WZ}

where:

n = 0..15 (or 0..31)

WC = write current lock state to C

WZ = cause a lock interrupt when the lock is acquired / released

As usual, LOCKSET would block until the specified lock is available.

The fact that hub reads/writes are atomic is not sufficient for my uses, as I often deal with large buffers, and hardware locks are much more efficient for my purposes.

I only added WZ to cause interrupts as it may be useful to have lock set/clear to cause interrupts

I need the ability to specify the lock to be used, however others may want an LOCKNEW / LOCKRET for allocating/freeing locks.

3) KEEP LOCKS

I could not have synchronized the five cogs and access to external memory on the P1 for my bitmapped high resolution drivers without locks.

With P2, I expect that we will have SDRAM to play with, and managing it properly will require locks.

16 locks is pretty good, however I'd prefer 32.

Out of curiosity, do you use LOCKNEW/LOCKRET, or just LOCKSET/LOCKCLR? Put another way, would you still be able to do what you want with:
LOCKSET (existing instruction, no change)LOCKCLR (existing instruction, no change)LOCKGET (new, writes current lock state of a single lock to C)LOCKTOG (new, toggles a lock and optionally writes old value to C)
With the interrupt mechanism, you would also have:
SETEDGE (to enable an interrupt on a lock transition)GETEDGE (to wait/block on a lock transition)

potatohead · 2015-07-22 15:21

I've not seen or heard anything about FPGA support.

IMHO, "what FPGA?" is gonna come up real soon now. I did ask about getting an image before Smart Pins get actualized, and I think that exercise will take a bit, and the answer was, "of course"

Given the time it took last time, an image might be a month? He's got the ROM, various cleanups, pnut.exe (which may largely be done for PASM as he's usually coding on that in tandem with the VERILOG), to do for sure. There are probably a few other things we don't know about too.

Anyway, that's my guess and general expectation, barring something else gets revealed.

And I don't know enough about the FPGA process to understand just what it takes to do a lot of builds. Maybe that will change with the 123 board and the potential for some projects on it. I never did much with the DE2 besides run the hot image. When P1 was released as open code, I changed jobs and roles and just kind of set things aside for lack of head space to think about it much.

Maybe it's possible to compile for a class or type of FPGA and leave a few user editable things out there for pin mappings and such? If so, great! If not, that means Chip will end up doing a few builds each iteration. Maybe that is no big deal. Ideally, we don't have a ton of iterations, right?

Anyone care to comment?

Seairth · 2015-07-22 16:47

Please keep LOCKNEW and LOCKRET.

*ouch* I just realized my understanding of LOCKs in the P1 was wrong. I had always thought that LOCKNEW/LOCKRET affected the same bits as LOCKSET/LOCKCLR, but for different use cases. Because of this, I never used LOCKNEW/LOCKRET. Looking at the P1V code, I now realize they are two separate registers! LOCKNEW/LOCKRET does not influence bits manipulated with LOCKSET/LOCKCLR, and vice versa. I just never noticed, because I only ever used LOCKSET/LOCKCLR (where all cogs knew the lock IDs "a priori").
Ugh. I hate when that kind of thing happens.

So, as Dave says, keep LOCKNEW and LOCKRET. I get it now. (But I still think you need at least LOCKGET, unless *that* is also possible and I have never noticed. which is conceivable.)

Seairth · 2015-07-22 17:02

I think that could be collapsed to:

LOCKSET D/#n {WC} {WZ}
LOCKCLR D/#n {WC} {WZ}

where:

n = 0..15 (or 0..31)

WC = write current lock state to C

WZ = cause a lock interrupt when the lock is acquired / released

As usual, LOCKSET would block until the specified lock is available.

The fact that hub reads/writes are atomic is not sufficient for my uses, as I often deal with large buffers, and hardware locks are much more efficient for my purposes.

I only added WZ to cause interrupts as it may be useful to have lock set/clear to cause interrupts

I need the ability to specify the lock to be used, however others may want an LOCKNEW / LOCKRET for allocating/freeing locks.

Are you suggesting that there should not be a LOCKGET? Without a LOCKGET, how can you know the current value of a lock without mutating it?
I admit that the LOCKTOG is simply a "nice to have", but I think it nicely fills out the possible atomic operations you could do: SET, CLEAR, TOGGLE. I think LOCKTOG would only have meaning when used with the new interrupt capability (and GETEDGE instruction). It allows lock bits to be used as an "event", which can be very useful.

Roy Eltham · 2015-07-22 17:35

You people do realize that "holding back" and being negative about new ideas/features is pretty much the opposite of what Chip wants, right?
He doesn't want to "ship" something that feels incomplete or that isn't fun to work with... Something you are holding back posting about could be the thing that tips the scale for Chip in the direction he wants.
Ultimately it's up to Chip/Ken/Parallax when we get an FPGA image and so on, and everyone here being silent about features isn't going to make it happen faster, in fact, it's likely to make it take longer....

potatohead · 2015-07-22 17:37

The point of the locks was to insure their state isn't changed in an arbitrary way. Hiding it, and blocking the SET command is how P1 does that. If you can't SET it, then you know the state of it.

Otherwise, one might as well just use some ordinary bits to lock on and then trust that code just isn't changing them. On P1, code can't actually change them, which is the point.

IMHO, this is why many didn't use LOCKS much on P1. They really wanted flags, not semiphores.

potatohead · 2015-07-22 18:06

[deleted, forum mangled it all

]

potatohead · 2015-07-22 18:07

You people do realize that "holding back" and being negative about new ideas/features is pretty much the opposite of what Chip wants, right?
He doesn't want to "ship" something that feels incomplete or that isn't fun to work with... Something you are holding back posting about could be the thing that tips the scale for Chip in the direction he wants.
Ultimately it's up to Chip/Ken/Parallax when we get an FPGA image and so on, and everyone here being silent about features isn't going to make it happen faster, in fact, it's likely to make it take longer....

Well great, and I get this: He doesn't want to "ship" something that feels incomplete or that isn't fun to work with.

Frankly, that design goal and the expectations of some here are clashing big. I want the fun stuff for sure! That is one of the primary reasons I'm here. And I believe in the longer term worth of that too. Many others do not. Their loss. Seriously.

Let's have that memory trap for external / virtual memory then!

(honestly, that's part of a fun little computer system we could make with this one day)

More seriously, it seems like there is some confusion on scope. On hot, we had a killer video subsystem. That came as feature requests and advice over time. Andre' and others helped to spec it. In the end, doing software with more COGS makes the best sense given how the process limits played out. All kinds of fun packed into that one too.

In a rough sense, that means more is going to be done in software and the larger number of COGS reflects that.

Doesn't that kind of rule out feature requests along those lines?

Now, use cases are different. Doing video in software, as an example, might present some difficulty with the current feature set. For this example, it's hard to know that, until we build some stuff, but for some others, it's not so hard to envision and discuss now. That is how interrupts happened. The use case of the monitor and the ROM in general highlighted the need for it. Perfect!

Where it can be foreseen, making sure we've got potential use cases covered seems fair game, which is why I said what I did. We all want that, or not having it will cost us an iteration, or those use cases being crappy, which is just a waste. And for sure not fun.

So I'm not going to ask for some video related hardware features, like color management, because that choice was made to do that stuff in software, and frankly that is a choice I really agree with.

It may well be an ask makes sense given a use case sucks though. It may be that we could really use an instruction or other OP to help with color management or some other tasks.

When we get the FPGA, I expect a lot of those because we will have the docs and will be attempting to actually do stuff instead of thinking about doing stuff, but shouldn't we take at least one pass and attempt to do stuff to flesh those out some and do some qualifying? Where there is a real limit on something fun, or compelling, why not, right?

Some people, worried about time, are going to say, "look at all those new features!!! It's never gonna get done!", and they would be wrong as most of the requests will be to optimize a use case. We should not, and arguably, cannot ignore those.

Where is the line on all of that Roy?

Maybe you can express it better than I did.

Heater. · 2015-07-22 18:30

potatohead,

In the general case, multiple readers and/or multiple writers to a common data structure, using "ordinary bits to lock on and then trust that code just isn't changing them" does not work.

That is why we need some kind of atomic operation. Like Intel's LOCEXG instruction. A means of reading some flag and setting it if we want to without anyone being able to get in between the read and write of that flag.

Turns out most Prop programs don't need this. It's not required for a single writer and a single reader. But to not have such an atomic operation would be negligent in a multi-processor system.

Now I'm wondering how OpenMP in prop-gcc does this. Does OpenMP use locks?

Heater. · 2015-07-22 18:36

potatohead,

I don't know how to put this nicely but sometimes I have a really hard time decoding what you write. For example:

"It may well be an ask makes sense given a use case sucks though."

What is that about?

Dave Hein · 2015-07-22 18:39

A PropGCC program allocates a single Prop lock when it first boots up, and it uses that to do all of it's atomic operations. It's used for OpenMP, pthreads, mutex locks and multi-cog I/O operations.

potatohead · 2015-07-22 18:43

No worries Heater. I won't be offended.

Making an ask backed by a compelling use case is different from making an ask without.

If a compelling use case is being hobbled by, or isn't as compelling due to a feature limitation or missing feature, asking to remedy that makes a lot of sense

There it is expressed two ways, and I am for sure influenced, maybe in a negative say by my last few interactions with product managers...

David Betz · 2015-07-22 18:44

So I guess this effort is to create the ideal successor to P1 no matter how long it takes. I guess that shows that my idea of waiting until there is silicon to do any tools work is a good one. No telling what might change and how much in the process of perfecting P2.

Heater. · 2015-07-22 18:55

potatohead,

"Making an ask backed by..."

Perhaps I'm with you.

It's some weird American use of the word "ask". Using it as a noun instead of a verb. Like "I have an ask in my pocket" or "That is a big ask you have there"

Normally I would expect "Making an request backed by..." or "Requesting a feature backed by..."

Never mind. I'm just not up to speed with American

Dave Hein · 2015-07-22 18:56

So I guess this effort is to create the ideal successor to P1 no matter how long it takes. I guess that shows that my idea of waiting until there is silicon to do any tools work is a good one. No telling what might change and how much in the process of perfecting P2.

I hope you are just kidding, but I didn't see any emoticons. My understanding is that as soon as we get the first FPGA image we can start testing it out with lots of code. With enough people hammering on it we should uncover any bugs that are in the design, and people can suggest fixes and enhancements. That's why I think it's important to get the FGPA image ASAP. All this tweaking on interrupts and locks can be done in the future FPGA images.

So I think it's important that the PropGCC work begin as soon as the FPGA image is available. In theory, it could be done based on the instruction doc that Chip posted even though that is lacking in a lot of details.

potatohead · 2015-07-22 18:56

Yes. I am among people who do that Heater. I'll reconsider that use going forward.

potatohead · 2015-07-22 19:00

I second Dave's comments

Dave Hein · 2015-07-22 19:01

Heater, I was confused by the phrase "Making an ask backed by...". It's not an American thing as far as I know. Maybe it's a regional thing.

Heater. · 2015-07-22 19:05

Dave and David,
Given past experience I would not be investing any time into creating compilers and tools for the P II until it is cast into silicon and I get the chips in my sweaty paws.
That's not to say that compiler writers should not look at whatever comes in the FPGA and make suggestions that might help compilers produce efficient code. If they are so inclined.

potatohead · 2015-07-22 19:05

Deffo regional. Until it was brought up, I didn't think about it. Now that I have, it is odd. May be culture too. The group I work with does it a lot, and are from AK, AZ, midwest...

Maybe I'll ask them about it.

potatohead · 2015-07-22 19:07

Man, I don't see it changing that much and am kind of hoping gcc is there early like it was before.

David Betz · 2015-07-22 19:12

So I guess this effort is to create the ideal successor to P1 no matter how long it takes. I guess that shows that my idea of waiting until there is silicon to do any tools work is a good one. No telling what might change and how much in the process of perfecting P2.

I hope you are just kidding, but I didn't see any emoticons. My understanding is that as soon as we get the first FPGA image we can start testing it out with lots of code. With enough people hammering on it we should uncover any bugs that are in the design, and people can suggest fixes and enhancements. That's why I think it's important to get the FGPA image ASAP. All this tweaking on interrupts and locks can be done in the future FPGA images.

So I think it's important that the PropGCC work begin as soon as the FPGA image is available. In theory, it could be done based on the instruction doc that Chip posted even though that is lacking in a lot of details.

Can you guarantee that the instruction set won't be revamped yet again to make room for new features that get added after the FPGA image is released? I originally asked for the instruction set to be posted to begin propgcc work but lost interest when it became clear that this was going to be another long drawn out design evolution discussion that could very well redefine the instruction set again. It's not worth targeting an instruction set that is itself a moving target.

Heater. · 2015-07-22 19:12

Well, it seems using verbs as nouns is not generally a feature of the American language. Even writers from the New York Times have an issue with it:
http://opinionator.blogs.nytimes.com/2013/03/30/those-irritating-verbs-as-nouns/?_r=0
I love the cartoon on that page.

potatohead · 2015-07-22 19:17

Awesome piece. I've been influenced by heathens who appear to specialize in the making of the understanding...

Ah well. Good to know now.

Heater. · 2015-07-22 19:31

We all have our own specialize, lets maintain the curious and always make the ask. That way we can often get the solve.
Hmm. I think I have the get of the hang of this

You are right, it is weird:
A skydiver jumps and then one can say "that was a great jump"
A questioner asks but then one does not say "that was a great ask"

Ale · 2015-07-22 19:33

Sometimes I think on why I like the propeller and why nowadays I use something else. I wanted something with more IOs., faster faster IO, not a faster CPU. I have 3 XMOS boards, all of them with the quad-core version (the real quad core I mean with 8 threads per chip). I don't use they either. they are really power hungry in comparison with the propeller and more powerful and bla bla but what I like are the ports, clocked ports with FIFOs.
If I look at what we liked and what not, I see that a couple of things were common:
more memory, more ports, a bit more flexible ports. And low power ? For 1 W I can get a cortex-a9, it is not a PII but it has loads of memory, and floating point and SIMD. But deterministic, it is not. And for 1 W we can get the XMOS chips, the new ones have finally more memory

.
What worked (and still does) with the P1 was simple programmer+Prop+Crystal (or not) plus Propellertool,5 Minutes (more like 15) soldering and your program has VGA output and keyboard. I also did that for the XMOS chip, but with another power envelope.
A cortex-M4 with a ton of peripherals and 180 MHz needs like 50 mA ?... Maybe we only needed 100 MIPS per COG and a bit more flexible ports a serializer or two...and a package like Propeller+Propplug+Proptool and 5 minutes soldering to get us going.
Sometimes I think a P1.5 that can do 25 MIPS (I know P1 can) and comes in PLCC68 or 84 should have been already there.

Seairth · 2015-07-22 19:35

Maybe I'll ask them about it.

This sentence requires a verb.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments