Propeller II update - BLOG

potatohead · 2011-05-22 15:33

Seriously. I'm going to listen to the I/O discussion again. There is a lot packed into a few minutes, just like the pins!

@Cluso, YCbCr is component video, where the "Y" signal is luma, and the two others are a red and blue difference signal. The net result is still three channels to the display, like RGB, just a different color space. The one nice thing about that format, is it is on nearly every HDTV display I have seen, where a PC or RGB signal isn't. Where available, it clocks all the way down to ordinary TV sweep frequencies, for those devices that have it. Should be able to grab most any HD display device, and drive it very nicely with a Prop, using familiar sync signals and such.

(neither of my HD displays have VGA, just HDMI and the component, and no, I didn't pick them out, or they would have)

On my ordinary CRT, component easily does 640x480i pixel perfect. Would have setup for this on Prop I, but it costs 9 pins, and component capable devices that appeal to this crowd pretty much don't exist.

Roy Eltham · 2011-05-22 17:10

I have been working with Chip a little, and will be working with him some more over the next couple of days. I will ask him what can be shared here, and see if I can post some details about some of the new instructions.
Cluso, the other 8bits of the 32bit colors is an alpha channel. Allowing for better alpha blending (per texel factored in to the interpolated alpha).

RossH · 2011-05-22 18:26

Roy Eltham wrote: »

I have been working with Chip a little, and will be working with him some more over the next couple of days. I will ask him what can be shared here, and see if I can post some details about some of the new instructions.

Hi Roy,

It seems a little odd to me that we could be just three months from seeing an actual chip, yet still not have the instruction set documented - even in draft form. I realize Parallax is a little "unorthodox" (and this is one of its strengths) but for me this really highlights the fact that Parallax places a great deal of emphasis on building fantastic hardware, but not much emphasis on building (or enabling to be built) the software that will be required for anyone^** to make effective use of it.

Or, to put it more succcintly, how do you think the mainstream microcontroller market will react to this wonderful new chip if there are no tools available to program it? I can't imagine anything more likely to ensure that the Propeller II sinks without trace.

Please pass this message onto Chip!

Ross.

^** Excluding the members of these forums, who I'm sure would be happy to program in hand assembled binary if it meant they could get their hands on one!

Cluso99 · 2011-05-22 18:44

heater: Yes, almost a Z80 on every pin.. ooh! The mind boggles. IIRC Intel is up to 4 billion transistors on a chip. Now I want a Prop III 1GHz 64bit with 256 cogs, each with 256KB and 1MB hub, on a PGA (pin grid array 0.1"). I am happy with 92 I/O. Unfortunately, I don't see enough people willing to pay for its development so we can get Chip & co to do it.

potatohead: Yes, many more products are now coming with YCbCr connections. However, there are still a lot of RGB too.

Roy: Thanks for the clarification. I will correct my post. This alpha blending is way over my head so I will just believe it will make a better pic.
Congratulations on working with Chip on this. Not only is he brilliant, but he has time to listen and take ideas to improve the design.

New phrase "Nowhere but Parallax".

With the new fab being used, is it worth re-asking if the onboard oscillator can be calibrated with a register (like on the ATtiny84 etc)? We could use some of the fuse bits to store the calabration for this provided we don't require them all for code protection.

potatohead · 2011-05-22 19:40

Yeah, hoping we see both more often on more sets. RGB is needed for the average PC, and of course the prop will output it easy.

jazzed · 2011-05-22 19:47

RossH wrote: »

It seems a little odd to me that we could be just three months from seeing an actual chip, yet still not have the instruction set documented - even in draft form.

It's coming. Just be patient.

potatohead · 2011-05-22 19:52

Given Chip is going to add at least one instruction, I think the close timing is some artifact of the design process change. That's my take anyway.

(hand raised to program in binary on the new device, LOL!!)

Tubular · 2011-05-22 19:56

Ross,

I gather the synthesis process employed allows instruction flexibility right up to the last minute.

In Chip's talk he seemed to even be still adding some potential instructions ( a CLUT autoincrementing Push/Pop to save one instruction??)

(Don't anyone dare start a 'what extra instructions would you like to see in Prop2' thread... save them for Prop2B/C/D which, if I understand the process correctly, could be theoretically synthesized very quickly indeed)

Phil Pilgrim (PhiPi) · 2011-05-22 20:05

I sincerely hope that Parallax does not release a half-baked Prop II without all the planned features, including code protection, and without a fully-fleshed-out dev system. We've waited this long; there's no need to rush into production with something half-baked. To do so would send a very bad signal to the market and cripple Parallax Semiconductor's nascent credibility. To Parallax: dot all your i's cross all your t's, and get your ducks in order before announcing any product release. I don't care how long it takes.

To all of you who are so anxious to have a Prop II in your hot little hands that you encourage action to the contrary: you do so at Parallax's -- and ultimately our -- peril. Just stop it. Please.

-Phil

potatohead · 2011-05-22 20:26

I'm on no particular timeline, just happy to be excited over what I consider to be a informative tech review, done for that exact purpose. Expressing that is ok to do. I would assume engineering samples, if that route is chosen, or is necessary, say to finish up said dev tools, would be handled with the right concerns in play.

Mike Green · 2011-05-22 20:34

FWIW, I would echo Phil's comments. The Propeller I was accompanied by a very impressive statement of philosophy from Chip, one rarely found anywhere in this industry. I would be very surprised (and disturbed) to find the Propeller II not treated the same way. I have to assume that Chip did due diligence with the company that's now doing the synthesis work on part of the central logic of the Prop II and decided that the advantages in using their design tools and parts library in building this central logic outweighs the loss of detailed control over the design and the detailed feedback on timing and power control gives him advantages that he wouldn't have with the design tools and methodology he had been using.

Cluso99 · 2011-05-22 20:57

I am not sure exactly what is in the synthesised logic, apart from the hub access by the cogs. Perhaps it contains the counter logic also because that is where I gather the NRZI and bit-stuffing resides. However, I did get the distinct impression that the cogs are almost complete and that is where the instruction set resides. In the almost completed sections of the cogs are the ROM tables for cordic etc, and the fifo/clut.

Phil and Mike: In no way do I expect Chip or Parallax to release a half baked prop. But, if the first release comes earlier without code protection and is followed up fairly quickly with the code protection version, this would get it into the developers hands quicker. Then all the compilers, etc, can be worked on; prototypes built, and even production. If you actually look at all the other chips out there, they all have revisions and extra features being added continually. The biggest problem is to ensure backward compatibility between chip revisions. This now seems to be the way of the future, and the tools to allow these things to be done are now much cheaper. As I understand from the video, this is why the direction of the prop II has changed. Now there are expected to be a family or a series of Prop IIs.

Tubular · 2011-05-22 21:13

Phil: "there's no need to rush into production with something half-baked"

I don't see the Prop2 as it stands as half baked, but starting to rise and exuding a very pleasant odour.

I need Prop2 for its pin hardware, and I see Parallax's strength as their hardware and ability to innovate and educate. While I like the Prop Tool, its enough for Parallax to innovate on the hardware side, and partner up on the software side, write a small number of application notes that highlight unique features, and get it out there. It's not necessary to perfect the triad of hardware, software and the supporting sauce (app notes, customer support etc).

I have no doubt Parallax will have a very attractive package at release. The Parallax Semiconductor site is a great start. But I worry sometimes that idealism/perfectionism spreads too far. The small bootloader and synthesis of guts seem to me to be good practical decisions, perhaps at expense of some clock speed. The hardware must be the critical path.

Mike: Chip seems to make a very impressive statement of philosophy each year at UPEW

Roy Eltham · 2011-05-22 22:19

Re: synthesized parts vs built by chip/beau by hand parts. The parts that are built directly by Beau and Chip are the I/O pads around the edge, the memories, the cog memories (rams and roms), the PLLs on the cogs, and the fuse bits stuff. The synthesized part is the combination of the 8 cog "guts" and the hub interface logic. That was what Beau was showing on the screen in the layout tool. The cog "guts" designed by Chip in Verilog (was in AHDL) and tested on a big FPGA, that gets sent out to the other company to synthesize (to ASIC form).

So the design is still very much in Chip's (and Beau's) control. It's just the nitty gritty details of actual transistor/gate layout for the cog guts that is given over to the synthesizing software. Except in the case of the memory parts (hub and cog, ram and rom), I/O pads, and the busses connecting those to the synthesized portion.

This is why Chip can still be tuning the instructions and such still. In fact, part of the process of working with the synthesizing tools at the other company is tweaking and tuning the Verilog to fix critical paths and tune timings and layout size.

Like Chip said in part of the talk, they are sort of just catching up to how the rest of the industry has been building chips for the last 10-15 years or so. Hand laying out every gate of a chip is very difficult and time consuming as your chip gets larger and more complex. However, there is always going to be some hand done parts.

Also, I may be wrong, but the impression I got from the talk was that the version that got made without the protection stuff enabled was the engineering samples... and that the actual final product that Parallax ships would include the final version with the fuse bits and bootloader working to allow for protection.

RossH · 2011-05-22 23:41

Roy Eltham wrote: »

This is why Chip can still be tuning the instructions and such still. In fact, part of the process of working with the synthesizing tools at the other company is tweaking and tuning the Verilog to fix critical paths and tune timings and layout size.

Hi Roy,

This is the bit that worries me. I'm certainly not a hardware design engineer, but even a cusory examination of the normal sequence of micro design steps (e.g. http://en.wikibooks.org/wiki/Microprocessor_Design) seems to indicate that you don't leave designing the instruction set till last. For one thing, the verification step that it does what you wanted it to do can take longer than all the other steps combined.

Parallax probably has a perfect right to be confident in their own designs and chip layouts (which was the case for the Prop I) - but to have that degree of faith in a design that they have outsourced? Using a new (to them) design language, new design philosophy and new fabrication technique?

This just seems a little optimistic to me.

Ross.

potatohead · 2011-05-23 00:18

Well, I'm a bit intrigued now.

Chip is maintaining a schematic, with Beau laying it out in silicon. They've got a simulation mapped to that with the instruction state, mapped too. So, in the case of the fifo instruction being added, can't they roll that up through the simulation and then verify?

What he said was they have those guys generate, synthesize the logic on silicon, according to some out of this world expensive, rule based software, which then outputs the block, for them to examine, along with a lot of timing and electrical data, identified as being non-compliant, or in need of verification.

On Prop I, that process was slow, because one person isn't able to see the whole picture, and so it happened that building out one thing, impacted another, requiring a rebuild, etc... That's the years it took for Prop I.

Compressing that, simply means being able to iterate at a higher level, faster, while still performing all the same vetting that was done before.

I would be curious to know whether or not Prop I instructions were set in stone early on as well. I suspect some of them were not, as the design progressed, they were added, just as this one is being fine tuned now. Chip has programs running on his COG simulator now, and is proving out paths, silicon capability, etc...

Would it not make the most sense to balance the instructions with the overall realities possible in the design, and actually not know those are fully defined until it all gets rather close?

Makes me wonder actually, when we see other chips with buggy instruction forms, etc... maybe there is a cost of assumption early on, where instructions are defined, then layout proceeds to not quite meet expectations?

Cluso99 · 2011-05-23 00:23

Roy: Thanks for the clarification.

Ross: From the video, it seems that the outsourcing part seems to be the most reliable part of the equation, and cuts man-years off the design over being laid by hand. I understand Chip has the cog instruction code running in the ASIC using VHDL and that translates directly to the Synthesis guys software and returns back to Chip the timing delays. This was the exciting part of Chips video because it allows Parallax to be able to produce variants (families) relatively cheaply. I do know someone who is interested in this once we get Prop IIs (care to identify yourself?).

Sapieha · 2011-05-23 01:36

Hi All.

In all talk on Prop II one thing disturb me -- Why not Use CLUT memory more reliable -- As in most cases This memory will not be used as not all COG's will run VGA else other type of Drivers for displays.

What I have missing in descriptions are -- One Indirect register and POP/PUSH instructions that automatic increment/decrement Indirect register AND use for that STACK mem CLUT memory
That can give all kernels that support C and like languages that NEED Stack type memory. With same speed as COG not as HUB that are 8x+

Heater. · 2011-05-23 02:02

"Half-baked" No way, I don't believe Parallax and especially Chip would release anything that is not as perfect as can be.

When I worked for the old Northern Telcom we started to find bugs in the, then new, Intel 286. Multiply by an immediate value that happened to be negative did not work for example.

After much negotiating we got hold of, under NDA, a two inch thick document from Intel detailing all the known bugs in the 286. Mostly to do with the new protected mode features and how to bust security there.

I just can't imagine Parallax would do that.

RossH is right to be worried about finalization and documentation of the instruction set. But I see no reason why the chip can't be out the door prior to having a full suite of dev tools. Perhaps as engineering samples so that people like Ross can get their tools working for Prop II. Those tools might take a while to perfect.

RossH · 2011-05-23 02:07

potatohead wrote: »

Would it not make the most sense to balance the instructions with the overall realities possible in the design, and actually not know those are fully defined until it all gets rather close?

If we were talking about just a few unusual instructions that only a few propeller-heads would ever even want to use, then I would agree with you. But during UPEW, Chip was talking about adding instructions that could lie at the very heart of any stack-based language implementation. For example, whether or not he might be able to add instructions to allow the use of the CLUT as a program stack.

Perhaps this is just the way Chip always does things. I'm just a bit concerned that by leaving these decisions until the very last minute, and then possibly having to change them anyway when it turns out that the required few micrometres of silicon simply can't be found (or because including them disturbs the timing of another critical part of the cog circuitry in a completely unforseen manner), then Parallax may be just asking for undeserved and avoidable grief.

Anecdotally, a GCC port is said to take 3 - 6 months when done by experienced GCC developers who have working silicon and a fairly "orthodox" architecture to target. A GCC port done by non-GCC developers, to a chip that is not fully defined even in some of its fundamental features - and for a very unorthodox architecture - might be expected to take somewhat longer. A GCC port that has to be re-written to take advantage of new instructions on said unorthodox architecture - or to remove the use of instructions that didn't quite pan out as expected - could be expected to take even longer!

If Parallax releases a Propeller II that has nothing more supporting it than a slightly enhanced Propeller Tool, plus the promise of a GCC port "sometime down the track ... once we figure out how exactly how best to do it", then we here in the forums might be ecstatic - but how well do you think it would be received by the customers that Parallax was hoping would buy them by the millions?

The Propeller II would be relegated to that dusty corner on many professional engineer's work benches reserved for "interesting novelties that sound really cool but I can't currently figure out how to actually use". The same dusty corner (incidentally) that is currently occupied by the Propeller I. Many months down the track, once it has gathered sufficient dust, or (worse) been buried under a layer of new ARM, PIC or XYZ chips, how many hardware engineers are going to be excited by the release of a GCC toolchain?

Ross.

EDIT: I see I have cross-posted with Heater and Sapieha - both of whom raise some of the same concerns.

Cluso99 · 2011-05-23 02:47

Sapieha: I understood from the video that PUSH is already there and Chip will add a POP instruction. Therefore, the clut can be used as a stack (fifo). He did say that the clut cannot be used as either the source or the destination as they are special instructions. So, it seems that not only will we be able to use the clut as a stack, but also as storage although we have to access it indirectly. This is good news, because otherwise it is a big waste of space, as you said.

Heater. · 2011-05-23 04:21

Could someone please elaborate on this idea of CLUT as stack or FIFO I seem to have missed a point here?
How big might it be? How is it to be accessed?

I would have thought that if only PUSH and POP is available that makes somewhat useless as a stack as fast access to any stack offset is also required.

Is there any support for creating faster stacks in HUB?

Heater. · 2011-05-23 04:21

Could someone please elaborate on this idea of CLUT as stack or FIFO I seem to have missed a point here?
How big might it be? How is it to be accessed?

I would have thought that if only PUSH and POP is available that makes somewhat useless as a stack as fast access to any stack offset is also required.

Is there any support for creating faster stacks in HUB?

RossH · 2011-05-23 04:42

Heater. wrote: »

Could someone please elaborate on this idea of CLUT as stack or FIFO I seem to have missed a point here?
How big might it be? How is it to be accessed?

I would have thought that if only PUSH and POP is available that makes somewhat useless as a stack as fast access to any stack offset is also required.

Is there any support for creating faster stacks in HUB?

Hi Heater,

It's hard to elaborate when there is no actual documentation (*sigh*) - there's just a few comments by Chip and others which could easily (as I said earlier) end up not being implemented at all. But you're essentially correct. The CLUT cannot be used as a general purpose program stack since you have no direct access to it. It could, however, be used as a simple Cog-based "return address" stack for some small language implementations (not just LMM ones).

However, slightly more definite is that there is going to be a kind of limited base+offset addressing mode - this will dramatically speed up manipulation of Hub-based stacks, and also dramatically reduce the code size, since in many cases we will be able to do in one instruction what currently takes two or three.

Ross.

Heater. · 2011-05-23 05:06

definite is that there is going to be a kind of limited base+offset addressing mode

Hmm...Zog is going to like that. Surely Catalina as well.

Sapieha · 2011-05-23 05:58

Hi Cluso.

That is ok -- BUT Indirect register to that instructions NEED be accessible to be possible to change its value if needed. AND as I said before automatically Increased/Decreased by that POP/PUSH instructions.

Cluso99 wrote: »

Sapieha: I understood from the video that PUSH is already there and Chip will add a POP instruction. Therefore, the clut can be used as a stack (fifo). He did say that the clut cannot be used as either the source or the destination as they are special instructions. So, it seems that not only will we be able to use the clut as a stack, but also as storage although we have to access it indirectly. This is good news, because otherwise it is a big waste of space, as you said.

davidsaunders · 2011-05-23 06:08

So long as Chip has a couple of weeks to fully test out the changes what is the concern? The logic to decode opcodes in any CPU is pretty small and simple so if the other core support is already in place there is no reason that Chip can not modify the instruction set a little up to about a week before the chip is sent off for fab.

If you wish to verify the simplicity of instruction decoders, look at some of the CPUs on OpenCores.

Roy Eltham · 2011-05-23 07:28

What is currently implemented for the CLUT access on the Prop 2:

SETCLUT -- this sets the address pointer for the CLUT
RDCLUT -- this reads the CLUT, and then increments the address pointer
WRCLUT -- this writes to the CLUT, and then increments the address pointer

Those alone allow you use the CLUT as general storage. You can even use it as a stack, it would just require multiple instructions to work (extra SETCLUT's).

The proposed changes would be to add alternate RDCLUT and WRCLUT variants that do different variants of pre and post increment or decrement. Making it so that you can implement a stack with less instructions, and also do other forms of access. There was also a discussion between Chip and Bill Henning about indirect call/return functionality through the CLUT. Once this stuff is implemented in the Verilog and tested in the FPGA simulation, then there will be more specifics to talk about.

I believe it's in the near term plans to better document the instructions that have already been listed out. So Ross, please just wait a little longer and you will be happy with what is available.

Roy

potatohead · 2011-05-23 07:35

Nice.

I suspect we are going to be very impressed with what will fit into just the COG on Prop II.

Sapieha · 2011-05-23 07:48

Hi Roy.

First Thanks for explanation.

Now I see it only needs ONE extra instruction for STACK usage. -- WRCLUTD - write CLUT with decrementing CLUT Address register/pointer

Roy Eltham wrote: »

What is currently implemented for the CLUT access on the Prop 2:

SETCLUT -- this sets the address pointer for the CLUT
RDCLUT -- this reads the CLUT, and then increments the address pointer
WRCLUT -- this writes to the CLUT, and then increments the address pointer

Those alone allow you use the CLUT as general storage. You can even use it as a stack, it would just require multiple instructions to work (extra SETCLUT's).

The proposed changes would be to add alternate RDCLUT and WRCLUT variants that do different variants of pre and post increment or decrement. Making it so that you can implement a stack with less instructions, and also do other forms of access. There was also a discussion between Chip and Bill Henning about indirect call/return functionality through the CLUT. Once this stuff is implemented in the Verilog and tested in the FPGA simulation, then there will be more specifics to talk about.

I believe it's in the near term plans to better document the instructions that have already been listed out. So Ross, please just wait a little longer and you will be happy with what is available.

Roy

Propeller II update - BLOG

Comments