I have to agree with Kbash, this redesign of the P2 worries me. It looks like the case of the creeping features creature. And how many more shuttle runs will it take to make the changes Chip has implemented workable? Let alone for testers to find out what bugs exist in his newest version.
And what happens if another shuttle run delivers dead chips. Will the Prop user community demand even more "enhancements" from Chip?
Chip opened the door on most of this, or he came to a design decision based on what was learned from the earlier synthesis, FPGA testing, etc...
Sure, there are some people pushing hard for this and that. From where I stand, that's always happened. What also has always happened is Chip took the discussion, factored it down to what made good sense and he did those things. That's not a demand so much as an aggressive and frank discussion. --a discussion he invited and appears in complete control over.
Given what was learned so far, he believes this refactor makes best sense. It's more than enhancements. Some of it came down to better / more optimal ways to do things to reduce the timing constraints the synthesis must resolve. IMHO, doing that kind of thing is just as important, if not more important, than sorting out the optimal instruction / feature layout. It's more important, because the core timing constraints impact the chance of success in synthesis and that impacts cost and time, or so I understand.
When I see, "It's just a few mux'es" to do the SETTRACE instruction, and when I see why; namely, Chip finding that functionality needed for his own engineering, I tend to see that all as a very good thing.
When I see the instruction set factored down to something more sane, better case addressing modes, etc... I also think that's a good thing, and it's something very testable too.
Here's what I don't know: I don't know whether or not changing nothing would improve or degrade or keep the chance of the synthesis being a good one. Chip did say he discovered some constructs he was using tended to make the synthesis task more difficult due to maintaining chip wide timing paths and constraints. Given he's factored those out, I would say that has to be a net gain, leaving us a more synthesis friendly design. On that basis, we are better off for having changed it.
Given a more synthesis friendly design, changing higher level things like the instruction decoding are much lower risk and more easily testable things than say, altering lower level things would be.
I'm also operating under the assumption that the new vendor / synthesis relationship comes with some ability to talk these things through too.
So then, at the end of the day, we still need to do at least one synthesis and shuttle. It's that regardless of whether or not some instructions changed. And the major factor putting the synthesis at risk isn't those kinds of changes either, which means we are basically improving functionality / throughput without also adding to the risk in like kind, which is why this makes sense to do right now.
I suspect if that were not true, Chip would be more conservative about what is being done right now.
I'm wondering this too. PropGCC for P2 is now broken and will require an unknown amount of work to fix. I'm sure all of these changes will make P2 better but I wonder how much later it will be because of them?
True, but that's an opcode mapping change, which is not hard to propagate.
( I don't think any important opcodes were removed ?)
Not yet mentioned, is one gain from a more regular Opcode decode, is smaller logic trees and thus a faster execution on a given process.
ie it may be the new P2 runs faster, (or has more margin) ?
PropGCC for P2 is now broken and will require an unknown amount of work to fix.
What?!!
Yes, of course, I think I did see some opcode remapping fly by here. Anything else getting in the way of propgcc?
This is a blow to me as was desperately trying to find time to get back to my nano board, the latest Prop configuration and propgcc. What can I do now?
Can we stop all talk of uses of STRACE? This is a potential huge can of worms.
STRACE looks like a quick hack Chip put into the PII so he could get some chip state out for debugging his design.
All of a sudden we have proposals for it's use, for bit banging, for debuggers, for chip to chip communications, for fault tolerant systems etc etc...
I would be very happy to see it as an undocumented "feature" that is only possible to enable under some weird circumstances (Which goes against all the openness of the PII) or removed entirely from the PII when delivered.
We should not be so quick to build glorious plans on a quick debug hack.
True, but that's an opcode mapping change, which is not hard to propagate.
( I don't think any important opcodes were removed ?)
Not yet mentioned, is one gain from a more regular Opcode decode, is smaller logic trees and thus a faster execution on a given process.
ie it may be the new P2 runs faster, (or has more margin) ?
That's why I said that the amount of work required to update PropGCC was unknown. It will probably mostly be opcode changes which I've already written a program to handle if Chip's opcode table looks anything like the one he had before. In any case, changing opcodes won't be a big deal. If there are many new instruction forms then new parsers have to be written. Also not that big a deal I guess but lots of little changes mount up. I'm just anxious to see a final instruction set so I can get the PropGCC changes made. I don't like the P2 compiler being broken.
Also, I guess I'm a bit jealous that none of these recent changes seem to address the difficulty of generating native code for the COG. :-)
I guess nothing would help there as long as we're limited to 2K bytes of COG memory other than hardware caching and that doesn't see to be in the cards.
Some of this makes sense, but I worry that each new tweak may add unforeseen problems downstream. That is interesting news about redesigning to make synthesis work better.
This is a blow to me as was desperately trying to find time to get back to my nano board, the latest Prop configuration and propgcc. What can I do now?
After Chip released the most recent FPGA configuration files I made a few changes to the opcode table to get fibo working but never finished updating PropGCC because it was clear that Chip would continue to make changes and I wanted to wait until things settled down. That is the state we are in now. You can use PropGCC with the previous FPGA configuration but not really with the current one. However, updating PropGCC will be high on my priority list once a new instruction set is released! I want it too! :-)
One of the more 'bemoaned' points about the Propeller was that there was little or no debugging tools available, particularly singlestepping and breakpoints, and that this would hinder widespread industry acceptance.
Now people are moaning about these features being added?
One of the more 'bemoaned' points about the Propeller was that there was little or no debugging tools available, particularly singlestepping and breakpoints, and that this would hinder widespread industry acceptance.
Now people are moaning about these features being added?
I don't think anyone objects to any of these cool new features. They're just worried that we might have to wait until 2016 to see them in a production chip! :-)
I think the changes are good and necessary, I agree it needs to be done soon, but it also needs to be done right.
The risk in releasing the P2 'as it was' and then following up with a P3 soon after with a revamped instruction set, added debug capabilty, serdes, etc, would leave the P2 obsoleted by the P3.
Given Parallax's goal of not having constant churn in the chip lineup they would be left with the choice of keeping around a very low volume chip, or bite the bullet and obsolete it.
I am worried too. The P1 was introduced 2006, a ten year development cycle is simply too long.
I am wondering what is happening behind the scenes, why Chip is taking the time to make such radical design changes, that also invalidate several (stable) software developments untill now. Has he decided to switch to another process, and leave the TSMC .18 um LP proces? Then the analog parts of the pinblocks would need redesign too.
The latest wafer run were not Multi Proces Wafers where multiple design are combined in one mask set, these yield typical 30-40 dice for one design per wafer. The pictures I have seen suggest that Parallax have produced several wafers with their own private maskset, so they must have been pretty confident the chips would work.
One suggestion: everyone asking for new features or functions will only be take seriously ij this forum if he/she delivers synthesizable Verilog code along with the suggested feature, so it will take minimal time of Chip's busy calender.
One suggestion: everyone asking for new features or functions will only be take seriously ij this forum if he/she delivers synthesizable Verilog code along with the suggested feature, so it will take minimal time of Chip's busy calender.
I don't see how this is possible since the Verilog source for the P2 is not available to forum members and probably rightly so.
I'm sure based on the fab schedules, the reworks needed and the funding availability, Ken has discussed with Chip an agreeable time frame to work within. There was a punch list of items that HAD TO BE DONE to make the next synthesis/shuttle a more probably success. I trust Chip and Ken aren't jeopardizing that. As Chip works through issues to streamline the synthesis/fab issues, he's coming across no cost/low risk improvements. Since he has a crew of eager and capable testers at the ready with FPGAs, this sh9old be one of the better pre-fab tested micros available.
Some of the changes are nice, some are changing the P2 to be what more people have been asking for and removing some of the "can't compete" roadblocks that some folks have been throwing in the path.
Dave, this was a semi-serous remark. I was referring to issues as the serial in / serial out discussion. This is someting that could be discussed in terms of Verilog code. Cluso is trying to catch it in schematic form. This helps to separate simple to implement ideas from impossible day-dreams.
We spoke later in the day and he had made some optimization to the serial routines that got him really excited. Then I very briefly bored him with some business details. (I left this part in just because I enjoy the complimentary contrast of genius between the two brothers)!
Infrequently on the forums we discuss the financial side of another foundry run. Expense is also significant as we enter our 8th year of R&D on this product. Parallax doesn't enter big financial commitments on credit, at least we don't want to use it very much. We have to generate the profit from sales. A conservative estimate of the next round of R&D and Propeller 2 fabrication according to the schedule I suggested above is around $250,000 (salaries, consultants, foundry, packaging, test hardware, etc.). I'm hoping it will be sooner than I guessed, obviously.
When it is ready (from all aspects), it will happen. The Gracey's have done well controlling risk and reward for 25 years, no reason to stop believing now.
Dave, this was a semi-serous remark. I was referring to issues as the serial in / serial out discussion. This is someting that could be discussed in terms of Verilog code. Cluso is trying to catch it in schematic form. This helps to separate simple to implement ideas from impossible day-dreams.
Ah, that makes sense. I was thinking more in terms of instruction set changes that would probably be hard to prototype without access to the current P2 source code. Something like the serial circuitry could probably be done independently. Thanks for clarifying.
I am wondering what is happening behind the scenes, why Chip is taking the time to make such radical design changes, that also invalidate several (stable) software developments untill now.
I certainly wouldn't say that these changes are likely to invalidate work done on PropGCC. I was just saying that some effort will have to be expended to bring it back in line with the new instruction set once it stabilizes. I don't know the extent of that effort yet but I don't expect it to be huge.
For what it is worth, I very much like the idea of bringing out internal COG states to pins for debugging purposes. As an ATE builder I have spend meny hours trying to convince product designers to incorporate testability features into their designs (PCB, software, IC's) to lower test time and cost, both during design verification and production test. It is an elementary form of making things observable. And it is one of the handy features of design using FPGA's, it is very easy to bring out internal states to pins and observe them with a logic analyzer.
@Heater, I am not too worried about SETTRACE. I think Chip will keep it simple and obvious. The other features mentioned may not even be considered. The core functionality, dumping of COG states, seems straightforward enough and useful. Other spiffy uses will likely remain in the domain of software.
I really do like it as an answer to the debugging exception. Having that instruction does mean a P2 can deliver runtime details. Package that up with some cool software, and I'm thinking a clever SPIN wrapper with assembly snippets used to capture and present that info to users, and suddenly we have a visual COG that will compete very nicely.
Debugging a core piece of PASM will seriously benefit. Arguably, P1 is simple enough to think through. P2 is more complex with the pipeline and task capability. You posted a fun little snippet of PASM that took considerable discussion to sort out. And it never really did get sorted for advanced cases. SETTRACE would improve that significantly, and it would do so with the most basic dump state functionality described so far too.
The can of worms has two core directions IMHO. One is extending what the instruction can do. I don't think that one will be a worry for reasons already. The other is abuse of the instruction. Not sure about that case, other than it can be discouraged.
If it is kept simple, I see it as an easy net gain.
Bill & Ahle2 - would you like to start a new thread to discuss possible simplification of these while Chip proceeds???
I leave it to Chip. If he really thinks it's worth it, I am sure he will come up with the best solution. I really want the P2 to be finished as soon as possible, so I will not make any more disctractions.
I think many of you are misinterpreting the instruction changes that Chip is doing. Each instruction is a Verilog (IIRC he is using Verilog, but the other is VHDL) set of statements. What Chip has done (apart from a few new instructions) is "regularise" the instruction set by taking the relatively unused "R" bit and using it to make a 7th instruction opcode bit. This actually makes the compilers simpler (and my deassembler/debugger) and more regular. It makes documenting instructions easier. The change, while certainly not as small as we would have liked, is a huge benefit for the P2. It also made available lots of new variants on instructions, as well as new ones. As Chip says, most of these are simple Verilog statements.
Chip also implemented some instruction changes that were on his list if the synthesis had to be redone. He has done this AFAIK. Also I think he separated the INx and OUTx registers from PINx. I think we are all in agreement this was required after detailed discussions.
Much of the way more complex parts of the P2 are the quad-long access to the hub, the clut/fifo, the video section, the counter section, and of course the pin configurations. None of these have changed. As I understand, the change in die supplier has not changed the outer parts of the layout which have already been done by Chip & Beau. The inner parts are the layouts (synthesis) done by the external team to Chip's Verilog code.
For verification and testing, Chip decided he needed a simple external set of pins he could look at on his logic analyser. This resulted in the SETTRACE addition. He required this for testing, but then thought (as he does) this would be a great debugging tool, so made it an instruction to turn on/off. Having done some work on P1 & P2 debuggers, this is magical!
Rightly so, some of us realised this could impact the P2 security and just reminded Chip in case this had been overlooked.
I have asked for a simple addition (when in this mode) to allow an external pin to "force a stall" in the instruction pipe. This would give us single stepping. There are a lot of caveats for its use, but what a debug facility! And remember, on the P2, we will have other cogs that can do the debugging, making this a really powerful debugger addition. IMHO it would be far superior to JTAG.
Just before Chip started on the instruction changes, he implemented a "novel" UART with start/stop bits and 8 or 32 bits and an optional 4 bit addressing scheme. This was greatly appreciated with the exception that it could not be used as a simple SERDES because the start/stop bits prevented its use as a simple SERDES. Chip left us to discuss what we were after while he proceeded on the onerous task of implementing the instruction changes (complex because he also had to modify the pnut assembler/compiler, do some simple changes to the ROM monitor, and write new test programs - all for 1 person).
I won't discuss the SERDES here because that is on another thread, except to say 2 basic instructions would help - one to do NRZI bit banging, and one to do a CRC16 bit calculation. These would help immensely, particularly with USB FS.
To summarise... Yes it's a big change, but IMHO the risks and delays are well worth it. The P2 is already overdue so we really need the BEST chip possible. So lets give Chip the air he needs to get it done.
See post #2747 in this thread. (No guarantee that nothing has changed in the meantime, Chip is suspiciously quiet)
Andy
I saw that post but I wasn't sure it was really the final instruction set. There has been a lot of talk since then that could have resulted in further changes. I'd rather not have to update PropGCC more than once if I can help it.
Comments
And what happens if another shuttle run delivers dead chips. Will the Prop user community demand even more "enhancements" from Chip?
Chip opened the door on most of this, or he came to a design decision based on what was learned from the earlier synthesis, FPGA testing, etc...
Sure, there are some people pushing hard for this and that. From where I stand, that's always happened. What also has always happened is Chip took the discussion, factored it down to what made good sense and he did those things. That's not a demand so much as an aggressive and frank discussion. --a discussion he invited and appears in complete control over.
Given what was learned so far, he believes this refactor makes best sense. It's more than enhancements. Some of it came down to better / more optimal ways to do things to reduce the timing constraints the synthesis must resolve. IMHO, doing that kind of thing is just as important, if not more important, than sorting out the optimal instruction / feature layout. It's more important, because the core timing constraints impact the chance of success in synthesis and that impacts cost and time, or so I understand.
When I see, "It's just a few mux'es" to do the SETTRACE instruction, and when I see why; namely, Chip finding that functionality needed for his own engineering, I tend to see that all as a very good thing.
When I see the instruction set factored down to something more sane, better case addressing modes, etc... I also think that's a good thing, and it's something very testable too.
Here's what I don't know: I don't know whether or not changing nothing would improve or degrade or keep the chance of the synthesis being a good one. Chip did say he discovered some constructs he was using tended to make the synthesis task more difficult due to maintaining chip wide timing paths and constraints. Given he's factored those out, I would say that has to be a net gain, leaving us a more synthesis friendly design. On that basis, we are better off for having changed it.
Given a more synthesis friendly design, changing higher level things like the instruction decoding are much lower risk and more easily testable things than say, altering lower level things would be.
I'm also operating under the assumption that the new vendor / synthesis relationship comes with some ability to talk these things through too.
So then, at the end of the day, we still need to do at least one synthesis and shuttle. It's that regardless of whether or not some instructions changed. And the major factor putting the synthesis at risk isn't those kinds of changes either, which means we are basically improving functionality / throughput without also adding to the risk in like kind, which is why this makes sense to do right now.
I suspect if that were not true, Chip would be more conservative about what is being done right now.
Ditto
True, but that's an opcode mapping change, which is not hard to propagate.
( I don't think any important opcodes were removed ?)
Not yet mentioned, is one gain from a more regular Opcode decode, is smaller logic trees and thus a faster execution on a given process.
ie it may be the new P2 runs faster, (or has more margin) ?
What?!!
Yes, of course, I think I did see some opcode remapping fly by here. Anything else getting in the way of propgcc?
This is a blow to me as was desperately trying to find time to get back to my nano board, the latest Prop configuration and propgcc. What can I do now?
Can we stop all talk of uses of STRACE? This is a potential huge can of worms.
STRACE looks like a quick hack Chip put into the PII so he could get some chip state out for debugging his design.
All of a sudden we have proposals for it's use, for bit banging, for debuggers, for chip to chip communications, for fault tolerant systems etc etc...
I would be very happy to see it as an undocumented "feature" that is only possible to enable under some weird circumstances (Which goes against all the openness of the PII) or removed entirely from the PII when delivered.
We should not be so quick to build glorious plans on a quick debug hack.
Also, I guess I'm a bit jealous that none of these recent changes seem to address the difficulty of generating native code for the COG. :-)
I guess nothing would help there as long as we're limited to 2K bytes of COG memory other than hardware caching and that doesn't see to be in the cards.
Now people are moaning about these features being added?
The risk in releasing the P2 'as it was' and then following up with a P3 soon after with a revamped instruction set, added debug capabilty, serdes, etc, would leave the P2 obsoleted by the P3.
Given Parallax's goal of not having constant churn in the chip lineup they would be left with the choice of keeping around a very low volume chip, or bite the bullet and obsolete it.
http://www.youtube.com/watch?v=bpj0t2ozPWY
C.W.
I am wondering what is happening behind the scenes, why Chip is taking the time to make such radical design changes, that also invalidate several (stable) software developments untill now. Has he decided to switch to another process, and leave the TSMC .18 um LP proces? Then the analog parts of the pinblocks would need redesign too.
The latest wafer run were not Multi Proces Wafers where multiple design are combined in one mask set, these yield typical 30-40 dice for one design per wafer. The pictures I have seen suggest that Parallax have produced several wafers with their own private maskset, so they must have been pretty confident the chips would work.
One suggestion: everyone asking for new features or functions will only be take seriously ij this forum if he/she delivers synthesizable Verilog code along with the suggested feature, so it will take minimal time of Chip's busy calender.
Some of the changes are nice, some are changing the P2 to be what more people have been asking for and removing some of the "can't compete" roadblocks that some folks have been throwing in the path.
I trust in Chip (and Ken) to do the right thing.
When it is ready (from all aspects), it will happen. The Gracey's have done well controlling risk and reward for 25 years, no reason to stop believing now.
I really do like it as an answer to the debugging exception. Having that instruction does mean a P2 can deliver runtime details. Package that up with some cool software, and I'm thinking a clever SPIN wrapper with assembly snippets used to capture and present that info to users, and suddenly we have a visual COG that will compete very nicely.
Debugging a core piece of PASM will seriously benefit. Arguably, P1 is simple enough to think through. P2 is more complex with the pipeline and task capability. You posted a fun little snippet of PASM that took considerable discussion to sort out. And it never really did get sorted for advanced cases. SETTRACE would improve that significantly, and it would do so with the most basic dump state functionality described so far too.
The can of worms has two core directions IMHO. One is extending what the instruction can do. I don't think that one will be a worry for reasons already. The other is abuse of the instruction. Not sure about that case, other than it can be discouraged.
If it is kept simple, I see it as an easy net gain.
/Johannes
Chip also implemented some instruction changes that were on his list if the synthesis had to be redone. He has done this AFAIK. Also I think he separated the INx and OUTx registers from PINx. I think we are all in agreement this was required after detailed discussions.
Much of the way more complex parts of the P2 are the quad-long access to the hub, the clut/fifo, the video section, the counter section, and of course the pin configurations. None of these have changed. As I understand, the change in die supplier has not changed the outer parts of the layout which have already been done by Chip & Beau. The inner parts are the layouts (synthesis) done by the external team to Chip's Verilog code.
For verification and testing, Chip decided he needed a simple external set of pins he could look at on his logic analyser. This resulted in the SETTRACE addition. He required this for testing, but then thought (as he does) this would be a great debugging tool, so made it an instruction to turn on/off. Having done some work on P1 & P2 debuggers, this is magical!
Rightly so, some of us realised this could impact the P2 security and just reminded Chip in case this had been overlooked.
I have asked for a simple addition (when in this mode) to allow an external pin to "force a stall" in the instruction pipe. This would give us single stepping. There are a lot of caveats for its use, but what a debug facility! And remember, on the P2, we will have other cogs that can do the debugging, making this a really powerful debugger addition. IMHO it would be far superior to JTAG.
Just before Chip started on the instruction changes, he implemented a "novel" UART with start/stop bits and 8 or 32 bits and an optional 4 bit addressing scheme. This was greatly appreciated with the exception that it could not be used as a simple SERDES because the start/stop bits prevented its use as a simple SERDES. Chip left us to discuss what we were after while he proceeded on the onerous task of implementing the instruction changes (complex because he also had to modify the pnut assembler/compiler, do some simple changes to the ROM monitor, and write new test programs - all for 1 person).
I won't discuss the SERDES here because that is on another thread, except to say 2 basic instructions would help - one to do NRZI bit banging, and one to do a CRC16 bit calculation. These would help immensely, particularly with USB FS.
To summarise... Yes it's a big change, but IMHO the risks and delays are well worth it. The P2 is already overdue so we really need the BEST chip possible. So lets give Chip the air he needs to get it done.
Hear,hear
See post #2747 in this thread. (No guarantee that nothing has changed in the meantime, Chip is suspiciously quiet)
Andy