The magic of INDx wrapping is that it works in the background with single-cycle MAC instructions. Even one more clock cycle would cut that DSP speed in half.
Let me think on that some. While I do, perhaps it's worth opening up that question. What is the P2? I mean really, what are the things we want to nail cold, so that it's just awesome, untouchable?
What I especially don't like, is that Chip is resorting to taking out existing features to make new ones. Luckily, Ariba caught it in time for this one, but what things have we lost in favor of some new feature that might not even be used by anyone (or very few). It's scary.
I think this concern has been overstated quite a bit. Losing INDx wrapping would have been a real example of this, but that was thankfully stopped by Andy. I don't think anything else even moderately significant has been compromised to accommodate new features. In other cases, things like hub exec are a huge improvement over executing from the WIDEs. And IJZ/IJNZ was never used. It might have been more practical in an 8-bit system, but with 32 bits the occasions to use numbers backwards has proven to be about nil.
Let me think on that some. While I do, perhaps it's worth opening up that question. What is the P2? I mean really, what are the things we want to nail cold, so that it's just awesome, untouchable?
I think my sausage-making process is what's making people upset more than the sausage, itself.
I agree with that right now. I don't think we've lost much, but I must confess, I don't know. And I don't know how much it matters.
One reason I don't know, is Chip's question about where we should be. The feature set is large now, and we've added a lot of niche optimization cases. I felt good about a lot of those, particularly those for GCC, which needs to sing on this chip, because it's big enough to really take advantage of it.
Another reason I don't know is we've not yet written much software against what we do have. So we think we know where the optimal paths are, and there is considerable skill here, and I trust that to be more right than wrong. But, we've not really given the great ideas plugged in so far any kind of a test drive, and that makes the whole "what is worth what?" discussion from here hard, because some of the things we might identify could really use some silicon, and we have a couple features like the SERDES pending. Most of us feel that one matters a lot too.
But, we might find a little bit of software gets things done just fine, and once we've got that software we can reuse the Smile out of it too, but we can't go back and redo silicon where we traded something away that made software a PITA, or performance non optimal. And we can improve on software too. But too much hardware will limit that some, as things get more rigid, less flexible, or we start to ignore things we thought were a big deal...
Like Roy, I'm not a fan of preemptive, but then again, whatever just as he said.
Some of the stuff can be ignored too, and that's another case where we could balance silicon features with documentation or tools to help present the best capabilities and encourage the ease of use.
If I were to say anything, I think we've run a bit astray on "do it in software"
Chip, I think the sausage making process is what it is. You are sharing it, and mostly, I think everybody is doing a fine job participating, though I do think it's awful tempting to engineer something as opposed to evaluate whether or not it should be engineered, but then again, we've got a thread full of engineers too!
I'm not upset, though I can't speak for others. As I just mentioned, I'm mindful of the balance of doing things in software as opposed to hardware, and unclear on what is worth what. Without that, it's hard to evaluate whether or not something should happen. We all know it can in most cases, but should it?
jmg,
I think the "another neat thing" is in regards to preemptive multitasking, not debug stuff. It turns out the there is overlap in what's needed for enhancing debugging and stuff needed to do the preemptive multitasking. I'm all for having better debug stuff built in, in fact I would have prefered a full jtag interface way back when if it was feasible (which it's not, so don't even think about it now Chip!). I don't like the preemptive multitasking, but whatever.
I also rank Debug above Task Swapping, to the point where taking a little longer to swap tasks would be chosen, if it saved significant silicon.
Debug needs to allow Access and Edit of all Flags and PC, and display of as much chip information as is practical.
It also should support BreakPoints, Single Step and Step-into and Step over.
A P2 will find it easy to do live-watch of COG memory, that bit it could do already.
Access into some task-local registers, and flags is not so simple, so needs the Swap silicon support.
Chip,
How do you know IJZ/IJNZ were never used? The chip isn't even done and out yet with it as a feature. How could we possibly know what would and wouldn't be used of any of the features? There's only a handful of people trying to use the P2 in FPGA form and even they haven't been able to very much lately because of all the changes. I like everything that's been added up until preemptive multitasking (which could have already been done completely in software before, to meet anyones practical needs). You still need to get the serdes and USB/CRC stuff in, those will be very useful, and could be argued as required.
I understand that most of what's been going on has greatly improved things, but we still need to be careful, and we still need to get it done soon. Part of why I am resorting to harsher tones and "overstating" things is because milder stuff seems to have been ignored.
I think now is the time to say that we should save all this mad frenzy to implement cool new thing X for the next round (P2.5 or P3, or whatever), and get P2 shipped.
There's only a handful of people trying to use the P2 in FPGA form and even they haven't been able to very much lately because of all the changes.
Roy
I can assure you that NONE of the changes to P2 and the FPGA releases have stopped me and I'm sure the OTHERS from doing anything.
No momentum shift here!
Brian
I also rank Debug above Task Swapping, to the point where taking a little longer to swap tasks would be chosen, if it saved significant silicon.
Debug needs to allow Access and Edit of all Flags and PC, and display of as much chip information as is practical.
It also should support BreakPoints, Single Step and Step-into and Step over.
A P2 will find it easy to do live-watch of COG memory, that bit it could do already.
Access into some task-local registers, and flags is not so simple, so needs the Swap silicon support.
To really orient debugging properly, it needs to be done from another cog. That cog has to have the ability to view into the target cog, step it, etc. Shy of that, we have a rather impure circumstance where the target cog must do debugging on itself, not allowing itself to be wholly what it would have been without the internal accommodations for the debug stuff. If I had time, I would certainly pursue this, but I do feel that is a bit much to jump on right now. So, what we have is adequate for grease monkeys like ourselves, but it's not shrink-wrapped like customers may expect it to be.
I had just noted that it didn't ever seem useful to me and others didn't see much value in it, either. It was an instruction that went into the Prop2 just because there was opcode space and it only cost a few gates to add. It was a near freebie.
ozpropdev,
You are not using the version that Chip has, which has significant changes from what I have been reading. We can't really call anything tested until significant changes stop.
Sapieha,
I was going to reply to you, but I never really know if I've understood you correctly and it often seems quite negative, so I am just going to ignore your comment.
Chip,
Are you not concerned about how little time is left, and how much you already have on your plate to finish? You keep adding more to your plate. Is it really not that urgent? The impression I get from Ken is that it is very urgent.
Chip,
You're probably right, IJZ and IJNZ would probably go unused by most, and it's probably fine that they were dumped.
Regarding debugging, unless you can truly stop the state of the whole chip (hub, cnt, etc.) when breakpointing and single stepping, then you are always going to have caveats while debugging. Like what happens when you break on a waitcnt? Does it stop immediately, or only after the waitcnt is satisfied? When you stop a single cog and single step it, is the hub still cycling around at full speed? What happens when you step over a hub instruction? What about the counters? Will they keep going full on while the cog execution is on a breakpoint or single stepping? Debugging is more than just being able to see the memory and flags, and really on the P1 or P2, you can't feasibly do proper debugging.
Chip,
Are you not concerned about how little time is left, and how much you already have on your plate to finish? You keep adding more to your plate. Is it really not that urgent? The impression I get from Ken is that it is very urgent.
I'm working day and night to get this wrapped up. I'm actually somewhat tempted to skip the task save/restore and get onto the USB/CRC/SERDES stuff, but I need to think about it some more.
EDIT: NEVER MIND THIS QUESTION. THOSE MUX'S NEED TO BE THERE FOR RDWIDEA/B AND WRWIDEA/B.
I've got another question for everybody:
Mapping the WIDEs into register space has always felt like kind of kludge to me because they don't affect the background registers, but float on top. Their mapping-in is bumping the critical path, too. It takes tons of gates to map those 256 bits into register space - for both D and S. And what else... it takes 3..11 clocks to get them read (RDWIDE) and mapped, so if you scan them and go to read another wide, you've already missed the hub window. Mapping the WIDEs sounds, at first, like a good idea, but it's kind of a half-cocked stuttering mess.
Now that we have RDWIDEA/B and WRWIDEA/B, we can read in or write out as many longs as you can handle, at one per clock. They go into real registers, too, not into some floaty thing. This is like beef compared to those puffed rice cakes that have no flavor, but jack your blood sugar up.
What if I just got rid of WIDE mapping and we use RD/WRWIDEA/B when we want some solid hub data?
Mapping the WIDEs into register space has always felt like kind of kludge to me because they don't affect the background registers, but float on top. Their mapping-in is bumping the critical path, too. It takes tons of gates to map those 256 bits into register space - for both D and S. And what else... it takes 3..11 clocks to get them read (RDWIDE) and mapped, so if you scan them and go to read another wide, you've already missed the hub window. Mapping the WIDEs sounds, at first, like a good idea, but it's kind of a half-cocked mess.
Now that we have RDWIDEA/B and WRWIDEA/B, we can read in or write out as many longs as you can handle, at one per clock. They go into real registers, too, not into some floaty thing. This is like beef compared to those puffed rice cakes that have no flavor, but jack your blood sugar up.
What if I just got rid of WIDE mapping and we use RD/WRWIDEA/B when we want some solid hub data?
I can live with that.
The only thing I would miss is the SETWIDZ instruction which was a handy way of "zapping" eight longs!
My concerns are twofold...
1. I don't really see the requirement for the latest additions and noone has really said that they will actually use it commercially.
2. We almost lost some valuable instructions. What is most worrying is how quickly they were nearly lost. I am only just reading this. I was lastonline 12 hours ago. Who else missed commenting? Thanks to Ariba. Perhaps Phil may have noticedbut I don't see him here a lot. BTW I would not have seen the requirement for these instructions.
Roy
I can assure you that NONE of the changes to P2 and the FPGA releases have stopped me and I'm sure the OTHERS from doing anything.
No momentum shift here!
Brian
They have stopped me because I can't really do much of anything with propgcc until I get a stable instruction set and encoding. Not to say all of these changes aren't good. I'm just waiting for them to settle out before diving in again.
I can assure you that NONE of the changes to P2 and the FPGA releases have stopped me and I'm sure the OTHERS from doing anything.
No momentum shift here!
Brian
They have stopped me because I can't really do much of anything with propgcc until I get a stable instruction set and encoding. Not to say all of these changes aren't good. I'm just waiting for them to settle out before diving in again.
I've been staying away from the PropII for the reason of the changing instructions and encoding.
Once the smoke clears I'm going jump on for testing and early designing. : ]
They have stopped me because I can't really do much of anything with propgcc until I get a stable instruction set and encoding. Not to say all of these changes aren't good. I'm just waiting for them to settle out before diving in again.
Yes, in your case David that is a problem.
My point was that the updates in the past 6 months or so have not stopped testing.
I also believe the testing thus far has not been a waste of time.
I do sympathize with your situation though. Brian
They have stopped me because I can't really do much of anything with propgcc until I get a stable instruction set and encoding. Not to say all of these changes aren't good. I'm just waiting for them to settle out before diving in again.
I realized after thinking about this that it isn't in the spirit of continuous incremental development which seems to be the model that Chip is following. However, when I see statements like this:
It turns out we've got some opcode space now, after I rearranged things a little.
It makes me wonder how much of the instruction parser/encoder I'll have to change to accomodate the rearrangement. Maybe if Chip would post a new instruction list after each of these rearrangements it might be possible to track his changes more in real time.
I've been staying away from the PropII for the reason of the changing instructions and encoding.
Once the smoke clears I'm going jump on for testing and early designing. : ]
As Chip made changes to instruction encodings he updated Pnut to match. Simply update the FPGA and run the latest Pnut and away you go!
I realized after thinking about this that it isn't in the spirit of continuous incremental development which seems to be the model that Chip is following. However, when I see statements like this:
It makes me wonder how much of the instruction parser/encoder I'll have to change to accomodate the rearrangement. Maybe if Chip would post a new instruction list after each of these rearrangements it might be possible to track his changes more in real time.
As Chip made changes to instruction encodings he updated Pnut to match. Simply update the FPGA and run the latest Pnut and away you go!
That's the keyword here... "Incremental"
That would be fine if these "increments" didn't also change what is already there. They don't at a source level but they do at the binary encoding level. All I was asking is that a new instruction list be provided with each of these rearrangements. I guess it probably isn't worth the time it would take Chip to do it though since I may be the only consumer of these incremental lists. Most people will only need the instruction lists that correspond to actual FPGA releases.
Comments
Let me think on that some. While I do, perhaps it's worth opening up that question. What is the P2? I mean really, what are the things we want to nail cold, so that it's just awesome, untouchable?
I think this concern has been overstated quite a bit. Losing INDx wrapping would have been a real example of this, but that was thankfully stopped by Andy. I don't think anything else even moderately significant has been compromised to accommodate new features. In other cases, things like hub exec are a huge improvement over executing from the WIDEs. And IJZ/IJNZ was never used. It might have been more practical in an 8-bit system, but with 32 bits the occasions to use numbers backwards has proven to be about nil.
Was one of the reasons in removing FIXINDx to gain some opcode space?
I think my sausage-making process is what's making people upset more than the sausage, itself.
No. It turns out we've got some opcode space now, after I rearranged things a little.
One reason I don't know, is Chip's question about where we should be. The feature set is large now, and we've added a lot of niche optimization cases. I felt good about a lot of those, particularly those for GCC, which needs to sing on this chip, because it's big enough to really take advantage of it.
Another reason I don't know is we've not yet written much software against what we do have. So we think we know where the optimal paths are, and there is considerable skill here, and I trust that to be more right than wrong. But, we've not really given the great ideas plugged in so far any kind of a test drive, and that makes the whole "what is worth what?" discussion from here hard, because some of the things we might identify could really use some silicon, and we have a couple features like the SERDES pending. Most of us feel that one matters a lot too.
But, we might find a little bit of software gets things done just fine, and once we've got that software we can reuse the Smile out of it too, but we can't go back and redo silicon where we traded something away that made software a PITA, or performance non optimal. And we can improve on software too. But too much hardware will limit that some, as things get more rigid, less flexible, or we start to ignore things we thought were a big deal...
Like Roy, I'm not a fan of preemptive, but then again, whatever just as he said.
Some of the stuff can be ignored too, and that's another case where we could balance silicon features with documentation or tools to help present the best capabilities and encourage the ease of use.
If I were to say anything, I think we've run a bit astray on "do it in software"
I'm not upset, though I can't speak for others. As I just mentioned, I'm mindful of the balance of doing things in software as opposed to hardware, and unclear on what is worth what. Without that, it's hard to evaluate whether or not something should happen. We all know it can in most cases, but should it?
I also rank Debug above Task Swapping, to the point where taking a little longer to swap tasks would be chosen, if it saved significant silicon.
Debug needs to allow Access and Edit of all Flags and PC, and display of as much chip information as is practical.
It also should support BreakPoints, Single Step and Step-into and Step over.
A P2 will find it easy to do live-watch of COG memory, that bit it could do already.
Access into some task-local registers, and flags is not so simple, so needs the Swap silicon support.
How do you know IJZ/IJNZ were never used? The chip isn't even done and out yet with it as a feature. How could we possibly know what would and wouldn't be used of any of the features? There's only a handful of people trying to use the P2 in FPGA form and even they haven't been able to very much lately because of all the changes. I like everything that's been added up until preemptive multitasking (which could have already been done completely in software before, to meet anyones practical needs). You still need to get the serdes and USB/CRC stuff in, those will be very useful, and could be argued as required.
I understand that most of what's been going on has greatly improved things, but we still need to be careful, and we still need to get it done soon. Part of why I am resorting to harsher tones and "overstating" things is because milder stuff seems to have been ignored.
I think now is the time to say that we should save all this mad frenzy to implement cool new thing X for the next round (P2.5 or P3, or whatever), and get P2 shipped.
It will be always ones that complain no mather what You give them.
I can assure you that NONE of the changes to P2 and the FPGA releases have stopped me and I'm sure the OTHERS from doing anything.
No momentum shift here!
Brian
To really orient debugging properly, it needs to be done from another cog. That cog has to have the ability to view into the target cog, step it, etc. Shy of that, we have a rather impure circumstance where the target cog must do debugging on itself, not allowing itself to be wholly what it would have been without the internal accommodations for the debug stuff. If I had time, I would certainly pursue this, but I do feel that is a bit much to jump on right now. So, what we have is adequate for grease monkeys like ourselves, but it's not shrink-wrapped like customers may expect it to be.
Okay. I made it up.
I had just noted that it didn't ever seem useful to me and others didn't see much value in it, either. It was an instruction that went into the Prop2 just because there was opcode space and it only cost a few gates to add. It was a near freebie.
You are not using the version that Chip has, which has significant changes from what I have been reading. We can't really call anything tested until significant changes stop.
Sapieha,
I was going to reply to you, but I never really know if I've understood you correctly and it often seems quite negative, so I am just going to ignore your comment.
Chip,
Are you not concerned about how little time is left, and how much you already have on your plate to finish? You keep adding more to your plate. Is it really not that urgent? The impression I get from Ken is that it is very urgent.
You're probably right, IJZ and IJNZ would probably go unused by most, and it's probably fine that they were dumped.
Regarding debugging, unless you can truly stop the state of the whole chip (hub, cnt, etc.) when breakpointing and single stepping, then you are always going to have caveats while debugging. Like what happens when you break on a waitcnt? Does it stop immediately, or only after the waitcnt is satisfied? When you stop a single cog and single step it, is the hub still cycling around at full speed? What happens when you step over a hub instruction? What about the counters? Will they keep going full on while the cog execution is on a breakpoint or single stepping? Debugging is more than just being able to see the memory and flags, and really on the P1 or P2, you can't feasibly do proper debugging.
I'm working day and night to get this wrapped up. I'm actually somewhat tempted to skip the task save/restore and get onto the USB/CRC/SERDES stuff, but I need to think about it some more.
There you have it, a clue that the P2 will in fact be called the "Chipolata"
I've got another question for everybody:
Mapping the WIDEs into register space has always felt like kind of kludge to me because they don't affect the background registers, but float on top. Their mapping-in is bumping the critical path, too. It takes tons of gates to map those 256 bits into register space - for both D and S. And what else... it takes 3..11 clocks to get them read (RDWIDE) and mapped, so if you scan them and go to read another wide, you've already missed the hub window. Mapping the WIDEs sounds, at first, like a good idea, but it's kind of a half-cocked stuttering mess.
Now that we have RDWIDEA/B and WRWIDEA/B, we can read in or write out as many longs as you can handle, at one per clock. They go into real registers, too, not into some floaty thing. This is like beef compared to those puffed rice cakes that have no flavor, but jack your blood sugar up.
What if I just got rid of WIDE mapping and we use RD/WRWIDEA/B when we want some solid hub data?
I can live with that.
The only thing I would miss is the SETWIDZ instruction which was a handy way of "zapping" eight longs!
I would leave an operandless instruction to clear the WIDEs. It can be called CLRWIDE.
1. I don't really see the requirement for the latest additions and noone has really said that they will actually use it commercially.
2. We almost lost some valuable instructions. What is most worrying is how quickly they were nearly lost. I am only just reading this. I was lastonline 12 hours ago. Who else missed commenting? Thanks to Ariba. Perhaps Phil may have noticedbut I don't see him here a lot. BTW I would not have seen the requirement for these instructions.
I've been staying away from the PropII for the reason of the changing instructions and encoding.
Once the smoke clears I'm going jump on for testing and early designing. : ]
My point was that the updates in the past 6 months or so have not stopped testing.
I also believe the testing thus far has not been a waste of time.
I do sympathize with your situation though. Brian
As Chip made changes to instruction encodings he updated Pnut to match. Simply update the FPGA and run the latest Pnut and away you go!
That's the keyword here... "Incremental"