Itanium still exists. SGI moved to it from MIPS, which just didn't make it to Ghz clocks. SGI optimized the Smile out of MIPS, getting some serious performance out of the lower clocks and larger caches.
Itanium was a great fit for all the compiler smarts SGI had, and Intel added the signals needed for their NUMA kilo CPU and up multi processing designs.
If you want a few thousand CPUs, one OS, big memory and I/O, with nuts FP tossed in, Itanium is a solid choice. Were it not for SGI, Intel may have had a big loss.
But anyways, we shouldn't try to solve the problems we may have in some (maybe remote) future... it has so much memory, we can use it fully today, it is good
I find it hilarious how short posts of mine are always noted. You know what they say about great minds... I liked your last few posts, BTW. Some meat in there. Never a bad thing.
(or, if they are, we have scroll bars for that use case!)
Frankly, I'm on shorter post mode because I'm about to change jobs and it's been a little bit of an adventure getting the new gig sorted out. Small club in my niche, and couple that with a role change and I've not had much in the way of productive P2 time, so I've not had too much to say. Of course, all of that makes me cranky too, so less is more.
That will all change here in a coupla weeks, and I think my rate of increase on grey hair will see a nice decline that I've needed for a bit.
... I liked your last few posts, BTW. Some meat in there. Never a bad thing.
Thanks, much appreciated. I know it would have been spouted somewhere else on the Web but it's something I've never acutally seen written so have been wanting to get it off my chest.
Frankly, I'm on shorter post mode because I'm about to change jobs and it's been a little bit of an adventure getting the new gig sorted out. Small club in my niche, and couple that with a role change and I've not had much in the way of productive P2 time, so I've not had too much to say. Of course, all of that makes me cranky too, so less is more.
That will all change here in a coupla weeks, and I think my rate of increase on grey hair will see a nice decline that I've needed for a bit.
Good to hear. Best of luck. Give us the good news when you're settled.
Programs that relied on MSDOS were very important to the industry for a long time. In effect, the industry generically demanded backwards compatibility.
Very true. The industry demanded binary compatibility, as I noted above. However that is not actually what they wanted. All they wanted was the ability to continue to run their programs and get jobs done as new faster bigger machines cam along.
Binary compatibility is one way to do it. And sadly the only way at the time. Had we lived in an open source and Free Software world moving forward and changing architectures would be a compilation away. As it is today in the Linux world.
Sadly following Bill Gates' infamous open letter to hobbyists binary distribution became the norm. At the time it was not even clear that you would have copyright protection on a compiled binary, it's just a number right? The whole world saw an opportunity to make big bucks from that and a bit of lobbying to get the laws in place made it possible.
We are still suffering from that.
It's true that the switch to 64 bits happened much quicker than the switch to 32 bits. MS had built a "proper" OS with API's and drivers and protected processes etc etc. It was built with C/C++ instead of assembler. They already had experience building Windows for other machines, DEC Alpah, Itanium. So when AMD introduced the amd64 architecture support was only (almost) a compiler switch away.
I agree with all you say about the importance of the GPL.
The GPL formed as a natural reaction to the current laws.
Depends what you mean by current laws. Richard Stallman conceived the GPL in 1989 as a reaction to the growing tide of closed source proprietary software he saw at the time. Seems he was the first to see how the use of such operating systems and programs ultimately take control of your computer away from you. That's bad enough for individuals but if you are a government or other institution not having control of your computers and data is a rather seriously bad situation to be in.
... Binary compatibility is one way to do it. And sadly the only way at the time. Had we lived in an open source and Free Software world moving forward and changing architectures would be a compilation away. As it is today in the Linux world.
The Unix world had open source, not that it was named, and decent APIs at the start; which is what RMS was initially trying to protect.
... Depends what you mean by current laws. Richard Stallman conceived the GPL in 1989 as a reaction to the growing tide of closed source proprietary software he saw at the time. Seems he was the first to see how the use of such operating systems and programs ultimately take control of your computer away from you. That's bad enough for individuals but if you are a government or other institution not having control of your computers and data is a rather seriously bad situation to be in.
Yeah, you'd think governments would be more constructive towards their own needs at the very least.
By "current laws" I was painting a broad brush across the changes in laws covering the whole software patents/licensing trend for the last few decades.
I want to make a microkernel with the ability to have arbitrary number of tasks, not just limited to HW-tasks. I think it would open up for some pseudo interrupt handling and great debugging as well. Not that a microkernel or interrupt handling is really needed on the P2.... but It's just "fun and games" for nerds like me.
/Johannes
Some of us are already doing that quite nicely on the P1. I'm really looking forward to replicating it on the P2. With the new instructions and addressing modes, it should run like stink !
Please add this. It would be very useful for a debugger running in another task, so it could show the user the PC and flags of the task being debugged. It would also be good for running more than four tasks, where one task, the kernel, which would rarely get a turn (SETTASK %%0120120120120123, it's task 3), would be in charge of swapping them out. It would sit in a djnz i, $ loop for a while and then, when the time comes to swap tasks, it would SETTASK to get all of the turns of the task that is being replaced, (e.g. SETTASK %%3123123123123123 to swap task 0), use Ahle2's instruction (GETPCZC dest, taskid?) to get and save the PC and flags of the old task, and then JMPTASK to the new task.
Additionally, could tasking be changed such that single-task mode is any time the task fields are all the same value, not just %00?
Can't we get the PC, Z, C, task# using the SETRACE instruction ?
Only when it executes an instruction, but that's not practical for capturing and redirecting Z/C/PC.
I'll add an instruction which lets you poll any task for its Z/C/PC. I'm thinking maybe it will be possible to probe tasks even in OTHER cogs. Then you could have a trace debugger with no target cog's involvement.
I just realized that stopping a task and redirecting its Z/C/PC has a caveat, if you intend to return that task to its prior Z/C/PC later: Any original REPS/REPD in progress will have been interrupted and its state lost. When you reenter that original Z/C/PC state, any REPS/REPD will fall through. It might be a lot better to allow the task to switch itself, or signify when it's ready, since you'd be able to avoid this problem. Or, you could just not use REPS/REPD in this case. Storing the REPS/REPD state would take another long. Also, there's the task's PTRA/PTRB values that would need saving and restoring. I think the only way to cleanly handle complex task code would be to let it decide when the change can occur. Just some things to consider.
I'll add an instruction which lets you poll any task for its Z/C/PC. I'm thinking maybe it will be possible to probe tasks even in OTHER cogs. Then you could have a trace debugger with no target cog's involvement.
I just realized that stopping a task and redirecting its Z/C/PC has a caveat, if you intend to return that task to its prior Z/C/PC later: Any original REPS/REPD in progress will have been interrupted and its state lost. When you reenter that original Z/C/PC state, any REPS/REPD will fall through. It might be a lot better to allow the task to switch itself, or signify when it's ready, since you'd be able to avoid this problem. Or, you could just not use REPS/REPD in this case.
Debug usually has Step and Step-Over - would step-over work on REPx ?
Or, maybe the Debug SW could include REPx simulation, so it reloops on a Debug counter then exits.
It's not going to be real time when stepping, but you do want correct flags.
Some of us are already doing that quite nicely on the P1. I'm really looking forward to replicating it on the P2. With the new instructions and addressing modes, it should run like stink !
Cheers,
Peter (pjv)
I'm talking about preemptive multitasking, not cooperative.
I just found a bug in hub execution. The data-forwarding circuit for the opcode fetch was not checking to make sure bits 15..9 of the PC were 0's. This means that if you modified, say, register $040 and then fetched a hub instruction from $xx40 ($xx >= $02), the instruction fetched would be the newly-modified value for cog register $040, instead of the instruction in hub memory at $xx40. I just noticed this by looking at the Verilog. This would have been a real bear to diagnose if this problem were to be experienced. If any of you had strange, intermittent troubles with hub exec, it could have been this bug. This will be fixed in the next release. I checked the Verilog for other such problems, but didn't see any.
I have a few questions about the prefetch mode. I assume it starts after an instruction cache miss is encountered, correct? Otherwise, the cache logic wouldn't know where to start prefetching from. So after the first instruction cache is loaded the next cache read is initiated for the next hub cycle, correct? What if the code performs a rdlong? Does this take precedence over the next cache read, or is it deferred until after the second cache read completes? Does the rdlong terminate the second cache read entirely?
Also, in the prefetch mode, does the processor only fetch one additional cache line, or does it fetch three more cache lines to fill all of the instruction caches? To me it seems like the non-prefetch mode is the more useful mode, and prefetching should be disabled by default when starting the cog. Of course the default doesn't matter too much since the code can do an explicit ICACHEP or ICACHEN before jumping to hub code.
I also have a question about the non-prefetch mode. How does the logic determine the "least-recently-used" instruction cache. In spinsim I timestamp a cache with the system counter after every read, and then use the cache with the oldest timestamp. However, I would assume that the actual hardware is using a different method.
I have a few questions about the prefetch mode. I assume it starts after an instruction cache miss is encountered, correct? Otherwise, the cache logic wouldn't know where to start prefetching from. So after the first instruction cache is loaded the next cache read is initiated for the next hub cycle, correct? What if the code performs a rdlong? Does this take precedence over the next cache read, or is it deferred until after the second cache read completes? Does the rdlong terminate the second cache read entirely?
Also, in the prefetch mode, does the processor only fetch one additional cache line, or does it fetch three more cache lines to fill all of the instruction caches? To me it seems like the non-prefetch mode is the more useful mode, and prefetching should be disabled by default when starting the cog.
I also have a question about the non-prefetch mode. How does the logic determine the "least-recently-used" instruction cache. In spinsim I timestamp a cache with the system counter after every read, and then use the cache with the oldest timestamp. However, I would assume that the actual hardware is using a different method.
Prefetch happens when executing from the hub and a hub cycle is available and the current instruction is not using it. What is prefetched is the next 8-long block from where the executing PC is - that's it. So, prefetch only happens on unused hub cycles. This is the default cache mode and can be disabled/(re)enabled by ICACHEN/ICACHEP.
The least-recently used algorithm works by keeping track, to 8 bits with saturation, of how many cache misses each cache line experienced, in sequence. If a cache line is hit or reloaded, its counter is reset to 0. This algorithm is used by both prefetch and non-prefetch modes.
Chip: Any idea when you might be able to provide an update to the P2 instruction list and an FPGA image that includes the LINK instruction (or whatever the instruction is called that stores its return address in location $000)? Also, how stable is the instruction encoding at this point? Did it have to change to add any of the recent features like the LINK instruction and the tasking enable/disable instructions? Do you anticipate any instruction encoding changes before P2 is frozen?
Chip: Any idea when you might be able to provide an update to the P2 instruction list and an FPGA image that includes the LINK instruction (or whatever the instruction is called that stores its return address in location $000)? Also, how stable is the instruction encoding at this point? Did it have to change to add any of the recent features like the LINK instruction and the tasking enable/disable instructions? Do you anticipate any instruction encoding changes before P2 is frozen?
I created some bug earlier today and I'm trying to nail it down right now. It's taking almost 1 hour per compile.
The only thing that I see changing in the instruction set is possibly some SERDES-related instructions and then the USB-friendly pin operation. Maybe some CRC instruction(s), too.
Once I get this latest bug resolved, I should be able to put out an update - maybe this weekend.
Well, that certainly is a different animal. It will be interesting to see what you come with, it certainly should be very useful !
Cheers,
Peter (pjv)
It's foremost for my own pleasure and not so much for the community. But if it turns out to be usable and people sees a need for a mini OS/microkernel, then I might share!
As far as I can tell the only thing needed apart from TLOCK, SETTASK and TFREE is a way of retrieving PC, Z and C from another task; Then my "preemptive dream" might come true!
It's foremost for my own pleasure and not so much for the community. But if it turns out to be usable and people sees a need for a mini OS/microkernel, then I might share!
As far as I can tell the only thing needed apart from TLOCK, SETTASK and TFREE is a way of retrieving PC, Z and C from another task; Then my "preemptive dream" might come true!
/Johannes
Something to consider: When a task is redirected by changing its PC via JMPTASK, any REPS/REPD or pending TLOCK is cancelled. So, if you redirect a task and then later restore the old Z/C/PC, it will not resume with REPS/REPD or the pending TLOCK, which could be fatal to the task's program. If you never used REPS/REPD/TLOCK in your tasks' code, there would be no problem. Is this going to work for you? Also, would you mind giving me a hint about what you will do with a task's Z/C/PC so that I'll be most inclined to add this feature? It sounded great at first, but I need to know that it's still great, considering these caveats.
Also, should I make JMPTASK set Z/C from S[31:30] so that they can be restored?
I created some bug earlier today and I'm trying to nail it down right now. It's taking almost 1 hour per compile.
The only thing that I see changing in the instruction set is possibly some SERDES-related instructions and then the USB-friendly pin operation. Maybe some CRC instruction(s), too.
Once I get this latest bug resolved, I should be able to put out an update - maybe this weekend.
Comments
Itanium was a great fit for all the compiler smarts SGI had, and Intel added the signals needed for their NUMA kilo CPU and up multi processing designs.
If you want a few thousand CPUs, one OS, big memory and I/O, with nuts FP tossed in, Itanium is a solid choice. Were it not for SGI, Intel may have had a big loss.
I find it hilarious how short posts of mine are always noted. You know what they say about great minds... I liked your last few posts, BTW. Some meat in there. Never a bad thing.
(or, if they are, we have scroll bars for that use case!)
Frankly, I'm on shorter post mode because I'm about to change jobs and it's been a little bit of an adventure getting the new gig sorted out. Small club in my niche, and couple that with a role change and I've not had much in the way of productive P2 time, so I've not had too much to say. Of course, all of that makes me cranky too, so less is more.
That will all change here in a coupla weeks, and I think my rate of increase on grey hair will see a nice decline that I've needed for a bit.
Thanks, much appreciated. I know it would have been spouted somewhere else on the Web but it's something I've never acutally seen written so have been wanting to get it off my chest.
Good to hear. Best of luck. Give us the good news when you're settled.
Binary compatibility is one way to do it. And sadly the only way at the time. Had we lived in an open source and Free Software world moving forward and changing architectures would be a compilation away. As it is today in the Linux world.
Sadly following Bill Gates' infamous open letter to hobbyists binary distribution became the norm. At the time it was not even clear that you would have copyright protection on a compiled binary, it's just a number right? The whole world saw an opportunity to make big bucks from that and a bit of lobbying to get the laws in place made it possible.
We are still suffering from that.
It's true that the switch to 64 bits happened much quicker than the switch to 32 bits. MS had built a "proper" OS with API's and drivers and protected processes etc etc. It was built with C/C++ instead of assembler. They already had experience building Windows for other machines, DEC Alpah, Itanium. So when AMD introduced the amd64 architecture support was only (almost) a compiler switch away.
I agree with all you say about the importance of the GPL. Depends what you mean by current laws. Richard Stallman conceived the GPL in 1989 as a reaction to the growing tide of closed source proprietary software he saw at the time. Seems he was the first to see how the use of such operating systems and programs ultimately take control of your computer away from you. That's bad enough for individuals but if you are a government or other institution not having control of your computers and data is a rather seriously bad situation to be in.
What happened?
When I see a potatohead post as arrived I sometimes put the kettle on, make a nice cup of tea and settle down for a good read.
Here I am, cup in hand, all set and what do I get. One measly line!
That's just not a good start to my day:)
The Unix world had open source, not that it was named, and decent APIs at the start; which is what RMS was initially trying to protect.
Yeah, you'd think governments would be more constructive towards their own needs at the very least.
By "current laws" I was painting a broad brush across the changes in laws covering the whole software patents/licensing trend for the last few decades.
Some of us are already doing that quite nicely on the P1. I'm really looking forward to replicating it on the P2. With the new instructions and addressing modes, it should run like stink !
Cheers,
Peter (pjv)
Additionally, could tasking be changed such that single-task mode is any time the task fields are all the same value, not just %00?
Actually, I already made that change a few weeks ago! I just forgot to mention it. I think the latest docs reflect this, though.
Only when it executes an instruction, but that's not practical for capturing and redirecting Z/C/PC.
I'll add an instruction which lets you poll any task for its Z/C/PC. I'm thinking maybe it will be possible to probe tasks even in OTHER cogs. Then you could have a trace debugger with no target cog's involvement.
That would be neat!
Debug usually has Step and Step-Over - would step-over work on REPx ?
Or, maybe the Debug SW could include REPx simulation, so it reloops on a Debug counter then exits.
It's not going to be real time when stepping, but you do want correct flags.
I'm talking about preemptive multitasking, not cooperative.
/Johannes
Well, that certainly is a different animal. It will be interesting to see what you come with, it certainly should be very useful !
Cheers,
Peter (pjv)
Also, in the prefetch mode, does the processor only fetch one additional cache line, or does it fetch three more cache lines to fill all of the instruction caches? To me it seems like the non-prefetch mode is the more useful mode, and prefetching should be disabled by default when starting the cog. Of course the default doesn't matter too much since the code can do an explicit ICACHEP or ICACHEN before jumping to hub code.
I also have a question about the non-prefetch mode. How does the logic determine the "least-recently-used" instruction cache. In spinsim I timestamp a cache with the system counter after every read, and then use the cache with the oldest timestamp. However, I would assume that the actual hardware is using a different method.
Prefetch happens when executing from the hub and a hub cycle is available and the current instruction is not using it. What is prefetched is the next 8-long block from where the executing PC is - that's it. So, prefetch only happens on unused hub cycles. This is the default cache mode and can be disabled/(re)enabled by ICACHEN/ICACHEP.
The least-recently used algorithm works by keeping track, to 8 bits with saturation, of how many cache misses each cache line experienced, in sequence. If a cache line is hit or reloaded, its counter is reset to 0. This algorithm is used by both prefetch and non-prefetch modes.
I created some bug earlier today and I'm trying to nail it down right now. It's taking almost 1 hour per compile.
The only thing that I see changing in the instruction set is possibly some SERDES-related instructions and then the USB-friendly pin operation. Maybe some CRC instruction(s), too.
Once I get this latest bug resolved, I should be able to put out an update - maybe this weekend.
It's foremost for my own pleasure and not so much for the community. But if it turns out to be usable and people sees a need for a mini OS/microkernel, then I might share!
As far as I can tell the only thing needed apart from TLOCK, SETTASK and TFREE is a way of retrieving PC, Z and C from another task; Then my "preemptive dream" might come true!
/Johannes
Something to consider: When a task is redirected by changing its PC via JMPTASK, any REPS/REPD or pending TLOCK is cancelled. So, if you redirect a task and then later restore the old Z/C/PC, it will not resume with REPS/REPD or the pending TLOCK, which could be fatal to the task's program. If you never used REPS/REPD/TLOCK in your tasks' code, there would be no problem. Is this going to work for you? Also, would you mind giving me a hint about what you will do with a task's Z/C/PC so that I'll be most inclined to add this feature? It sounded great at first, but I need to know that it's still great, considering these caveats.
Also, should I make JMPTASK set Z/C from S[31:30] so that they can be restored?
What is the behavior right now? Does JMPTASK clear them, or just leave them unchanged?
JMPTASK currently only affects tasks' PCs. The tasks' Z/C flags are unchanged by JMPTASK.