8086 Story - heater correct!
Cluso99
Posts: 18,069
Story today fills in the gaps and vindicate heaters assertion that the 8086 was developed in a rush due to the failure of the i432
eejournal.com/article/fifty-or-sixty-years-of-processor-developmentfor-this/
Thought your memory was correct heater
eejournal.com/article/fifty-or-sixty-years-of-processor-developmentfor-this/
Thought your memory was correct heater
Comments
I do recall having the i432 data book back in 1983 or so and not understanding it at all. Far too complex for this 8085, 6502, 6800, 6809 boy.
No. The story I told about the x86 panic design comes from recently watching a youtube video by an old Intel hand. He said 12 weeks to design the instructions set. Patterson here tells it at as only four weeks! Both stories tell that Intel insiders knew the x86 was a "dog".
The amazing thing for me about all this is that David Hennessy and Davide Patterson, the lords of processor design, the RISC crusaders, inspiring the likes of MIPS, SPARK, ARM processors and now RISC-V are second in the list of references in this Wikipedia article:
https://en.wikipedia.org/wiki/ZPU_(microprocessor)
Who is 4th in that articles reference list? Why that would be me. A link to this very forum.
I am amazed and honoured to be in such company.
Small world, isn't it?
Sadly I missed Patterson speaking that the Maker Faire in San Mateo last year.
-Phil
I'd always thought of it that way. The moment pipelining was added, things changed.
It's amazing what can be achieved when there is unlimited money to throw at the problem.
You can still boot MS-DOS and FreeDOS etc.
When you switch to 16 bit protected mode (286) or 32 bit (386 and up) or 64 bit (AMD and up) modes the instructions look much the same. In assembler source code and in the binary opcodes. Except now the registers get bigger and there are more of them and so on.
If I understand correctly the AAA instruction (ASCII Adjust for Addition) did actually, finally, get dropped in 64 bit mode (amd64). That instruction dates back to the DAA of the 8080!
Thing is the processor hoovers up all that CISC x86 instruction set junk and converts it, on the fly, to a simple RISC instruction set, which it can then pipeline easily, reorder, parallel dispatch and do all that speculative execution etc with.
But, yeah, "life support" sums it up.
We only ever needed that life support so that we could carry all that closed source software from Microsoft and others to the new machines as they came along.
Today, in the Open Source world, moving forward is only a recompile away. So x86 looks ridiculous. RISC-V is the way to go.
Edit: Hmmm....what do we mean by x86 instruction set? Intel has added one instruction per month since the beginning. Have a look at the graph Fig 2.43 in the Patterson book:
https://books.google.fi/books?id=DMxe9AI4-9gC&pg=PA168&lpg=PA168&dq=x86+instructions+added+per+month&source=bl&ots=w-JeV4gJfI&sig=cmDpPjidcNNacjbYUz9l2V0hbTM&hl=en&sa=X&ved=0ahUKEwjJnebK0pLaAhVhM5oKHb7rC9YQ6AEIeDAH#v=onepage&q=month&f=false
There is over a thousand x86 instructions now. Insane!
Not just money though. Transistors. This insanity was made possible because of the exponentially rising number of transistors one could economically throw at the problem. Moore's Law and all that.
Now that Moore's law is at an end it's time to rethink the problem of moving processor performance forward.
To keep going down this path though there is a need to merge 2nd and 3rd level caches. So, have both immediate access and shared access combined. This should allow data duplication within the general cache where suited, while also providing full cache space for unique data too.
So, shift from on-chip multilevel caching to on-chip multilevel switch fabric, with general cache blocks, instead.
But what if it had turned out to be physically impossible to shrink the the transistor further after 1990 or so? Then throwing money at the problem would not have helped. It would be like investing in time travel or faster than light communications. Soon we would have migrated to ARM or similar architecture. The first ARM used the same number of transistors as x86 and ran 10 to 20 times faster at a lot less power consumption. Nope. One of Patterson's main points in the presentation linked to in the OP is that simply throwing cores at the problem does not work.
Have a look at the processor performance progress graph in the article. It became physically/economically impossible to increase clock speeds so rapidly due to the end of Dennard Scaling around 2013 so we started throwing cores at the problem, which we could because Moore's law provided more transistors all the time. But that hits a limit when Amdahl’s Law kick's in around 2010-15 and progress starts to flatten out.
Now Moore's law is breaking down and we are not able to physically/economically double the transistor count every two years. Progress is almost flat.
I think we are seeing more focused processors, instead of the general PC desktop one.
Here is one example, where NVIDIA are claiming a x10 gain - that's hardly 'almost flat'.
https://www10.edacafe.com/nbc/articles/1/1575472/NVIDIA-Boosts-Worlds-Leading-Deep-Learning-Computing-Platform-Bringing-10x-Performance-Gain-Six-Months
Update: Some searching indicates that an early language for the iAPX 432 wss OPL, Object Programming Language, which was based on Smalltalk. On the other hand a 1992 book claims that the cpu was designed for Ada. I'm still having some issues with that, given that Ada didn't yet exist when the iAPX 432 project started.
However the ideas in Moores original paper are a bit more subtle than is normally stated as Moores's Law. They were actually a thing. Basically he said that given the state of technology available there is an economically optimal number of transistors that can be integrated onto a chip. Build chips with less transistors than that optimum and you are not using the technology to full advantage, price per transistor goes up. Build chips with more transistors than that optimum and you are hitting yield and process problems and the price per transistor goes up.
That factual observation is more like a law, a law of economics. It then poses the question: What can we make with that optimal number of transistors to get the best bang for the buck?
He did mention in passing that, oh by the way that optimum transistor density seems to be doubling every two years or so.
This is kind of deep. For a couple of reasons:
If you have that insight then clearly you can invest money into starting development of things that you cannot build today but you know can be built when the design is completed. That might give you an advantage over your competitors who are stuck designing only what can be built today.
Also it pushes one to building processors. The best way to get the most bang per buck out of the available transistors.
I agree, "Moores Law" has been misstated and misused by the media and marketing for ages.
As for the "limit on multiprocessor options". I think the feeling is now that throwing more x86, or ARM or whatever, cores at the problem is not going to get us the performance gains at the rate we have been used to for decades. Certainly that is what Patterson is arguing.
Interestingly Patterson, one of the fathers of the RISC idea, is not totally stuck on RISC. What he talks of now is "domain specific" processors. Customize your core to tackle the particular job rather than have lots of the same general purpose core. That is almost the opposite of RISC as we know it. That is why Patterson is now heading up Google's TPU neural net processor design.
Any DMA engine is domain specific, so there has been much of that all along. The ratios vary with target audience though. Unified GPU is a good example, most web surfers are happy (oblivious even) with one, while both IT and PC gamers hate them. It's a big chunk of the die that could be useful for more CPU.
I still see fat, general CPUs being a thing.
But when we want performance for specific things specialist architectures or specialist extensions to the general purpose core will get used more and more. This is already the case in mobile phones. They have a bunch of different processors and chips crunching on things in there.
Who wants performance today? Well, still the graphics and gamer guys. But now the bitcoin miners, see how they developed their own custom hardware, throwing Intel or other general purpose cores at it is not a good solution. What about the deep learning guys? Same story.
Like you say, the ratios vary with target audience.
What has changed is the ease of contract manufacturing and of course the number of transistors to work with.
To maintain the performance outcome of Moores Law, the designers must now think about using those available transistors in a much better way. IOW they are going to need to think rather than just throw transistors at the problem. This is something not really done over the past some 30 years. It's just like assembler programmers could squeeze much more out of the early processors in less code space.
I still believe we will see the increase in transistors on the chip. But the extras will be used differently. I expect there to be a new layer set supporting a layer of RAM. That could add a GB or more directly accessible to the CPU(s) at chip speed. The power increase for the chip would be negligible. Now we wouldn't need al those levels of cache, so those complexities, and the transistors (and space) implementing both the cache and support can be retargetted.
I also expect we will see further breakthroughs in the way transistors are built, along the lines of the FinFET. This will reduce the footprint size, leakage, and tweak the speed. This might require more layers to achieve this. Design Engineers are going to have to think for a change!
Also, I don't expect RISC-V to be the saviour. We'll see other changes too. It's great to provide single clock execution for simple instructions, and having a number of them exccute in parallel too. But we've spent way too much time on floating point arithmetic. It's time to improve fixed point maths. Perhaps we need 64-bit add/sub/mult/div instructions with MULT taking 2 x 64bit giving a 128bit result and a register containing info on the result (like how many leading zeros in the result). DIV takes a 128bit and a 64bit divisor giving a 64bit result and 64bit remainder. Now we don't have overflows to deal with either. I worked on an assembler mini that did this from the seventies using 10 digit decimals.
They also are using the latest memory technology to get really wide/fast throughput (orders of magnitude above current CPU levels, approaching Terabytes/s vs in the tens of Gigabytes/s).
You can program them in a C like language too.
wow. I did not know that at all. Thousands of cores running in parallel on a GPU. Interesting. C like language, hmm. Can you maybe give me some keywords for googling?
I remember from other posts of you that you have experience in game programming, as a COBOL programmer I am usually very far away from even simple graphical environments, but would be interested for personal fun.
I will start with 'Volta full set' no clue what that is, yet...
wrong, 'CUDA' is what one has to search for. Interesting.
Mike
To be fair, many of the thousands of cores are fairly specialized/limited on their own. However, they have been expanding their ability a lot to cater to the machine learning, scientific, and so on crowds. The Volta Architecture (another decent search) is the latest and greatest (very expensive), with expanded general computing ability.
I am a programmer for a game dev, yes. I have done a fair bit of graphics stuff. Including writing shader programs in HLSL (which has expanded a lot in recent revisions). HLSL is C like also, and it will work on pretty much any GPU. Graphics shaders generally work on textures, models, and whatnot for doing graphics rendering, they have a semi-rigid flow to them that fits the graphics rendering pipeline. We also have what we call "Compute Shaders" which are more general computing in nature. They are written in HLSL also. With those we have more generally useful memory access ability, and no graphics pipeline flow restrictions. Most modern games will mix the two types of shaders, depending on the needs.
https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_RISCV_Story_V2.pdf
The shader thing looks like fun. I'll have to remember the "Book of Shaders" for a rainy afternoon.
http://thebookofshaders.com
Roy, do you have any recommendations for "Shaders for Dummies" introduction?
I don't really have any good recommendations, really. I started out in game graphics before we had GPUs, and everything was done in software. I learned each advancement into GPUs and shaders as they came out using just the API docs and hardware vendor white papers and what not, and I can't recommend that path anymore.
I would suggest going to shadertoy.com and just pick some random ones you find interesting looking, and look at the code. Try changing things and seeing the results. You ought to be able to sort out the syntax pretty easily just from looking at it. The book of shaders is an alright choice as well.
There's some pretty neat stuff on Shadertoy. Here's a working, playable (and slow, unless you have a discrete GPU) implementation of Minecraft, running entirely on the GPU, including player movement and world interaction:
https://www.shadertoy.com/view/wsByWV
Also, here's a program I wrote that graphs implicit equations and color functions of the screen (e.g. domain coloring, Mandelbrot sets, etc.) and other such things on the GPU, with animations. It's rather buggy; you'll probably find it easier to edit its savefiles in a text editor than to actually edit things inside the program. I've only tested it on my machine; it should work on any Linux machine. The README unhelpfully doesn't say this, but it needs GLFW 3, along with all the libraries it includes as submodules:
https://github.com/electrodude/gpugraph