8086 Story - heater correct!

Cluso99 · 2018-03-29 21:08

Story today fills in the gaps and vindicate heaters assertion that the 8086 was developed in a rush due to the failure of the i432
eejournal.com/article/fifty-or-sixty-years-of-processor-developmentfor-this/

Thought your memory was correct heater

Heater. · 2018-03-29 22:06

My memory is perhaps not so good. I was not there anyway!

I do recall having the i432 data book back in 1983 or so and not understanding it at all. Far too complex for this 8085, 6502, 6800, 6809 boy.

No. The story I told about the x86 panic design comes from recently watching a youtube video by an old Intel hand. He said 12 weeks to design the instructions set. Patterson here tells it at as only four weeks! Both stories tell that Intel insiders knew the x86 was a "dog".

The amazing thing for me about all this is that David Hennessy and Davide Patterson, the lords of processor design, the RISC crusaders, inspiring the likes of MIPS, SPARK, ARM processors and now RISC-V are second in the list of references in this Wikipedia article:
https://en.wikipedia.org/wiki/ZPU_(microprocessor)

Who is 4th in that articles reference list? Why that would be me. A link to this very forum.

I am amazed and honoured to be in such company.

Small world, isn't it?

Sadly I missed Patterson speaking that the Maker Faire in San Mateo last year.

Phil Pilgrim (PhiPi) · 2018-03-29 22:25

What I didn't realize until reading the article is that modern versions of the x86 architecture don't even run x86 instructions. The x86 code gets sliced and diced into RISC instructions on the fly, which then get executed. Talk about putting an old architecture on modern life support!

-Phil

evanh · 2018-03-29 23:02

Phil Pilgrim (PhiPi) wrote: »

What I didn't realize until reading the article is that modern versions of the x86 architecture don't even run x86 instructions. The x86 code gets sliced and diced into RISC instructions on the fly, which then get executed. Talk about putting an old architecture on modern life support!

I'd always thought of it that way. The moment pipelining was added, things changed.

It's amazing what can be achieved when there is unlimited money to throw at the problem.

Heater. · 2018-03-29 23:04

Actually Phil, I think you will find the modern versions of the x86 architecture will run original 16 bit x86 code.

You can still boot MS-DOS and FreeDOS etc.

When you switch to 16 bit protected mode (286) or 32 bit (386 and up) or 64 bit (AMD and up) modes the instructions look much the same. In assembler source code and in the binary opcodes. Except now the registers get bigger and there are more of them and so on.

If I understand correctly the AAA instruction (ASCII Adjust for Addition) did actually, finally, get dropped in 64 bit mode (amd64). That instruction dates back to the DAA of the 8080!

Thing is the processor hoovers up all that CISC x86 instruction set junk and converts it, on the fly, to a simple RISC instruction set, which it can then pipeline easily, reorder, parallel dispatch and do all that speculative execution etc with.

But, yeah, "life support" sums it up.

We only ever needed that life support so that we could carry all that closed source software from Microsoft and others to the new machines as they came along.

Today, in the Open Source world, moving forward is only a recompile away. So x86 looks ridiculous. RISC-V is the way to go.

Edit: Hmmm....what do we mean by x86 instruction set? Intel has added one instruction per month since the beginning. Have a look at the graph Fig 2.43 in the Patterson book:
https://books.google.fi/books?id=DMxe9AI4-9gC&pg=PA168&lpg=PA168&dq=x86+instructions+added+per+month&source=bl&ots=w-JeV4gJfI&sig=cmDpPjidcNNacjbYUz9l2V0hbTM&hl=en&sa=X&ved=0ahUKEwjJnebK0pLaAhVhM5oKHb7rC9YQ6AEIeDAH#v=onepage&q=month&f=false

There is over a thousand x86 instructions now. Insane!

Heater. · 2018-03-29 23:28

evanh,

It's amazing what can be achieved when there is unlimited money to throw at the problem.

Yes.

Not just money though. Transistors. This insanity was made possible because of the exponentially rising number of transistors one could economically throw at the problem. Moore's Law and all that.

Now that Moore's law is at an end it's time to rethink the problem of moving processor performance forward.

evanh · 2018-03-30 00:54

Which itself is a result of the money thrown at the problem.

evanh · 2018-03-30 05:39

Heater. wrote: »

... the problem of moving processor performance forward.

Just more cores I'd guess.

To keep going down this path though there is a need to merge 2nd and 3rd level caches. So, have both immediate access and shared access combined. This should allow data duplication within the general cache where suited, while also providing full cache space for unique data too.

So, shift from on-chip multilevel caching to on-chip multilevel switch fabric, with general cache blocks, instead.

Heater. · 2018-03-30 06:22

vanh,

Which itself is a result of the money thrown at the problem.

Sort of, yes and no, I think. Certainly we customers threw money at Intel which enabled Intel to continue to shrink the transistor and grow the x86 puppy dog into ever bigger dogs. How stupid were we customers?

But what if it had turned out to be physically impossible to shrink the the transistor further after 1990 or so? Then throwing money at the problem would not have helped. It would be like investing in time travel or faster than light communications. Soon we would have migrated to ARM or similar architecture. The first ARM used the same number of transistors as x86 and ran 10 to 20 times faster at a lot less power consumption.

Just more cores I'd guess.

Nope. One of Patterson's main points in the presentation linked to in the OP is that simply throwing cores at the problem does not work.

Have a look at the processor performance progress graph in the article. It became physically/economically impossible to increase clock speeds so rapidly due to the end of Dennard Scaling around 2013 so we started throwing cores at the problem, which we could because Moore's law provided more transistors all the time. But that hits a limit when Amdahl’s Law kick's in around 2010-15 and progress starts to flatten out.

Now Moore's law is breaking down and we are not able to physically/economically double the transistor count every two years. Progress is almost flat.

jmg · 2018-03-30 06:29

Heater. wrote: »

Now Moore's law is breaking down and we are not able to physically/economically double the transistor count every two years. Progress is almost flat.

I think we are seeing more focused processors, instead of the general PC desktop one.

Here is one example, where NVIDIA are claiming a x10 gain - that's hardly 'almost flat'.

https://www10.edacafe.com/nbc/articles/1/1575472/NVIDIA-Boosts-Worlds-Leading-Deep-Learning-Computing-Platform-Bringing-10x-Performance-Gain-Six-Months

evanh · 2018-03-30 06:42

Yeah, we're a long way from any limit on multiprocessor options. And the so-called "Moore's Law" is nothing really. It was only a rough observation of Intel's progress. It gets manipulated and promoted as suits Intel.

Tor · 2018-03-30 07:44

One question about the article: It said that the iAPX 432 was designed to run Ada. That doesn't sound right. IIRC the iAPX 432 was designed for strongly object [edit: or logic?] oriented languages like Prolog. And Ada wasn't really around when Intel started its advertising campaigns for the future iAPX 432 (Byte magazine was full of them). While Prolog came out at least ten years before Ada.

Update: Some searching indicates that an early language for the iAPX 432 wss OPL, Object Programming Language, which was based on Smalltalk. On the other hand a 1992 book claims that the cpu was designed for Ada. I'm still having some issues with that, given that Ada didn't yet exist when the iAPX 432 project started.

Heater. · 2018-03-30 07:50

I agree, Moores's Law is no kind of Law as such. Just an observation that Gordon made. It's not really anything to do with Intel though. Moore wrote his paper in 1965 after watching integrated circuit progress for a few years. Two transistors, four transistors, eight transistors.... Intel did not exit till 1968 and was not a thing to be making laws about till the early 1970's.

However the ideas in Moores original paper are a bit more subtle than is normally stated as Moores's Law. They were actually a thing. Basically he said that given the state of technology available there is an economically optimal number of transistors that can be integrated onto a chip. Build chips with less transistors than that optimum and you are not using the technology to full advantage, price per transistor goes up. Build chips with more transistors than that optimum and you are hitting yield and process problems and the price per transistor goes up.

That factual observation is more like a law, a law of economics. It then poses the question: What can we make with that optimal number of transistors to get the best bang for the buck?

He did mention in passing that, oh by the way that optimum transistor density seems to be doubling every two years or so.

This is kind of deep. For a couple of reasons:

If you have that insight then clearly you can invest money into starting development of things that you cannot build today but you know can be built when the design is completed. That might give you an advantage over your competitors who are stuck designing only what can be built today.

Also it pushes one to building processors. The best way to get the most bang per buck out of the available transistors.

I agree, "Moores Law" has been misstated and misused by the media and marketing for ages.

As for the "limit on multiprocessor options". I think the feeling is now that throwing more x86, or ARM or whatever, cores at the problem is not going to get us the performance gains at the rate we have been used to for decades. Certainly that is what Patterson is arguing.

Interestingly Patterson, one of the fathers of the RISC idea, is not totally stuck on RISC. What he talks of now is "domain specific" processors. Customize your core to tackle the particular job rather than have lots of the same general purpose core. That is almost the opposite of RISC as we know it. That is why Patterson is now heading up Google's TPU neural net processor design.

evanh · 2018-03-30 08:44

Okay, I didn't know it was that old. Certainly I'd only ever known it as being association with Intel, and Intel alone, with the assumption being it's one of their talking points.

Any DMA engine is domain specific, so there has been much of that all along. The ratios vary with target audience though. Unified GPU is a good example, most web surfers are happy (oblivious even) with one, while both IT and PC gamers hate them. It's a big chunk of the die that could be useful for more CPU.

I still see fat, general CPUs being a thing.

Heater. · 2018-03-30 09:34

I agree, the general purpose CPU is not going away anytime soon.

But when we want performance for specific things specialist architectures or specialist extensions to the general purpose core will get used more and more. This is already the case in mobile phones. They have a bunch of different processors and chips crunching on things in there.

Who wants performance today? Well, still the graphics and gamer guys. But now the bitcoin miners, see how they developed their own custom hardware, throwing Intel or other general purpose cores at it is not a good solution. What about the deep learning guys? Same story.

Like you say, the ratios vary with target audience.

evanh · 2018-03-30 10:30

Real performance was always in specific hardware. That hasn't changed. Only PC weenies might have thought otherwise.

What has changed is the ease of contract manufacturing and of course the number of transistors to work with.

Cluso99 · 2018-03-30 16:37

I firmly believe that Moores Law is not at an end. It's just my take that I look at the outcome of Moores Law rather than the implementation.

To maintain the performance outcome of Moores Law, the designers must now think about using those available transistors in a much better way. IOW they are going to need to think rather than just throw transistors at the problem. This is something not really done over the past some 30 years. It's just like assembler programmers could squeeze much more out of the early processors in less code space.

I still believe we will see the increase in transistors on the chip. But the extras will be used differently. I expect there to be a new layer set supporting a layer of RAM. That could add a GB or more directly accessible to the CPU(s) at chip speed. The power increase for the chip would be negligible. Now we wouldn't need al those levels of cache, so those complexities, and the transistors (and space) implementing both the cache and support can be retargetted.

I also expect we will see further breakthroughs in the way transistors are built, along the lines of the FinFET. This will reduce the footprint size, leakage, and tweak the speed. This might require more layers to achieve this. Design Engineers are going to have to think for a change!

Also, I don't expect RISC-V to be the saviour. We'll see other changes too. It's great to provide single clock execution for simple instructions, and having a number of them exccute in parallel too. But we've spent way too much time on floating point arithmetic. It's time to improve fixed point maths. Perhaps we need 64-bit add/sub/mult/div instructions with MULT taking 2 x 64bit giving a 128bit result and a register containing info on the result (like how many leading zeros in the result). DIV takes a 128bit and a 64bit divisor giving a 64bit result and 64bit remainder. Now we don't have overflows to deal with either. I worked on an assembler mini that did this from the seventies using 10 digit decimals.

Roy Eltham · 2018-03-30 18:28

GPU's have been making giant leaps in performance over the last 10 years. Also, massively parallel with some having thousands of "cores" (the latest Volta full set is 5376 each of float32 cores and int32 cores, 2688 float64 cores, and 672 tensor cores). They are capable of a lot more general computing things now too, not just fixed function graphics related stuff. If they keep going on the trajectory they are on, I think they will replace the traditional CPU in a lot of situations (if not all).

They also are using the latest memory technology to get really wide/fast throughput (orders of magnitude above current CPU levels, approaching Terabytes/s vs in the tens of Gigabytes/s).

You can program them in a C like language too.

msrobots · 2018-03-31 02:38

@Roy,

wow. I did not know that at all. Thousands of cores running in parallel on a GPU. Interesting. C like language, hmm. Can you maybe give me some keywords for googling?

I remember from other posts of you that you have experience in game programming, as a COBOL programmer I am usually very far away from even simple graphical environments, but would be interested for personal fun.

I will start with 'Volta full set' no clue what that is, yet...

wrong, 'CUDA' is what one has to search for. Interesting.

Mike

Roy Eltham · 2018-03-31 04:17

Yeah, CUDA is the C like language nvidia has for their GPUs.

To be fair, many of the thousands of cores are fairly specialized/limited on their own. However, they have been expanding their ability a lot to cater to the machine learning, scientific, and so on crowds. The Volta Architecture (another decent search) is the latest and greatest (very expensive), with expanded general computing ability.

I am a programmer for a game dev, yes. I have done a fair bit of graphics stuff. Including writing shader programs in HLSL (which has expanded a lot in recent revisions). HLSL is C like also, and it will work on pretty much any GPU. Graphics shaders generally work on textures, models, and whatnot for doing graphics rendering, they have a semi-rigid flow to them that fits the graphics rendering pipeline. We also have what we call "Compute Shaders" which are more general computing in nature. They are written in HLSL also. With those we have more generally useful memory access ability, and no graphics pipeline flow restrictions. Most modern games will mix the two types of shaders, depending on the needs.

Heater. · 2018-03-31 08:35

Interestingly Nvidia is using RISC V in their next generation GPU's:
https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_RISCV_Story_V2.pdf

The shader thing looks like fun. I'll have to remember the "Book of Shaders" for a rainy afternoon.
http://thebookofshaders.com

Roy, do you have any recommendations for "Shaders for Dummies" introduction?

Roy Eltham · 2018-04-01 16:25

Heater,
I don't really have any good recommendations, really. I started out in game graphics before we had GPUs, and everything was done in software. I learned each advancement into GPUs and shaders as they came out using just the API docs and hardware vendor white papers and what not, and I can't recommend that path anymore.

I would suggest going to shadertoy.com and just pick some random ones you find interesting looking, and look at the code. Try changing things and seeing the results. You ought to be able to sort out the syntax pretty easily just from looking at it. The book of shaders is an alright choice as well.

Electrodude · 2020-08-19 05:12

Roy Eltham wrote: »

Heater,
I don't really have any good recommendations, really. I started out in game graphics before we had GPUs, and everything was done in software. I learned each advancement into GPUs and shaders as they came out using just the API docs and hardware vendor white papers and what not, and I can't recommend that path anymore.

I would suggest going to shadertoy.com and just pick some random ones you find interesting looking, and look at the code. Try changing things and seeing the results. You ought to be able to sort out the syntax pretty easily just from looking at it. The book of shaders is an alright choice as well.

There's some pretty neat stuff on Shadertoy. Here's a working, playable (and slow, unless you have a discrete GPU) implementation of Minecraft, running entirely on the GPU, including player movement and world interaction:
https://www.shadertoy.com/view/wsByWV

Also, here's a program I wrote that graphs implicit equations and color functions of the screen (e.g. domain coloring, Mandelbrot sets, etc.) and other such things on the GPU, with animations. It's rather buggy; you'll probably find it easier to edit its savefiles in a text editor than to actually edit things inside the program. I've only tested it on my machine; it should work on any Linux machine. The README unhelpfully doesn't say this, but it needs GLFW 3, along with all the libraries it includes as submodules:
https://github.com/electrodude/gpugraph

8086 Story - heater correct!

Comments