Library OS?
potatohead
Posts: 10,261
https://www.sigarch.org/leave-your-os-at-home-the-rise-of-library-operating-systems/
Suggesting very strongly an OS is not required on modern hardware.
Suggesting very strongly an OS is not required on modern hardware.
Comments
For the Propeller, not much is needed ... maybe some common communication areas for I/O drivers, maybe some frequently used I/O drivers (like keyboard / display or serial terminal support) along with a loader which can mostly reside in its own cog. Several such OSes have been written for the Propeller consisting mostly of utility programs, a few I/O drivers, and an initialization program
"The efficiency of Operating Systems (OSes) has always been in the spotlight of systems researchers ... But the reason for this obsession is not entirely obvious."
Then we have:
"It turns out that in data center workloads about 15-20% of CPU cycles are spent in the OS kernel. Indeed, most of these workloads are I/O intensive, thus they stress multiple OS components, from device drivers through networking and file I/O stack, to the OS scheduler managing thousands of threads."
Which is to say that the reason for the obsession over OS efficiency is obvious. As spelled out in the second paragraph.
Of course only 15-20% of data center apps are spent in the OS kernel. Most data center apps are written in languages like Java, Javascript, PHP, etc that are horrendously inefficient. It's a tribute to the OS builders that only 20% of CPU cycles are spent in their code, despite the stress they are under.
I'll have to try reading past that point again...
However, whilst we are here, I might ask why do my programs need an operating system at all? They should be able to run on bare metal. As they do in many embedded, micro-controller, systems.
Even in the cloud (data center) my programs need only a few things: A CPU to run them, some memory to work in, some means of input and output.
I don't need a time sharing system to share that CPU with others. No scheduler, context switches and all that junk.
I don't need a network stack. Just give me some simple pipes for data in and data out. Can be done in simple hardware like the "channels" of the Transputer,
I don't need a file system. Just let me output data with some identifier and let me get it back later with that identifier (Using those pipes mentioned above.)
Meanwhile, other processors in that "cloud" setup can take care of networking, files systems, etc. Guess what? They don't need an OS either. In this picture they also run on bare metal.
What am I saying here?
What we want is many simple processors with simple software hooked together with communication pipes. Rather than huge complex processors, with their huge complex operating systems and protected memory, virtualization and all that Smile.
See: "Communicating Sequential Processes", Tony Hoare, 1985.
http://www.usingcsp.com/cspbook.pdf
It's all historical baggage. Got a 10 million dollar main frame and a 100 potential users? Better devise a way to share that mainframe among the users.
The micro-processor has been faithfully following, re-inventing, all those old main frame ideas ever since. The complexity has been escalating. Together with high energy requirements, bugs, and security issues.
It need not be like that. If my application needs network access that is just a simple hardware channel. Storage? It's on the end of a simple hardware channel. Console port? Another channel. Just give me a processor, memory and channels to talk to.
Given that we can now build small, cheap, low power processors why not dedicate one to every application in the cloud? Everything simple and regular.
Things are heading that way. Google was using a ton of regular servers to implement it's neural nets for language translation an such. Now they have the Tensor Processing Unit. https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu
Of course if we build millions of small, simple, single application processors to provide cloud server instances the "operating system" moves out of those processors to somewhere else. Something has to manage that network of channel connected processors, allocate them as needed, connect them up correctly, detect failures, etc. A whole new ball game.
Certainly it seems that old, forgotten, ideas can become new again as technology and economics changes.
I'm not sure that engineers do get much of training in the history of their art. A couple of examples:
0) The original micro-processor instruction sets were designed by electronics engineers not those steeped in computer history. That is why the Intel and Motorola (6800) instruction sets are so awful.
1) Back in the day Intel decided it needed to jump from 8 bit to 16 or 32 bit processors. They hired a whole team of CS grads to design it. That was the i432. After a year or more it was still going to take "another year" had become too complex to build and performance sucked. The 432 was canned and and in 10 weeks an emergency project team produce the x86 design we know and "love" today. Those 432 guys had not learned from the famous collapse of Multics years before.
2) Later Intel decided the x86 architecture was Smile. They needed to jump to 64 bits so that was a good time for an architecture change. After billions spent the result was the Itanium. Which as you know was a disaster and Intel had to quickly turn around and adopt AMD's x64 architecture. The Itanium designers had never learned that a VLIW machines had never been made to perform well. Shunting the optimization required for VLIW to the compiler is a problem that has never been solved.
It's interesting to hear Dave Patterson (of RISC fame) talk of the last 30 or 40 years of computer architecture history:
As for our massive network of single application, super simple, OS free processors, that is something we have never done before. We did not have the technology for it. Or the economic incentive. Who knows where it will lead. But I can't help thinking the current Intel model has not long to live.
The Datapoint 220 was not conceived as a computer. It was a terminal. Given the requirements and the constraints I'm sure it was great. It is not clear to me that the instruction set of the 8080 was the same as Datapoints original TTL implementations.
From wikipedia:
"Poor and fellow amateur radio colleague Harry Pyle produced the underlying architecture of the modern microprocessor on a living room floor. They then asked fellow radio amateur Jonathan Schmidt to write the accompanying communications software. Pitching the idea to both Texas Instruments and Intel, the partnership developed the Intel 8008, the forerunner of the microprocessor chips found in today's personal and computing devices"
The 8008 instruction set seems to have been an independent development.
It is amazing that the 8008 instruction set lived on to the 8085. And more amazing that the x86 instruction set is basically the same with some additions. Back in the day we ran 8085 assembler source through Intel's "conv86" translator and it would spit out x86 assembler syntax that would work out of the box. The translation was almost all a direct mapping of a single 8080 instruction to a single x86 instruction. They were so similar.
And that is why we still have a valuable single byte instruction used for Ascii Adjust for Addition (AAA) in the latest x86 incarnations. Even though it is never used.
Gotta give credit to Intel for trying to wipe the slate clean and kill off x86 with something else. Sadly it was the Itanium.
Datapoint was originally Computer Terminal Corporation and most of their business was terminals. The 2200 was marketed as a business computer from the beginning.
Vic, Jonathan, and Harry did develop the instruction set in Vic's living room. Harry had been going to Case Inst. of Technology in Cleveland, studying computer science.
I got involved writing an ASCII decimal arithmetic package for one of the 2200 prototypes. They had developed a business programming language called Databus with a compiler and interpreter and needed the arithmetic package to do variable length, arbitrary decimal place signed arithmetic. Eventually I went to work for them.
I guess that pesky DAA instruction came in very handy for what you were doing.
Interestingly by the time CP/M came around it was almost never used. When I was creating the Z80 emulator for the Prop I got CP/M up and running before I implemented DAA. Never did find a CP/M program that used it. It's been there wasting a valuable single byte opcode space in our computers ever since!
Thanks for that info. Right at the forefront of mini-computing.
I find it very interesting about requiring decimal arithmetic. The mini I worked on had decimal arithmetic inbuilt in the hardware. The memory addressing was also decimal, with an instruction using 10 ascii 6-bit characters.
Datapoint was also actively involved in wireless networking and developed an operating system for distributed networking where all parts of an active program could be farmed out to other instances of the system as long as permissions were granted. You could have a program's console in one place, the program executing elsewhere, a printer and com channel in other places, and disk drives elsewhere.
They had optical links for network paths (a mile or so) where cabling or microwave links were not available.
The mini supported decimal arithmetic up to 10 digits (20 for multiply and divide to avoid overflow). And then an edit instruction to format the output. It was a RISC machine with 15 instructions. Instructions were memory to memory, with indirect and indexing options.
What could be done in a few KB of memory was amazing. But it was B&W (or green on black) video terminals, uppercase ASCII.
In 1981 a major upgrade brought 8-bit ASCII (from 6-bit ASCII) and a few additional instructions including some logic ones.
It was certainly an exciting part of my life.
And before some smart operators in Berkley(?), bored by just sitting around and running batch jobs INVENTED the idea of 'time-sharing' most computers did just run one program after the other. No OS needed, for that we had operators.
This is like we are cycling backwards. After Single Batch per Mainframe to Time-Share on MF, to Personal Computer to PC is now just Web-Browser like a MF-Terminal to we need HW just doing one job, without 'Time-sharing' like a OS does.
To me it looks like the cycle is closed.
Mike
What we have been discussing here is the idea of a single process running on a single processor with no operating system. Or at least a very minimal one. That might sound like the old batch job running mainframes of old but it's very different.
Those old batch jobs were run, one at a time, from a queue. Perhaps the queue was a stack of punched cards. Each job started, did it's thing, then ended. Then the next job was loaded from the queue.
What we have been talking about is a single process running on a single processor but it is expected to run forever. Servicing requests for whatever it does as they come.
Significantly systems today comprise many of those processes running at the same time. Tens, hundreds, thousands of them. Each on their own processor. Some might be dedicated to database work, some might be web servers, some might be processing this or that. See Facebook, Google, etc, etc.
When you submit a job to such a system, like actually hitting submit on a web page, your job is making work for many of those processors. Not just one.
Of course the issues of resource management and such, as handled by those later multi-tasking, multi-user operating systems are still there. But now they are in load balancers and such that direct the work to the available processors.
The 8086/8088 and its successors are not object code compatible with the 8080, although all its successors to the modern day are object code compatible with the 8086. The goal with the 8086 was source code compatibility, so you could take an assembly language program written for the 8080 and compile it directly to new machine code that would run on the '86. The actual instruction set was quite different though because of the implementation of the segment registers we loved *cough* so much back in the day. The idea was to provide for quick migration of existing 8080 software while providing a smooth upgrade path (without mode bits) for future enhancements involving more memory and more powerful CPU instructions.
Of course by the time of the 8086, the Z80 had gained a lot of market share and the x86 series weren't compatible with its extended instruction set. x86 was also very memory hoggy compared to the 8-bitters, with the 8-bit bus 8088 also being slower for a similar clock speed. The whole project was really ahead of its time but not competitive with any of the popular 8-bit competitors. But when IBM came along Tandy was using the Z80, and Apple and Commodore the 6502. They didn't want to be also-rans using the same chip as an existing popular computer, and that pretty much left them with Intel because it was so expensive that nobody was using it for much of anything.
On the other hand, the reason x86 was so hoggy was that it was very forward-looking, and the reason you can run a program compiled for the original IBM PC on a modern machine without redeveloping it is that forward thinking. A few generations forward and that compatibility was far more important than any other performance metric.
Not only that the 8088 was, hardware wise, an almost a drop in replacement for the 8085. I remember building a little carrier board for the 8088 that provided whatever tweaks were needed to the pin out, and a new clock arrangement clock, and plugged into our embedded systems boards. I was amazed when it worked! The up shot was that respinning all our boards to use the 8088 was trivial.
The catch was that after that conv86 assembly language translation the resulting binary ran almost exactly half as fast as the 8080 version! It then took a lot of tweaking of the code to use 16 bit arithmetic and such before you got the speed back.
In what way was the x86 forward looking?
As I said above, the Intel 432 was forward looking. So forward they could not build it. The x86 on the other hand was an emergency quick hack created in 10 weeks. Apart from the 1 megabyte addressing capability lashed onto the side I see nothing forward looking in it.
The reason you can run old x86 code on modern machines is that they just kept piling more and more stuff on to it and leaving the old stuff in there. Each layer of cruft being buried under a "mode": long mode, protected mode, real mode, system management mode, unreal mode, virtual 8086 mode. Hmm... must be more than that, they had to add AMD mode to move to 64 bits.
The 8080 and 8086 are not source code compatible. But Intel provided a tool to help with source code translation [edit: The conv86 tool Heater mentions above], around the time when the 8086 was introduced. It was working reasonably well, although not perfectly IIRC.
Problem was that for a lot of instructions that affected the flag register bits ADD, SUB, etc. the flags were not set exactly as an 8080 would set them. So conv86 would fix that by adding a whole bunch of LAHF (Load Status Flags into AH Register) and SAHF (Store AH into Flags) instructions. Together with whatever it was that corrected the flag bits.
The result of all that was the code came out two or three times bigger than it started and ran incredibly slowly.
Turns out that the reason for all this extra flag twiddling was that the 8080 DAA (Decimal Adjust Accumulator) instruction worked a bit differently from the x86 AAA (ASCII Adjust After Addition) instruction. The flags (specifically The Auxiliary carry bit) were tweaked all the time just so that the AAA instruction would work correctly.
Of course, nobody ever used that stupid DAA instruction so Intel provided an option to conv86 that told it not to do all that flag correcting. (That should have been a clue to them that x86 did not need an AAA instruction)
Boom, you got a one to one instruction mapping. Code size and execution time was much reduced. I was amazed when I pushed our first embedded app though conv86, and ran it on my Frankenstein hack of an 8088 board into an 8085 board and it ran first time! It had taken all day to do all the conversion and rebuild the app on three Intel MDS development systems running in parallel.
Oh Smile. I can't remember what I had for lunch yesterday and here I am reminiscing about work I did in the early 1980's. Nurse, more meds...
Oh yes, the Z80 object code contains three times the number of instructions as the 8080 (or is it four). The Z80 added all kind of bit twiddling instructions and such. It's much harder to write an emulator for the Z80. As I found out when my 8080 emulator for the Propeller grew into the Z80 emulator (ZiCog). Luckily in the CP/M world nobody ever used those extra instructions. (Except for the block move instructions and the second register bank sometimes).