...adding the hardware for switching the execution path to every cog would take up much less space than adding a single cpu.
No argument about that.
Only problem is that saving the vanishingly small expense of an extra CPU now a days and moving the complexity to every program and programmer of that device is not a good trade-off in time, energy or money in creating the finished products.
It was a good trade-off back in the days when a CPU was made of tubes, the size of a room and consumed huge amounts of power. It was still a good trade off in the days of transistor CPU's and even early integrated circuits.
It is perhaps still a good trade-off if you want to create millions of tiny devices of miniscule power consumption. That is not the world the Propeller lives in or ever will do.
From XMOS point of view, pre XMOS code is legacy. From a developers point of view, it might be ported code.
A word like legacy needs context.
I really liked the tasking in the hot chip. Time slicing is a great option, and at the least, maybe we will get some basic ability to implement it on this up coming P2.
The instruction by instruction time slicing of threads in the "hot chip" was my suggestion after I had seen Chip had got hardware threading instructions implemented. Yes, I know, others had suggested it before, but I happened to chirp up with the idea at a moment was Chip was receptive to it.
Looking back at that debacle perhaps it would have been better if I had kept shtum about it.
I don't think so. Truth is, a lot of great ideas got prototyped in that chip. The iterations have taken a while, and there are mixed opinions about it all.
But, when I step back and just look at what our likely result is, I see the efforts to date as high value. Chip will center in on a power appropriate design and in the longer term, that's going to be high value and have a long life just as P1 is demonstrating to us.
Personally, I've got time, but I want it. All of us do. In an ideal world, we would have seen the release of the hot chip with clock limitations as we move toward this one. Had we got that and the first one which had the short, there would be three Props out there. IMHO, that's optimal, but not financially possible.
The hot chip @80Mhz would be killer, as would have been the original one that shuttled.
But, here we are. Given the dynamics, I want it robust and right, both of which are very possible given all that has been learned.
As for going dark, I support Parallax doing that. Chip took a long time establishing the baseline for each iteration. That needs to happen this time too. Once we've got the baseline image out there, a dialog about it to refine it makes a lot of sense.
Really, there is nothing much to discuss, until that baseline image is done.
What I hope to see this time is some boundaries released along with that to prevent features running all over the place.
The time slice was a great feature addition! It's easy to use and it made a lot more possible with one COG. If we get a variant of that on this design great! If not, maybe we can get a simple instruction, or some COG option that lets us stop, start, pause, or start up at address, or in HUBEXEC mode. Those would be enough to build a time-slice engine in software and get most of the benefit of the all hardware tasking design in the hot chip.
Just earlier this month, I was researching the Williams arcade games. DEFENDER, ROBOTRON, etc...
Those boards had a 300Hz clock that got used to dispatch processes on the 6809. Interrupts are needed on that chip obviously, but the end result was the ability to break the game into little processes and all of that made very efficient use of a 1Mhz CPU. Go and take a look at those games. No blitter, no sprites. Just a bitmap. Impressive action, and even more impressive ability to manage things on a process level.
With 16 COGS and a planned ~200Mhz clock, this "we need interrupts" discussion is going to be a whole lot less compelling. Right now, P1 gets pushed to do some large / complex tasks relative to it's size. That tends to improve how compelling the "need interrupts" discussion is, IMHO.
Seems to me we have been spoiled. There was so much openness about PII development.
We should not expect it. Most companies are pretty "dark" about whatever they are doing until it's ready to ship. In general companies don't want the world to know all about their failed experiments.
Famous exceptions are Osborne whose announcement of a new model killed the sales of their old one thus killing the company (Or so popular myth has it) and Transmeta that ran obscure and vague advertising of their product until it finally arrived and was seen to be totally useless.
Because of all the interrupt bashing that so often happens on this forum, I feel compelled to respond.
I agree that in many cases, especially with single processor microcontroller chips that have multiple or complex peripherals, proper interrupt handling can be a daunting task. More so if the application software is complex or needs to have low latency. But in those cases the "unwelcome" interrupt is the only practical solution, and hence I agree with Bill that the Popeller can be a more comfortable solution.
The messages from most responders portrays the concept of interrupts as old fashioned, or just plain "bad". And in many cases, perhaps most cases for those on this forum, that sentiment might ring true when their Propeller applications are not complex and have excess cogs available, or do not require low latency or quick response. The typical approach is to "throw a cog at it" and deal with the interrupt that way. Let a WAITXX deal with it.
But this is not always possible, practical or optimum.
Consider the case where one has more threads running than there are cogs available. One is then required to run multiple threads in one cog. And that is very doable, but one can then not have a WAITXX stall execution of the other threads in that cog. And then we require to for some sort of polling to test for the said event to occur. This immediately adds significant overhead to the operation of the cog and takes a huge whack at the sacred determinism that is also mentioned.
In another case, when a dedicated cog is WAITed for a supposed low latency event to occur, just how much time is spent in signalling the cog that needs to process that event ? Certainly more than a SIMPLE interrupt could give. One can't just "hand-over" an event detection from one cog to another; a signal has to be communicated via hub memory, and that again takes code and time, as well as polling of the part of the recipient.
So, I think there is much more to the analysis than "throwing a cog at it", and my take on interrupts is that they are not necessarily evil, provided they are used judiciously and appropriately.
Furthermore, I for one DO wish that each Propeller cog had, as Kwinn puts it, a SIMPLE interrupt process of just a JMPRET triggered by: a pin change, or a counter overflow, and additionally a read/write to a specified Hub address. The latter could completely eliminate any requirement for polling.
Having a simple "interrupt" such as that could seriously shorten the code and reduce latency of my Multi-Threading Scheduler.
Just my personal view of things, and put me down for being all in FOR a SIMPLE interrupt per cog.
I have an endlessly recursive argument as to why you don't want interrupts. Simply stated it is this: Any time somebody says they want an interrupt I say "No what you want is for some code to run when an event happens, preferably as soon as possible and with no impact on whatever else is going on in the machine at the time.". Ergo you want a CPU ready to handle the event.
Consider the case where one has more threads running than there are cogs available.
Yep, that is when you have hit the limit of the technology available. That is only 8 event handlers on the P1, but hey it's a start. The PII will have 16.
As far as I know there is no point in having more threads of execution than events to be handled.
Any time somebody says they want an interrupt I say "No what you want is for some code to run when an event happens, preferably as soon as possible and with no impact on whatever else is going on in the machine at the time.". Ergo you want a CPU ready to handle the event.
I have a 31c MCU here, that has 15 interrupt vectors. If I really wanted/needed to have a CPU ready to handle the event, then I can simply sprinkle more of them into the design.
At 31c, having 8 x 'CPU ready to handle the event' is under $2.50
Of course, most designs can be done with interrupts, without needing a whole CPU assigned to each one.
UARTS and serial are classic examples.
I have a 31c MCU here, that has 15 interrupt vectors.
Very interesting, what is it exactly?
If I really wanted/needed to have a CPU ready to handle the event, then I can simply sprinkle more of them into the design.
At 31c, having 8 x 'CPU ready to handle the event' is under $2.50
I guess that is not really true. "simply sprinkle" means more board space, more design time, a lot of effort, in hardware and software. Soon the costs outweigh the benefits of having it all in a single device. Mind you I think the finished PII should only cost $2.50, if that.
UARTS and serial are classic examples.
Yes they are. But if a char coming in on a UART and causing an interrupt upsets my otherwise carefully timed program I'm going to be mightily upset. On the other hand if the char is dropped because my carefully timed program disallowed it I'm also going to be upset. Give me multiple CPUs to sort this out
Wow, that Silicon Labs EFM8BB10F2G device must be about 2mm on a side. It has 256 bytes of RAM and 2K of FLASH program store. How can one sensibly use 15 levels of interrupt in such a small device?!
One might be able to sprinkle a number of these cheap mpus around to handle all the interrupts. But then how on earth do you get the to communicate each other without considerable time delay. There is no hub to share the results and the total memory is tiny.
BTW I did this with a pair of MC68705P3S many moons ago. One mpu did the proprietary 53kb serial and one did the Centronics interface. I implemented an 8 bit bus plus 2 handshake pins. In the day it saved huge boards of logic, but at the time the mpus were worth $150 ea !!!
There is no way I would like to join 8-16 of them to try to do what the prop can do. In fact most things would be impossible.
The discussion surrounding the multi-tasking P2 HOT chip got me thinking...
I wonder how a 16 task single cog with a single combined hub and cog ram/rom would go. Each task could have it's own 512KB 2KB page of hub to use as cog ram equivalent. This would remove the hub sharing mechanism and the busses required, although the ALU would be considerable more complex.
Just wondering how this would effect power.
Hubexec would be easier to implement. In fact Chi had it working in this hot chip IIRC.
Each task could have it's own 512KB page of hub to use as cog ram equivalent.
? I think you meant 512L ?
The problem I see there, is memory-address muxes - COG ram is 4 port as it needs to read Opcode, up to 2 operands and write to a destination address. So you cannot really make those 'pages' of HUB memory, as they have local address values fully committed.
You could add even more Muxes, so some time-slots allowed other tasks access, but that comes at the cost of speed. (both from the MUX delays and the stolen time )
? I think you meant 512L ?
The problem I see there, is memory-address muxes - COG ram is 4 port as it needs to read Opcode, up to 2 operands and write to a destination address. So you cannot really make those 'pages' of HUB memory, as they have local address values fully committed.
You could add even more Muxes, so some time-slots allowed other tasks access, but that comes at the cost of speed. (both from the MUX delays and the stolen time )
I meant 2KB
Cog is now 2 port whereas the hot chip was 4 port. The hub section would have to be single port. However, with pipelining perhaps the cog section could remain single port.
From where I am thinking, the hub would not have any muxes. The page reference would just be a register which supplies the upper address bits (A9...). So the single core would have a single data bus and a single address bus.
I meant 2KB
From where I am thinking, the hub would not have any muxes. The page reference would just be a register which supplies the upper address bits (A9...). So the single core would have a single data bus and a single address bus.
- but each COG has its own PC, Ra,Rb,Rc values and those need to be fed, via Muxes, to/from what you have as a single Memory plane.
You also need to manage the time of those slots, so the peak speed would be low.
Quite a few hits from moving to a single memory plane, and not many benefits, if the code reach is still 512L
Cog is now 2 port whereas the hot chip was 4 port. The hub section would have to be single port. However, with pipelining perhaps the cog section could remain single port.
Pipelining, extending to superscalar, demands more and more buses for parallel fetching, and writebacks, and therefore a higher number of ports into the register set, and primary cache for the bigger iron.
Hold your horses before condemning Hubexec without even giving it a try.
Call it an interrupt or a hardware initiated thread or an event capture, but adding the hardware for switching the execution path to every cog would take up much less space than adding a single cpu.
Switching the execution path is the part that isn't needed. All the other features are to allow the software to cooperatively check the events pending at it's leisure.
If you desire a separate execution path then use another Cog.
I guess that is not really true. "simply sprinkle" means more board space, more design time, a lot of effort, in hardware and software. Soon the costs outweigh the benefits of having it all in a single device. Mind you I think the finished PII should only cost $2.50, if that.
Yes they are. But if a char coming in on a UART and causing an interrupt upsets my otherwise carefully timed program I'm going to be mightily upset. On the other hand if the char is dropped because my carefully timed program disallowed it I'm also going to be upset. Give me multiple CPUs to sort this out
I seriously doubt that the P2 will only cost $2.50. I would be surprised if it came in below $15.00. As for all the arguments against the "complexity" of using interrupts I have to ask "what complexity?". The proposed interrupt is at the cog level only, and is as simple as using the waitxx instruction. The cog runs thread A (the main thread), an event/interrupt occurs and the cog runs thread B and returns to thread A when done. No fuss, no saving registers, no multiple interrupt levels. The programmer is responsible for getting the code and timing right as is the case now.
I'm beginning to think that the anti interrupt sentiment is more of a phobia than a realistic assessment of the cost versus the benefit of a simple implementation of the idea.
I'm beginning to think that the anti interrupt sentiment...
I often wonder what all the fuss is about. I've been using interrupts longer than I care to think about and I've still got all my hair. Starting with the 6502, and the 8085 with multiple 8259s, all the way up to modern processors, I've never had a problem.
Sure, an interrupt on a single-core processor will upset determinism but that's why you have hardware peripherals.
Does anyone have a real-world example of interrupts causing a problem?
Does anyone have a real-world example of interrupts causing a problem?
Indeed I do:
1) An industrial control system I worked on in the mid 1980's was occasionally going spastic. Turned out that a sufficiently short spike on any interrupt input of the Intel 8059 interrupt controller would always be registered as interrupt 7. Eventually we figured out how to detect that spurious interrupt and ignore it. Later that bug was documented as a "feature" of the 8059 by Intel.
2) Same industrial control system was originally developed with a single processor at the "sharp" end. Turned out to be too slow and juggling all the interrupts was a nightmare. That functionality was then split over two processors with shared memory, at which point I realized interrupts are an ugly kludge to get what you want. If only we had a Propeller at that time!
3) The Husky Hunter 16. A hand held, battery powered, PC compatible, MSDOS running, machine using an Intel compatible "V20" processor. https://www.google.fi/search?q=Husky+Hunter+16&espv=2&biw=1920&bih=975&tbm=isch&tbo=u&source=univ&sa=X&ei=1S2IVfK1EYKRsgHvlZXwDw&ved=0CB4QsAQ. That had such a tortuous system of interrupt handling, dictated by it's need to go to sleep and wake up when anything interesting happened, to save battery power. We found no end of race conditions in that for a military application. The guys getting Ada up and running on that had no end of fun hooking Ada's multi-tasking system into it.
4) The fly-by-wire system of the Indonesian IPTN N-250 turboprop. When I arrived on the project I found that they had gone away from the old deterministic Lucol language to interrupt driven Ada. When I started testing the thing I found it was using between 70 and 90 percent of it's available time budget every 100ms very unpredictably. No one on the dev team could demonstrate that exceeding 100% would never happen with the ensuing total failure of the control system! Luckily that plane was cancelled:)
5) Any time I have my code, with some time critical processing, and I want to mix in your code, which also has some time critical processing. On a single processor with interrupts it may well not work as there is simply not enough performance. Or I have to analyse all this code and figure out what interrupts to hook it to and what priorities they should have.
6) The mere fact that I have to think about any of that is a pain before we start. I just want to throw code in there and have it work!
I worked on a militarized real time signal processing system with multiple prioritized interrupts and a super fast processor (for the time). The interrupting signals were asynchronous and arrived often enough too close together such that even the minimal processing done in the interrupt routines was enough to render certain parts of later data invalid. The solution - multiple processors. SuperProp anyone?
I have to ask "what complexity?". The proposed interrupt is at the cog level only, and is as simple as using the waitxx instruction. The cog runs thread A (the main thread), an event/interrupt occurs and the cog runs thread B and returns to thread A when done. No fuss, no saving registers, no multiple interrupt levels.
OK, at this point I have to "come out" and say that in some dark, dusty and rarely visited, corner of my mind I agree with you. It would be very nice if a COG could WAITxxx on more than one thing at a time and when that event happens run whatever code is assigned to it to completion. Although what you are describing there is not an interrupt system as generally understood.
I look at it like this:
0) First of all let's forget about "threads". Threads generally means endless loops. That means you have to "interrupt" the loop when something interesting happens, remember where you were, handle the interesting thing, the return to the right place in the loop. Blech,
1) A COG need never do anything unless it has some input. If it could be stuck on some WAITxxx until some event happens that would be great. Low power and all that.
2) If we could WAITxxx on more than one thing at a time that might make all kinds of code simpler and more efficient.
Let's take FullDuplexSerial as a canonical example, what are the events we are interested in?
I) The application gives us a new byte to transmit. That should wake us up and point us to some code that adds the new byte to the Tx FIFO. Or wake us up and say there is a new byte on the end of the Tx FIFO to transmit. Immediately we have a problem with the Propeller as there is no such signal we can WAIT on so we have to poll for new Tx bytes all the time.
II) An incoming edge arrives on the Rx pin. That should wake us up so we can deal with the incoming bits and bytes.
III) Some point in time arrives. That should wake us up so that we can transmit the next bit. Or perhaps do some more sampling on the incoming Rx data.
All of these bits of code would basically get triggered by an event, run to completion and then HALT. As you say, never mind interrupting or priorities. We have an event driven system.
Now, the question is: How much faster or more efficient would doing all this be over what we have now? Is it worth it? Clearly a halted processor waiting on events is much more power efficient than having to poll everything as we do now.
I wish there were a permanent web page some place that laid out all the reasons why the Propeller does not have interrupts and why it's a bad idea to add them.
Savage///Circuits is in a very sad state right now and none of my projects or articles are currently up, however I have been asked many times over the years by various people, including some engineers we used to have as guests in our weekly IRC chat, Why Doesn't The Propeller Chip Have Interrupts? So I posted the following info post on my site. It's my canned response to the question.
I have heard this question and variations of it from many people (including some pretty notable/respected engineers) who have yet to try the Propeller Chip because they just can't accept that it doesn't have interrupts. And just as they are baffled that it doesn't, I am equally as baffled as to why you would want them in the first place!
Let's take a look at interrupts from the perspective of me getting started with the 6502 and the Z80 back in the 80s and 90s. These CPUs had both non-maskable interrupts (NMI) as well as a standard interrupt request pin (IRQ). The Z80 additionally had a feature called vectored interrupts which allowed multiple devices to share a single IRQ pin by supplying a vector address for individual handlers for each device. This was a very powerful feature of the Z80 and one I made use of for a short time.
Back then interrupts were required because the CPU did not have the resources to keep track of so many tasks at the same time. Instead when a device, such as a UART chip required service, it would activate the IRQ line and the CPU would stop what it was doing to process and handle the interrupt request. Once completed the CPU would RETI (return from interrupt) and continue execution of the main code. Think about this. We just interrupted the execution of the main code to run some other code to handle that piece of hardware. This usually meant pushing the registers onto the stack and the popping them off on return so that the data in the registers was not lost when the IRQ code made use of those registers.
By its very nature an interrupt is doing just that, interrupting the main code loop to run some other code and then picking up where it left off when it is finished with that other code. But what if your main code is doing some critical task? What happens if the main loop requires precise timing? Well back then the only solution was to write the IRQ routines so that they returned to the main code in a specific period of time (x clock cycles). This was often very difficult if the IRQ routine had conditional branches changing the number of CPU cycles before a return. Programmers would sometimes have to pad routines with NOP instructions to eat cycles to make sure no matter how the code branched it was deterministic. But of course it wasn't efficient and was often tedious.
The Propeller does away with interrupts by having multiple cores. Routines that used to require interrupts now run in their own core. So instead of the CPU executing different pieces of code for short periods of time, each core executes its code without interruption allowing for very deterministic code and no interruption of timing critical routines. Let's face it, modern computers are running so many services, tasks and applications that even they require multi-core processors such as the Intel i7 to be able to handle modern computing jobs. The Propeller has set that standard for microcontrollers to move past the archaic system of being interrupted by tasks to perform other tasks and frees you simply run your code, without interruption and not have to worry about how many CPU cycles concurrent tasks are taking.
This allows for some really nice capabilities such as being able to generate broadcast TV, NTSC Composite and VGA video all while your main code is free to do what it wants, free of being interrupted by the video driver to refresh the display. There are objects for the Propeller that allow you to run 4 serial UARTs with FIFO buffers and support flow control. Other objects handle various sensors and interface chips so that the programmer does not have to deal with the specifics of a particular IC if they don't want to. And while other platforms have that capability, they can still have an effect on the timing of your main code loop.
So really the Propeller allows you to be free to program without the constraints of interrupt driven systems. It frees you from having to count CPU cycles and do a bunch of math just to make routines work together. Those that would discount the Propeller Chip because it doesn't have a traditional architecture are limiting themselves and staying in the realm of old-school thinking instead of being forward thinking and shedding architectural constraints. Quite simply, you don't know what you're missing.
OK, at this point I have to "come out" and say that in some dark, dusty and rarely visited, corner of my mind I agree with you. It would be very nice if a COG could WAITxxx on more than one thing at a time and when that event happens run whatever code is assigned to it to completion. Although what you are describing there is not an interrupt system as generally understood................................
Ah, progress. You're right, it's not your typical interrupt system so lets call it an event switch, which is actually a better description. You do realize that there is already a multi state wait instruction. Waitpeq and waitpne already look at the state of multiple pins. How hard would it be to collect the status bits from the other waits into a register that you could do a similar wait on?
I'm not suggesting something as complicated as what you describe in post 56, although that would be nice.Just a simple “when event X occurs run code B, otherwise run code A”. Code B would clear the event to return control to code A. This could be done with a hardware version of jmpret or a secondary program counter. In both cases a second flag register would be needed.
Ah, progress. You're right, it's not your typical interrupt system so lets call it an event switch, which is actually a better description. ....
I'm not suggesting something as complicated as what you describe in post 56, although that would be nice.Just a simple “when event X occurs run code B, otherwise run code A”. Code B would clear the event to return control to code A. This could be done with a hardware version of jmpret or a secondary program counter. In both cases a second flag register would be needed.
The P2 will add (I believe) Edge versions of WAITxx, which become a natural place to look to add an (address + queue block), to turn it into an event switch.
It could be managed as a paired opcode like
qWAITxx ; Queued version of Wait _/= or =\_, pairs with next 32b
Address_Count ; Address to vector to, with possible count qualifier for WAIT condition
In Init / execute the above two opcodes appear to skip over and other code runs normally, until the (now armed) trigger condition is met.
Then, the address (subject to possible count ?) is applied in the HW JMPRET
Because this does not consume a classic fixed Interrupt vector address, there is no firm limit on only one of these.
It comes down to how many of these nano-state engines would sensibly fit 1? 2? 4 ? ?
Sorry Heater. - but the pedant in me couldn't be restrained (and I am at work, bored, watching a generator chuck out some unfriendly stuff - in the name of a vital test)
"We will have 16 COGs in the PII. That is equivalent to 16 interrupt handlers. More than enough I believe"
Isn't that 15 interrupt handlers? something had to be running first --- Right, time to get back under my rock ...
Comments
Only problem is that saving the vanishingly small expense of an extra CPU now a days and moving the complexity to every program and programmer of that device is not a good trade-off in time, energy or money in creating the finished products.
It was a good trade-off back in the days when a CPU was made of tubes, the size of a room and consumed huge amounts of power. It was still a good trade off in the days of transistor CPU's and even early integrated circuits.
It is perhaps still a good trade-off if you want to create millions of tiny devices of miniscule power consumption. That is not the world the Propeller lives in or ever will do.
From XMOS point of view, pre XMOS code is legacy. From a developers point of view, it might be ported code.
A word like legacy needs context.
I really liked the tasking in the hot chip. Time slicing is a great option, and at the least, maybe we will get some basic ability to implement it on this up coming P2.
Looking back at that debacle perhaps it would have been better if I had kept shtum about it.
But, when I step back and just look at what our likely result is, I see the efforts to date as high value. Chip will center in on a power appropriate design and in the longer term, that's going to be high value and have a long life just as P1 is demonstrating to us.
Personally, I've got time, but I want it. All of us do. In an ideal world, we would have seen the release of the hot chip with clock limitations as we move toward this one. Had we got that and the first one which had the short, there would be three Props out there. IMHO, that's optimal, but not financially possible.
The hot chip @80Mhz would be killer, as would have been the original one that shuttled.
But, here we are. Given the dynamics, I want it robust and right, both of which are very possible given all that has been learned.
As for going dark, I support Parallax doing that. Chip took a long time establishing the baseline for each iteration. That needs to happen this time too. Once we've got the baseline image out there, a dialog about it to refine it makes a lot of sense.
Really, there is nothing much to discuss, until that baseline image is done.
What I hope to see this time is some boundaries released along with that to prevent features running all over the place.
The time slice was a great feature addition! It's easy to use and it made a lot more possible with one COG. If we get a variant of that on this design great! If not, maybe we can get a simple instruction, or some COG option that lets us stop, start, pause, or start up at address, or in HUBEXEC mode. Those would be enough to build a time-slice engine in software and get most of the benefit of the all hardware tasking design in the hot chip.
Just earlier this month, I was researching the Williams arcade games. DEFENDER, ROBOTRON, etc...
Those boards had a 300Hz clock that got used to dispatch processes on the 6809. Interrupts are needed on that chip obviously, but the end result was the ability to break the game into little processes and all of that made very efficient use of a 1Mhz CPU. Go and take a look at those games. No blitter, no sprites. Just a bitmap. Impressive action, and even more impressive ability to manage things on a process level.
With 16 COGS and a planned ~200Mhz clock, this "we need interrupts" discussion is going to be a whole lot less compelling. Right now, P1 gets pushed to do some large / complex tasks relative to it's size. That tends to improve how compelling the "need interrupts" discussion is, IMHO.
We should not expect it. Most companies are pretty "dark" about whatever they are doing until it's ready to ship. In general companies don't want the world to know all about their failed experiments.
Famous exceptions are Osborne whose announcement of a new model killed the sales of their old one thus killing the company (Or so popular myth has it) and Transmeta that ran obscure and vague advertising of their product until it finally arrived and was seen to be totally useless.
I agree that in many cases, especially with single processor microcontroller chips that have multiple or complex peripherals, proper interrupt handling can be a daunting task. More so if the application software is complex or needs to have low latency. But in those cases the "unwelcome" interrupt is the only practical solution, and hence I agree with Bill that the Popeller can be a more comfortable solution.
The messages from most responders portrays the concept of interrupts as old fashioned, or just plain "bad". And in many cases, perhaps most cases for those on this forum, that sentiment might ring true when their Propeller applications are not complex and have excess cogs available, or do not require low latency or quick response. The typical approach is to "throw a cog at it" and deal with the interrupt that way. Let a WAITXX deal with it.
But this is not always possible, practical or optimum.
Consider the case where one has more threads running than there are cogs available. One is then required to run multiple threads in one cog. And that is very doable, but one can then not have a WAITXX stall execution of the other threads in that cog. And then we require to for some sort of polling to test for the said event to occur. This immediately adds significant overhead to the operation of the cog and takes a huge whack at the sacred determinism that is also mentioned.
In another case, when a dedicated cog is WAITed for a supposed low latency event to occur, just how much time is spent in signalling the cog that needs to process that event ? Certainly more than a SIMPLE interrupt could give. One can't just "hand-over" an event detection from one cog to another; a signal has to be communicated via hub memory, and that again takes code and time, as well as polling of the part of the recipient.
So, I think there is much more to the analysis than "throwing a cog at it", and my take on interrupts is that they are not necessarily evil, provided they are used judiciously and appropriately.
Furthermore, I for one DO wish that each Propeller cog had, as Kwinn puts it, a SIMPLE interrupt process of just a JMPRET triggered by: a pin change, or a counter overflow, and additionally a read/write to a specified Hub address. The latter could completely eliminate any requirement for polling.
Having a simple "interrupt" such as that could seriously shorten the code and reduce latency of my Multi-Threading Scheduler.
Just my personal view of things, and put me down for being all in FOR a SIMPLE interrupt per cog.
Cheers,
Peter, (pjv)
I have an endlessly recursive argument as to why you don't want interrupts. Simply stated it is this: Any time somebody says they want an interrupt I say "No what you want is for some code to run when an event happens, preferably as soon as possible and with no impact on whatever else is going on in the machine at the time.". Ergo you want a CPU ready to handle the event. Yep, that is when you have hit the limit of the technology available. That is only 8 event handlers on the P1, but hey it's a start. The PII will have 16.
As far as I know there is no point in having more threads of execution than events to be handled.
I have a 31c MCU here, that has 15 interrupt vectors. If I really wanted/needed to have a CPU ready to handle the event, then I can simply sprinkle more of them into the design.
At 31c, having 8 x 'CPU ready to handle the event' is under $2.50
Of course, most designs can be done with interrupts, without needing a whole CPU assigned to each one.
UARTS and serial are classic examples.
We can but dream....
Wow, that Silicon Labs EFM8BB10F2G device must be about 2mm on a side. It has 256 bytes of RAM and 2K of FLASH program store. How can one sensibly use 15 levels of interrupt in such a small device?!
I'll admit I've never used all 15 interrupt vectors at once - 4-6 is a more typical active interrupts use.
BTW I did this with a pair of MC68705P3S many moons ago. One mpu did the proprietary 53kb serial and one did the Centronics interface. I implemented an 8 bit bus plus 2 handshake pins. In the day it saved huge boards of logic, but at the time the mpus were worth $150 ea !!!
There is no way I would like to join 8-16 of them to try to do what the prop can do. In fact most things would be impossible.
I wonder how a 16 task single cog with a single combined hub and cog ram/rom would go. Each task could have it's own 512KB 2KB page of hub to use as cog ram equivalent. This would remove the hub sharing mechanism and the busses required, although the ALU would be considerable more complex.
Just wondering how this would effect power.
Hubexec would be easier to implement. In fact Chi had it working in this hot chip IIRC.
The problem I see there, is memory-address muxes - COG ram is 4 port as it needs to read Opcode, up to 2 operands and write to a destination address. So you cannot really make those 'pages' of HUB memory, as they have local address values fully committed.
You could add even more Muxes, so some time-slots allowed other tasks access, but that comes at the cost of speed. (both from the MUX delays and the stolen time )
Cog is now 2 port whereas the hot chip was 4 port. The hub section would have to be single port. However, with pipelining perhaps the cog section could remain single port.
From where I am thinking, the hub would not have any muxes. The page reference would just be a register which supplies the upper address bits (A9...). So the single core would have a single data bus and a single address bus.
You also need to manage the time of those slots, so the peak speed would be low.
Quite a few hits from moving to a single memory plane, and not many benefits, if the code reach is still 512L
Pipelining, extending to superscalar, demands more and more buses for parallel fetching, and writebacks, and therefore a higher number of ports into the register set, and primary cache for the bigger iron.
Hold your horses before condemning Hubexec without even giving it a try.
Switching the execution path is the part that isn't needed. All the other features are to allow the software to cooperatively check the events pending at it's leisure.
If you desire a separate execution path then use another Cog.
I seriously doubt that the P2 will only cost $2.50. I would be surprised if it came in below $15.00. As for all the arguments against the "complexity" of using interrupts I have to ask "what complexity?". The proposed interrupt is at the cog level only, and is as simple as using the waitxx instruction. The cog runs thread A (the main thread), an event/interrupt occurs and the cog runs thread B and returns to thread A when done. No fuss, no saving registers, no multiple interrupt levels. The programmer is responsible for getting the code and timing right as is the case now.
I'm beginning to think that the anti interrupt sentiment is more of a phobia than a realistic assessment of the cost versus the benefit of a simple implementation of the idea.
I often wonder what all the fuss is about. I've been using interrupts longer than I care to think about and I've still got all my hair. Starting with the 6502, and the 8085 with multiple 8259s, all the way up to modern processors, I've never had a problem.
Sure, an interrupt on a single-core processor will upset determinism but that's why you have hardware peripherals.
Does anyone have a real-world example of interrupts causing a problem?
1) An industrial control system I worked on in the mid 1980's was occasionally going spastic. Turned out that a sufficiently short spike on any interrupt input of the Intel 8059 interrupt controller would always be registered as interrupt 7. Eventually we figured out how to detect that spurious interrupt and ignore it. Later that bug was documented as a "feature" of the 8059 by Intel.
2) Same industrial control system was originally developed with a single processor at the "sharp" end. Turned out to be too slow and juggling all the interrupts was a nightmare. That functionality was then split over two processors with shared memory, at which point I realized interrupts are an ugly kludge to get what you want. If only we had a Propeller at that time!
3) The Husky Hunter 16. A hand held, battery powered, PC compatible, MSDOS running, machine using an Intel compatible "V20" processor. https://www.google.fi/search?q=Husky+Hunter+16&espv=2&biw=1920&bih=975&tbm=isch&tbo=u&source=univ&sa=X&ei=1S2IVfK1EYKRsgHvlZXwDw&ved=0CB4QsAQ. That had such a tortuous system of interrupt handling, dictated by it's need to go to sleep and wake up when anything interesting happened, to save battery power. We found no end of race conditions in that for a military application. The guys getting Ada up and running on that had no end of fun hooking Ada's multi-tasking system into it.
4) The fly-by-wire system of the Indonesian IPTN N-250 turboprop. When I arrived on the project I found that they had gone away from the old deterministic Lucol language to interrupt driven Ada. When I started testing the thing I found it was using between 70 and 90 percent of it's available time budget every 100ms very unpredictably. No one on the dev team could demonstrate that exceeding 100% would never happen with the ensuing total failure of the control system! Luckily that plane was cancelled:)
5) Any time I have my code, with some time critical processing, and I want to mix in your code, which also has some time critical processing. On a single processor with interrupts it may well not work as there is simply not enough performance. Or I have to analyse all this code and figure out what interrupts to hook it to and what priorities they should have.
6) The mere fact that I have to think about any of that is a pain before we start. I just want to throw code in there and have it work!
I look at it like this:
0) First of all let's forget about "threads". Threads generally means endless loops. That means you have to "interrupt" the loop when something interesting happens, remember where you were, handle the interesting thing, the return to the right place in the loop. Blech,
1) A COG need never do anything unless it has some input. If it could be stuck on some WAITxxx until some event happens that would be great. Low power and all that.
2) If we could WAITxxx on more than one thing at a time that might make all kinds of code simpler and more efficient.
Let's take FullDuplexSerial as a canonical example, what are the events we are interested in?
I) The application gives us a new byte to transmit. That should wake us up and point us to some code that adds the new byte to the Tx FIFO. Or wake us up and say there is a new byte on the end of the Tx FIFO to transmit. Immediately we have a problem with the Propeller as there is no such signal we can WAIT on so we have to poll for new Tx bytes all the time.
II) An incoming edge arrives on the Rx pin. That should wake us up so we can deal with the incoming bits and bytes.
III) Some point in time arrives. That should wake us up so that we can transmit the next bit. Or perhaps do some more sampling on the incoming Rx data.
All of these bits of code would basically get triggered by an event, run to completion and then HALT. As you say, never mind interrupting or priorities. We have an event driven system.
Now, the question is: How much faster or more efficient would doing all this be over what we have now? Is it worth it? Clearly a halted processor waiting on events is much more power efficient than having to poll everything as we do now.
Savage///Circuits is in a very sad state right now and none of my projects or articles are currently up, however I have been asked many times over the years by various people, including some engineers we used to have as guests in our weekly IRC chat, Why Doesn't The Propeller Chip Have Interrupts? So I posted the following info post on my site. It's my canned response to the question.
Ah, progress. You're right, it's not your typical interrupt system so lets call it an event switch, which is actually a better description. You do realize that there is already a multi state wait instruction. Waitpeq and waitpne already look at the state of multiple pins. How hard would it be to collect the status bits from the other waits into a register that you could do a similar wait on?
I'm not suggesting something as complicated as what you describe in post 56, although that would be nice.Just a simple “when event X occurs run code B, otherwise run code A”. Code B would clear the event to return control to code A. This could be done with a hardware version of jmpret or a secondary program counter. In both cases a second flag register would be needed.
An excellent, but short take, on why we don't need interrupts.
The P2 will add (I believe) Edge versions of WAITxx, which become a natural place to look to add an (address + queue block), to turn it into an event switch.
It could be managed as a paired opcode like
qWAITxx ; Queued version of Wait _/= or =\_, pairs with next 32b
Address_Count ; Address to vector to, with possible count qualifier for WAIT condition
In Init / execute the above two opcodes appear to skip over and other code runs normally, until the (now armed) trigger condition is met.
Then, the address (subject to possible count ?) is applied in the HW JMPRET
Because this does not consume a classic fixed Interrupt vector address, there is no firm limit on only one of these.
It comes down to how many of these nano-state engines would sensibly fit 1? 2? 4 ? ?
"We will have 16 COGs in the PII. That is equivalent to 16 interrupt handlers. More than enough I believe"
Isn't that 15 interrupt handlers? something had to be running first --- Right, time to get back under my rock ...
Alan