Linux/Minix on P2?
m00tykins
Posts: 73
Hello everyone!
So I came across the Propeller 1 MCU after looking up open-source processors, and congratulations to Parallax on making the Propeller one of the only open-source processors in production! Also, since the P2 is already available for the FPGA, I'm assuming the production version will be fully open-source as well? If so, then perhaps the Propeller is exactly the processor I need... Basically I am trying to make a completely open-source computer with networking that will be able to run a simple graphical web browser. But I have a few questions first:
First of all, and most importantly: Will the P2 be completely open-source hardware? If not I may need to use the P1...
Secondly, I know this has been discussed before as infeasible for the P1, but since we have gcc working for the Propeller, would Linux/Minix be able to run on the P2? I bring up Minix specifically because it is a very small OS (~10kloc for the kernel) and therefore may be better suited to the P2's memory constraints, which IIRC were the main reason why porting Linux to the P1 was considered impractical.
Thank you very much for any help!
So I came across the Propeller 1 MCU after looking up open-source processors, and congratulations to Parallax on making the Propeller one of the only open-source processors in production! Also, since the P2 is already available for the FPGA, I'm assuming the production version will be fully open-source as well? If so, then perhaps the Propeller is exactly the processor I need... Basically I am trying to make a completely open-source computer with networking that will be able to run a simple graphical web browser. But I have a few questions first:
First of all, and most importantly: Will the P2 be completely open-source hardware? If not I may need to use the P1...
Secondly, I know this has been discussed before as infeasible for the P1, but since we have gcc working for the Propeller, would Linux/Minix be able to run on the P2? I bring up Minix specifically because it is a very small OS (~10kloc for the kernel) and therefore may be better suited to the P2's memory constraints, which IIRC were the main reason why porting Linux to the P1 was considered impractical.
Thank you very much for any help!
Comments
Welcome to the forum.
There are no plans to open source the P2 design.
I feel the P2 would be a totally unsuitable machine for an OS like Linux or Minix. Despite the increase in speed over the P1 it will still a very poor performer for such an OS compared to even the slowest ARM processors now a days. The main issue here is memory. The 512K on board RAM is not sufficient and adding external RAM will make the thing impossibly slow. Too slow for running a modern browser.
The P2 is an MCU it lacks a lot of features a Linux OS expects: Interrupts, virtual memory, tasks and protected memory spaces. Etc etc.
If you want an Open Source processor design suitable for running Linux I would suggest looking at the OpenRISC design: [ur]http://opencores.org/or1k/Main_Page[/url]
As Heater said, no one has heard of any plans to open source the P2.
Regarding Linux...
If the P1 Verilog was modified as follows, Linux would be feasible:
- external memory interface mapped into the hub space
- a moderate ammount of code/data cache for the external memory interface
- an MMU for address mapping
- hubexec
- modifications to the current GCC toolchain to support the above
However it would be far simpler to use an existing open core that already had Linux ported to it, perhaps slapping a Propeller on the side for hard real time I/O (as much discussed in these forums by Heater and others)
So, in theory, Minix could be ported to P2 with the limitations I mentioned. With 16 cogs you can run 16 simultaneous processes and/or drivers. If hub RAM is divided evenly among the 16 cogs, this would be 32K per cog. I haven't looked at the Minix utilities, but I suspect many of them could be made to run within 32K. Some processes and drivers would required less than 32K, which means that other processes could use more than 32K.
cseg - a code base register, added to all hubexec code addresses before referencing the hub
dseg - a data base register, added to all non-code hub data addresses reads/writes before referencing the hub
and a way to turn the cseg/dseg off for accessing common memory (this could be done by setting cseg/dseg to zero)
It would be nice to have limit registers (climit, dlimit) but it would require an interrupt-like mechanism to handle address exceptions.
A separate stack segment would also be nice, but is not absolutely necessary, and it would make it "interesting" to decide if a data reference should use dseg or sseg. It is simpler to have a dseg big enough for data + stack, and have the stack grow down.
(I know, I am using old, evil, x86 segment register names - but they are easy to remember)
Note, the above does not get you virtual memory, external memory addressing etc, but would allow using the full 512KB of a P2 with hardware relocation - thus Minix, or any small Unix.
With careful management, it would even allow for whole process swapping to external memory, uSD etc... but watch out for memory fragmentation.
Note that the additional adders and logic would reduce the fMax.
I am planning on adding CSEG/DSEG to the Propeller 1 Verilog
I am even tempted to make the segment addresses be quad long aligned (just like the x86) as it would make potential x86 emulators faster.
I will use the 'WC' flag on RDxxxx/WRxxxx to choose between DSEG and SSEG
Two new instructions:
SETSEG n,reg
GETSEG n,reg
where n is:
0 for CSEG
1 for DSEG
2 for SSEG
3 for ESEG reserved, not implemented for now
(hey, if I will use x86 segment register names, I may as well do it right)
Segment registers will be initialized to $0 on startup, so unless modified, no relocation takes place.
to read using DSEG, use RDLONG dest, src
to read using SSEG, use RDLONG dest, src wc
to write using DSEG, use WRLONG dest, src
to write using SSEG, use WRLONG dest, src wc
hubexec will *ALWAYS* use CSEG, but as it is initialized to 0, it is transparent
When launching a cog, first few instructions should set up CSEG/DSEG/SSEG
In the future, limit registers can be added, using SETSEG/GETSEG n+4 will refer to the limit register associated with segment register n
No criticism of segmentation please - it is a nice simple cheap way of getting almost free data/code relocation, without the significant resources and changes an MMU would need.
Later, I may modify coginit to set up the segment registers, in that fashion multiple relocation-unaware P1 style code would port easily.
More seriously what is the motivation for this idea? Does an MCU need relocatable code/data? Does it help C or Spin?
Don't poke your eyes out!!!!
No, not a 286 ... and I think I hate segmentation as much as you do, but it fits my motivation...
Relocatable code and data help all compiled code immensely as you can compile all code/data assuming a clean slate for memory, starting at location 0, and load it anywhere - without having to fix up the addresses in the code and data!
This means that old P1 code would work unchanged on the P1V (Propeller 1 Verilog implementation) regadless of where in a large hub it was loaded.
New, segment aware code could use a separate segment for a display frame, leaving a 64KB code / 64KB application space to Spin / C / whatever.
You could load several segment aware images, that could exchange data through the stack segment pointer (maybe better called spare segment register).
I figured out how to do limited memory protection cheaply as well - I'll post in in my ESR thread on the P1V forum (ESR=Evil Segment Registers)
http://forums.parallax.com/showthread.php/156835-I-ve-decided-on-what-I-want-to-try-first-with-the-P1-Verilog-%28Evil-Segment-Registers%29
All these benefits come at a low transistor count, with a relatively low impact on fMax
Besides, once I get this running, I may tackle a proper MMU - which is a lot harder to justify.
So yes, I think the Prop can make great use of segmentation (even an MMU) but I don't think I can justify the transistor budget, lower fMax, and the need for traps or interrupts that an MMU would need - at least not at this time.
Thanks to Parallax releasing the Verilog, this is a small enough change for me to try as a first experiment with P1V, and promises to be an interesting experiment.
The hardware segmentation would allow using a large hub with unmodified code - think running a large Spin app, an LMM ap, and a ZiCog at the same time - unmodified.
A "real" MMU would be much nicer, but needs many more transistors, and would impact fMax significantly - so this is the cheapest way to achieve my aims.
I may do a real MMU after this experiment works.
Basically, I want it for my experiments - and thanks to the Verilog release, I can try it easily enough!
Looks like the DE2-115 will allow a very large hub, and many more cogs. I plan to have a lot of fun
Unfortunately though, the fact that the P2 won't be open-source is a deal breaker for me. I don't want to just have fun working on lower-level hardware than I'm used to, but also end up with something that might be truly brag-worthy: A completely open-source computer down to the transistor. If I wanted to use a closed-source processor I might as well just get an ARM.
So, I guess I'm back to using the P1. However, I just found this very interesting port of the Minix 2 OS to a homemade CPU running at 4MHz with 4MB of RAM: http://www.homebrewcpu.com/. According to the website it supports a TCP/IP stack as well, meaning if something like the MAGIC-1 computer, made out of TTL, can do this the P1 certainly could, right?
So basically now my question is: If the MAGIC-1 TTL CPU can run Minix, is there any real reason why the vastly more powerful P1 can't? Thanks again!
With an appropriate version of LMM, and external memory, and a modified compiler, it would work - but you'd only get ~4MIPS.
I actually designed my Morpheus board, with the Mem+ memory add-on for something like that... but did not have time to port a compiler, and finish my Largos OS.
Why would I need a modified compiler? The Catalina C Compiler supports LMM and is based on LCC, which is the same compiler used in the MAGIC-1 project... So what would be left to change? I could be wrong, but it sounds like all I would need to do is get started writing the C/PASM code assuming the board I use supports external memory, right?
Seems to me that relocatable code is great for operating systems and such were code is compiled into executables and libraries or modules that are then run or loaded dynamically at run time. Here we have tons of binary "blobs" that can be loaded and discarded and moved around at will by the OS.
Why do we need that in an MCU? In MCU land generally everything is located at compile time. The whole "blob" is loaded at boot time. All it's parts live at fixed locations and there is no requirement to move them around.
What am I missing here?
Hi!
Agreed.
We don't NEED it.
However, I believe it would be useful, especially on a multi-core device (memory protection) and dynamically loaded drivers & libraries would allow upgrading / slipstreaming fixes.
It is also "cleaner" in some cases to break down a large complex app into multiple concurrent apps.
With the P1V, I get to test my ideas, and see if they are useful. I get to learn more about Verilog, and the Propellers guts.
I think what we have here is a philosophical difference... you are thinking (to some extent) typical microcontroller usage, as currently in fashion, where case ESR is definitely not need.
I am trying to think outside of the box, and trying to see how I can squeeze in such functionality with minimal impact on gate count and fMax. Thus the 'ESR''s.
Think of this scenario (granted, not common microcontroller usage, but fun to play with)
Cog#1: Zog
Cog#2: ZiCog, with 64KB for CP/M
Cog#3-6: Spin interpreter, running existing app
Cog#7: spare
Cog#8: monitor
With ESR, Zog, ZiCog, and Spin 4-cog app, all think they have the prop to themselves, all with their own 64KB private area in the (say 384KB) hub.
I don't know about "philosophical" difference, I was just curious about the motivation.
I recall reading the blurb from Motororla when the 6809 was new, they made a big thing of how it enabled position independent code and how great this would be as it enabled software vendors to ship closed source functionality as binary blobs that can be used from your application.
Multiple concurrent apps don't actually need to be position independent by virtue of being concurrent. Certainly it will help if they are really "apps", separately compiled programs, perhaps in different languages, that may get loaded and unloaded at will.
My philosophy here is that if you have an itch go right ahead and scratch it. No matter if the result has no obvious use. That's why we have such things as ZiCog:)
So, full steam ahead on that then. It's certainly a way to get into Verilog and the P1 design itself.
Exactly. As you pointed out, it is really generally mostly used to support operating systems - but in this case, would also allow supporting several legacy P1 apps simultaneously that assumed taking over the whole chip at once.
Yep - that would be my biggest motivation - getting into Verilog an P1V design
And I think this is a nice (small) sized chunk for my first attempt to modify the P1V for some new functionality.
Of course as long as you are using an FPGA you are still many layers of abstraction away from your design to the actual transistors.
Those levels are still very closed source. The Verilog compiler, synthesizer, fitter, router...I don't know what they have in the tool chain. Then there is the actual design of the logic blocks of the FPGA that is used to implement all this.
Having the source code to an Open Source processor has not helped you very much in your goal. Unless you can create those tools and the logic block hardware. Or translate the Verilog into a schematic manually and build the thing out of actual transistors and wires!
It's kind of tempting to make up some of these blocks in discrete logic chips. If retirement were closer...
It seems like a lean design, and is neatly split into blocks like CTRA, CTRB, VID, ALU etc. It would be possible to make up one of these at a time, while leaving the remainder in the FPGA
Yes, I am familiar with how closed-source FPGAs actually are. That's why I was hoping to use the production variant of the propeller, and trust parallax that they didn't add any surprises to the production silicon :P
I've not seen an announcement one way or the other, anyone seen anything official?
Maybe they're in a holding pattern until that design is ready to fab?
WIth shuttle-runs costing hundreds of thousands of dollars, it may make some sense to hold off on opening the P2 sources until after the design has been proven with an ASIC run.
You wouldn't want to let loose hundreds or thousands of open-source developers (who probably don't have the 100k+ capital to shuttle) to fork in all directions until we're certain that the foundational code we start with is solid in ASIC production.
Let's be honest, the real purpose of an FPGA is a stepping-stone to real hardware. How cool will it be to, in a few years be able to hold up a tqfp or bga and say "a couple 'o thousand of those transistors were mine".
I'm so excited.
Hang on in there m00tykins, Parallax has a history of doing the right thing for their community.
You'll never meet a more open and supportive hardware vendor in your life.
Red
"It will probably be open-sourced, maybe even during development. We'll see how it goes."
So it seems you're right Red, Parallax not only has in the past, but is continuing to support the O/S community. TY Parallax!
P.S Chip: I hope you don't mind me blurting that out, you didn't say to keep it secret so I thought I'd share the good news.
For me it's difficult to know exactly what "completely open source" means with hardware. I guess you would need to define what your requirements. Just quickly brainstorming - does the OpenRisc source include everything needed to produce the ASIC? Was test logic added to OpenRisc that's not in the publicly available RTL? Is the package design available? Production test vectors? The net list and sdf? The libraries? The physical design? The masks? Does it require the use of closed source tools to produce?
But anyways … and eventually http://www.lowrisc.org which Heater will like because of http://riscv.org/download.html#tab_angel ;-)
For lowrisc it might be at a good stage to explore for those interested. I don't think that it hurts to mention it here, because it's solving a much different problem than the propeller ever intends to solve as far as I can tell.
Note that there are commercial non-open derivatives of OpenRisc. I've used one of them. Wikipedia lists several.
Edited to add: Hmm - lowrisc appears to be using the Chisel HDL https://chisel.eecs.berkeley.edu
For me it means 1) free layout software, 2) free standard cell libraries, and 3) support of that SW and Cells at current foundries, so anyone can move from HDL to GDSII.
Yes, just dreaming. This was the closest thing: http://www.vlsitechnology.org/
However the code is there and it's open source. Do with it as you like.
Similarly one could imagine open source software that has no open source compiler or run time for the language it is written in, comes with no unit tests, or such niceties. In the extreme there may not even be a compiler for the language it's written in. Still open source is it not?
As it stands now we have open source operating systems and applications, Everything is open all the way down to the CPU it runs on and of course all the pesky peripherals with their very closed source firmware and such like.
These opensource hardware (HDL) efforts are at least pushing the boundary between what is open and what is closed lower and lower towards the transistors.
Thanks for the lowrisc link. Never heard of it. Will certainly be watching progress there.