Ideas for Linux on the Propeller
pedward
Posts: 1,642
Yeah, I agree that it's kinda a bad idea in principle, but if the tools were available I'm sure someone would figure out how to use it.
With that out of the way, I started poking around uClinux, and this looks like a really viable Linux kernel base to port to the Propeller.
To make it even possible, you would need an XMM memory interface on the order of 16-32MB in size, plus drivers to talk to peripherals like SD.
uClinux has been ported to a number of microcontrollers, from the Hitachi H8, Motorola m68k Dragonball, and Atmel SAM9 ARM9 chips.
The ARM9 chips have very little resources, and they are comparable to the Propeller 1 in rated MIPS.
If a Propeller 1 port works, a Propeller 2 port is a totally viable platform, especially with the extra speed and direct memory interfacing.
Just some thoughts that have been provoked by some realizations about the proliferation of ARM -- it's really hard to find an x86 SoC to build a C3 type board, but ARM SoC chips and big micros are prevalent and relatively inexpensive.
EDIT: With the multiple COGs of the Propeller architecture, you could implement a hardware virtual machine architecture, with Linux running as a guest OS, while other COGs implement machine services (memory server). You could also run hard real time services in dedicated COGs, while Linux runs a more conventional, rich, software stack.
With that out of the way, I started poking around uClinux, and this looks like a really viable Linux kernel base to port to the Propeller.
To make it even possible, you would need an XMM memory interface on the order of 16-32MB in size, plus drivers to talk to peripherals like SD.
uClinux has been ported to a number of microcontrollers, from the Hitachi H8, Motorola m68k Dragonball, and Atmel SAM9 ARM9 chips.
The ARM9 chips have very little resources, and they are comparable to the Propeller 1 in rated MIPS.
If a Propeller 1 port works, a Propeller 2 port is a totally viable platform, especially with the extra speed and direct memory interfacing.
Just some thoughts that have been provoked by some realizations about the proliferation of ARM -- it's really hard to find an x86 SoC to build a C3 type board, but ARM SoC chips and big micros are prevalent and relatively inexpensive.
EDIT: With the multiple COGs of the Propeller architecture, you could implement a hardware virtual machine architecture, with Linux running as a guest OS, while other COGs implement machine services (memory server). You could also run hard real time services in dedicated COGs, while Linux runs a more conventional, rich, software stack.
Comments
A Prop running in COG PASM gets you 20MIPS (Provided it is not making heavy use of data in HUB)
A Prop running LMM PASM gets you 5 MIPS
A Prop running XMM PASM may only get you 1 MIP.
ARM Linux is going to be very cheap. The entire Raspberry Pi board is only 25$
I happen to have a Gadget Gangster board here with 32MB RAM, SD, and console port. So hardware is not an issue no matter how slow it may be in the end.
Hard real time services in COG is equivalent to Linux device drivers loading firmware in to wifi chips or other devices. No problem.
Not sure what you mean by "guest OS". I hope there is only one OS layer. This beast is going to be slow enough as it is. Un less you count the LMM kernel as a host OS.
We now have propgcc that could probably be coaxed into compiling the Linux kernel so what's stopping you
You make a good point about ~1MIP with XMM.
I'm thinking Prop 1 is a development system that can be used to develop some *real* applications for the Prop 2.
When I said "guest OS" I meant that Linux would not be the native code for the chip. It's Paravirtualization, the Prop still runs a supervisor COG, and Linux runs as in 1 or more COGs, with 1 or more COGs allocated to hard real time applications.
Yes, I intended to use PropGCC. To do the port right would require some time to properly optimize the code to put really important bits in Hub memory, then regular program code in XMM.
I am thinking that a hybrid approach would be best, either Linux runs COGs with PASM or the supervisor runs them for certain services. We can agree that certain functions would be better implemented as dedicated COGs running PASM, then allocate COGs like processors or take the approach of a monolithic/macro kernel.
Linux is a monolithic kernel, where 1 chunk of code does it all. Implementing certain parts, like I/O as dedicated COGs could probably be cheated to improve throughput, since it would just be regular code as far as Linux is concerned, but in reality it would work more like a multi core macro kernel with certain chunks of code running in dedicated COGs. You can avoid the coginit overhead by always having idle COGs ready to dispatch certain chunks of code, then you avoid parallelism problems and locking, basically running certain code in a different COG to avoid cache thrashing of the calling COG -- since XMM requires caching to work well, and trying to run 2MB of code through 1 COG could be very slow.
I'm just mulling things over at this point, I may get the urge to hack on the problem in a bit, but I need to get up to speed on PropGCC and uClinux first.
As to purpose/reason? A) To prove it can be done The Prop 2 will be cheaper than an Atmel ARM9 chip.
1) All code runs as XMM from 32Mb external RAM. A nice flat memory space just like any other micro.
2) Hopefully you find a linux kernel that works without MMU. Implementing virtual memory is going to be complex and slow things down a lot.
3) Already we have 1 cog in use as a serial console port, one cog in use as the SD card block driver, one cog running the XMM kernel. Perhaps another cog is used in a caching memory controller. I suggest these are loaded and set running prior to the Linux kernel, they just provide hardware services via memory locations in HUB. Just like real peripheral hardware does.
4) Any remaining cogs can be left to be handled by future linux device drivers.
So all you need to do is get the kernel compiled with propgcc. Fix up the block driver, console driver and timer tick driver. And write a boot loader. Job done:)
They say on their site that to port to a new architecture you need to principally write about 10 files. Not simple, but not impossible.
I would tackle the 2.0.38 kernel, which is old, smaller, and less sophisticated. I also have an O'Reilly book on this version that talks about the kernel.
There is a nice book on writing Linux device drivers that will come in handy for getting those 10 files straight.
If I remember correctly there was an edition covering the 2.0 kernel drivers. Things have changed a lot since then and got more complicated.
here is the conclusion ive come to about linux acually running on the prop. why spend time on doing this when u may only amazingly get the kernel working with few cli utilities if ur lucky? i think the better approach is to read over the posix standards and develop a *inx clone that is suited to the the propellers unique architecture. i mean realistically u would have to build an os and utilities from the ground up that dont rely on interupts and other such 8086 specifics if you would ever have a chance of compiling zippy programs that didnt need hacks or emulation.
it jist seems to me that is u really want to ever see a usable bash shell on your prop you need to develop from the ground up not hack a port of 20 year old tech. you would probably want to develop some kind of system to handel sram first and and develop with it in my mind. i would love to get this accomplished someday just a decent posix os with a fast monolithic kernel and 8 to 16mb of sdram. the sad truth is by the time i have the knowlege level to get this accomplished the prop2 will be out with sdram support built in. anyways thats just my musings on the project. i know theres the hacker menatality of i wanna port linux just becuase i can, but if you have enough skill u can why not put it to better use making something that has the same outcome but is usable.
http://www.nxp.com/products/microcontrollers/arm9/
I don't think you'd have many pins left, but maybe it doesn't matter...
As far as I can tell it's pretty much exactly the same difficulty for Prop I or Prop II. From an implementation/logical point of view. My 32Mb Prop system already has RAM + SD card + serial port so we are ready to go.
Only difference is the resulting speed of the thing. Frankensteins monster was never a fast mover:)
Spinix is an attempt to provide the look and feel of Linux on a Prop. It takes advantage of the relocatable nature of Spin porgrams and the mutliple processors to implement multiple processes at the same time. It runs on any Prop board that supports an SD card. It is starting to outgrow the P1, and it will be nice to run it on a P2 when it's available. The kernel is around 10 KB, and the shell is another 9 KB. This leaves just enough memory to run most of the standard apps, but some of them require killing the shell, and in some cases, taking over the entire Prop. The P2 will alleviate the memory limitation and allow more apps to run at the same time. The P2 will allow mutliple shells to run, which will allow multiple users at the same time. The main limitation then becomes the fact that there are only 8 cogs, and no interrupt. It will require cooperative processes to get above 8 processes at a time.
The ARM9 series is certainly no Prop in terms of performance. Go to TI and check out their ARM9 SITARA line which is a SOC, it beats the Prop in performance so bad its not even funny.
True.
With the memory board you only get Serial Port, SD card, TV/Audio, and Keybd/mouse via the TinyTwo-Wire addon. Some folks have the first prototype with Keybd/Mouse built-in. I2C expansion is the only pin option left.
If nothing else attempting to do this would help exercise Propeller-GCC more.
In that regard this is a welcome project.
Whether or not it's practical depends on what you want the end result to be
I am the guy who brought the CP/M operating system to the Prop. Along with that comes the first and only C compiler, BDSC, that can compile C on the Prop.
Practical? No. But we just have to do it anyway.
Here's a thread about it.
Try this: http://code.google.com/p/propgcc/
The 32MB SDRAM module was a gadget gangster limited edition. With PropellerPlatform more or less defunct, it seems like a better option would be to support a Quickstart solution. If anyone cares, I'll do a Quickstart board with DingBatty's 2 latch design that gives more free pins. I could add some other experimental goodies, etc....
I like the feature set of the board you designed for GG, and I would encourage moving towards an I/O co-processor model. You have a main prop with SDRAM, SD, video, and a co-processor that is connected with a QSPI type interface for doing pin I/O.
You could design the daughterboard to have a QFP prop on it that provides expanded I/O capabilities. Optional crystal, chain loaded from the parent instead of an EEPROM (yeah, we are revisiting old ideas). The usual p28-31 would be used for data pins.
So you don't mean a cross compiler used on a computer that will generate Propeller binary programs ?
You mean a compiled GCC program that runs on Propeller to generate Propeller binary programs?
As for the cross compiler, we do that now with all the standard library functions, etc... As for the GCC program, yes that is too big. GCC requires too much support to run on a Propeller today - that would need a linux-like OS running on Propeller. I don't really see the need for running GCC on a Propeller. but that's just my opinion. A cross compiler is needed to bootstrap all the other stuff though.
Hi Perry.
@DingBatty's performance table doesn't reveal a marked slowdown. The main thing is to burst the data as fast as possible. The extra latch is just for the address setup. The SDRAM setup is pretty laborious already so just one more latch doesn't hurt much.
I've played with something like this already: http://forums.parallax.com/showthread.php?128651-Name-It-Win-It-Contest-Closed-MicroPropPC&p=966990&viewfull=1#post966990
There are 2 assembled boards. The main Propeller-SDRAM data bus is 16 bits wide. The second Propeller handles most peripherals including VGA.
I chose an I2C controlled parallel bus for updating the VGA buffer on a second Propeller. The I2C control is slow, but the data is fast once the I2C says start the transfer. The QSPI interface is an interesting idea.
I'd like to put together a dev board to get PropGCC up an running, then tackle Linux after I've gotten that working sufficiently.
Pedward I like your general approach, and getting started with Prop1 (no matter how slow), will certainly reduce the time taken to get something decent running with Prop2. I gather Prop2 will likely use a single cog as a "caching memory server". If we use the same idea here, we can load into that cog the code to match any available memory hardware system.
A 1 MIP XMM performance figure was mentioned, which presumably wouldn't require a super fast memory serving cog. Coincidentally I'm looking at a Prop<>PC104 (ISA bus) interface to get around a serial bottleneck. Perhaps it would be possible to use an old PC as memory source too (with memory serving cog to suit)
edit: alternatively something like the new FT240X (8 bit + strobes, 1MByte/sec) could provide a slowish gateway to a PC's memory over USB. The advantage is you could view what is being written/read on the PC
Just showing where I've gone with this stuff - nothing more, nothing less.
I didn't have time to follow through with it all.
Usually what happens is some bigger shiny idea comes along to derail me.
I'll post schematics, etc... if your interested.
I certainly value your opinion
You're right P1 can be a P2 stepping stone. We have lots of things in place already if people want to try a uCLinux port.
As I understand it, P2 COGs have cache read instructions that allow bursting data from HUB to COG one cycle per long up to 16 longs. This will give a good boost if enough of the necessary tools are there in COG. There are other features about COGs that will speed up other things like the SDRAM clocking mode and FIFO register file.
We may be able to live without a second COG cache, and that is very desirable to avoid a bottleneck. Arbitration by locks or some means will still be necessary for multi-COG access to external memory. We use a separate cache on P1 today mainly for portability and speed.
Another benefit is not having to andn/or bits to coexist with the user program, which is major flaw in choosing a single COG memory access design. Some of the more macro-like instructions of P2 will help this.