I don't think Propellers are a bad fit at all. The trouble with P1 is the clock speed it runs at and it's RAM are both just shy of being able to do the better machines and CPU's justice. It's almost there, and frankly the Z80 / 8080 stuff is impressive!
When we get P2, this stuff will be cake, and I'm deffo in line with Heater on that. I love emulation projects and really want to learn more and emulate some great devices. P2 is gonna hit the sweet spot on all the 8 bit stuff for sure.
BTW: A 6809 is probably the hardest emulation due to how complex the chip instruction set is. That chip is IMHO, the most powerful and elegant of the 8 bitters, improved only by the Hitachi 6309 and it's extras. When it's time, that's the device to emulate. On P2, we get much improved LMM.
An Apple //e will mostly fit into a P2, and it's going to very easily run at clock speed+. Imagine that.
Thanks for the response Heater.
I think that with what the Prop1 is capable off, I can use it to build a pretty competant 63C09 based computer (at about 3.58 to 5 MHz).
The advantage to emulation or an FPGA design would be increased speed.
At this point, I'm thinking of running the SRAM I'm going to use at about four times processor speed and interleave access to four seperate CPUs.
Memory allocation will be tricky, but SMD isn't anything new to the 6809/6309.
With the Z80 we pushed the propeller to do amazing things, but I don't think we fully explored all it is capable of, and I think the same logic applies to the 6809 as well.
For a 16 bit micro with 64k of program space, if you write it in LMM you only can use 32k hub space at most. This means you end up fetching each instruction from external ram, then processing it via LMM. So for the dracblade for instance, it took 20 instructions to get one byte from external ram, and for a Z80 where many instructions are 2 or 3 bytes, that could be as bad as a 60:1 ratio.
I believe we can do better with caching and with different ram. For instance with a different ram arrangement, eg the touchscreens that read 1 word rather than 1 byte, and do that word in a tight pasm loop, I think that can be 20x faster. And with caching you can do even better again.
The challenge I think is how you divide the task up between cogs. In a simplistic way, cache driver in one cog and micro emulator in another cog. But there might be a better solution with cache plus the commonly used instructions in one cog, and lesser used instructions in another cog.
The cache part can further be divided into the external ram interface and the cache manager.
Sometimes I ponder a new language for the propeller - XMM pasm where the entire program is in external ram, hub is for variables, and you move blocks of raw pasm into cogs as needed. So a cog has the external ram driver, a cache driver, and about half the cog is free for code. Move a block of code from external ram to hub and then into the cog and run it. Replace all jumps and calls with jumps to special handlers in the cog code. Replace all 32bit dat variables with code that creates a variable out of mov and shift instructions. Have some common variables eg r0 to r15. Have some stacks either in cog or hub eg program counter, call stack, temp variable stack so make it a hybrid stack/register architecture. Have a precompiler that keeps functions contiguous within a cog (ie if nearly at the end of what is going to fit in a cog and the function will be split, move to the next cog).
It should end up faster than LMM and enable pasm programs megabytes in size. But it would need a smart precompiler. Such a program could simplify the creation of emulators because you wouldn't need to worry about cog size limitations.
So it is back to reading deSilva's pasm tutorials for me...
Considering the current fire sale on SX chips (thanks Parallax!), the interested tinkerer could couple the 48-pin version with a fast Cypress SRAM and execute 6809 code faster and cheaper than either the Propeller or the MC6809.
Those don't seem to be able to generate as high resolution video as the Prop.
BTW - Decided my initial idea was too ambitious.
For now 1 6309, static RAM, and a Prop for VGA, keyboard and mouse (and possibly interrupt generation).
Can't decide how to approach audio.
Wikipedia:
"The SX dies are still manufactured by Ubicom, who sends them to Parallax for packaging."
"On July 31, 2009, Parallax announced that the SX line of microcontrollers will be discontinued. They will only be available until their stock is depleted."
I'm actually surprised that the SX isn't known to practically everyone here. Parallax was selling SX chips and tools years before the P1 was born. They still have tens of thousands of them for sale right now. Leon's terse description of it was very apt.
While the SX and the Propeller are both microcontrollers, they are very different (8-bit vs 32-bit, single-core vs 8-core, etc.). I'm certainly not recommending it as a competitor or alternative to the Propeller for general purpose use. We were talking emulation...and nothing I've ever used can shove eight bits around faster. So I was suggesting that the SX could function as an old-fashioned control store, microsequencer, and ALU.
RE: Sometimes I ponder a new language for the propeller - XMM pasm where the entire program is in external ram, hub is for variables, and you move blocks of raw pasm into cogs as needed.
In a way we already have the beginning of this concept with the current XMM pasm, and it would be a natural fit for a Forth byte code interpreter as well. All of the propeller machine language instructions except jumps can be executed very quickly, and the jumps would not take all that much longer.
By using the 18 bits of the current D and S address as a relative jump a program could jump forwards or back 128K, which is more than enough for most loops. An absolute jump could easily be 18 bits (256K), 26 bits (64M), 32 bits (4096M) or 50 bits (ridiculously large). Considering the speed of the propI I would think that something in the 18 to 26 bit address range would be more than adequate for most applications.
I'm actually surprised that the SX isn't known to practically everyone here. Parallax was selling SX chips and tools years before the P1 was born. They still have tens of thousands of them for sale right now. Leon's terse description of it was very apt.
While the SX and the Propeller are both microcontrollers, they are very different (8-bit vs 32-bit, single-core vs 8-core, etc.). I'm certainly not recommending it as a competitor or alternative to the Propeller for general purpose use. We were talking emulation...and nothing I've ever used can shove eight bits around faster. So I was suggesting that the SX could function as an old-fashioned control store, microsequencer, and ALU.
¿Entiendes Mendez o te lo explico Federico?
So, do you think it would make a better base for 6809 emulation?
The 6809 has one 16 bit accumulator and the 6309 has two and one 32 bit.
But that might not hinder emulation.
Despite having been hanging around the embedded systems industry for decades in Europe and followed the progress of all the big names in micro-controllers and embedded processors, Intel, Motorola, ATMEL, Microchip, ARM, and so on I have never seen anyone use SX chips. Indeed I had not even heard of Parallax, SX or Propeller until a hand full of years back. I guess that is why there is now Parallax Semiconductor.
Given that the SX is now obsolete and on life support I would not be inclined to invest any serious effort into it. My curiosity might get me to dabble a bit though.
An implementation for the Propeller has the great advantage that it will be easily portable to the Prop II where there is a lot more room and speed.
So, do you think it would make a better base for 6809 emulation?
The 6809 has one 16 bit accumulator and the 6309 has two and one 32 bit.
But that might not hinder emulation.
If your goal is to get a functional solution with a minimum of 'adventure' I certainly would not suggest an SX+SRAM solution. But, if you find joy in doing crazy things in unconventional ways (and getting very satisfying results), I think it is made to order!
(BTW, the mixed register sizes of the 6809 aren't a problem since you are just emulating the processor. I mean you could emulate the 6809 with a 4004 if you wanted to. But the results would be pretty sluggish.)
@Heater: I agree...the SX was never a widely publicized or main-stream product. It was only because of your association with Parallax that I thought you and others might have heard of it. For example, the opening page of Parallax.com has an advertisement for the SX fire sale right now.
Edit: Looks like the ad disappeared in the last 24 hrs. Fortunately the chips haven't.
Despite having been hanging around the embedded systems industry for decades in Europe and followed the progress of all the big names in micro-controllers and embedded processors, Intel, Motorola, ATMEL, Microchip, ARM, and so on I have never seen anyone use SX chips. Indeed I had not even heard of Parallax, SX or Propeller until a hand full of years back. I guess that is why there is now Parallax Semiconductor.
Given that the SX is now obsolete and on life support I would not be inclined to invest any serious effort into it. My curiosity might get me to dabble a bit though.
An implementation for the Propeller has the great advantage that it will be easily portable to the Prop II where there is a lot more room and speed.
I found out about the SX by accident after spending a while with the propeller. It's a nice chip but since it isn't being made anymore I wouldn't use it for any serious project or anything I plan to make more than a few of. I did manage to buy a few and have plenty for smallish projects and other things for the fun of it. It's sad they no longer being made though. It's a fun chip and not hard to program.
Some little time ago I discovered Nitros-9 and decided this time I want so a computer, I mean a CoCo, I make it short, I bought some 6309 and 6809 off that auction site, let's see if they even work, and would try to build something. A 6847 should be pretty easy with a propeller.
But the point of my post is a bit more in the direction of emulating the 6809, splitting those opcodes into very simple ones (like microcode):
LDA $00
into
load address in temporal $00
fetch from memory at temporal
move to register A
update flags after load
it needs loads of such microcode. Probably two COGs could do it easily... On exactly that is in what I am interested, one COG runs ahead and leaves the other one the microopcodes it has to run in HUB memory
Poor old MoCog has been sidelined for a long time.
The approach of splitting instructions into sub ops or "microcodes" is a good one. That's exactly what the ZiCog Z80 emulator does.
In the ZiCog case we observe that a lot of Z80 instructions can be split into three operations:
1) Load something.
2) Do some operation.
3) Store somethng.
Now here is a trick: The dispatch table consists of a LONG entry for each op code. That LONG can be split to three fields each of which represents one of the above steps. Further, each field can be the actual address in COG of the operation to perform. So the decode is:
a) Read an op code from memory.
b) Use that opcode to index the dispatch table and get a LONG
c) Mask of the first field in the long and call that address in COG which might be "load memory" say,
d) Shift and mask the next field and call that address for the next micro-op.
e) Same again for the last micro-op.
The is means you can get a lot of opcodes in a lot less code space in COG and it's pretty quick.
This slick idea came from Cluso who has used it in his go faster Spin bytecde interpreter.
Just now I don't recall how I was getting on with MoCog. I have a feeling the 6809 instruction set is so much bigger and more complex that I did not manage to set up a scheme like that. But perhaps it is possible if you sit and think about the machine structure long enough.
I did some experiments with the multi-cog approach to the Z80. What I ended up with was slow. The overheads of exchanging data and synching between COGs was too big. It would be quicker to just write the ops in LMM and have the code live in HUB.
Heu Heater, I remember reading about the table some time ago, it is a good one. The ratio of HUB access to COG speed is an unfortunate one, but we had lived long enough with it and made amazing things with it.
Looking at the opcode table of the 6809 you realize how slow that thing is, don't forgetting that it divides the clock by 4!... maybe it is just a pipe dream...
Hi Mike!
I just read your post about loosing everything, I would have cried too. Anyhow, if you haven't already given them all away, I'd like a copy physical or electronic of all the manuals you mentioned. I'd especially like to see the OS-9 Source code! By the way, The Color Computer Community has completely disassemble the entire OS-9 L1 and L2 packages and upgraded them to NitrOS-9 v3.3.0. See nitros9project.org for details. If for some reason the web site is down, let me know. It would be very interesting to see the differences comparing the original code! I hope and pray you can recollect much of what you lost. That is one reason I never remarried. Wives just don't want to understand. The Coco Email list is at http://five.pairlist.net/mailman/listinfo/coco. I don't know if it boots, but l have some disks from a 6800 class I took in the early '80s that used SSB computers. Computerware is SSB? Interesting. I'd like to get some of that software as well.
Take care my friend.
Comments
When we get P2, this stuff will be cake, and I'm deffo in line with Heater on that. I love emulation projects and really want to learn more and emulate some great devices. P2 is gonna hit the sweet spot on all the 8 bit stuff for sure.
BTW: A 6809 is probably the hardest emulation due to how complex the chip instruction set is. That chip is IMHO, the most powerful and elegant of the 8 bitters, improved only by the Hitachi 6309 and it's extras. When it's time, that's the device to emulate. On P2, we get much improved LMM.
An Apple //e will mostly fit into a P2, and it's going to very easily run at clock speed+. Imagine that.
I think that with what the Prop1 is capable off, I can use it to build a pretty competant 63C09 based computer (at about 3.58 to 5 MHz).
The advantage to emulation or an FPGA design would be increased speed.
At this point, I'm thinking of running the SRAM I'm going to use at about four times processor speed and interleave access to four seperate CPUs.
Memory allocation will be tricky, but SMD isn't anything new to the 6809/6309.
For a 16 bit micro with 64k of program space, if you write it in LMM you only can use 32k hub space at most. This means you end up fetching each instruction from external ram, then processing it via LMM. So for the dracblade for instance, it took 20 instructions to get one byte from external ram, and for a Z80 where many instructions are 2 or 3 bytes, that could be as bad as a 60:1 ratio.
I believe we can do better with caching and with different ram. For instance with a different ram arrangement, eg the touchscreens that read 1 word rather than 1 byte, and do that word in a tight pasm loop, I think that can be 20x faster. And with caching you can do even better again.
The challenge I think is how you divide the task up between cogs. In a simplistic way, cache driver in one cog and micro emulator in another cog. But there might be a better solution with cache plus the commonly used instructions in one cog, and lesser used instructions in another cog.
The cache part can further be divided into the external ram interface and the cache manager.
Sometimes I ponder a new language for the propeller - XMM pasm where the entire program is in external ram, hub is for variables, and you move blocks of raw pasm into cogs as needed. So a cog has the external ram driver, a cache driver, and about half the cog is free for code. Move a block of code from external ram to hub and then into the cog and run it. Replace all jumps and calls with jumps to special handlers in the cog code. Replace all 32bit dat variables with code that creates a variable out of mov and shift instructions. Have some common variables eg r0 to r15. Have some stacks either in cog or hub eg program counter, call stack, temp variable stack so make it a hybrid stack/register architecture. Have a precompiler that keeps functions contiguous within a cog (ie if nearly at the end of what is going to fit in a cog and the function will be split, move to the next cog).
It should end up faster than LMM and enable pasm programs megabytes in size. But it would need a smart precompiler. Such a program could simplify the creation of emulators because you wouldn't need to worry about cog size limitations.
So it is back to reading deSilva's pasm tutorials for me...
And can it do VT100 terminal video at the same time as being a CP/M machine?
Those don't seem to be able to generate as high resolution video as the Prop.
BTW - Decided my initial idea was too ambitious.
For now 1 6309, static RAM, and a Prop for VGA, keyboard and mouse (and possibly interrupt generation).
Can't decide how to approach audio.
Wikipedia:
"The SX dies are still manufactured by Ubicom, who sends them to Parallax for packaging."
"On July 31, 2009, Parallax announced that the SX line of microcontrollers will be discontinued. They will only be available until their stock is depleted."
BTW - Of the two, the Prop looks like the more powerful MCU.
While the SX and the Propeller are both microcontrollers, they are very different (8-bit vs 32-bit, single-core vs 8-core, etc.). I'm certainly not recommending it as a competitor or alternative to the Propeller for general purpose use. We were talking emulation...and nothing I've ever used can shove eight bits around faster. So I was suggesting that the SX could function as an old-fashioned control store, microsequencer, and ALU.
¿Entiendes Mendez o te lo explico Federico?
RE: Sometimes I ponder a new language for the propeller - XMM pasm where the entire program is in external ram, hub is for variables, and you move blocks of raw pasm into cogs as needed.
In a way we already have the beginning of this concept with the current XMM pasm, and it would be a natural fit for a Forth byte code interpreter as well. All of the propeller machine language instructions except jumps can be executed very quickly, and the jumps would not take all that much longer.
By using the 18 bits of the current D and S address as a relative jump a program could jump forwards or back 128K, which is more than enough for most loops. An absolute jump could easily be 18 bits (256K), 26 bits (64M), 32 bits (4096M) or 50 bits (ridiculously large). Considering the speed of the propI I would think that something in the 18 to 26 bit address range would be more than adequate for most applications.
So, do you think it would make a better base for 6809 emulation?
The 6809 has one 16 bit accumulator and the 6309 has two and one 32 bit.
But that might not hinder emulation.
Given that the SX is now obsolete and on life support I would not be inclined to invest any serious effort into it. My curiosity might get me to dabble a bit though.
An implementation for the Propeller has the great advantage that it will be easily portable to the Prop II where there is a lot more room and speed.
If your goal is to get a functional solution with a minimum of 'adventure' I certainly would not suggest an SX+SRAM solution. But, if you find joy in doing crazy things in unconventional ways (and getting very satisfying results), I think it is made to order!
(BTW, the mixed register sizes of the 6809 aren't a problem since you are just emulating the processor. I mean you could emulate the 6809 with a 4004 if you wanted to. But the results would be pretty sluggish.)
@Heater: I agree...the SX was never a widely publicized or main-stream product. It was only because of your association with Parallax that I thought you and others might have heard of it. For example, the opening page of Parallax.com has an advertisement for the SX fire sale right now.
Edit: Looks like the ad disappeared in the last 24 hrs. Fortunately the chips haven't.
I found out about the SX by accident after spending a while with the propeller. It's a nice chip but since it isn't being made anymore I wouldn't use it for any serious project or anything I plan to make more than a few of. I did manage to buy a few and have plenty for smallish projects and other things for the fun of it. It's sad they no longer being made though. It's a fun chip and not hard to program.
But the point of my post is a bit more in the direction of emulating the 6809, splitting those opcodes into very simple ones (like microcode):
LDA $00
into
load address in temporal $00
fetch from memory at temporal
move to register A
update flags after load
it needs loads of such microcode. Probably two COGs could do it easily... On exactly that is in what I am interested, one COG runs ahead and leaves the other one the microopcodes it has to run in HUB memory
Poor old MoCog has been sidelined for a long time.
The approach of splitting instructions into sub ops or "microcodes" is a good one. That's exactly what the ZiCog Z80 emulator does.
In the ZiCog case we observe that a lot of Z80 instructions can be split into three operations:
1) Load something.
2) Do some operation.
3) Store somethng.
Now here is a trick: The dispatch table consists of a LONG entry for each op code. That LONG can be split to three fields each of which represents one of the above steps. Further, each field can be the actual address in COG of the operation to perform. So the decode is:
a) Read an op code from memory.
b) Use that opcode to index the dispatch table and get a LONG
c) Mask of the first field in the long and call that address in COG which might be "load memory" say,
d) Shift and mask the next field and call that address for the next micro-op.
e) Same again for the last micro-op.
The is means you can get a lot of opcodes in a lot less code space in COG and it's pretty quick.
This slick idea came from Cluso who has used it in his go faster Spin bytecde interpreter.
Just now I don't recall how I was getting on with MoCog. I have a feeling the 6809 instruction set is so much bigger and more complex that I did not manage to set up a scheme like that. But perhaps it is possible if you sit and think about the machine structure long enough.
Somewhere I also have a 6809 chip...
Sill, maybe my approach their was not optimal.
Looking at the opcode table of the 6809 you realize how slow that thing is, don't forgetting that it divides the clock by 4!... maybe it is just a pipe dream...
I just read your post about loosing everything, I would have cried too. Anyhow, if you haven't already given them all away, I'd like a copy physical or electronic of all the manuals you mentioned. I'd especially like to see the OS-9 Source code! By the way, The Color Computer Community has completely disassemble the entire OS-9 L1 and L2 packages and upgraded them to NitrOS-9 v3.3.0. See nitros9project.org for details. If for some reason the web site is down, let me know. It would be very interesting to see the differences comparing the original code! I hope and pray you can recollect much of what you lost. That is one reason I never remarried. Wives just don't want to understand. The Coco Email list is at http://five.pairlist.net/mailman/listinfo/coco. I don't know if it boots, but l have some disks from a 6800 class I took in the early '80s that used SSB computers. Computerware is SSB? Interesting. I'd like to get some of that software as well.
Take care my friend.
Kip Koon
http://www.cocopedia.com/wiki/index.php/Kip_Koon
Cheers
Carlos