First ideas of PASM virtual machine
pik33
Posts: 2,398
Goal: execute PASM from main RAM or external RAM attached to Propeller.
Idea (in pseudocode, I didn't memorize PASM syntax yet)
If we want to jump, PASM syntax has to be extended to enable long jumps. We can insert
Our main loop will execute a jump to jump_procedure. It will load a long jump address from main ram to virtual program counter, then return to main VM loop
Calls, JMPRETs etc ... I think can be treated this way too
It is only first idea...
Idea (in pseudocode, I didn't memorize PASM syntax yet)
loop: nop inc virtual_program_counter rdlong loop, virtual_program_counter jmp loop
If we want to jump, PASM syntax has to be extended to enable long jumps. We can insert
jmp jump_procedure dd long_jump_address
Our main loop will execute a jump to jump_procedure. It will load a long jump address from main ram to virtual program counter, then return to main VM loop
Calls, JMPRETs etc ... I think can be treated this way too
It is only first idea...
Comments
What you describe has already been done. It's called the LMM, or "Large Memory Model," first proposed by Bill Henning in 2006. You will see it mentioned in numerous places in the forum.
Here's Bill's first introduction: http://forums.parallax.com/showthread.php?89640
Here's a Wiki page about it: http://propeller.wikispaces.com/Large+Memory+Model
Being "second to invent" by no means diminishes your originality and insight, though. Congratulations!
-Phil
So true, as I have already discovered. Finding them is not always easy though.
You will find this returns a much more useful list on threads on your keyword(s) of interest.
Hope this helps!
Paul
Various compilers are using this LMM today.
Many things have been "invented" by different people at different or even the same time. I wonder where did you get the idea originally?
The same thing can be done with full blocks of course, and many of us use that. It is especially helpful for language interpreters or emulators.
Welcome to the forum!
--Steve
Here's another search engine that might help you. http://www.propellerpowered.com/search
It's a custom Google search that will search all of the known discussion locations.
BTW, welcome to the forums.
OBC
As others have said you have invented the Large Memory Model technique first proposed here by Bill Henning and now used in the Catalina and propgcc C compilers. It is also used in other projects where it is LMM PASM written by hand, for example my ZiCog Z80 emulator and PullMolls qZ80 emulator.
As far as I remember the designers of the Propeller had not realized this possibility until Bill pointed it out long after the Prop was in production. So even though you are late to the show well done for coming up with the idea independently.
If you ponder it a while you will see that one third of your execute loop is JMP instructions so unrolling the loop a bit will gain a lot of speed as well as aligning HUB reads with the HUB access slots better. Still that has been discussed a lot here already.
Turns out to be such a good idea that the up and coming Propeller II chip has features to speed it up. A wonderful case of a processor chip design being influenced by a public forum of it's users. (Has that ever happened before?)
Other LMM solutions are implemented in propbasic (you write basic, and you get back assembly code, with LMM if you ask it). Moreover in the comments you can find you original code.
Another intriguing solution is SpinLMM, available in the obex.
http://obex.parallax.com/objects/635/
Check the sticky on top of the forum, propeller.wikispaces.com and existing objects in the obex.
Massimo
I was thinking about 6502 emulation. Get a 6502 instruction from main RAM, then translate it to intermediate instruction via lookup table, then execute this instruction in main emulator loop. Most of these intermediate instructions can be simply one PASM instruction. Then I thought - maybe we can simply execute native PASM from main ram? Or even faster from external ram chip attached to one cog?
There are some cunning tricks - eg looking at the opcode numbers you can group them together and hence jump to the appropriate opcode with a jump table rather than a long series of "if..then" checks.
You will always want more ram, and the prop has 32k and really 64k is better for the old micros. So it is a matter of fetching data from ram as fast as you can. I started with a design using latches but it took 20 pasm instructions to get one byte out of external ram. In a quest to get a touchscreen display working faster (see http://forums.parallax.com/showthread.php?137266-Propeller-GUI-touchscreen-and-full-color-display) average joe and myself have got the pasm loop a lot faster. This bit of code reads two bytes:
so that is 6 pasm instructions for 2 bytes, or 3 pasm instructions per byte.
I believe there could be even faster solutions using caching of blocks of memory from external ram into hub ram.
There is still a lot more to do with emulating older micros. The "20 pasm per byte" external ram ended up emulating a Z80 at about 2Mhz. The "3 pasm per byte" ought to be a lot faster. A 6502 will be a similar ballpark improvement in speed.
There have been some cunning solutions by heater and pullmoll and others where you put the most common instructions in a cog, along with the local registers, but you call external LMM code to emulate less frequently used instructions.
The other thing I don't believe has been done is to write a propeller compiler for these old micros. Rather than emulate a "mov" from an older micro, compile the source code into propeller code. CP/M for instance is written in simpler 8080 code and would run very fast if recompiled to pasm. Many Z80 and 6502 instructions could compile down to only a few pasm instructions.
There are some traps too though. A cog works in a flat 2k memory space but these older micros work natively in a flat 64k memory space. So you can run into problems with how to do jumps and calls. Caching probably is the answer, and then you have to devote some of the cog space to the emulation and some to the caching driver.
So... setup the counter module to be in NCO mode and to put the clock source out on whatever pin you need to clock.
If you unrolled the loop so that you could get hub accesses between every two instructions (so the "read ina" timing is fixed and constant) you could then have an unrolled loop that could read a huge chunk of data at once very quickly.
You would just have to play with the phase and frequency of the clock source. Use the PHSA to change the phase and FRQA to change the frequency.
... Potential for around 10 MB a second transfer burst speeds. (Its about 16 clock for the 3 instructions... 16 bits per 3 instructions... at 80 Mhz...). Do 128 3 instruction sequences per pass of the loop and you should have a pretty nice and fast setup.
As for searching the forums. If you have just independently come up with an idea like LMM how on earth would you know what to search for?
Propeddle: Software-Defined 6502 Computer
Yeah, what to search? Great question. Frankly, I think it's impressive to realize that mode so early, just musing on what one would do with the chip. Get this man a Prop! (It appears that is being settled too.)
LMM is amazing and I can't wait to play with it. Having external ram is really nice but now I want MULTIPLE RAMS. Display RAM. System RAM. Audio RAM... *HINT HINT* Using fast counters would be great to implement into a system like LMM. You'll probably even get better results for system memory than with the displays since you will be able to clock them fast... REALLY FAST if you get fast 5v parts *HINT HiNT* Resistors do cost on board but this may be the key to getting LMM to really kick it in the pants! Let me know if your interested in such things. We could even use multiple cogs to do read and writes using locks? VM could still be worked on and memory could still use improvement.
*EDIT*
FYI! While Doc has a great start on this, consider this before jumping on board with the hardware. He's running his '161's and RAM @ 3.3V. I can't say for sure what timings will be since data sheets list voltage @ 5V or 2V. But I can hypothesize 5v will be faster. No guess whatsoever at HOW MUCH faster. But I WILL be finding out!
One of the musings I have regarding pasm virtual machines is the tradeoff with pins and speed. If you use all the propeller pins then it runs the fastest. But then there are no pins left for anything else and there are many things that the propeller can do very well with just a few pins. VGA/TV display. Keyboard. SD card. Some serial ports. Even turning a relay on and off.
But I think that a lot can be done with caching. Take averagejoe's comment about more ram. Why can't the same ram be used for the system, for an emulation, for (some) external displays like a touchscreen, and for audio?
If you see the external ram as holding the code, then up to almost all of the hub can be used as a cache. And if the cogs can handle waiting a bit here and there, then it ought to be possible to run things like playing audio, and running a program, and handling input and output, all at the same time.
I've got this vague idea that you can emulate not just one pasm virtual machine, but several, all at the same time. Super Cogs - each with huge amounts of memory, plus a cache handler that keeps all the commonly used blocks of memory in hub ram.
With caching, not so many propeller pins need to be devoted to ram I/O.
I'm still not sure exactly how such a thing would work, but an idea would be to have a megabyte of sram attached to a propeller, and run a 6502 emulation in one cog and an 8080 emulation in another, and each has a 64k working space, and (say) 8k of that is sitting in hub at any one time and then there is a cache handler that works out whether a block needs to be read in or out.
A sort of LMM/XMM hybrid.
Drac's Euro-Touch++, or whatever he decides to call it, is a great starting point. Small modifications may be necessary to fit design requirements? Most projects would work quite well. But since the `161 - RAM combo is new *to the best of my knowledge* LMM will not take advantage of it. Neither will XMM. *don't quote me on this, since I'm still wrapping my head around LMM*
My comment about more RAM is quite out of the norm for most people. 512k x 16B is quite a healthy bit of RAM, especially when you add the SD card in there. Throw a backup battery on the RAM and cut boot times too. I have this nasty little habit of over-engineering things. Looking 2 or 3 years in the future, I don't want to be forced to start over AGAIN because I've hit the limit of the HW.