First ideas of PASM virtual machine

pik33 · 2012-04-06 11:38

Goal: execute PASM from main RAM or external RAM attached to Propeller.

Idea (in pseudocode, I didn't memorize PASM syntax yet)

loop: nop
           inc virtual_program_counter
           rdlong  loop, virtual_program_counter
   jmp loop

If we want to jump, PASM syntax has to be extended to enable long jumps. We can insert

       jmp jump_procedure
 dd  long_jump_address

Our main loop will execute a jump to jump_procedure. It will load a long jump address from main ram to virtual program counter, then return to main VM loop

Calls, JMPRETs etc ... I think can be treated this way too

It is only first idea...

Phil Pilgrim (PhiPi) · 2012-04-06 11:52

pik33,

What you describe has already been done. It's called the LMM, or "Large Memory Model," first proposed by Bill Henning in 2006. You will see it mentioned in numerous places in the forum.

Here's Bill's first introduction: http://forums.parallax.com/showthread.php?89640

Here's a Wiki page about it: http://propeller.wikispaces.com/Large+Memory+Model

Being "second to invent" by no means diminishes your originality and insight, though. Congratulations!

-Phil

pik33 · 2012-04-06 12:00

It seems I have a lot to read here until I write about my ideas, because it seems that many things which I think can be done with Propeller is already done and I only need to find them.

kwinn · 2012-04-06 13:11

pik33 wrote: »

It seems I have a lot to read here until I write about my ideas, because it seems that many things which I think can be done with Propeller is already done and I only need to find them.

So true, as I have already discovered. Finding them is not always easy though.

doggiedoc · 2012-04-06 13:20

pik33 wrote: »

...I only need to find them.

One tip that you may find helpful in searching is to limit Google to this site. The sites search engine does leave something to be desired. The method for this is enter the following into your Google search bar (assuming you use Google):

site:http://forums.parallax.com [I]keyword(s)[/I]

You will find this returns a much more useful list on threads on your keyword(s) of interest.

Hope this helps!

Paul

jazzed · 2012-04-06 14:21

pik33 wrote: »

It seems I have a lot to read here until I write about my ideas, because it seems that many things which I think can be done with Propeller is already done and I only need to find them.

Various compilers are using this LMM today.

Many things have been "invented" by different people at different or even the same time. I wonder where did you get the idea originally?

The same thing can be done with full blocks of course, and many of us use that. It is especially helpful for language interpreters or emulators.

Welcome to the forum!
--Steve

Oldbitcollector (Jeff) · 2012-04-06 15:41

LMM is certainly a valid idea and more work with it is welcome!

Here's another search engine that might help you. http://www.propellerpowered.com/search
It's a custom Google search that will search all of the known discussion locations.

BTW, welcome to the forums.

OBC

localroger · 2012-04-06 16:35

pik33, you might be interested in a project I'm working on "in progress" which is kind of like LMM but different enough to be useful in different ways. It uses byte codes, not longs, to convert a lot of PASM instructions into standardized stack machine operations (they work off a math stack, like in Forth, instead of from memory). This results in a byte code which is much faster to execute than Spin but much more memory efficient than LMM. You still have to handle jumps LMM style, and common sense says to pack as many helper functions for thing like string operations into the interpreter cog as you can. I've done some backpedaling since I wrote them but you might find my old blog posts about "Windmill" of interest.

Heater. · 2012-04-06 21:20

pik33,

As others have said you have invented the Large Memory Model technique first proposed here by Bill Henning and now used in the Catalina and propgcc C compilers. It is also used in other projects where it is LMM PASM written by hand, for example my ZiCog Z80 emulator and PullMolls qZ80 emulator.

As far as I remember the designers of the Propeller had not realized this possibility until Bill pointed it out long after the Prop was in production. So even though you are late to the show well done for coming up with the idea independently.

If you ponder it a while you will see that one third of your execute loop is JMP instructions so unrolling the loop a bit will gain a lot of speed as well as aligning HUB reads with the HUB access slots better. Still that has been discussed a lot here already.

Turns out to be such a good idea that the up and coming Propeller II chip has features to speed it up. A wonderful case of a processor chip design being influenced by a public forum of it's users. (Has that ever happened before?)

max72 · 2012-04-07 06:59

You can find a lot of nice stuff around.. for example:

Other LMM solutions are implemented in propbasic (you write basic, and you get back assembly code, with LMM if you ask it). Moreover in the comments you can find you original code.

Another intriguing solution is SpinLMM, available in the obex.
http://obex.parallax.com/objects/635/

Check the sticky on top of the forum, propeller.wikispaces.com and existing objects in the obex.

Massimo

pik33 · 2012-04-07 08:20

jazzed wrote: »

(...) I wonder where did you get the idea originally? (...)

I was thinking about 6502 emulation. Get a 6502 instruction from main RAM, then translate it to intermediate instruction via lookup table, then execute this instruction in main emulator loop. Most of these intermediate instructions can be simply one PASM instruction. Then I thought - maybe we can simply execute native PASM from main ram? Or even faster from external ram chip attached to one cog?

Dr_Acula · 2012-04-07 17:49

Sounds a good idea. Heater did that a few years back - he even changed his avatar to "Zicog".

There are some cunning tricks - eg looking at the opcode numbers you can group them together and hence jump to the appropriate opcode with a jump table rather than a long series of "if..then" checks.

You will always want more ram, and the prop has 32k and really 64k is better for the old micros. So it is a matter of fetching data from ram as fast as you can. I started with a design using latches but it took 20 pasm instructions to get one byte out of external ram. In a quest to get a touchscreen display working faster (see http://forums.parallax.com/showthread.php?137266-Propeller-GUI-touchscreen-and-full-color-display) average joe and myself have got the pasm loop a lot faster. This bit of code reads two bytes:

ramtohub_loop           mov     data_16,ina             ' get the data
                        wrword  data_16,hubaddr         ' move data to hub
                        andn    outa,maskP19            ' clock 161 low
                        or      outa,maskP19            ' clock 161 high
                        add     hubaddr,#2              ' increment the hub address 
                        djnz    len,#ramtohub_loop

so that is 6 pasm instructions for 2 bytes, or 3 pasm instructions per byte.

I believe there could be even faster solutions using caching of blocks of memory from external ram into hub ram.

There is still a lot more to do with emulating older micros. The "20 pasm per byte" external ram ended up emulating a Z80 at about 2Mhz. The "3 pasm per byte" ought to be a lot faster. A 6502 will be a similar ballpark improvement in speed.

There have been some cunning solutions by heater and pullmoll and others where you put the most common instructions in a cog, along with the local registers, but you call external LMM code to emulate less frequently used instructions.

The other thing I don't believe has been done is to write a propeller compiler for these old micros. Rather than emulate a "mov" from an older micro, compile the source code into propeller code. CP/M for instance is written in simpler 8080 code and would run very fast if recompiled to pasm. Many Z80 and 6502 instructions could compile down to only a few pasm instructions.

There are some traps too though. A cog works in a flat 2k memory space but these older micros work natively in a flat 64k memory space. So you can run into problems with how to do jumps and calls. Caching probably is the answer, and then you have to devote some of the cog space to the emulation and some to the caching driver.

Kye · 2012-04-07 19:38

@Dr_Acula - Your loop could go faster if you used a counter module to generate the clock signal.

So... setup the counter module to be in NCO mode and to put the clock source out on whatever pin you need to clock.

     
ramtohub_loop   waitpne maskP19, maskP19   ' Wait for clock to go low for data to be valid.
                mov     data_16,ina        ' get the data
                wrword  data_16,hubaddr    ' move data to hub
                add     hubaddr,#2         ' increment the hub address 
                djnz    len,#ramtohub_loop

If you unrolled the loop so that you could get hub accesses between every two instructions (so the "read ina" timing is fixed and constant) you could then have an unrolled loop that could read a huge chunk of data at once very quickly.

ramtohub_loop 
                mov data_16,     ina          ' get the data
                wrword data_16,   hubaddr      ' move data to hub
                add hubaddr,        #2           ' increment the hub address 
                mov data_16,       ina          ' get the data
                wrword data_16,   hubaddr      ' move data to hub
                add hubaddr,        #2           ' increment the hub address 
                mov data_16,       ina          ' get the data
                wrword data_16,   hubaddr      ' move data to hub
                add hubaddr,        #2          ' increment the hub address 

                ...etc.

You would just have to play with the phase and frequency of the clock source. Use the PHSA to change the phase and FRQA to change the frequency.

... Potential for around 10 MB a second transfer burst speeds. (Its about 16 clock for the 3 instructions... 16 bits per 3 instructions... at 80 Mhz...). Do 128 3 instruction sequences per pass of the loop and you should have a pretty nice and fast setup.

Heater. · 2012-04-07 21:26

Strangely enough there have also been people working on 6502 emulation on the Prop. Did any of those come to fruition?

As for searching the forums. If you have just independently come up with an idea like LMM how on earth would you know what to search for?

JLocke · 2012-04-07 22:55

Says they will first be available at the Parallax Expo...

Propeddle: Software-Defined 6502 Computer

Heater. · 2012-04-07 23:06

Nah, that's not an emulation on the Prop but a real 6502 bolted to a Prop.

potatohead · 2012-04-07 23:32

Yep, and that has a strong appeal, but isn't quite the same. (I still want one, but I think I'll settle for bolting a Prop onto a 6502 computer instead.)

Yeah, what to search? Great question. Frankly, I think it's impressive to realize that mode so early, just musing on what one would do with the chip. Get this man a Prop! (It appears that is being settled too.)

average joe · 2012-04-08 00:51

Not sure how much we can actually push the ram with a counter. I though of using inverted CS signals for doing ram -> display, then you only have one toggle. 10MHZ clock to display... MAX.
LMM is amazing and I can't wait to play with it. Having external ram is really nice but now I want MULTIPLE RAMS. Display RAM. System RAM. Audio RAM... *HINT HINT* Using fast counters would be great to implement into a system like LMM. You'll probably even get better results for system memory than with the displays since you will be able to clock them fast... REALLY FAST if you get fast 5v parts *HINT HiNT* Resistors do cost on board but this may be the key to getting LMM to really kick it in the pants! Let me know if your interested in such things. We could even use multiple cogs to do read and writes using locks? VM could still be worked on and memory could still use improvement.
*EDIT*
FYI! While Doc has a great start on this, consider this before jumping on board with the hardware. He's running his '161's and RAM @ 3.3V. I can't say for sure what timings will be since data sheets list voltage @ 5V or 2V. But I can hypothesize 5v will be faster. No guess whatsoever at HOW MUCH faster. But I WILL be finding out!

kuroneko · 2012-04-08 00:53

Kye wrote: »

If you unrolled the loop ...

Not necessary. [post=1017146]A loop will do just fine.[/post]

Dr_Acula · 2012-04-08 03:38

@kye - brilliant. I'll check it out. Lots of spare space in that cog so unrolling etc is not a problem.

One of the musings I have regarding pasm virtual machines is the tradeoff with pins and speed. If you use all the propeller pins then it runs the fastest. But then there are no pins left for anything else and there are many things that the propeller can do very well with just a few pins. VGA/TV display. Keyboard. SD card. Some serial ports. Even turning a relay on and off.

But I think that a lot can be done with caching. Take averagejoe's comment about more ram. Why can't the same ram be used for the system, for an emulation, for (some) external displays like a touchscreen, and for audio?

If you see the external ram as holding the code, then up to almost all of the hub can be used as a cache. And if the cogs can handle waiting a bit here and there, then it ought to be possible to run things like playing audio, and running a program, and handling input and output, all at the same time.

I've got this vague idea that you can emulate not just one pasm virtual machine, but several, all at the same time. Super Cogs - each with huge amounts of memory, plus a cache handler that keeps all the commonly used blocks of memory in hub ram.

With caching, not so many propeller pins need to be devoted to ram I/O.

I'm still not sure exactly how such a thing would work, but an idea would be to have a megabyte of sram attached to a propeller, and run a 6502 emulation in one cog and an 8080 emulation in another, and each has a 64k working space, and (say) 8k of that is sitting in hub at any one time and then there is a cache handler that works out whether a block needs to be read in or out.

A sort of LMM/XMM hybrid.

average joe · 2012-04-08 04:49

WOAH! 6502 AND 8080? There are possibilities there or sure. Caching will improve performance and reduce the pins / speed trade-off a bit. Especially if we can prevent thrashing. Static cache size is easier to implement but having dynamic cache sizing could be of benefit too?
Drac's Euro-Touch++, or whatever he decides to call it, is a great starting point. Small modifications may be necessary to fit design requirements? Most projects would work quite well. But since the `161 - RAM combo is new *to the best of my knowledge* LMM will not take advantage of it. Neither will XMM. *don't quote me on this, since I'm still wrapping my head around LMM*
My comment about more RAM is quite out of the norm for most people. 512k x 16B is quite a healthy bit of RAM, especially when you add the SD card in there. Throw a backup battery on the RAM and cut boot times too. I have this nasty little habit of over-engineering things. Looking 2 or 3 years in the future, I don't want to be forced to start over AGAIN because I've hit the limit of the HW.

First ideas of PASM virtual machine

Comments