Announcing P2BEE: Propeller 2 Bytecode Execution Engine
Bill Henning
Posts: 6,445
Propeller 2 BEE is the ultimate emulation engine for the Propeller 2!
I know, that is a very strong claim. I'll prove it.
Benefits:
- Maximum possible execution rate for byte-encoded instructions
6.5 clocks per byte avearage execution rate for single cycle P2 instructions
- Makes writing processor emulators MUCH easier with much less code
- Makes writing any virtual machine much easier and run much faster
- Propeller 2 BEE is now the premiere emulation platform
- for P2BEE's requiring less than 256 instructions rest of STACK is available
Designed for:
- Specifically for Propeller 2
- Fastest possible Spin VM
- Fastest possible Java VM
- Fastest and Smallest "compressed mode" for C and other compilers
- Fastest possible 8 bit processor emulation
- Can be for 16/32 bit emulators (in some cases)
- provides DRASTIC speed up for Spin, Forth, ZOG, Z80, 6809, 6502 and
every other emulator and virtual machine
- Retro Gaming
- Retro Console Emulation
- Retro Computer Emulation
License: Creative Commons Attribution-ShareAlike 3.0 Unported
http://creativecommons.org/licenses/by-sa/3.0/legalcode
History:
I've been having a lot of fun with the Propeller 2 on my DE0-Nano, and I came up with a really cute trick that lead to developing P2BEE.
Today I verified that my P2BEE concept works, and that the engine works.
I could not wait to publish it - and I can't wait to see all the different emulators and virtual machines that will be based on it!
**********************************************************************
FAQ:
**********************************************************************
Why is P2BEE so fast?
Propeller 2 BEE pulls out all the stops, and uses all of the tricks I could
think of to execute byte codes as fast as possible. It uses a specific order
of cached byte read and stack access instructions optimized for the pipeline
details of the Propeller 2.
How did you think of it?
I have developing software and hardware for the Parallax Propeller since it
became available. I always had a great interest in processor emulation for
retro computers and gaming.
Way back, I came up with the LMM virtual machine for the Propeller, allowing
it to execute larger programs than it could in the native "cog" memory.
Once I saw the specifications and instructions for the STACK memory, I started
thinking of non-obvious uses for it... and once Chip increased it to 256 longs
from the original 128 entr "CLUT" version, it became even more interesting.
256 longs... a very useful number. There are 256 possible values for a byte.
Thus P2BEE was born.
Single cycle propeller instructions stored in the STACK (CLUT) memory can be
executed by the inner P2BEE engine in 6.5 cycles (on average) - obviously
other instructions will take longer.
By storing various JMP instructions, sequences of instructions can be run
for every byte code - making coding VM's and emulators immensely easier.
I know, that is a very strong claim. I'll prove it.
Benefits:
- Maximum possible execution rate for byte-encoded instructions
6.5 clocks per byte avearage execution rate for single cycle P2 instructions
- Makes writing processor emulators MUCH easier with much less code
- Makes writing any virtual machine much easier and run much faster
- Propeller 2 BEE is now the premiere emulation platform
- for P2BEE's requiring less than 256 instructions rest of STACK is available
Designed for:
- Specifically for Propeller 2
- Fastest possible Spin VM
- Fastest possible Java VM
- Fastest and Smallest "compressed mode" for C and other compilers
- Fastest possible 8 bit processor emulation
- Can be for 16/32 bit emulators (in some cases)
- provides DRASTIC speed up for Spin, Forth, ZOG, Z80, 6809, 6502 and
every other emulator and virtual machine
- Retro Gaming
- Retro Console Emulation
- Retro Computer Emulation
License: Creative Commons Attribution-ShareAlike 3.0 Unported
http://creativecommons.org/licenses/by-sa/3.0/legalcode
History:
I've been having a lot of fun with the Propeller 2 on my DE0-Nano, and I came up with a really cute trick that lead to developing P2BEE.
Today I verified that my P2BEE concept works, and that the engine works.
I could not wait to publish it - and I can't wait to see all the different emulators and virtual machines that will be based on it!
**********************************************************************
FAQ:
**********************************************************************
Why is P2BEE so fast?
Propeller 2 BEE pulls out all the stops, and uses all of the tricks I could
think of to execute byte codes as fast as possible. It uses a specific order
of cached byte read and stack access instructions optimized for the pipeline
details of the Propeller 2.
How did you think of it?
I have developing software and hardware for the Parallax Propeller since it
became available. I always had a great interest in processor emulation for
retro computers and gaming.
Way back, I came up with the LMM virtual machine for the Propeller, allowing
it to execute larger programs than it could in the native "cog" memory.
Once I saw the specifications and instructions for the STACK memory, I started
thinking of non-obvious uses for it... and once Chip increased it to 256 longs
from the original 128 entr "CLUT" version, it became even more interesting.
256 longs... a very useful number. There are 256 possible values for a byte.
Thus P2BEE was born.
Single cycle propeller instructions stored in the STACK (CLUT) memory can be
executed by the inner P2BEE engine in 6.5 cycles (on average) - obviously
other instructions will take longer.
By storing various JMP instructions, sequences of instructions can be run
for every byte code - making coding VM's and emulators immensely easier.
This discussion has been closed.
Comments
- added untested v.011 that uses a pipelined approach to hopefully reduce execution time to ~4.5 cycles for single cycle instructions
I expected the CLUT would be extremely useful for other things besides clut and stacks.
Get a cached byte from hub memory, translate it to a long via the stack RAM, and execute it. How simple, but effective!
Good job!
Everyone is having so much fun but it's really interesting seeing the stuff that has been done already. Chip knows that there is an up side with this and a down side because P2 better come out soon and it better work too!
Thanks Bill, I will sit down sometime next week and digest this tasty dish, sounds interesting, especially for Tachyon P2.
Have you thought about overlapped read/execute:
If I understand the idea correctly:
a) An emulator/VM for a byte wide instruction machine often uses a look up table to dispatch op codes to instruction sequences. Think Z80, Zog, Java etc.
b) That dispatch table on a Prop is likely to be in HUB RAM using WDLONGs
c) You are proposing to put that dispatch table in the COGs CLUT/stack RAM for fast look up.
For things like a Z80 the dispatch table in STACK RAM would have to contain jumps to code sequences or even multiple code sequence addresses squeezed into to long. After all a Z80 op can take a lot of Prop instructions to complete.
Very cunning but hardly an idea that has not occurred to an emulator writer and Prop enthusiast, like myself. Although we may never have come up with the fastest way to do it:)
I had to stop reading your source when I to the license which is incompatible with things like GPL or MIT.
I kept looking for other uses for the stack ram, and when I squinted just the right way, this idea popped out.
Translation tables have been done on a lot of architectures (including prop1) in the past, but the unique clut/stack ram access instructions allowed me to decouple translation from the hub, and more importantly, the cache, so the RDBYTEC runs mostly cached, making this the fastest method possible on a Prop2.
It should make Tachyon run MUCH faster. I am looking forward to see where you take it.
Yesterday I experimented with the pipeline delays for instructions fetched from the stack, and I needed two delay slots between popping the instruction and executing it.
Using an "execute next time around" like RDQUAD based LMM should work and be faster, but would require the vm's to keep pipeline effects in mind. Here is that test version:
I'll try this one in a couple of hours - after breakfast
In academia, attribution is required, so I don't see the issue.
Of course lookup tables have done in in the past - far before the propeller - CLUT's, TLB's, caches, lookup tables, microcode engines etc.
But I did come up with this first for P2, in a manner that avoids spoiling the RDxxxxC cache.
Today I am posting a potentially faster variant, that reduces it to four instructions, at the expense of making it more complicated with an additional pipeline.
-Tor
I'm all for attribution, credit should be given where it is deserved.
However:
In the linked Spin file it says:
"You may not distribute derived works under a different license without written permission from William Henning."
Clearly if I update ZiCog or Zog, for example to Prop II and use your I have created a derived work from it.
That means I have to put my existing code out under your same license, according to your terms above. Or I have to get permission from you to release under the MIT license which its seems you don't want to.
I have not read that specific licence but it seems to not be compatible with any OSI licence according to this page http://opensource.org/licenses/alphabetical where there is no mention of it.
It seems like an unusual choice for a software project.
It flies in the face of the majority of open Propeller code that is under the MIT license.
1) If I use GPLed code I have to apply the GPL to my derivative work.
2) If I sell a binary of that now GPLed work I may be asked for the source and have to give it.
3) The user is then free to pass on that source under the same terms.
4) The sale value of my product is now zero.
LGPL may be better for this kind of thing, but has weird rules about what is linked in and not linked in and companies hate to mess with all that.
None of these work because Bill wants attribution.
Isn't that what it's there for?
C.W.
There is the idea of the Prop II CLUT/Stack RAM as an opcode look up table. Only protectable under patent if has not been done before which I'm sure it has.
Then there is the actual implementation as in Bill's source code, protectable under copyright like any published work.
Quite how this goes if you have or hear of the idea and then write your own, which quite likely will end up looking very similar.
Could GCC ever make use of Bill's code given this license?
Sounds like you've validated one of the intended uses of the CLUT. Congratulations! I'm glad you didn't find any hardware problems in the process! I think we're all going to have fun finding clever ways to use the new P2 features. Now if I could just get the P2 version of GAS done I might have some time to play with the new features myself! :-)
GPL is a very different beast, it is definitely difficult to sort out if your use of it is derived work or not, and if all that even applies (e.g. if the GPL module is just another variant of many with the same API then your use of it is _not_ a derived work - think Libc or the Unix API).
Not that I'm in any way arguing about what license Bill should use - not at all - I'm only commenting about the interpretation of GPL variants as that came up in the thread. And wanting attribution, for example, is totally understandable (when that's said, Copyright of course has to be retained in GPL/LGPL work too of course)
-Tor
Well my intent is for emulator use, so it is of concern.
http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1148239&highlight=cosmacog#post1148239
All of my lookups will be jumps, so translating an opcode to a jump in every case.
I haven't looked at and don't plan to look at Bill's code for this.
For the 1802 it will still be running at the typical 3.579/2 Mhz for 1861 support, so not looking for the speed of some tight loop doing fast lookup/execute anyway.
I just hope we don't enter an arms race of everyone looking for little snippits to claim, otherwise I have a DEO Nano for sale...
C.W.
Tor:
thanks - if there was an "attribution required" version of LGPL, I might have chosen it.
Heater: (#16,17)
You are right, but that would also mean that people would have to attribute derivations of ZOG to you.
I'd love to see you do a ZOG using this technique.
ctwardwell:
No. I just require attribution to using the clut/stack as an instruction store for a quite optimal byte code execution engine.
Heater: (#19)
Yes, GCC is welcome to use it, at no charge, as long as they attribute as I ask and use this license for a hypothetical byte code compressed engine based on P2BEE.
David: (#20)
The CLUT/STACK was designed for color look up table, later stack functionality was added. It was not intended for storing executable code for byte code expansion
So instead of "validating an intended use" I came up with a clever new way to use it in an unintended way that will have great benefits for all byte code execution.
It is rather similar to LMM actually, but fetching code from a memory smaller than the cog registers, instead of larger like the hub.
This could allow for an extremely compact drastically faster compressed mode for GCC and Spin ... with the only cost to Parallax being an attribution requirement.
I am adding P2BEE v0.12 to the first post
This version has an alternate pipelined mode that can execute a single cycle Propeller 2 instruction in 4.53 cycles on average!
I've a DE2 for sale, if we are going to have a land grab on P2.
Sorry Bill.
Think of where LMM would be today with this license. "Hey, that nop is there to comply with non MIT licensing, if you want it to run at peak speed, talk to Bill..."
I will not address this question to any. It is Rhetoric Question?
But why that many people have problems to give credits for others work
Ps. I saw many posts on this forum that some people claimed others work that its own
Bill with that license Don't say -- Any need pay for it -- Only give credits for his work
My 5 cent's
Another one was Eric Ball's software video technique. A similar kind of license was placed on it. Fantastic code that has color capability not seen in just about every other driver out there. I still think that one is the best, runs at 14Mhz, offers a great color set and it includes some very clever software code to render the color sub-carrier, allowing for a few things we've not seen exploited on the P1. That code was never widely used either. That's not a negative to Eric, who deserves credit for that driver, and where it was used, credit was given gladly. (me) It's excellent, and it demonstrates his very deep understanding of video. Over the years, lots of video drivers got done, and I'm quite sure that one just got forgotten amidst so many great ways to exploit P1 video hardware.
Frankly, a license like this will more or less insure that the technique does not see wide use. Of course, we can encrypt now too, so who knows? Very interesting and new questions.
Jim Bagley authored Prop GFX, released in binary form and well documented. We really didn't adopt a binary blob well, despite his seriously good efforts to document how to use it. Again, not a negative statement against him. He's got serious graphics / game experience running over 30 years and probably has some stuff in there he would rather keep control over, and I thought it generous to work so hard to publish it in a way we could enjoy.
I put code here because I take code from here, and it's been a working arrangement that has served us all very well. If we start down this road with P2, it's going to get really interesting! The idea that code gets put here so that we all can benefit will go away, leaving lots of little islands of code and much less innovation in this little ecosystem.
Edit: You know, I saw the other thread about requiring code posted here be MIT something. Maybe that is a great requirement! Anyone wanting to use another license can write their package up here, advertizing essentially, leaving people free to consider it and enter into the appropriate licensing agreements. IMHO, it's an unfair exploitation of this forum to do otherwise as it limits discussion and puts people into odd circumstances, despite there being no nefarious intent to be there. Click on the wrong thread and? I would rather not see that happen.
That's my $.02
Bill, please don't quote that code out in the open here, or warn us so we can avoid the thread. I want to not see it. And I want to not see it for the difficulty doing so will bring, not that I don't want to recognize how bad *** smart you are about this stuff. Sorry man, and thanks for helping out.
I have no problem with giving credit to the creators of things I have used. Most of my code contains such "thank you notices" and references to original sources. I'm sure most here feel the same. After all "we stand on the shoulders of giants" as they say.
No that is not the point. Rather, we are pointing out that with that string attached it is a very nice Christmas present many projects, current and future cannot use. Most importantly for the Parallax and the Prop II the GCC effort.