Basis for a Bytecode Interpreter?

lonesock · 2010-02-24 23:43

Hi, All.

Well, this has probably been done before, but here goes [noparse][[/noparse]8^)

I noticed that while PASM instruction codes are 9 bit, they are always in the form xxxxxx00x. With only 7 input bits, you can recover the 9 bit instruction code with the following PASM statements:

              ror       inst, #30 wc            ' like a shl by 2, but puts the low bit in C
              andn      inst, #%110             ' zero out these 2 bits
              muxc      inst, #1                ' recover the low bit

Obviously you still have to do a movi and then some other instruction before the actual instruction is executed. However, this opens up the possibility of a bytecode interpreter where the low bit of each byte signals if it's a "semi-native" PASM instruction in the form:

semi_native_op  registerD, registerS

or another instruction dealing with the overhead of getting/setting the values of the 2 registers (from Hub RAM, Cog RAM, or constants, or maybe from a stack?), accessed via a jump table...something like:

              ' Fetch this byte, then do something!
              rdbyte    inst, code_idx          ' read the next byte
              add       code_idx, #1            ' increment the code index
              shr       inst, #1 wc             ' check if it's a direct PASM or an Interpreter inst
        if_c  jmp       #PASM_instruction       ' go to PASM if necessary              
              add       inst, #Interp_jump_table ' get the address out of a jump table              
              jmp       inst                    ' jump the the correct PMOD operator

Does this seem worth developing?

thanks,
Jonathan

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.

Phil Pilgrim (PhiPi) · 2010-02-24 23:56

lonesock said...
I noticed that while PASM instruction codes are 9 bit, they are always in the form xxxxxx00x.

Well, not really. The supposed zero bits are for wz and wc. How were you planning to accommodate them?

-Phil

Bill Henning · 2010-02-24 23:59

Honestly?

I think it is too slow. I have played with many different schemes, and your approach is 12 cycles slower than the fastest byte code interpreter I could code:

rdbyte vect,pc
add pc,#1
jmp vect

This ofcourse assumes a jump table starting at 0

In the end, I found that mixed byte code / LMM took too many cog resources (memory) to be efficient at either. Your mileage may vary.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system

Mike Green · 2010-02-25 00:09

There is a lot of stuff missing. For example, what about the conditional execution bits? What about the enable result bits? What do you hope to accomplish by doing this? You're introducing quite a bit of overhead for something that doesn't look a lot different from direct execution or LMM-style execution. What are you getting for the extra cost?

lonesock · 2010-02-25 00:52

This would be targeted more at the ZPU type niche that ZOG is in the process of filling (or even Spin, for that matter). [noparse][[/noparse]8^) It definitely isn't meant to be LMM, so it would not account for the Z or C flags. I was thinking more along the lines that I could have a very small number of helper functions (eight?) and the attendant jump table. Any of the ops potentially directly executed by the prop would use the above mechanism, taking much less code than a jump for each instruction (add, sub, abs, etc.). I was planning on using the extra space for an in-Cog stack, or a series of quick-access registers that an optimizing compiler could use to reduce the overhead of addressing and transferring to/from Hub RAM all the time.

If this seemed at all feasible, then the next, and harder, step would be re-targeting a compiler to produce bytecode for this.

thanks for all the feedback,
Jonathan

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.

Cluso99 · 2010-02-25 01:21

lonesock: You have missed Phil's point...
Those 2 bits are used to determine if the instruction will set the z and c flags when it executes. So they are used!

e.g. cmp x,y wc, wz 'sets c & z flags (this instruction requires those 2 bits you are referring to on)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

lonesock · 2010-02-25 03:31

Sorry, everybody, I don't seem to be communicating well:

* I do not want to make any kind of LMM kernel
* I do not want to be able to execute "cmp x,y wc, wz", or any other perfectly valid PASM instruction
* I want to make a code-dense virtual CPU (like the ZPU) that executes bytecode
* I want the bytecode to be generated from a standard compiler, e.g. LCC
* I would like there to be a simple mechanism for executing instructions that the prop can handle natively (like ADD, SUB, etc)
-- without writing a specific function and attendent jump-table entry
-- the mechanism would handle a subset of byte-codes as "do the PASM equivalent passthrough"
-- the mechanism would handle all other valid bytecodes using a jump table
* I would like to use the remainder of the Cog RAM as a scratch pad for keeping either a fast stack, fast registers, or both

Jonathan

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.

RossH · 2010-02-25 04:09

Lonesock,

If I understand correctly, you want to use the lower order bit to determine if the byte is a byte code (one of 127 such), or instead should be re-interpreted as the first byte of a PASM instruction. If you decide the byte is a PASM instruction, you will reconstruct the original LONG instruction from that byte plus the 3 succeeding bytes and execute it directly, otherwise you will use it as a bytecode and execute it via a jumptable.

Is that right?

Here are the problems I see:

- As others have said, the bits you are assuming are empty are not - they will be set in any PASM instructions that want to write the carry or zero bit. But this is ok if you never intend to execute such instructions.

- bytecodes will be long aligned only in 1 in 4 cases - and if they are not, but the byte code represents a PASM instruction then you need to either pad them till they are (so that you can re-read the whole thing as a long) or be prepared to reconstruct the PASM instruction by reading the next three consecutive bytes using byte reads. Either way is quite inefficient.

You may be better of having a bytecode that says something like "interpret the following long as a PASM instruction", and use the compiler to ensure that this special byte code always occurs just prior to 4 long-aligned bytes (so they can be fetched as a single LONG). Then you could execute any instruction, and wouldn't need to do any messing about shifting bits back into the correct position.

But I'm still not sure it's worth it - the overhead of executing each such PASM instruction is some bit test logic plus an additional hub fetch - and you still have the problem of wasted space since you have to worry about alignment. But it might be worth it if the gain you get in both speed and space for the byte codes outweighs what you lose because of the embedded PASM.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Mike Green · 2010-02-25 05:34

If I understand you, you want to use the op-codes from the native Propeller instructions for your bytecodes for many of the basic arithmetic and logical instructions. You can do that, but it may not save you much because of all the other overhead.

lonesock · 2010-02-25 06:15

Thanks, everybody.

I was never planning on reconstructing a full PASM instruction. I was planning on keeping a stack, with the head of stack always in a local register, call it registerS. I was planning on having the working register in another local register, call it registerD. There would be a single native PASM instruction, for example "mov registerD, registerS". Now, the mov would be swapped out for certain opcodes the prop can handle natively (ADD, SUB, etc.), but all the rest remains the same.

Well, the consensus seems to be "don't waste my time", so I'll go with that. [noparse][[/noparse]8^)

thanks,
Jonathan

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.

Cluso99 · 2010-02-25 07:17

lonesock: I am not sure what you are thinking so guess I really cannot comment. We were confused with the original statement of the unused bits, so I guess we lost what you were trying to achieve.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

ImageCraft · 2010-02-25 08:02

Hi Lonesock, I have a 16 bits VM designed with similar ideas. It would execute some instructions "reasonably" fast, as in faster than Spin, but slower than LMM C.

This would be ideal for Prop I, I think. You will get about 1.5x-2x more memory than LMM C (it won't have the conditional bits which the ICC uses for LMM C) and still faster than Spin.

// richard

Dave Hein · 2010-02-25 14:15

Some very large instruction word (VLIW) processors use compression in their instructions.· A VLIW of 128 bits (16 bytes) can typically be compressed down to a few bytes.· These processors use hardware to expand the compressed instruction back to 128 bits before executing it.

The trick in the Propeller would be to develop a compression method that would only take a few instructions to expand the compressed codes.· You also may need to enforce some rules on the assembly code so that it can be compressed.· One such rule could be that variable addresses must be between $100 and $1ff.· This would reduce the size of the source and destination fields from 9 bits to·8 bits.· That way you could use a single byte for each value.

The value of immediate constants could be restricted even further.· Typically, immediate constants are small values, such as 0, 1, 2, 4, etc.· You might be able to limit constants to 4 bits.· Larger constants would need to be stored in cog memory or hub memory.

Most instructions write a result, but do not write the carry or zero flag.· Most instructions are performed unconditionally, so this would be the normal case also.· So for the normal case you could compress the opcode and flags down to a single byte.· This would give you 1 byte for the opcode, and 1 byte for each of the operands.· That's an amazing compression of 4 bytes down to 3 bytes!

·· That's not very impressive.· However, you might be able to further restrict the instruction set and variable range to get this down to 2 bytes, which is starting to get interesting.· That would be a 2:1 compression.

It really starts getting interesting if you can combine multiple instructions into a single psuedo-code.· This will get you get you a lot more compression.· The trick is to make it·quick and easy·to decode.

Dave

Post Edited (Dave Hein) : 2/25/2010 2:20:19 PM GMT

jazzed · 2010-02-25 15:21

lonesock said...
Sorry, everybody, I don't seem to be communicating well:
...

Maybe I don't get it exactly either, but here goes:

So basically, a 3 byte instruction could be used where 6 bits of the 8 bit opcode reference a PASM instruction? Byte 2 and 3 are the destination and source registers as R0...R255 (or a subset) with the balance of COG registers for stack/interpreter.

It seems that ZC should always be affected and RI can be selected. Of course, one loses IF_Z etc... conditionals, but is that really important? One could use TJZ and TJNZ for branching BRZ/BRNZ.

Another approach would be a 2 byte instruction with R0..15 (mapped in COG) selected for destination and source registers. That might be slower, but it would be more compact.

How easy is it to build a prototype? Getting this group to "bless" even the best ideas and intentions is difficult. Do what you like with your time ... you don't need anyone here to help you explore this.

heater · 2010-02-25 15:34

Assuming you come up a workable compact byte code don't forget you then have to create an assembler and or compiler for it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Basis for a Bytecode Interpreter?

Comments