Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S

Bill Henning · 2010-04-17 19:33

Excellent!

I am chewing on the page flushing and loading in VMCOG...

heater said...
Yep, just spotted that myself, modified accordingly and test.bin works fine

I'll tidy up a bit and post this working version.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

Bill Henning · 2010-04-17 19:39

You are probably right... I'm too deep in VMCOG to study the +4 right now [noparse]:)[/noparse]

Yep, your initial implementation of my TOS suggestion was right on the money; I basically inlined push/pop and used nos for the local var to make it easier to follow for others. Basically you tended to copy the old TOS to "a", then pop what I call NOS into TOS, and work on it, whereas I leave TOS alone, and get the next number on the stack into NOS; it is easier for me to visualize and code it that way, as it reflects old Forth terminology. If ZOG ever needs a "DROP" operator, its implementation is simply tos=pop(); and OVER is simply "nos=pop();push(tos);tos=nos;" etc etc

heater said...
I've just been studying a diff of our two versions. Lots of little changes but unless I've missed a point they are essentially the same but with inline push/pop functions and using tos/nos instead of local vars spread around the ops.

So I'm even more convince the SP+=4 is wrong.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-17 21:35

Attached to the first post is ZPU in C v0.8 with all Bill's optimizations in place.

This is very encouraging as it has only 48 accesses to ZPU memory space (PUSH/POP/READ/WRITE) in
all of it's op codes verses 96 in the original version.

Also because there are many fewer calls to memory access routines there will be a lot less code in the final PASM
version.

So we are looking at doubling the performance of Zog on the Prop. Maybe a FIBO(23) in about 2 seconds. The same as spin!

Next up is a rewrite of the Zog PASM based on this C code.

Edit: Moved attachment to the first post.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Post Edited (heater) : 4/17/2010 10:01:40 PM GMT

Bill Henning · 2010-04-17 21:58

Nice! Can't wait to see how new pasm compares to old pasm.

heater said...
Attached is ZPU in C v0.8 with all Bill's optimizations in place.

This is very encouraging as it has only 48 accesses to ZPU memory space (PUSH/POP/READ/WRITE) in
all of it's op codes verses 96 in the original version.

Also because there are many fewer calls to memory access routines there will be a lot less code in the final PASM
version.

So we are looking at doubling the performance of Zog on the Prop. Maybe a FIBO(23) in about 2 seconds. The same as spin!

Next up is a rewrite of the Zog PASM based on this C code.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

Bill Henning · 2010-04-17 22:04

I wonder if the Spin interpreter would benefit from the TOS in register technique...

heater said...
Attached to the first post is ZPU in C v0.8 with all Bill's optimizations in place.

This is very encouraging as it has only 48 accesses to ZPU memory space (PUSH/POP/READ/WRITE) in
all of it's op codes verses 96 in the original version.

Also because there are many fewer calls to memory access routines there will be a lot less code in the final PASM
version.

So we are looking at doubling the performance of Zog on the Prop. Maybe a FIBO(23) in about 2 seconds. The same as spin!

Next up is a rewrite of the Zog PASM based on this C code.

Edit: Moved attachment to the first post.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-17 22:12

I'm wondering why I can't have a custom made Propeller chip with a Zog VM in PROM instead of Spin.

BINGO A Propeller fully programmable in bog standard C out of the box.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-17 23:01

True... and there is enough room in the rom for both interpreters.

FYI, I don't think a ram dispatch table is needed; looking at the C code it should fit with a 128 entry dispatch table in-cog.

VMCOG is getting closer... it now also pre-loads the first N pages from SPI flash during VMFLUSH

I am still testing to make sure it is reading the page correctly; already squashed one bug I introduced while merging the PropCade SPI ram code into VMCOG.

heater said...
I'm wondering why I can't have a custom made Propeller chip with a Zog VM in PROM instead of Spin.

BINGO A Propeller fully programmable in bog standard C out of the box.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-18 16:24

Release ZOG v0.8. Attached to the first post.

All ZPU ops have been rewritten as per Bill Henning suggestions and as modelled in ZPU v0.8.

Sadly FIBO(26) still seems to take about 17 seconds as measured by eye against my desktop clock.

There are 72 longs free in the COG so a cog dispatch table is quite possible.

Bill: Do you have some nice PASM code for interfacing to VMCog?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-18 17:36

I've made, and uploaded here, a "VMACCESS.SPIN" which contains untested (but should work) versions of read/write byte from the VM.

I will also add it to the first post in the VMCOG thread.

I am somewhat surprised fibo did not show an improvement! I think Dhrystone would show it more.

heater said...
Release ZOG v0.8. Attached to the first post.

All ZPU ops have been rewritten as per Bill Henning suggestions and as modelled in ZPU v0.8.

Sadly FIBO(26) still seems to take about 17 seconds as measured by eye against my desktop clock.

There are 72 longs free in the COG so a cog dispatch table is quite possible.

Bill: Do you have some nice PASM code for interfacing to VMCog?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

Bill Henning · 2010-04-18 21:56

I found a bug in VMACCESS - currently VMCOG still uses the old 3 long format; so instead of +4 for the second long, use +8 until I fix VMCOG.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

RossH · 2010-04-19 01:28

@heater,

Sorry - undexpected interruptions (blasted social life! - who needs one?) meant I didn't get a change to look at this yet.

Maybe later this week.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

heater · 2010-04-19 05:16

Social life:

There was lots of it here this week end. Guest come, they eat all the food, they drink all the wine, they proceed to take a long siesta.

Great, a nice quite house and quality time to tinker with the Prop and Zog[noparse]:)[/noparse]

As it stands Spin does FIBO(26) in 10 seconds on my Prop at 104MHz. Zog takes 17 seconds. With a few instructions removed from my version here and in-lining the instruction byte fetch I think we are down to about 14 seconds.

I was never as optimistic as Bill about the effect of the recent optimizations on FIBO. FIBO is all about stack ops and tight recursion. These optimizations don't help so much there.

Also as we are operating from HUB RAM reducing the number of memory/stack fetches does not have such a dramatic effect. I'm sure it's essential for slow external memory access.

Unless someone has some brilliant ideas about further optimization AND is willing to implement it in ZOG I think the actual VM is done. I don't think I have the patience to go through all that again.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-19 12:57

Social life:

I had to go over to my sister's place last nite ; it was her birthday.

I think the reason we are not seeing a real difference is precisely FIBO being about tight recursion - however I was mistaken, as I really did expect at least 20% improvement even in FIBO.

The good news from this is that if cutting memory accesses almost in half did not have an appreciable effect, there may not be a huge further loss from using VMCOG instead of direct hub access. Hopefully.

heater said...
Social life:

There was lots of it here this week end. Guest come, they eat all the food, they drink all the wine, they proceed to take a long siesta.

Great, a nice quite house and quality time to tinker with the Prop and Zog[noparse]:)[/noparse]

As it stands Spin does FIBO(26) in 10 seconds on my Prop at 104MHz. Zog takes 17 seconds. With a few instructions removed from my version here and in-lining the instruction byte fetch I think we are down to about 14 seconds.

I was never as optimistic as Bill about the effect of the recent optimizations on FIBO. FIBO is all about stack ops and tight recursion. These optimizations don't help so much there.

Also as we are operating from HUB RAM reducing the number of memory/stack fetches does not have such a dramatic effect. I'm sure it's essential for slow external memory access.

Unless someone has some brilliant ideas about further optimization AND is willing to implement it in ZOG I think the actual VM is done. I don't think I have the patience to go through all that again.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-19 13:21

Bill: "The good news from this is that if cutting memory accesses almost in half did not have an appreciable effect, there may not be a huge further loss from using VMCOG instead of direct hub access."

Hmm... For some reason I think that reasoning is backwards.

Let's say every ZPU op originally took, say, 100 PASM instructions and originally had 5 HUB ops. (Instruction fetch, dispatch table look up, POP(opr1), POP(opr2), PUSH(result)).

Then reducing that to 3 HUB ops (instruction fetch, dispatch lookup, POP(nos)) would not make much difference. As we have seen.

But increasing those 3 memory accesses up to 100 instruction periods, say, would be very detrimental to performance.

Anyway, I lose patience with thinking about the various swings and roundabouts of optimizations. We just have to see what happens.

For now I'm going to try some simple comparisons like summing a big array of bytes, word or longs. Stuff like that. I've got an urge to try a Spin version of the RC4 crypto. It's very byte oriented. en.wikipedia.org/wiki/RC4#RC4-based_cryptosystems

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-19 14:32

You are right. Let's get it all working, then we can worry about further optimizations.

Even if it ends up 1/2 the speed of Spin - even 1/3rd - having full C/C++/Fortran/more compilers running with dirt cheap (for 128K) memory expansion will be very nice.

And there would be no arguments about it being ANSI C [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-25 15:15

Just attached ZPU in C v0.9 to the first post.

This fixes a bug in the ZPU PUSHSP op.

I have added a new test program. It's the RC4 encryption algorithm. On first attempt this just crashed and burned on the optimized version of ZPU.
Going back to my unoptimized version it worked though.
There ensued a long session of comparing the debug output of cut down versions of the code between the two. Eventually after some thousands of instruction steps I see that PUSHSP was wrong in the optimized version.

Sadly we have a way to go as the full RC4 now runs to completion but gives the wrong result. In fact returning the plain text input as unencrypted output !!

Anyway I'll get the changes into ZOG next.

Edit: OK ZOG v0.9 is posted now. Includes new RC4 test but the included test.bin is still the old one running FIBO.
I have added some further optimizations to ZOG to get it's FIBO(26) time down from 17 to about 14 seconds.
Those optimizations may not be permanent as they take a lot of space.
I have added a Spin FIBO to the start up for speed comparisons.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Post Edited (heater) : 4/25/2010 4:49:54 PM GMT

Bill Henning · 2010-05-02 00:34

I'm back from my cruise! Boy did I need that week off.

Glad to see your progress; I will be resuming VMCOG work on Monday.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

VIRAND · 2010-05-05 07:34

This project reminds me of something from "a long time ago", 2000, +/- 5 years:

"A very simple and fast RISC Forth ('-ish') processor chip"

I think it was called MuP21. The chip die was so uncomplicated it looked to me like a
morph between a 256 bit core memory and a TTL PCB. I think it was designed
as a school project with plans to sell it and then maybe it was forgotten.

Yes. it was mup21. Maybe google it if interested. Only 20 instructions.
I wonder if its related to or compatible with Zylin's.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I should be typing in Spin now.
Coming soon. My open Propeller Project Pages and favorite links index.

heater · 2010-05-05 08:02

Interesting. The mup21 was implemented with only 6000 transistors including it's NTSC video generation circuitry.

Having a quick look around I see the mup21 fetched 20 bits at a time containing four 5 bit instructions. As it happens FPGA implementations of ZPU fetch four 8 bit instructions and decode them simultaneously.

The mup21 instruction set looks initially very much like the ZPU. It is tailored for Forth though and does include a A register to speed things a long a bit.

The ZPU instruction set is designed for minimal logic block usage in FPGA implementations and has a few twists to support C compilers specifically GCC.

I guess if there were a mup21 enthusiast who discovered the Prop he could whip up a neat emulation on the Prop making use of the video generators

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Post Edited (heater) : 5/5/2010 8:07:56 AM GMT

TonyD · 2010-05-06 10:12

VIRAND said...
This project reminds me of something from "a long time ago", 2000, +/- 5 years:

"A very simple and fast RISC Forth ('-ish') processor chip"
....
Yes. it was mup21. Maybe google it if interested. Only 20 instructions.
I wonder if its related to or compatible with Zylin's.

Chuck Moore who designed the mup21 (and invented Forth) has a new company called Green Array Chips which has a couple of interesting forth based micros. The top of the range GA144 has 144-cores which implement the colorForth instruction set all in a 10x10 mm QFN-88 package.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- Tony

http://zuzebox.wordpress.com/

heater · 2010-05-06 10:57

This has come up before on this forum. All very interesting but nowhere to click to buy one[noparse]:([/noparse]

Also programming in Forth is the road to insanity, for example:

 : STAR     ( -- )            \ Print a single star
   42 EMIT ;               \ 42 is the ASCII code for *


 : STARS    ( n -- )   \ Print n stars
   0 DO STAR LOOP ;       \ Loop n times (0 up to n-1) and execute STAR


 : SQUARE    ( n -- )   \ Print an n-line square of stars
   DUP 0 DO           \ Loop n times, keeping (DUP-licating) n on the stack
   DUP STARS CR            \ Each time, print n stars then print CR
   LOOP DROP ;             \ After loop is done, drop the n from the stack


 : TRIANGLE    ( n -- )   \ Print an n-line triangle
   1 + 1 DO           \ Loop n times from 1 to n (instead of 0 to n-1)
   I STARS CR              \ This time use the inner loop index I
   LOOP ;


 : TOWER    ( n -- )   \ Print a "tower" with an base of size n
   DUP                     \ DUP-licate n (since it is used twice below)
   1 - TRIANGLE            \ Print a triangle 1 size smaller than n
   SQUARE ;                \ Print a square base of size n

Give me PASM any day.

P.S. ZOG is a bit stuck. As is the ZPU in C prototype. Even going back to the unoptimized version it runs many things nicely but is having a bit of a problem with certain test program of mine that tries to do a modulus operation (%) on unsigned ints. This does not use the ZPU mod instruction as that works only for signed integers but rather it compiles to call a long winded modulus function. That function goes wrong somewhere. Despite comparing the ZPU in C code to the documentation and the Java implementation for ages I just can't see where we are going wrong.

Just now I'm trying to test against the actual ZPU VHDL implementation....

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

TonyD · 2010-05-07 08:51

heater said...

Also programming in Forth is the road to insanity ...
...
Give me PASM any day.

Yes programming in Forth can be insanely challenging

Give me C any day

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- Tony

http://zuzebox.wordpress.com/

heater · 2010-05-08 20:16

ZPU in C version 0.10 is attached to the first post.

ZPU and therefore ZOG has been stuck on an obvious but well hidden bug for a while.

Turns out that the ZPU byte codes were loaded into the LONG wide memory array in the wrong endianness and that the readByte and writeByte functions were also "backwards". These two features almost cancelled out such that ZPU/ZOG appeared to work and run quite a lot of C code. But eventually things start to fail in strange ways and eventually I cotton on to the problem.

Also fixed the ADDSP instruction. It failed when the offset into the stack is zero because now that we have optimized things the top of stack value is not actually on the stack after most ops.

Also added some #idefines in zpu.h such that all the ops that can be done with the ZPU EMULATE operation can be switched to do so. In that mode single stepping will produce a register dump that can be compared with ZyLins ZPU VHDL implementation run under the GHDL simulator on a PC. A good regression test.

I suspect ZOG in PASM has the same endianness issue which is a shame because swapping bytes around will slow down instruction fetch.

Anyway now I can get back to making ZOG work better.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-05-08 21:29

Nice progress!

I've finished building a shipment of boards for a client, and I am currently building a second PropCade so I can test the RS485 and IR functions (so I can order proto's of the new rev of the PCB). After that, I will resume work on VMCOG.

I think it may be enough to modify the endian-ness of word/long loads/stores in the VM. Then pointers to bytes/words in a long will be correct again. I could be wrong, I have been breathing solder fumes all day.

heater said...
ZPU in C version 0.10 is attached to the first post.

ZPU and therefore ZOG has been stuck on an obvious but well hidden bug for a while.

Turns out that the ZPU byte codes were loaded into the LONG wide memory array in the wrong endianness and that the readByte and writeByte functions were also "backwards". These two features almost cancelled out such that ZPU/ZOG appeared to work and run quite a lot of C code. But eventually things start to fail in strange ways and eventually I cotton on to the problem.

Also fixed the ADDSP instruction. It failed when the offset into the stack is zero because now that we have optimized things the top of stack value is not actually on the stack after most ops.

Also added some #idefines in zpu.h such that all the ops that can be done with the ZPU EMULATE operation can be switched to do so. In that mode single stepping will produce a register dump that can be compared with ZyLins ZPU VHDL implementation run under the GHDL simulator on a PC. A good regression test.

I suspect ZOG in PASM has the same endianness issue which is a shame because swapping bytes around will slow down instruction fetch.

Anyway now I can get back to making ZOG work better.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-05-08 21:56

Actually what I have done in the C version is:

1) Reverse the order of bytes in each LONG as the byte codes are read into the ZPU memory from the test.bin file
2) Arrange that readMemoryByte() and writeMemoryByte() access the bytes within a LONG memory location starting from the other end of the LONG. If you see what I mean.

This way when an op accesses constant or initialized variables in the "code" space as LONGs it gets them in the right order.

I will do the same for readMemoryWord() and readMemoryWord(). Turns out that printf() still fails because it uses those for some reason.

I have not checked yet but I imagine in PASM using rd/wrbyte/word from HUB has the same endianness issue as on an Intel PC.

Now we could swap the endianness in VMCOG as you say. Is there a performance penalty for one endianness over another?

As it stands, running from HUB, I think what I have done is the way to go as it allows all rd/wrlong activity to run at full speed for the price of slightly slower instruction fetch. Of course with some #ifdefs we could have it either way.

We are not out of the woods yet. I can only get printf("Hello world"); to work if I use the EMULATE op for LOADH, STOREH and NEQ. The first two of those is because I have not fixed the endianness yet but NEQ seems a bit of a mystery.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-05-08 23:48

heater said...
Actually what I have done in the C version is:

1) Reverse the order of bytes in each LONG as the byte codes are read into the ZPU memory from the test.bin file
2) Arrange that readMemoryByte() and writeMemoryByte() access the bytes within a LONG memory location starting from the other end of the LONG. If you see what I mean.

This way when an op accesses constant or initialized variables in the "code" space as LONGs it gets them in the right order.

I will do the same for readMemoryWord() and readMemoryWord(). Turns out that printf() still fails because it uses those for some reason.

I have not checked yet but I imagine in PASM using rd/wrbyte/word from HUB has the same endianness issue as on an Intel PC.

Now we could swap the endianness in VMCOG as you say. Is there a performance penalty for one endianness over another?

As it stands, running from HUB, I think what I have done is the way to go as it allows all rd/wrlong activity to run at full speed for the price of slightly slower instruction fetch. Of course with some #ifdefs we could have it either way.

We are not out of the woods yet. I can only get printf("Hello world"); to work if I use the EMULATE op for LOADH, STOREH and NEQ. The first two of those is because I have not fixed the endianness yet but NEQ seems a bit of a mystery.

I am not sure why you have to reverse the order of the bytes in a long for the byte codes read in. Will have to chew on that more.

I suspect that it would be faster to just reverse the bytes during readword, readlong, writeword, writelong as they occur far less frequently than instruction byte fetches, and as such theoretically should incurr a smaller penalty.

The only speed difference is re-packing the longs/words.

Yep, NEQ having a bug seems weird...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-05-09 01:13

Bill: "I am not sure why you have to reverse the order of the bytes in a long for the byte codes read in. Will have to chew on that more."

If a RDLONG of a constant embedded in the code is backwards there are only two possible fixes:
1) Reorder the bytes in the LONG after the RDLONG is performed (or read it bytewise in the right order).
2) Reorder all the bytes in all the LONGs in the code in memory first.

If 1) then you are done.
If 2) then you have to change byte accesses to match otherwise at least instruction fetch will be in the wrong order. And also change word accesses.

Bill: "I suspect that it would be faster to just reverse the bytes during readword, readlong, writeword, writelong as they occur far less frequently than instruction byte fetches, and as such theoretically should incurr a smaller penalty."

I'm not with you here. Do you mean reverse in the VMCOG? If so then I might agree.

As for the "far less frequently" part, I'm not convinced yet. In the unoptimized version every instruction does at least a pop and a push, some have two pops and a push or two pops and write memory etc etc. That's a lot more LONG accesses than instruction fetches.

In the optimized version we have a lot less LONG accesses going on but is it really less than opcode fetches? I'm going to start counting.

Perhaps the solution that is optimal for VMCOG is not optimal for working from HUB.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-05-09 09:07

ZPU in version 0.11 now in first post.

This one works !!! (As far as I can tell[noparse]:)[/noparse]

Fixed NEQ and NEG opcodes. They were reversed, a documentation error on the Zylin web site.
Optimized ADDSP a little.
Added a helloworld test program exercising iprintf.

Re: Endianness. I have just realized a neat trick. To reverse the endianness of BYTE accesses it is only necessary to XOR the address with $03. Similarly reversing the endianness of WORD accesses just XOR the address with $10.

So reversing the endianness of BYTE and WORD is only a single instruction, quicker than the alternative of reversing the bytes in a LONG. At least when working form HUB.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

pullmoll · 2010-05-09 09:35

heater said...
Similarly reversing the endianness of WORD accesses just XOR the address with $10.

%10 or $2 methinks

There are 10 kinds of people: one kind understands binary and one doesn't.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pullmoll's Propeller Projects

heater · 2010-05-09 09:39

Err yeah. I must be one of the the other kinds[noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S

Comments