Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S

heater · 2010-03-18 06:03

Just having a quick drag race with Jazzed and Javelin Java. Thought I poke the results to this thread as well:

FIBO BENCHMARK:

Spin an Java on 80Mhz Prop

Spin fibo(23) takes about 2.4 seconds
Java fibo(23) takes between 3 and 4 seconds

Zog on 104MHz TriBlade. FIBO in C executing from HUB RAM is something like this:

FIBO(23) 4 Seconds
FIBO(24) 7 Seconds
FIBO(25) 11 Seconds
FIBO(26) 17 Seconds

Or all of them in 41 seconds. Measure by eye against my PC's clock, so give or take a second.

fibo(00000017) = 00006ff1
fibo(00000018) = 0000b520
fibo(00000019) = 00012511
fibo(0000001a) = 0001da31

Sorry Zog only knows how to print hex for now.

Seems Zog is not doing as badly as I expected.

We have some optimizations in mind for Zog. Bascally the ZPU architecture is stack based with no processor registers (except PC and SP), we can optimize away a lot of redundant PUSH/POPS. But as FIBO is very stack intensive those optimizations may not have much effect.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Cluso99 · 2010-03-18 06:39

Gee, spin faired the best. Very interesting. I really should recheck and release my faster spin interpreter and maybe get <2s

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

heater · 2010-03-18 07:18

Indeed you should.

Never mind the speed but many would like to see Spin executed from external RAM.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-04-16 08:30

Here is an implementation in C of the ZPU byte code interpreter. It builds and runs on a PC. It successfully runs the same test binary that I used for Zog so far.

Why?

1)
Well Bill Henning suggested ways to optimize away many of the huge number of stack ops (PUSH/POP) that ZPU uses. Remember that the ZPU architecture is stack based with no registers and therefore quite slow. When implementing the ZPU straight from the documentation, as in Zog currently, one ends up with a lot of redundant PUSH/POPS.

Bills suggestions, while sound, were giving me a real headache to even think about! One cannot just optimize one ZPU op at a time in PASM and quickly test the results. It all has to be done in one go. It was giving me such a bad headache I realized it's better to write the thing in C and test it on a PC. Then use the resulting working C version as the "spec" for a PASM rewrite of Zog.

So, Bill, could you look this over and check that I have understood your suggestions correctly? I have introduced a top of stack "register" (TOS) such that a lot of ops go straight to that rather than doing PUSH/POP on the stack in ZPU memory space. I have tried to ensure that TOS is flushed/loaded to/from the real top of stack in memory in all places that require it. All in all I'm still worried there maybe some instruction sequences where this falls down. I have not used any "PUSH pending" flag as I originally thought was required. In fact I have not used any new flag at all. There is the case where it is required to flush TOS to the real stack when doing a PUSH on an empty stack. I have handled this by checking that SP = 0 and not doing the flush if so.
The C code is written as simply as possible so as to make for an easy transition to PASM.

2)
Anyone crazy enough can run this on the Prop using Catalina or ICC. One day Zog will also be able to run it, then we will have Zog running ZPU in C running ZPU in C....

3)
I have no idea how the ZPU enthusiasts have implemented it in Verilog or VHDL for FPGAs but there is the possibility that this approach would speed up Zog hardware implementations as well.

4)
This will be used on certain other multi-core embedded controller chip that also needs external RAM for large programs and a byte code interpreter to run the code from it. No names mentioned here for fear of being flogged.

Next up is the rewrite of Zog in line with this version, once Bill has OKed it. It might take a while to find time to fit this in.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

RossH · 2010-04-16 10:00

@heater,

You are a complete lunatic

If I get time this weekend I will try it out.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

RossH · 2010-04-16 10:05

@heater,

You are a complete lunatic

If I get time this weekend I will try it out.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

heater · 2010-04-16 10:20

RossH: "You are a complete lunatic"

You are in good company. Many people have said so over the years.

If you can spot any logical errors in that C version I would be grateful. It's going to be much harder to spot them in the PASM version.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-16 13:22

Hi...

I will gladly check it out!

I can't wait to run large Zog binaries on single-channel SPI RAM on PropCade, four-channel SPI on my other boards, and XMM on Morpheus.

I'll have fun benchmarking the effect of different memory interfaces on VMCOG, and later, comparing it to a native XMM interface on Morpheus.

Now where did I leave my copy of the old Byte benchmarks...

heater said...
Here is an implementation in C of the ZPU byte code interpreter. It builds and runs on a PC. It successfully runs the same test binary that I used for Zog so far.

Why?

1)
Well Bill Henning suggested ways to optimize away many of the huge number of stack ops (PUSH/POP) that ZPU uses. Remember that the ZPU architecture is stack based with no registers and therefore quite slow. When implementing the ZPU straight from the documentation, as in Zog currently, one ends up with a lot of redundant PUSH/POPS.

Bills suggestions, while sound, were giving me a real headache to even think about! One cannot just optimize one ZPU op at a time in PASM and quickly test the results. It all has to be done in one go. It was giving me such a bad headache I realized it's better to write the thing in C and test it on a PC. Then use the resulting working C version as the "spec" for a PASM rewrite of Zog.

So, Bill, could you look this over and check that I have understood your suggestions correctly? I have introduced a top of stack "register" (TOS) such that a lot of ops go straight to that rather than doing PUSH/POP on the stack in ZPU memory space. I have tried to ensure that TOS is flushed/loaded to/from the real top of stack in memory in all places that require it. All in all I'm still worried there maybe some instruction sequences where this falls down. I have not used any "PUSH pending" flag as I originally thought was required. In fact I have not used any new flag at all. There is the case where it is required to flush TOS to the real stack when doing a PUSH on an empty stack. I have handled this by checking that SP = 0 and not doing the flush if so.
The C code is written as simply as possible so as to make for an easy transition to PASM.

2)
Anyone crazy enough can run this on the Prop using Catalina or ICC. One day Zog will also be able to run it, then we will have Zog running ZPU in C running ZPU in C....

3)
I have no idea how the ZPU enthusiasts have implemented it in Verilog or VHDL for FPGAs but there is the possibility that this approach would speed up Zog hardware implementations as well.

4)
This will be used on certain other multi-core embedded controller chip that also needs external RAM for large programs and a byte code interpreter to run the code from it. No names mentioned here for fear of being flogged.

Next up is the rewrite of Zog in line with this version, once Bill has OKed it. It might take a while to find time to fit this in.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

Bill Henning · 2010-04-16 13:37

UPDATE:

I have not had any coffee yet, and I've only taken a quick glance through the code, so I may be wrong... but:

        if (sp != 0)
            memoryWriteLong(sp, tos);        // Flush tos in case it is read back below
        else
            printf("ADDSP : Stack empty\n");

I don't think all these stack empty check / flush are needed. I will let you know later.

I can also see some slight optimizations for the C code.

Basically, you often use a local variable 'a' for what I tend to use a second cog variable 'nos' for.

I notice in zpu_memory.c that all accesses are aligned... nice!

Is there a nice test program that I can run that checks the ZOG instruction set for correctness? Then I could hack the C code, and send you the results.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-16 14:27

Bill: One of the first instructions executed from the binaries built with the ZPU C compiler is naturally an IM. My logic goes like this:

1) IM want's to push a value onto the stack.
2) The stack pointer prior to the IM is zero.
3) Generally before we let IM (or anyone) update the TOS variable we have to flush that to the real top of stack in memory.
4) We can't let that happen because it would overwrite address 0. Which happens to be important.

This could be prevented by the use of a flag or by starting with the SP not equal to zero. The former is extra house keeping, the latter is contary to the ZPU spec.

As it happens when starting up code that "stack empty" condition occurs tree times, all from IM. We could probably remove the SP=0 check in other ops though.

You should not worry about the local "a" vars. I will convert them into global "nos" or whatever. Just have to be careful that signed and unsigned ints are used in the right places otherwise comparisons will fail and sign extending by shifting up and down will fail. We can do that with casts at the right points anyway.

The ZPU architecture does not support unaligned memory access of words and longs and will ignore the lower address bits. Fits very nicely on the Propeller with RD/WRWORD etc. Strangely enough the test code I have run, compiled from C, does generate unaligned accesses. I have seen it in the startup sequence in Zog and now ZPU in C. Looks like the tool chain has a little bug somewhere.

I have not found any program that checks ZPU instruction correctness. The only test I have run is the test.bin included.

What worries me about this optimized version is that individual instructions may appear to work but given they don't actually post their results to the real stack but to the TOS register it is possible that mismanagement of TOS can cause some instruction sequences to fail.

Soon I will need a version VM for the TriBlade. I have promised myself never to put memory hardware access code directly into Zog.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-04-16 14:30

Here is the output of ZPU for C running my little test program:

ZOG v0.6
IM : Stack empty
CONFIG indicates CPU type is 2
IM : Stack empty
IM : Stack empty
Hi
Zog say hello
0000000e
argc = 00000001
Address of argv[noparse][[/noparse]0] = 380c0000
_hardware = 00000000
_cpu_config = 00000002
Read LONG not aligned: addr=00000002
ZPU_ID = 0000000e
_use_syscall = 00000001
Please type 4 characters...
4444You typed...
4444
fibo(00000017) = 00006ff1
fibo(00000018) = 0000b520
fibo(00000019) = 00012511
fibo(0000001a) = 0001da31
Bye
Breakpoint
PC=000009a8 SP=ffffffc0  OP=00 DM=00 debug=00000000

Notice that IM hits the bottom of stack 3 times during start up.
Also notice the unaligned access warning.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-16 14:35

Hi,

I've taken an axe to zpu.c

I've added: nos, and in-line push and pop functions - that makes the code much easier to read.

At the moment, I have converted it upto and including ZPU_MULT16X16, after I drop my wife off at her work, I will convert the rest and upload it.

As you guessed, I initialized the stack pointer to $FFFC - most of the code seems to grow the stack down, but in one place it looked like it was growing up...

I have had to use casts to deal with the signed variants of less than, and less than or equal.

I was VERY happy to see I don't need to support unaligned accesses in VMCOG yet [noparse]:)[/noparse]

Do NOT worry about "posting the results to the real stack" at all!

TOS IS part of the real stack, the stack management code in each instruction makes sure of that. The only weirdness with this approach has to do with non-commutative operators such as -,/,<,<= but that is easily handled, as you will see in the changes I made.

More after I come back...

heater said...
Bill: One of the first instructions executed from the binaries built with the ZPU C compiler is naturally an IM. My logic goes like this:

1) IM want's to push a value onto the stack.
2) The stack pointer prior to the IM is zero.
3) Generally before we let IM (or anyone) update the TOS variable we have to flush that to the real top of stack in memory.
4) We can't let that happen because it would overwrite address 0. Which happens to be important.

This could be prevented by the use of a flag or by starting with the SP not equal to zero. The former is extra house keeping, the latter is contary to the ZPU spec.

As it happens when starting up code that "stack empty" condition occurs tree times, all from IM. We could probably remove the SP=0 check in other ops though.

You should not worry about the local "a" vars. I will convert them into global "nos" or whatever. Just have to be careful that signed and unsigned ints are used in the right places otherwise comparisons will fail and sign extending by shifting up and down will fail. We can do that with casts at the right points anyway.

The ZPU architecture does not support unaligned memory access of words and longs and will ignore the lower address bits. Fits very nicely on the Propeller with RD/WRWORD etc. Strangely enough the test code I have run, compiled from C, does generate unaligned accesses. I have seen it in the startup sequence in Zog and now ZPU in C. Looks like the tool chain has a little bug somewhere.

I have not found any program that checks ZPU instruction correctness. The only test I have run is the test.bin included.

What worries me about this optimized version is that individual instructions may appear to work but given they don't actually post their results to the real stack but to the TOS register it is possible that mismanagement of TOS can cause some instruction sequences to fail.

Soon I will need a version VM for the TriBlade. I have promised myself never to put memory hardware access code directly into Zog.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-16 15:08

Ouch!

I purposely left out inline functions, makes the code easier to read but I was wanting to see exactly what is going on under my nose at all times.

Not sure I'm keen on initializing the stack pointer to anything other than zero as defined by the ZPU spec. But I guess we can live with it, who will know[noparse]:)[/noparse]

Stack growing up!!! What? Where?

Casts are cool for the comparisons that was on my TODO list.

TOS is part of the real stack yes from ZPUs point of view and if it's coded correctly. But, for example, if you halt execution immediately after an arithmetic operation and look at the stack in memory the result is not going to be there.

What's up with non-commutative operators? Thought I had them sorted OK. I guess you have something quicker up your sleeve.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-16 16:09

Sorry, I like code clarity [noparse]:)[/noparse] besides, just have the two in-lines printed on one page [noparse]:)[/noparse]

I think SP should be size_of_vm, which will initially be $10000 for the 64KB version, with $20000 to soon follow for the 128KB version

We don't really need to waste the 4 bytes with initializing it to memsize-4.

I will re-check for the stack growing up in one place, that may have been my mistake as I did not have coffee yet at this time... I have coffee now.

As long as you cannot halt execution in the middle of an operation, and as long as each opcode is implemented correctly, TOS+stack will always be correct.

You should not halt execution in the middle of a ZOG op.

most ops are commutative, a few are not - this allows an optimization that you will see in the code, in the assembly version.

heater said...
Ouch!

I purposely left out inline functions, makes the code easier to read but I was wanting to see exactly what is going on under my nose at all times.

Not sure I'm keen on initializing the stack pointer to anything other than zero as defined by the ZPU spec. But I guess we can live with it, who will know[noparse]:)[/noparse]

Stack growing up!!! What? Where?

Casts are cool for the comparisons that was on my TODO list.

TOS is part of the real stack yes from ZPUs point of view and if it's coded correctly. But, for example, if you halt execution immediately after an arithmetic operation and look at the stack in memory the result is not going to be there.

What's up with non-commutative operators? Thought I had them sorted OK. I guess you have something quicker up your sleeve.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

Bill Henning · 2010-04-16 16:17

I think there may be a bug in your ZPU_LOADSP - I am suspicious of the XOR'ing of bit 5.

It looks like it is supposed to push one of the sixteen longs above SP or long at SP or one of 15 longs following SP - that would require sign extending the 5 bit value obtained from the instruction... this is why I am suspicious of the XOR. Can you check the spec?

            else if ((instruction & 0xE0) == ZPU_LOADSP) //BH: not 100% sure this is correct
            {
                uint32_t addr;
                addr = (instruction & 0x1F) ^ 0x10; 
                addr = sp + 4 * addr;
        if (sp != 0)
            memoryWriteLong(sp, tos);        // Flush tos in case it is read back below
        else
            printf("LOADSP : Stack empty\n");
                tos = memoryReadLong(addr);
        sp = sp - 4;
            }

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

Bill Henning · 2010-04-16 16:45

Ok, here is the revised archive - I have only changed zpu.c

Note, I have not compiled or tested this - just re-wrote for clarity, with a bit of C level optimization.

I will keep an eye on this thread, but now I will go and work on VMCOG... so we can run at least 64KB ZOG programs ASAP

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-17 03:25

Re: LOADSP -- Don't worry, it really is like that. The offset field in the instruction is NOT signed and for whatever reason bit 5 is upside down. Seriously.

I'll have a play with that ASAP.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-04-17 15:33

It no work.

I presume push should not be:

inline void push(uint32_t data) {
    memoryWriteLong(data,tos);
    sp-=4;
}

but:

static inline void push(uint32_t data) {
    memoryWriteLong(sp, data);
    sp-=4;
}

Anyway after that it runs until POPSP. The code for which looks very wrong:

                   case ZPU_POPSP:
                    {
                        sp = tos;
                        tos = pop();
                        break;
                    }

Given that the pop messes up the sp you have just set.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-17 16:11

Thanks for finding that bug in push!

As far as ZPU_POPSP, I guess I misunderstood how it should work.

Your original version for POPSP was:

case ZPU_POPSP:
                    {
            sp = tos;
            tos = memoryReadLong(sp);    // Ensure tos is actually from current top of stack
                        break;
                    }

This looks like what you meant to do is pop a new SP off the current SP, then grab a new TOS.

If that is the case, then the "new" sp should be decremented, as its top element is now int the TOS register - unless the new SP also needs to be on the stack.

Can you check it against the "original" Java based emulator?

heater said...
It no work.

I presume push should not be:
inline void push(uint32_t data) {
    memoryWriteLong(data,tos);
    sp-=4;
}
but:
static inline void push(uint32_t data) {
    memoryWriteLong(sp, data);
    sp-=4;
}
Anyway after that it runs until POPSP. The code for which looks very wrong:
                   case ZPU_POPSP:
                    {
                        sp = tos;
                        tos = pop();
                        break;
                    }
Given that the pop messes up the sp you have just set.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-17 16:31

POPSP from the documentation:

"Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads."

From the Java simulator:

public void setSp(int sp) throws CPUException
{
         if ((sp%4)!=0)
        {
             throw new IllegalInstructionException();
        }
        
         if (sp<minStack)
        {
            minStack=sp;
        }
        this.sp = sp;
}

public void changeSp(int sp) throws CPUException 
{
    setSp(sp);
    tracer.setSp(sp);
}

.
.
.
                    case POPSP:
                         changeSp(popIntStack());
                         intSp=0;        // flush internal stack
                         break;

So the pop decrements the SP but then SP is set to the value popped so the decrement becomes redundant.

Never did figure out what the internal stack was all about, somehow gets used in their syscall implementation I think.

P.S. That LOADSP thing about inverting bit 5 is quirky and weird and undocumented. According to the ZPU creator it is there as it happened to save a few gates in some FPPGA implementation some time.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-17 16:45

Ok, thanks - that clears it up!

This should work:

                   case ZPU_POPSP:
                    {
                        sp = tos;
                        tos = memoryReadLong(sp);
                        sp+=4;
                        break;
                    }

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-17 17:25

FAIL[noparse]:([/noparse]

With the SP+=4 it fails after 22 instructions when hits a POPPC

Without, which is how I would expect it to look, it fails when hitting POPPC after 138 instructions.

In both cases preceding instruction is zpu_pushspadd.

PUSHSPADD in Java is:

                     case PUSHSPADD:
                        {
                            int a;
                            int b;
                            a=sp;
                            b=popIntStack()*4;
                            pushIntStack(a+b);
                        break;

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-04-17 17:53

I think i got it.

The last POPSP I sent needs the SP+=4 to adjust for keeping TOS in a register;

PUSHADDSP I made is also correct.

The bug is actually in ZPU_PUSHSP

here is the fix:

                    case ZPU_PUSHSP:
                    {
            push(tos);
            tos = sp;  // did not need the +4!! my bad, i did not need to adjust SP because of TOS being a register
                        break;
                    }

Sorry that I am not debugging ZOG myself, I am debugging VMCOG so you have a 64KB VM (and shortly after, a 128KB VM)

heater said...
FAIL[noparse]:([/noparse]

With the SP+=4 it fails after 22 instructions when hits a POPPC

Without, which is how I would expect it to look, it fails when hitting POPPC after 138 instructions.

In both cases preceding instruction is zpu_pushspadd.

PUSHSPADD in Java is:
                     case PUSHSPADD:
                        {
                            int a;
                            int b;
                            a=sp;
                            b=popIntStack()*4;
                            pushIntStack(a+b);
                        break;

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-17 18:11

Sadly not. Fails in the same place

Sorry I can't be thinking about this a bit harder. I've got a hose full of guests for the weekend.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-04-17 18:44

If I remove the sp+=4; from POPSP it runs on OK for 1600 instructions !!

I'm dumping the PC, SP, TOS etc as it runs and comparing against the same for my version.

At that point it goes wrong on a LSHIFTRIGHT.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-04-17 18:50

Quickly hacking my old LSHIFTRIGHT back in there, as below, gets it running all the way through my test.bin [noparse]:)[/noparse]

ZOG v0.7 plus Bill Henning mods
CONFIG indicates CPU type is 2
Hi
Zog say hello
0000000e
argc = 00000001
Address of argv[noparse][[/noparse]0] = 780c0000
_hardware = 00000000
_cpu_config = 00000002
Read LONG not aligned: addr=00000002
ZPU_ID = 0000000e
_use_syscall = 00000001
Please type 4 characters...
ffffYou typed...
ffff
fibo(00000017) = 00006ff1
fibo(00000018) = 0000b520
fibo(00000019) = 00012511
fibo(0000001a) = 0001da31
Bye
Breakpoint
PC=000009e8 SP=ffffffbc  OP=00 DM=00 debug=00000000

                    case ZPU_LSHIFTRIGHT:
            {
                       uint32_t a;
                        a = tos;
                        sp = sp + 4;
                        a = a & 0x3f;
                        tos = memoryReadLong(sp);
                        tos = tos >> a;

//Bill's code     nos = pop() & 0x3f;
//            tos = nos >> tos;
                        break;
                    }

Presumably the other shifts are bust also.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Post Edited (heater) : 4/17/2010 7:05:54 PM GMT

Bill Henning · 2010-04-17 19:11

Found prob with shift...

    case ZPU_LSHIFTRIGHT:
       {
             nos = pop();
             tos = nos >> (tos&0x3f);
             break;
        }

You are correct, all the other shift's will have the same problem.

I am still scratching my head about that SP+=4, but hey, its working without it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

heater · 2010-04-17 19:17

Yep, just spotted that myself, modified accordingly and test.bin works fine

I'll tidy up a bit and post this working version.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-04-17 19:22

Why a conceptual problem with the SP+=4 ?

POPSP should take a value off the stack, POP, and that value should end up in SP.
Therefore the increment done by POP is redundant.
No other increment need be considered no matter how you keep your stack, with nos and tos or not. The final value of SP must be whatever was on top of the stack. No matter were we keep that.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-04-17 19:31

I've just been studying a diff of our two versions. Lots of little changes but unless I've missed a point they are essentially the same but with inline push/pop functions and using tos/nos instead of local vars spread around the ops.

So I'm even more convince the SP+=4 is wrong.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Zog - A ZPU processor core for the Prop + GNU C, C++ and FORTRAN.Now replaces S

Comments