We have some optimizations in mind for Zog. Bascally the ZPU architecture is stack based with no processor registers (except PC and SP), we can optimize away a lot of redundant PUSH/POPS. But as FIBO is very stack intensive those optimizations may not have much effect.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Here is an implementation in C of the ZPU byte code interpreter. It builds and runs on a PC. It successfully runs the same test binary that I used for Zog so far.
Why?
1)
Well Bill Henning suggested ways to optimize away many of the huge number of stack ops (PUSH/POP) that ZPU uses. Remember that the ZPU architecture is stack based with no registers and therefore quite slow. When implementing the ZPU straight from the documentation, as in Zog currently, one ends up with a lot of redundant PUSH/POPS.
Bills suggestions, while sound, were giving me a real headache to even think about! One cannot just optimize one ZPU op at a time in PASM and quickly test the results. It all has to be done in one go. It was giving me such a bad headache I realized it's better to write the thing in C and test it on a PC. Then use the resulting working C version as the "spec" for a PASM rewrite of Zog.
So, Bill, could you look this over and check that I have understood your suggestions correctly? I have introduced a top of stack "register" (TOS) such that a lot of ops go straight to that rather than doing PUSH/POP on the stack in ZPU memory space. I have tried to ensure that TOS is flushed/loaded to/from the real top of stack in memory in all places that require it. All in all I'm still worried there maybe some instruction sequences where this falls down. I have not used any "PUSH pending" flag as I originally thought was required. In fact I have not used any new flag at all. There is the case where it is required to flush TOS to the real stack when doing a PUSH on an empty stack. I have handled this by checking that SP = 0 and not doing the flush if so.
The C code is written as simply as possible so as to make for an easy transition to PASM.
2)
Anyone crazy enough can run this on the Prop using Catalina or ICC. One day Zog will also be able to run it, then we will have Zog running ZPU in C running ZPU in C....
3)
I have no idea how the ZPU enthusiasts have implemented it in Verilog or VHDL for FPGAs but there is the possibility that this approach would speed up Zog hardware implementations as well.
4)
This will be used on certain other multi-core embedded controller chip that also needs external RAM for large programs and a byte code interpreter to run the code from it. No names mentioned here for fear of being flogged.
Next up is the rewrite of Zog in line with this version, once Bill has OKed it. It might take a while to find time to fit this in.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I can't wait to run large Zog binaries on single-channel SPI RAM on PropCade, four-channel SPI on my other boards, and XMM on Morpheus.
I'll have fun benchmarking the effect of different memory interfaces on VMCOG, and later, comparing it to a native XMM interface on Morpheus.
Now where did I leave my copy of the old Byte benchmarks...
heater said...
Here is an implementation in C of the ZPU byte code interpreter. It builds and runs on a PC. It successfully runs the same test binary that I used for Zog so far.
Why?
1)
Well Bill Henning suggested ways to optimize away many of the huge number of stack ops (PUSH/POP) that ZPU uses. Remember that the ZPU architecture is stack based with no registers and therefore quite slow. When implementing the ZPU straight from the documentation, as in Zog currently, one ends up with a lot of redundant PUSH/POPS.
Bills suggestions, while sound, were giving me a real headache to even think about! One cannot just optimize one ZPU op at a time in PASM and quickly test the results. It all has to be done in one go. It was giving me such a bad headache I realized it's better to write the thing in C and test it on a PC. Then use the resulting working C version as the "spec" for a PASM rewrite of Zog.
So, Bill, could you look this over and check that I have understood your suggestions correctly? I have introduced a top of stack "register" (TOS) such that a lot of ops go straight to that rather than doing PUSH/POP on the stack in ZPU memory space. I have tried to ensure that TOS is flushed/loaded to/from the real top of stack in memory in all places that require it. All in all I'm still worried there maybe some instruction sequences where this falls down. I have not used any "PUSH pending" flag as I originally thought was required. In fact I have not used any new flag at all. There is the case where it is required to flush TOS to the real stack when doing a PUSH on an empty stack. I have handled this by checking that SP = 0 and not doing the flush if so.
The C code is written as simply as possible so as to make for an easy transition to PASM.
2)
Anyone crazy enough can run this on the Prop using Catalina or ICC. One day Zog will also be able to run it, then we will have Zog running ZPU in C running ZPU in C....
3)
I have no idea how the ZPU enthusiasts have implemented it in Verilog or VHDL for FPGAs but there is the possibility that this approach would speed up Zog hardware implementations as well.
4)
This will be used on certain other multi-core embedded controller chip that also needs external RAM for large programs and a byte code interpreter to run the code from it. No names mentioned here for fear of being flogged.
Next up is the rewrite of Zog in line with this version, once Bill has OKed it. It might take a while to find time to fit this in.
I have not had any coffee yet, and I've only taken a quick glance through the code, so I may be wrong... but:
if (sp != 0)
memoryWriteLong(sp, tos); // Flush tos in case it is read back below
else
printf("ADDSP : Stack empty\n");
I don't think all these stack empty check / flush are needed. I will let you know later.
I can also see some slight optimizations for the C code.
Basically, you often use a local variable 'a' for what I tend to use a second cog variable 'nos' for.
I notice in zpu_memory.c that all accesses are aligned... nice!
Is there a nice test program that I can run that checks the ZOG instruction set for correctness? Then I could hack the C code, and send you the results.
Bill: One of the first instructions executed from the binaries built with the ZPU C compiler is naturally an IM. My logic goes like this:
1) IM want's to push a value onto the stack.
2) The stack pointer prior to the IM is zero.
3) Generally before we let IM (or anyone) update the TOS variable we have to flush that to the real top of stack in memory.
4) We can't let that happen because it would overwrite address 0. Which happens to be important.
This could be prevented by the use of a flag or by starting with the SP not equal to zero. The former is extra house keeping, the latter is contary to the ZPU spec.
As it happens when starting up code that "stack empty" condition occurs tree times, all from IM. We could probably remove the SP=0 check in other ops though.
You should not worry about the local "a" vars. I will convert them into global "nos" or whatever. Just have to be careful that signed and unsigned ints are used in the right places otherwise comparisons will fail and sign extending by shifting up and down will fail. We can do that with casts at the right points anyway.
The ZPU architecture does not support unaligned memory access of words and longs and will ignore the lower address bits. Fits very nicely on the Propeller with RD/WRWORD etc. Strangely enough the test code I have run, compiled from C, does generate unaligned accesses. I have seen it in the startup sequence in Zog and now ZPU in C. Looks like the tool chain has a little bug somewhere.
I have not found any program that checks ZPU instruction correctness. The only test I have run is the test.bin included.
What worries me about this optimized version is that individual instructions may appear to work but given they don't actually post their results to the real stack but to the TOS register it is possible that mismanagement of TOS can cause some instruction sequences to fail.
Soon I will need a version VM for the TriBlade. I have promised myself never to put memory hardware access code directly into Zog.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I've added: nos, and in-line push and pop functions - that makes the code much easier to read.
At the moment, I have converted it upto and including ZPU_MULT16X16, after I drop my wife off at her work, I will convert the rest and upload it.
As you guessed, I initialized the stack pointer to $FFFC - most of the code seems to grow the stack down, but in one place it looked like it was growing up...
I have had to use casts to deal with the signed variants of less than, and less than or equal.
I was VERY happy to see I don't need to support unaligned accesses in VMCOG yet [noparse]:)[/noparse]
Do NOT worry about "posting the results to the real stack" at all!
TOS IS part of the real stack, the stack management code in each instruction makes sure of that. The only weirdness with this approach has to do with non-commutative operators such as -,/,<,<= but that is easily handled, as you will see in the changes I made.
More after I come back...
heater said...
Bill: One of the first instructions executed from the binaries built with the ZPU C compiler is naturally an IM. My logic goes like this:
1) IM want's to push a value onto the stack.
2) The stack pointer prior to the IM is zero.
3) Generally before we let IM (or anyone) update the TOS variable we have to flush that to the real top of stack in memory.
4) We can't let that happen because it would overwrite address 0. Which happens to be important.
This could be prevented by the use of a flag or by starting with the SP not equal to zero. The former is extra house keeping, the latter is contary to the ZPU spec.
As it happens when starting up code that "stack empty" condition occurs tree times, all from IM. We could probably remove the SP=0 check in other ops though.
You should not worry about the local "a" vars. I will convert them into global "nos" or whatever. Just have to be careful that signed and unsigned ints are used in the right places otherwise comparisons will fail and sign extending by shifting up and down will fail. We can do that with casts at the right points anyway.
The ZPU architecture does not support unaligned memory access of words and longs and will ignore the lower address bits. Fits very nicely on the Propeller with RD/WRWORD etc. Strangely enough the test code I have run, compiled from C, does generate unaligned accesses. I have seen it in the startup sequence in Zog and now ZPU in C. Looks like the tool chain has a little bug somewhere.
I have not found any program that checks ZPU instruction correctness. The only test I have run is the test.bin included.
What worries me about this optimized version is that individual instructions may appear to work but given they don't actually post their results to the real stack but to the TOS register it is possible that mismanagement of TOS can cause some instruction sequences to fail.
Soon I will need a version VM for the TriBlade. I have promised myself never to put memory hardware access code directly into Zog.
I purposely left out inline functions, makes the code easier to read but I was wanting to see exactly what is going on under my nose at all times.
Not sure I'm keen on initializing the stack pointer to anything other than zero as defined by the ZPU spec. But I guess we can live with it, who will know[noparse]:)[/noparse]
Stack growing up!!! What? Where?
Casts are cool for the comparisons that was on my TODO list.
TOS is part of the real stack yes from ZPUs point of view and if it's coded correctly. But, for example, if you halt execution immediately after an arithmetic operation and look at the stack in memory the result is not going to be there.
What's up with non-commutative operators? Thought I had them sorted OK. I guess you have something quicker up your sleeve.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Sorry, I like code clarity [noparse]:)[/noparse] besides, just have the two in-lines printed on one page [noparse]:)[/noparse]
I think SP should be size_of_vm, which will initially be $10000 for the 64KB version, with $20000 to soon follow for the 128KB version
We don't really need to waste the 4 bytes with initializing it to memsize-4.
I will re-check for the stack growing up in one place, that may have been my mistake as I did not have coffee yet at this time... I have coffee now.
As long as you cannot halt execution in the middle of an operation, and as long as each opcode is implemented correctly, TOS+stack will always be correct.
You should not halt execution in the middle of a ZOG op.
most ops are commutative, a few are not - this allows an optimization that you will see in the code, in the assembly version.
heater said...
Ouch!
I purposely left out inline functions, makes the code easier to read but I was wanting to see exactly what is going on under my nose at all times.
Not sure I'm keen on initializing the stack pointer to anything other than zero as defined by the ZPU spec. But I guess we can live with it, who will know[noparse]:)[/noparse]
Stack growing up!!! What? Where?
Casts are cool for the comparisons that was on my TODO list.
TOS is part of the real stack yes from ZPUs point of view and if it's coded correctly. But, for example, if you halt execution immediately after an arithmetic operation and look at the stack in memory the result is not going to be there.
What's up with non-commutative operators? Thought I had them sorted OK. I guess you have something quicker up your sleeve.
I think there may be a bug in your ZPU_LOADSP - I am suspicious of the XOR'ing of bit 5.
It looks like it is supposed to push one of the sixteen longs above SP or long at SP or one of 15 longs following SP - that would require sign extending the 5 bit value obtained from the instruction... this is why I am suspicious of the XOR. Can you check the spec?
else if ((instruction & 0xE0) == ZPU_LOADSP) //BH: not 100% sure this is correct
{
uint32_t addr;
addr = (instruction & 0x1F) ^ 0x10;
addr = sp + 4 * addr;
if (sp != 0)
memoryWriteLong(sp, tos); // Flush tos in case it is read back below
else
printf("LOADSP : Stack empty\n");
tos = memoryReadLong(addr);
sp = sp - 4;
}
Re: LOADSP -- Don't worry, it really is like that. The offset field in the instruction is NOT signed and for whatever reason bit 5 is upside down. Seriously.
I'll have a play with that ASAP.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
As far as ZPU_POPSP, I guess I misunderstood how it should work.
Your original version for POPSP was:
case ZPU_POPSP:
{
sp = tos;
tos = memoryReadLong(sp); // Ensure tos is actually from current top of stack
break;
}
This looks like what you meant to do is pop a new SP off the current SP, then grab a new TOS.
If that is the case, then the "new" sp should be decremented, as its top element is now int the TOS register - unless the new SP also needs to be on the stack.
Can you check it against the "original" Java based emulator?
"Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads."
From the Java simulator:
public void setSp(int sp) throws CPUException
{
if ((sp%4)!=0)
{
throw new IllegalInstructionException();
}
if (sp<minStack)
{
minStack=sp;
}
this.sp = sp;
}
public void changeSp(int sp) throws CPUException
{
setSp(sp);
tracer.setSp(sp);
}
.
.
.
case POPSP:
changeSp(popIntStack());
intSp=0; // flush internal stack
break;
So the pop decrements the SP but then SP is set to the value popped so the decrement becomes redundant.
Never did figure out what the internal stack was all about, somehow gets used in their syscall implementation I think.
P.S. That LOADSP thing about inverting bit 5 is quirky and weird and undocumented. According to the ZPU creator it is there as it happened to save a few gates in some FPPGA implementation some time.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Quickly hacking my old LSHIFTRIGHT back in there, as below, gets it running all the way through my test.bin [noparse]:)[/noparse]
ZOG v0.7 plus Bill Henning mods
CONFIG indicates CPU type is 2
Hi
Zog say hello
0000000e
argc = 00000001
Address of argv[noparse][[/noparse]0] = 780c0000
_hardware = 00000000
_cpu_config = 00000002
Read LONG not aligned: addr=00000002
ZPU_ID = 0000000e
_use_syscall = 00000001
Please type 4 characters...
ffffYou typed...
ffff
fibo(00000017) = 00006ff1
fibo(00000018) = 0000b520
fibo(00000019) = 00012511
fibo(0000001a) = 0001da31
Bye
Breakpoint
PC=000009e8 SP=ffffffbc OP=00 DM=00 debug=00000000
case ZPU_LSHIFTRIGHT:
{
uint32_t a;
a = tos;
sp = sp + 4;
a = a & 0x3f;
tos = memoryReadLong(sp);
tos = tos >> a;
//Bill's code nos = pop() & 0x3f;
// tos = nos >> tos;
break;
}
Presumably the other shifts are bust also.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
POPSP should take a value off the stack, POP, and that value should end up in SP.
Therefore the increment done by POP is redundant.
No other increment need be considered no matter how you keep your stack, with nos and tos or not. The final value of SP must be whatever was on top of the stack. No matter were we keep that.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I've just been studying a diff of our two versions. Lots of little changes but unless I've missed a point they are essentially the same but with inline push/pop functions and using tos/nos instead of local vars spread around the ops.
So I'm even more convince the SP+=4 is wrong.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Comments
FIBO BENCHMARK:
Spin an Java on 80Mhz Prop
Spin fibo(23) takes about 2.4 seconds
Java fibo(23) takes between 3 and 4 seconds
Zog on 104MHz TriBlade. FIBO in C executing from HUB RAM is something like this:
FIBO(23) 4 Seconds
FIBO(24) 7 Seconds
FIBO(25) 11 Seconds
FIBO(26) 17 Seconds
Or all of them in 41 seconds. Measure by eye against my PC's clock, so give or take a second.
fibo(00000017) = 00006ff1
fibo(00000018) = 0000b520
fibo(00000019) = 00012511
fibo(0000001a) = 0001da31
Sorry Zog only knows how to print hex for now.
Seems Zog is not doing as badly as I expected.
We have some optimizations in mind for Zog. Bascally the ZPU architecture is stack based with no processor registers (except PC and SP), we can optimize away a lot of redundant PUSH/POPS. But as FIBO is very stack intensive those optimizations may not have much effect.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Never mind the speed but many would like to see Spin executed from external RAM.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Why?
1)
Well Bill Henning suggested ways to optimize away many of the huge number of stack ops (PUSH/POP) that ZPU uses. Remember that the ZPU architecture is stack based with no registers and therefore quite slow. When implementing the ZPU straight from the documentation, as in Zog currently, one ends up with a lot of redundant PUSH/POPS.
Bills suggestions, while sound, were giving me a real headache to even think about! One cannot just optimize one ZPU op at a time in PASM and quickly test the results. It all has to be done in one go. It was giving me such a bad headache I realized it's better to write the thing in C and test it on a PC. Then use the resulting working C version as the "spec" for a PASM rewrite of Zog.
So, Bill, could you look this over and check that I have understood your suggestions correctly? I have introduced a top of stack "register" (TOS) such that a lot of ops go straight to that rather than doing PUSH/POP on the stack in ZPU memory space. I have tried to ensure that TOS is flushed/loaded to/from the real top of stack in memory in all places that require it. All in all I'm still worried there maybe some instruction sequences where this falls down. I have not used any "PUSH pending" flag as I originally thought was required. In fact I have not used any new flag at all. There is the case where it is required to flush TOS to the real stack when doing a PUSH on an empty stack. I have handled this by checking that SP = 0 and not doing the flush if so.
The C code is written as simply as possible so as to make for an easy transition to PASM.
2)
Anyone crazy enough can run this on the Prop using Catalina or ICC. One day Zog will also be able to run it, then we will have Zog running ZPU in C running ZPU in C....
3)
I have no idea how the ZPU enthusiasts have implemented it in Verilog or VHDL for FPGAs but there is the possibility that this approach would speed up Zog hardware implementations as well.
4)
This will be used on certain other multi-core embedded controller chip that also needs external RAM for large programs and a byte code interpreter to run the code from it. No names mentioned here for fear of being flogged.
Next up is the rewrite of Zog in line with this version, once Bill has OKed it. It might take a while to find time to fit this in.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
You are a complete lunatic
If I get time this weekend I will try it out.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
You are a complete lunatic
If I get time this weekend I will try it out.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
You are in good company. Many people have said so over the years.
If you can spot any logical errors in that C version I would be grateful. It's going to be much harder to spot them in the PASM version.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I will gladly check it out!
I can't wait to run large Zog binaries on single-channel SPI RAM on PropCade, four-channel SPI on my other boards, and XMM on Morpheus.
I'll have fun benchmarking the effect of different memory interfaces on VMCOG, and later, comparing it to a native XMM interface on Morpheus.
Now where did I leave my copy of the old Byte benchmarks...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I have not had any coffee yet, and I've only taken a quick glance through the code, so I may be wrong... but:
I don't think all these stack empty check / flush are needed. I will let you know later.
I can also see some slight optimizations for the C code.
Basically, you often use a local variable 'a' for what I tend to use a second cog variable 'nos' for.
I notice in zpu_memory.c that all accesses are aligned... nice!
Is there a nice test program that I can run that checks the ZOG instruction set for correctness? Then I could hack the C code, and send you the results.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
1) IM want's to push a value onto the stack.
2) The stack pointer prior to the IM is zero.
3) Generally before we let IM (or anyone) update the TOS variable we have to flush that to the real top of stack in memory.
4) We can't let that happen because it would overwrite address 0. Which happens to be important.
This could be prevented by the use of a flag or by starting with the SP not equal to zero. The former is extra house keeping, the latter is contary to the ZPU spec.
As it happens when starting up code that "stack empty" condition occurs tree times, all from IM. We could probably remove the SP=0 check in other ops though.
You should not worry about the local "a" vars. I will convert them into global "nos" or whatever. Just have to be careful that signed and unsigned ints are used in the right places otherwise comparisons will fail and sign extending by shifting up and down will fail. We can do that with casts at the right points anyway.
The ZPU architecture does not support unaligned memory access of words and longs and will ignore the lower address bits. Fits very nicely on the Propeller with RD/WRWORD etc. Strangely enough the test code I have run, compiled from C, does generate unaligned accesses. I have seen it in the startup sequence in Zog and now ZPU in C. Looks like the tool chain has a little bug somewhere.
I have not found any program that checks ZPU instruction correctness. The only test I have run is the test.bin included.
What worries me about this optimized version is that individual instructions may appear to work but given they don't actually post their results to the real stack but to the TOS register it is possible that mismanagement of TOS can cause some instruction sequences to fail.
Soon I will need a version VM for the TriBlade. I have promised myself never to put memory hardware access code directly into Zog.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Notice that IM hits the bottom of stack 3 times during start up.
Also notice the unaligned access warning.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I've taken an axe to zpu.c
I've added: nos, and in-line push and pop functions - that makes the code much easier to read.
At the moment, I have converted it upto and including ZPU_MULT16X16, after I drop my wife off at her work, I will convert the rest and upload it.
As you guessed, I initialized the stack pointer to $FFFC - most of the code seems to grow the stack down, but in one place it looked like it was growing up...
I have had to use casts to deal with the signed variants of less than, and less than or equal.
I was VERY happy to see I don't need to support unaligned accesses in VMCOG yet [noparse]:)[/noparse]
Do NOT worry about "posting the results to the real stack" at all!
TOS IS part of the real stack, the stack management code in each instruction makes sure of that. The only weirdness with this approach has to do with non-commutative operators such as -,/,<,<= but that is easily handled, as you will see in the changes I made.
More after I come back...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I purposely left out inline functions, makes the code easier to read but I was wanting to see exactly what is going on under my nose at all times.
Not sure I'm keen on initializing the stack pointer to anything other than zero as defined by the ZPU spec. But I guess we can live with it, who will know[noparse]:)[/noparse]
Stack growing up!!! What? Where?
Casts are cool for the comparisons that was on my TODO list.
TOS is part of the real stack yes from ZPUs point of view and if it's coded correctly. But, for example, if you halt execution immediately after an arithmetic operation and look at the stack in memory the result is not going to be there.
What's up with non-commutative operators? Thought I had them sorted OK. I guess you have something quicker up your sleeve.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I think SP should be size_of_vm, which will initially be $10000 for the 64KB version, with $20000 to soon follow for the 128KB version
We don't really need to waste the 4 bytes with initializing it to memsize-4.
I will re-check for the stack growing up in one place, that may have been my mistake as I did not have coffee yet at this time... I have coffee now.
As long as you cannot halt execution in the middle of an operation, and as long as each opcode is implemented correctly, TOS+stack will always be correct.
You should not halt execution in the middle of a ZOG op.
most ops are commutative, a few are not - this allows an optimization that you will see in the code, in the assembly version.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
It looks like it is supposed to push one of the sixteen longs above SP or long at SP or one of 15 longs following SP - that would require sign extending the 5 bit value obtained from the instruction... this is why I am suspicious of the XOR. Can you check the spec?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Note, I have not compiled or tested this - just re-wrote for clarity, with a bit of C level optimization.
I will keep an eye on this thread, but now I will go and work on VMCOG... so we can run at least 64KB ZOG programs ASAP
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I'll have a play with that ASAP.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I presume push should not be:
but:
Anyway after that it runs until POPSP. The code for which looks very wrong:
Given that the pop messes up the sp you have just set.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
As far as ZPU_POPSP, I guess I misunderstood how it should work.
Your original version for POPSP was:
This looks like what you meant to do is pop a new SP off the current SP, then grab a new TOS.
If that is the case, then the "new" sp should be decremented, as its top element is now int the TOS register - unless the new SP also needs to be on the stack.
Can you check it against the "original" Java based emulator?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
"Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads."
From the Java simulator:
So the pop decrements the SP but then SP is set to the value popped so the decrement becomes redundant.
Never did figure out what the internal stack was all about, somehow gets used in their syscall implementation I think.
P.S. That LOADSP thing about inverting bit 5 is quirky and weird and undocumented. According to the ZPU creator it is there as it happened to save a few gates in some FPPGA implementation some time.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
This should work:
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
With the SP+=4 it fails after 22 instructions when hits a POPPC
Without, which is how I would expect it to look, it fails when hitting POPPC after 138 instructions.
In both cases preceding instruction is zpu_pushspadd.
PUSHSPADD in Java is:
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The last POPSP I sent needs the SP+=4 to adjust for keeping TOS in a register;
PUSHADDSP I made is also correct.
The bug is actually in ZPU_PUSHSP
here is the fix:
Sorry that I am not debugging ZOG myself, I am debugging VMCOG so you have a 64KB VM (and shortly after, a 128KB VM)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Sorry I can't be thinking about this a bit harder. I've got a hose full of guests for the weekend.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I'm dumping the PC, SP, TOS etc as it runs and comparing against the same for my version.
At that point it goes wrong on a LSHIFTRIGHT.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Presumably the other shifts are bust also.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 4/17/2010 7:05:54 PM GMT
You are correct, all the other shift's will have the same problem.
I am still scratching my head about that SP+=4, but hey, its working without it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I'll tidy up a bit and post this working version.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
POPSP should take a value off the stack, POP, and that value should end up in SP.
Therefore the increment done by POP is redundant.
No other increment need be considered no matter how you keep your stack, with nos and tos or not. The final value of SP must be whatever was on top of the stack. No matter were we keep that.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
So I'm even more convince the SP+=4 is wrong.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.