BACON Converts BASIC into C that can then be compiled with GCC and run under Zog.www.basic-converter.org/.
1M Byte BASIC programs here we come.
So this command:
$ bacon.bash -p -c zpu-elf-gcc fibo
directly translates the file fibo.bas into C and compiles it to run under Zog on the Prop.
You never have to look at the C code but this...
REM
REM Fibonacci numbers - 8/8/2009 - PvE.
REM Recursive function demonstration.
REM Revised November 2009.
REM
FUNCTION fibonacci(NUMBER n)
LOCAL x, y
IF n EQ 0 OR n EQ 1 THEN
RETURN n
ELSE
x = fibonacci(n - 1)
y = fibonacci(n - 2)
RETURN x + y
END IF
END FUNCTION
SPLIT ARGUMENT$ BY " " TO arg$ SIZE dim
IF dim < 2 THEN
PRINT "Usage: fibonacci <value>"
END
END IF
PRINT fibonacci(VAL(arg$)) FORMAT "%ld\n"
...becomes this...
/* Created with BaCon 1.0.8 - (c) Peter van Eerten - GPL v3 */
#include "fibo.bas.h"
int
main (int argc, const char **argv)
{
/* Default is: system traps signals */
signal (SIGILL, __b2c__catch_signal);
signal (SIGABRT, __b2c__catch_signal);
signal (SIGFPE, __b2c__catch_signal);
signal (SIGSEGV, __b2c__catch_signal);
/* Make sure internal string buffers are empty */
for (__b2c__sbuffer_ptr = 0; __b2c__sbuffer_ptr < 32; __b2c__sbuffer_ptr++)
__b2c__sbuffer[noparse][[/noparse]__b2c__sbuffer_ptr] = NULL;
__b2c__sbuffer_ptr = 0;
for (__b2c__stackptr = 0; __b2c__stackptr < 8; __b2c__stackptr++)
__b2c__stringstack[noparse][[/noparse]__b2c__stackptr] = NULL;
__b2c__stackptr = 0;
for (__b2c__rbuffer_ptr = 0; __b2c__rbuffer_ptr < 8; __b2c__rbuffer_ptr++)
__b2c__rbuffer[noparse][[/noparse]__b2c__rbuffer_ptr] = NULL;
__b2c__rbuffer_ptr = 0;
/* Setup the reserved variable 'ARGUMENT' */
for (__b2c__counter = 0; __b2c__counter < argc; __b2c__counter++)
{
__b2c__arglen += strlen (argv[noparse][[/noparse]__b2c__counter]) + 1;
}
__b2c__arglen++;
ARGUMENT$ = (char *) malloc (__b2c__arglen * sizeof (char));
strcpy (ARGUMENT$, "");
for (__b2c__counter = 0; __b2c__counter < argc; __b2c__counter++)
{
strcat (ARGUMENT$, argv[noparse][[/noparse]__b2c__counter]);
if (__b2c__counter != argc - 1)
strcat (ARGUMENT$, " ");
}
/* Rest of the program */
if (arg$ != NULL)
{
for (__b2c__ctr = 0; __b2c__ctr <= __b2c__split__arg$; __b2c__ctr++)
if (arg$[noparse][[/noparse]__b2c__ctr] != NULL)
{
free (arg$[noparse][[/noparse]__b2c__ctr]);
}
free (arg$);
arg$ = NULL;
__b2c__split__arg$ = 0;
}
if (ARGUMENT$ != NULL && " " != NULL && strlen (" ") > 0)
{
__b2c__split_tmp = strdup (ARGUMENT$);
__b2c__split = strtok (__b2c__split_tmp, " ");
if (__b2c__split != NULL)
{
arg$ =
(char **) realloc (arg$,
(__b2c__split__arg$ + 1) * sizeof (char *));
arg$[noparse][[/noparse]__b2c__split__arg$++] = strdup (__b2c__split);
while ((__b2c__split = strtok (NULL, " ")) != NULL)
{
arg$ =
(char **) realloc (arg$,
(__b2c__split__arg$ +
1) * sizeof (char *));
arg$[noparse][[/noparse]__b2c__split__arg$++] = strdup (__b2c__split);
}}
free (__b2c__split_tmp);
}
dim = __b2c__split__arg$ - 0;
if (__b2c__split__arg$ > 0)
__b2c__split__arg$--;
if (dim < 2)
{
fprintf (stdout, "%s", "Usage: fibonacci <value>");
fprintf (stdout, "\n");
fflush (stdout);
exit (EXIT_SUCCESS);
}
fprintf (stdout, "%ld\n", fibonacci (VAL (arg$)));
fflush (stdout);
__B2C__PROGRAM__EXIT:
return 0;
}
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Actually all those commands already work in the shell.
And there is about a dozen more [noparse]:)[/noparse]
Note: they work for my custom flash file system, which I intend to port to SD cards. It uses a LOT less hub ram than FAT.
heater said...
Phew, that's all right then.
Largos sounds totally audacious, cat cd du df ls rm rmdir mv mkdir mkfs pwd touch, can't wait. So if Zog gets us there quicker that's excellent. Many things will be very happy on Prop II, I'm not very good at waiting though.
Have a look at that example conversion I posted above. There is no way anyone new to C is going to learn anything from that. Except that is very easy to write C in such a way that no one can understand it. That machine generated C is not intended to ever be read by a user.
I only just discovered the world of BASIC to C translation recently so I have no idea but BCX looks quite polished. No good for Zog though as it generates C code targeted at Windows.
It's quite possible to use BaCon on a Windows machine by making use of CygWin www.cygwin.com/ which is a "...Linux-like environment for Windows".
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I agree with heater - I tried these BASIC to C translaters (Bacon, BCX, B2C) because I thought they might be useful with Catalina - but they they are universally abyssmal.
They tend to work ony with a very small and platform dependent subset of BASIC, and the C they generate is also non-standard and platform dependent. It is also hideously ugly and very inefficient!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
This is a fascinating little subtopic. Even though that is a machine translation, it is the sort of translation I would have come up with as well. That is probably because I don't understand C very well, so I would tend to use a small subset of instructions and force them to do things that probably can be done a lot more efficiently in other ways.
I've never really liked Mbasic with multiple statements on one line and GOTO and subroutines labelled by number instead of name. Sbasic is much more like C (and Pascal) in that there is only one statement per line, you never use GOTO and procedures and functions are named so you can start to build up libraries of functions that are easy to read. But - I think I am the only person in the world who uses SBasic!
C on the other hand has all these advantages, but simple variants of C seem to have hardly changed in the last thirty years or more, and there are millions of people using C.
Having said that, I use Basic mainly because of string manipulation, and I like strings because a lot of code then ends up working in "plain english" rather than obscure manipulations of single element arrays. Mike Green once kindly posted a Spin translation of some arbitrary string code and it certainly cleared a lot of things up, but it still seems that Spin struggles a bit with strings. I think C has libraries for string manipulation, and I think some of those libraries are being used in that code above.
Maybe the code I'm writing is a bit unusual, but I'm working with strings of text, searching for sub strings, chopping them up, putting them back together in other ways, saving them to disk (sd card) and sending and receiving them via serial ports. Here is a tiny bit of pseudo basic code that encapsulates some of this. I wonder what it would look like in C (both machine translated, and hand coded)?
var string1=string ' set up variable space. Mbasic reserves dynamic space. SBasic reserves 80 bytes in a zero terminated array
var string2=string ' some variants of basic use DIM rather than VAR to declare variables
var string3=string
var c=integer
var i=integer
string1="Hello"
string2="World"
string3=string1+string2 ' join two strings
string3=mid(string3,6,5) ' get 'World'
open "MYFILE.TXT" for output as #1 ' open a file for output
print #1,string3 ' save 'World'
close #1 ' close the file
for i=1 to len(string3) ' send characters to a serial port
c=asc(mid(string3,i,1)) ' get the ascii value of the next character in 'World'
out port1,c ' send to a serial port (syntax on this line would be highly variable between languages and computers)
next i ' repeat until the end of the string
There are many alternatives - one possible line by line translation is given below. I have not written a replacement for "out" as it is platform dependent - I just used 'putchar' to write it to the terminal:
#include <string.h>
#include <stdio.h>
main() {
char string1[noparse][[/noparse] 80 ]; // var string1=string
char string2[noparse][[/noparse] 80 ]; // var string2=string
char string3[noparse][[/noparse] 80 ]; // var string3=string
int c; // var c=integer
int i; // var i=integer
FILE *f;
strcpy(string1,"Hello"); // string1="Hello"
strcpy(string2,"World"); // string2="World"
strcpy(string3,string1); strcat(string3,string2); // string3=string1+string2
strncpy(string3, &string3[noparse][[/noparse] 5 ],5); string3[noparse][[/noparse] 5 ]='\0'; // string3=mid(string3,6,5)
f = fopen("MYFILE.TXT", "w"); // open "MYFILE.TXT" for output as #1
fprintf(f,"%s", string3); // print #1,string3
fclose(f); // close #1
for (i = 0; i < strlen(string3); i++) { // for i=1 to len(string3)
c = string3[noparse][[/noparse] i ]; // c=asc(mid(string3,i,1))
putchar(c); //simulate output to serial port // out port1,c
} // next i
}
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Wow, that looks really nifty. Almost all of it makes sense! I'm just a little fuzzy on the FILE *f and also the mid() function. I'm getting close to becoming a C convert here...
The FILE *f declares a variable (called f) that is a pointer to a file control block (the * indicates a pointer). Then f=fopen("filename", "mode") opens the file and returns a pointer to a file control block for the file, and fprintf(f, "format", <values>) writes values to the file pointed to by f (using the specified format string).
It would be easy to write a mid type function in C:
I really should pull my finger out and leave some of my Z80 with basic fixations behind. Unfortunately My first brush with C was a 4KB demo restricted on a 8052 varient. I did the first "Hello World" thing and found that the print library used up most of those 4KB. Then I found MCSBasic would run on it, with a bit of xram, and I regressed.
I won't be able to on the STM8S-Discovery that has just turned up, C is the only thing offered ( I even managed to get it for free through work) It is 16KB restricted and only runs about 12MIPS, single cored, so I don't think I am posting a threat to the Prop.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
The // convention for comments is not supported by all compilers. It was introduced by C99, but was one of the few useful things introduced in that release, so most C89 compilers adopted it as well and simply ignored the other stuff.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Dr_A: Be aware that the BDS C compiler works to a different C dialect to modern C compilers. So there are some little differences to think about when you move up to Catalina, ICC, GCC + Zog whatever. Yes C has changed in the last 30 years.
What immediately comes to mind is the way you define functions.
Nowadays you might write:
int my_func (int param1, char param2); /* Function declaration
int my_func (int param1, char param2) /* Function definition
{
...
...
}
In the original old style C you might write:
double my_func(); /* Obsolete function declaration */
double my_func(param1, param2) /* Obsolete function definition */
int param1;
char param2;
{
...
...
}
Note the lack of parameter types in the old style declaration and how the parameter types are written in the definition.
Actually you can compile the old style with modern compilers (not recommended) but of course not the other way around.
Also be aware that the size of an int is machine specific. On BDS C it is only 16 bits.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Well the new style should be easier to translate from sbasic
in c it is
int my_func (int param1, char param2); /* Function declaration
and in sbasic it is
function my_func(param1 = integer, param2=char) = integer
Ok, Zog can do this which is great. CP/M has the advantage of being able to run wordstar to edit a program then recompile it on the board. But many times there is only one program you ever use, so whether it is C in CP/M or C in Zog or C in Catalina should not matter. (except for speed, where C in CP/M is going to be the slowest by a big margin).
Maybe with all the enhancements coming out the external memory might not be needed for some applications!
But there is one thing I'm finding exceptionally useful with CP/M and that is to have all the keyboard/vga display/LCD display/serial ports all working behind the scenes in cogs. Especially with buffers, so the program can be doing what it wants and it can get a byte from the keyboard or the serial port when it wants to. Right now, it is hard to think of another way to do keyboard drivers/multiple serial drivers in one cog etc in any other way except raw PASM. Some only fit with a few bytes to spare, so I'm not sure if rewriting them in C would necessarily fit. Spin code is very useful for binding things together, so if I use PRINT in basic or printf() in C, then it is possible to route that byte to the VGA display and to the small LCD display and to serial port 1. Or, indeed to turn one or more of those off (especially the serial port, then use it for other things) and to even do this turning on/off in software.
I'm wondering how such things might be accomplised in Zog or catalina?
Part of the problem with basic/c translation is that some of the core language functionality is just plain different. For example...
#1 nearly all BASICs that live in environments with enough memory use Pascal style strings with a garbage collected common string area, which is poor for performance but great for preventing buffer overruns and allowing strings that contain zero bytes. C doesn't support strings at all, the so-called standard library that adds string support treats them as fixed dimension byte arrays with null termination, nobody guards against overrunning the dimensioned length, and the nonstandard libraries that work better aren't standard so you can't depend on them being available. Bridging this gap automatically just plain takes lots of code in C and won't always work; it is depressingly common especially when working with modern point of sale style printers to need null bytes in escape sequences to do simple things like resetting the font size to default, and it is just plain impossible to auto-translate totally working Basic string code that works to C code that works for this reason.
#2 nearly all BASICs that have enough resources use single precision floating point math by default, and a lot of them use single precision math even when you're working entirely with integer declared variables. C uses processor integer math by default, and as with the strings other types of math require bolt on libraries that are ugly and inconvenient to use. Again, this means that some very common tasks, such as representing a price in dollars and cents, require completely different high level strategies; in Basic you can let the float carry cents, but in C you have to work in integer cents and adjust your display. Getting one strategy to work automatically with the other set of libraries will result in ugly code at best.
So in real life it doesn't really matter that, for example, both languages support very similar loop, branch, and case structures; God is in the details, and those are what kill you. BASIC was designed to be safe and convenient, C to be fast and powerful. They are as different as a pickup truck and a Ferrari, and that's why converters tend to be more interesting as curiosities than useful.
DR_A: Interesting. So far BDS C is the only C compiler that actually runs on the Prop. I can't imagine ever getting GNU C for ZPU to actually run on on the Prop. and I don't see any other C compilers running on the Prop soon (note 1). So if the aim is to have a stand alone C development environment on the Prop. then ZiCog and CP/M it is.
Now for all the support environment in CP/M and in the ZiCog emulator plus all the drivers you have added to it, well, at the moment Zog does not have any of that. It's early days yet.
However as you might have noticed we are progressing that direction.
Firstly Zog will use external memory so there will be space free in the Prop for UART, video drivers. Just as in ZiCog. With the help of Bill Henning's Virtual Memory system it will be easier to use all the different external memory hardware solutions including some we have not seen before like serial SPI RAM. The latter will be slower but free up a lot of pins.
Secondly I'm going to implement I/O for Zog using so called sys calls in C. The sys calls get you out of the Zog emulation into Spin code. Much the same way as the I/O works in Zicog but a bit more sophisticated. This means it will be just as easy to redirect input and output of the console, for example, by tweaking things in Spin. BUT it will also mean that I/O can be redirected from UART to Video from within the C program itself.
Thirdly, looks like soon we will have Bills Largos operating system. Think CP/M on steroids.
When all that is in place, Zog and Largos and Virtual memory, you will be able to use it like a souped up CP/M system. Of course unlike CP/M it will not have hundreds of ready made programs to run like WordStar etc etc etc.
Note 1: I suspect one day RossH might get Catalina to run on the Prop. and be able to compile stuff there.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Correct. What's stopping me is that Catalina uses Homespun to assemble the PASM it generates, so while Catalina itself can run on the Prop, Homespun can't - and without that there's not much point.
I look at Sphinx periodically, but the last time I looked (a while ago) the limitations were too severe. I'm contemplating writing my own PASM assembler.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
RossH said...
The // convention for comments is not supported by all compilers. It was introduced by C99, but was one of the few useful things introduced in that release, so most C89 compilers adopted it as well and simply ignored the other stuff.
I find short inline functions a useful and superior alternative to #define macros because of parameters
and scope. Some people use dynamic arrays which are harder to "digress" to C89 ... I don't use them.
I won't use a compiler that doesn't support // comments and will not do a job where they are prohibited
for any amount of money.
That conversion example "makes my eyes bleed" [noparse]:)[/noparse]
You're right - inline is probably the other generally useful C99 feature.
As for variable length arrays - I use them in Ada where they were properly integrated with the language. In C they just seem 'tacked on' as a fairly clumsy afterthought.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Routing standard output or standard error to the LCD is feasible in Catalina - when I get time I will write a HMI plugin that supports both - standard output will go to VGA and standard error will go to the LCD.
All I need is time [noparse]:)[/noparse]
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Zog version 0.5 posted to the first post in this thread.
v0.5 implements console I/O using the ZPU SYSCALL instruction. That is the standard C read() and write() will go though SYSCALL into a syscall handler in Spin. There us still a print() in C that goes through an emulated memory mapped UART.
Currently only stdin, stdout and stderr are implemented. Later syscall will be expanded to handle SD files as well. Then other devices are also possible.
Many other stubs are in place in Spin for other syscalls, open, close, seek etc.
The test.c program exercises the syscall read() and write() and generates a FIBO sequence.
Next up is integration of Bill Hennings Virtual Memory system.
Bill: You will be disturbed to see that the return values from syscall, well actually all function calls, are left in a pseudo register R1 which is actually address zero in memory. Rather than being left on the stack. Not sure why but that's what GCC does.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Bill, or anyone, I had a restless night and woke up with some ideas for optimizing the ZPU interpreter, can anyone point out the flaw in my plan?
Optimization 1: Last PUSH - first POP removal
My thinking goes like this:
1) Most ZPU ops have the pattern: a) Do some action, b) PUSH the result onto the stack.
2) Most ZPU ops have the pattern: a) POP something from the stack, b) Do some action.
Well, the simple idea is that if that is generally the case then the the final PUSH of instruction "n" need not be done and likewise the first POP of instruction "n+1" need not be done at all. Rather instruction "n" leaves it's result in a COG variable, "TOS" (or "data" as it is now). Then the following instruction "n+1" just picks up the data from that COG variable.
Let's consider an example, in the following ZPU instructions for the C call show:
result = write(1, helloMsg, strlen(helloMsg));
620: 97 im 23 ;1 0
621: c4 im -60 ;2 0
622: 51 storesp 4 ;1 0
623: 86 im 6 ;1 0
624: 86 im 6 ;2 0
625: 3f callpcrel ;1 0
626: 80 im 0 ;1 0
627: 08 load ;1 0
628: 53 storesp 12 ;1 0
629: 97 im 23 ;1 0
62a: c4 im -60 ;2 0
62b: 52 storesp 8 ;1 0
62c: 81 im 1 ;1 0
62d: 51 storesp 4 ;1 0
62e: 87 im 7 ;1 0
62f: be im 62 ;2 0
630: 3f callpcrel ;1 0
An "im" instruction pushes a small value on the stack, a following "im" pops the value, extends it with another small value and the pushes it back again. So all those pairs of "im" are normally 3 stack operations that could be reduced to one. But wait, if the instruction following "im", say "callpcrel" uses the top of stack then we can skip all stack operations in "im" and skip the POP in "callpcrel". Result: No stack operations at all!
In that code example I have put in comments the stack operations of each instruction with and without this optimization. Total goes from 21 stack ops to none!.
Well, that's extreme and often that last PUSH of an instruction should actually be done as the next instruction may not immediately POP the data. What to do?
I'm thinking this:
1) The last PUSH of all instructions removed and replaced by setting a flag that means "PUSH pending".
2) All instructions that would first POP some data only do so if "PUSH pending" is false (Then they clear the flag) otherwise they use the data in COG.
3) All instructions that don't do POP would perform the PUSH if "PUSH pending" is set and then clear the "PUSH pending" flag.
The COGs CARRY flag van be used as a "PUSH pending" flag between ZPU instructions steps. We can set or clear carry with a single instruction and use conditional execution on the POP calls so this is very efficient.
Bill: I'm sorry if this is a long winded way of saying what you already said in your optimization posts. I have not gone back to check but it feels different because we are completely removing PUSH/POP pairs rather than just using the top of stack value from COG with out a POP.
Are my three rules above sufficient to make this work?
Optimization 2: Low memory CACHE
I have discovered that the ZPU / GCC combination uses four "pseudo" registers R0, R1, R2, R3. These are not real CPU registers they are just the first four LONGS of memory.
the pseudo register R0 is used to return values from functions. In the above code you see:
callpcrel
im 0
load
where the "im" is loading the address of R0 (0000) and the "load" reads from R0 and pushes it to stack. This is the return value from the strlen() call.
So, a possible optimization is to always keep the first 4 LONGs of memory in COG.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Clusso: Currently the ZPU stack is in HUB, as is all the ZPU code and other data, the whole ZPU memory space.
Being able to skip PUSH/POP pairs might save about 40% of the executed PASM code when emulating!
For example, the "im" instrution emulation takes 24 PASM instructions (excluding the fetch/dispatch loop). 12 of those instructions are PUSH and POP code.
But we are aiming at using a variety of external memory solutions, both parallel and serial devices, via Bill's Virtual Memory system, VMCog. In that case stack ops become the primary time consumer and optimizing them away is essential. Even if there is some PASM overhead in managing the optimization dynamically, on an instruction by instruction basis, as I suggest.
I had thought about keeping a mini stack, say 16 LONGs, in COG or HUB that spills over to real memory but that seems a tad complicated just now.
I just have to convince myself that the 3 rules I put down for managing the PUSH/POPs and pending flag are sufficient in all cases. For example "loadsp" reads a LONG from some offset on the stack then pushes it. The SP had better be in the right place when it reads that LONG.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
No, I am not disturbed. I do find it a bit strange, but if that's what gcc wants, that's what gcc gets [noparse]:)[/noparse]
heater said...
Zog version 0.5 posted to the first post in this thread.
v0.5 implements console I/O using the ZPU SYSCALL instruction. That is the standard C read() and write() will go though SYSCALL into a syscall handler in Spin. There us still a print() in C that goes through an emulated memory mapped UART.
Currently only stdin, stdout and stderr are implemented. Later syscall will be expanded to handle SD files as well. Then other devices are also possible.
Many other stubs are in place in Spin for other syscalls, open, close, seek etc.
The test.c program exercises the syscall read() and write() and generates a FIBO sequence.
Next up is integration of Bill Hennings Virtual Memory system.
Bill: You will be disturbed to see that the return values from syscall, well actually all function calls, are left in a pseudo register R1 which is actually address zero in memory. Rather than being left on the stack. Not sure why but that's what GCC does.
Instead of a "PUSH pending", a "LASTOP_WAS_IM" flag that is set at the end of every IM instruction, and cleared by any non-IM instruction, combined with my TOS in a register scheme as detailed in my (long) message is the fastest yet.
Consider:
- only IM would need to check "LASTOP_WAS_IM" flag
- if "LASTOP_WAS_IM", no stack ops for a following IM, it can operate on TOS directly
- only the first IM of a sequence of IM's will cause a PUSH
Now if callpcrel is ALWAYS followed by an IM, you can have callpcrel set TOS to 0, and set "LASTOP_WAS_IM" to avoid a pop inside it. You can only do this if the next op is an IM so make sure you see what code is generated for void functions that don't have a return code!
With my previously suggested optimization:
- single operand instructions that leave a result cause NO stack accesses
- two operand instructions that leave a result only cause a "POP" stack operation
- with a flag as descibed above (inspired by your "PUSH PENDING" flag) only the first IM of a sequence of IM's causes a stack operation
New optimizations:
im0load
im1load
im2load
im3load
im0store
im1store
im2store
im3store
Extended single-byte instructions, if you can get GCC/linker to use them...
heater said...
Bill, or anyone, I had a restless night and woke up with some ideas for optimizing the ZPU interpreter, can anyone point out the flaw in my plan?
Optimization 1: Last PUSH - first POP removal
My thinking goes like this:
1) Most ZPU ops have the pattern: a) Do some action, b) PUSH the result onto the stack.
2) Most ZPU ops have the pattern: a) POP something from the stack, b) Do some action.
Well, the simple idea is that if that is generally the case then the the final PUSH of instruction "n" need not be done and likewise the first POP of instruction "n+1" need not be done at all. Rather instruction "n" leaves it's result in a COG variable, "TOS" (or "data" as it is now). Then the following instruction "n+1" just picks up the data from that COG variable.
Let's consider an example, in the following ZPU instructions for the C call show:
result = write(1, helloMsg, strlen(helloMsg));
620: 97 im 23 ;1 0
621: c4 im -60 ;2 0
622: 51 storesp 4 ;1 0
623: 86 im 6 ;1 0
624: 86 im 6 ;2 0
625: 3f callpcrel ;1 0
626: 80 im 0 ;1 0
627: 08 load ;1 0
628: 53 storesp 12 ;1 0
629: 97 im 23 ;1 0
62a: c4 im -60 ;2 0
62b: 52 storesp 8 ;1 0
62c: 81 im 1 ;1 0
62d: 51 storesp 4 ;1 0
62e: 87 im 7 ;1 0
62f: be im 62 ;2 0
630: 3f callpcrel ;1 0
An "im" instruction pushes a small value on the stack, a following "im" pops the value, extends it with another small value and the pushes it back again. So all those pairs of "im" are normally 3 stack operations that could be reduced to one. But wait, if the instruction following "im", say "callpcrel" uses the top of stack then we can skip all stack operations in "im" and skip the POP in "callpcrel". Result: No stack operations at all!
In that code example I have put in comments the stack operations of each instruction with and without this optimization. Total goes from 21 stack ops to none!.
Well, that's extreme and often that last PUSH of an instruction should actually be done as the next instruction may not immediately POP the data. What to do?
I'm thinking this:
1) The last PUSH of all instructions removed and replaced by setting a flag that means "PUSH pending".
2) All instructions that would first POP some data only do so if "PUSH pending" is false (Then they clear the flag) otherwise they use the data in COG.
3) All instructions that don't do POP would perform the PUSH if "PUSH pending" is set and then clear the "PUSH pending" flag.
The COGs CARRY flag van be used as a "PUSH pending" flag between ZPU instructions steps. We can set or clear carry with a single instruction and use conditional execution on the POP calls so this is very efficient.
Bill: I'm sorry if this is a long winded way of saying what you already said in your optimization posts. I have not gone back to check but it feels different because we are completely removing PUSH/POP pairs rather than just using the top of stack value from COG with out a POP.
Are my three rules above sufficient to make this work?
Optimization 2: Low memory CACHE
I have discovered that the ZPU / GCC combination uses four "pseudo" registers R0, R1, R2, R3. These are not real CPU registers they are just the first four LONGS of memory.
the pseudo register R0 is used to return values from functions. In the above code you see:
callpcrel
im 0
load
where the "im" is loading the address of R0 (0000) and the "load" reads from R0 and pushes it to stack. This is the return value from the strlen() call.
So, a possible optimization is to always keep the first 4 LONGs of memory in COG.
I agree, optimizing stack access out is a HUGE win - which is why I made that long optimization post about the benefits of TOS [noparse]:)[/noparse]
I've played with mini-stacks, and it is simply not worth it. It actually ends up (at most) breaking even, due to the lack of indexed addressing!
Keeping the top of the stack (TOS) in a register is a HUGE win - I got the idea from the hardware implementation of some stack machines.
Having a NOS register that you pop the second operand of dual operand instructions like ADD, SUB, AND etc is also a big win, but remember to adjust for non-commutative instructions.
It is NOT worth it to maintain both TOS and NOS as registers all the time, the overhead is too big.
heater said...
Clusso: Currently the ZPU stack is in HUB, as is all the ZPU code and other data, the whole ZPU memory space.
Being able to skip PUSH/POP pairs might save about 40% of the executed PASM code when emulating!
For example, the "im" instrution emulation takes 24 PASM instructions (excluding the fetch/dispatch loop). 12 of those instructions are PUSH and POP code.
But we are aiming at using a variety of external memory solutions, both parallel and serial devices, via Bill's Virtual Memory system, VMCog. In that case stack ops become the primary time consumer and optimizing them away is essential. Even if there is some PASM overhead in managing the optimization dynamically, on an instruction by instruction basis, as I suggest.
I had thought about keeping a mini stack, say 16 LONGs, in COG or HUB that spills over to real memory but that seems a tad complicated just now.
I just have to convince myself that the 3 rules I put down for managing the PUSH/POPs and pending flag are sufficient in all cases. For example "loadsp" reads a LONG from some offset on the stack then pushes it. The SP had better be in the right place when it reads that LONG.
Bill: As it happens there is already a "LASTOP_WAS_IM" flag. Can't remember what it's called now. It's part of the ZPU architecture. It is required because when an IM follows an IM it shifts 7 more bits into the immediate value that is being built up on the stack top. When a non-IM instruction follows a sequence of IMs then that signals that the value is completed.
If you want to push two immediate values on to the stack you need two runs of IM's with something else in between, often a NOP.
mini-stacks and spilling is out. Confuses me to much.
So that optimization can be put in place easily I guess.
I worry when you say "...make sure you see what code is generated for ...". Any optimizations have to work for all possible code sequences. We know not how the compiler may change.
>> - single operand instructions that leave a result cause NO stack accesses
>> - two operand instructions that leave a result only cause a "POP" stack operation
Hmm... that's what I'm looking at with the PUSH_PENDING idea. I have to go back and see how you suggest to it with no flag.
>> - with a flag as descibed above (inspired by your "PUSH PENDING" flag) only the first IM of a sequence of IM's causes a stack operation.
Well there is my thing. I think that with a PUSH_PENDING flag it is quite possible for most sequences of IM's have ZERO stack operations. That is because immediates generally get pushed only to be used straight away by the next instruction. In which case, why PUSH the immediate at all. And why POP it again in the next op.
As you see from that sequence I posted the whole thing can be done with no stack ops.
Given that IM is the most prevalent instruction by a wide margin this could be a big win.
I counted 750 IM's in an assebler listing of 2000 instructions.
>> New optimizations:
>> im0load
>> im1load
No chance, I think. I'm not up to hacking GCC and it's best to stay compatible with the real ZPU.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
heater said...
As it happens there is already a "LASTOP_WAS_IM" flag. Can't remember what it's called now. It's part of the ZPU architecture. It is required because when an IM follows an IM it shifts 7 more bits into the immediate value that is being built up on the stack top. When a non-IM instruction follows a sequence of IMs then that signals that the value is completed.
Perfect. That is all that is required.
heater said...
If you want to push two immediate values on to the stack you need two runs of IM's with something else in between, often a NOP.
mini-stacks and spilling is out. Confuses me to much.
So that optimization can be put in place easily I guess.
The simplest and cheapest (in terms of longs used and execution time) is to implement it like I suggested, with the TOS kept in a register. I've built about half a dozen virtual machines on the prop that I never published, and that technique is by far the best optimization I've been able to come up with over years of tweaking.
Yes, there are a couple of boundary cases where setting flags etc would win in a small percentage of cases, but in the vast majority of cases this wins.
If there is room to spare, keeping both TOS and NOS can be a win, but it chews a lot of memory, and at the end of the day, it makes the same number of stack ops. The only way for it to be faster is to delay actual pushes for TWO stacking operations, but then when you don't have a binary op, you end up doing a lot of extra moves and hub access "catching up", AND you end up being slower in single op cases.
heater said...
I worry when you say "...make sure you see what code is generated for ...". Any optimizations have to work for all possible code sequences. We know not how the compiler may change.
That is why I don't like the delayed-push flag. Too expensive in cog memory terms to implement, as every single emulated instruction would have to incorporate "handling logic" for it. Overall, I believe it would be a loss.
heater said...
>> - single operand instructions that leave a result cause NO stack accesses
>> - two operand instructions that leave a result only cause a "POP" stack operation
Hmm... that's what I'm looking at with the PUSH_PENDING idea. I have to go back and see how you suggest to it with no flag.
With my method, there is absolutely no need for a flag, and my version is immune to compiler changes.
heater said...
>> - with a flag as descibed above (inspired by your "PUSH PENDING" flag) only the first IM of a sequence of IM's causes a stack operation.
Well there is my thing. I think that with a PUSH_PENDING flag it is quite possible for most sequences of IM's have ZERO stack operations. That is because immediates generally get pushed only to be used straight away by the next instruction. In which case, why PUSH the immediate at all. And why POP it again in the next op.
As you see from that sequence I posted the whole thing can be done with no stack ops.
Given that IM is the most prevalent instruction by a wide margin this could be a big win.
I counted 750 IM's in an assebler listing of 2000 instructions.
Given that there is an existing version of "LAST_OP_WAS_IM", only the first IM would cause a push, the rest would operate on the TOS (top of stack) register.
heater said...
>> New optimizations:
>> im0load
>> im1load
No chance, I think. I'm not up to hacking GCC and it's best to stay compatible with the real ZPU.
Ok, we will leave that for another day [noparse]:)[/noparse]
Comments
BACON Converts BASIC into C that can then be compiled with GCC and run under Zog.www.basic-converter.org/.
1M Byte BASIC programs here we come.
So this command:
$ bacon.bash -p -c zpu-elf-gcc fibo
directly translates the file fibo.bas into C and compiles it to run under Zog on the Prop.
You never have to look at the C code but this...
...becomes this...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
And there is about a dozen more [noparse]:)[/noparse]
Note: they work for my custom flash file system, which I intend to port to SD cards. It uses a LOT less hub ram than FAT.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
But 'BaCon is a free BASIC to C converter for Unix-based systems.'
I don't have a unix system. What do you suggest? BCX?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Have a look at that example conversion I posted above. There is no way anyone new to C is going to learn anything from that. Except that is very easy to write C in such a way that no one can understand it. That machine generated C is not intended to ever be read by a user.
I only just discovered the world of BASIC to C translation recently so I have no idea but BCX looks quite polished. No good for Zog though as it generates C code targeted at Windows.
It's quite possible to use BaCon on a Windows machine by making use of CygWin www.cygwin.com/ which is a "...Linux-like environment for Windows".
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I agree with heater - I tried these BASIC to C translaters (Bacon, BCX, B2C) because I thought they might be useful with Catalina - but they they are universally abyssmal.
They tend to work ony with a very small and platform dependent subset of BASIC, and the C they generate is also non-standard and platform dependent. It is also hideously ugly and very inefficient!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I've never really liked Mbasic with multiple statements on one line and GOTO and subroutines labelled by number instead of name. Sbasic is much more like C (and Pascal) in that there is only one statement per line, you never use GOTO and procedures and functions are named so you can start to build up libraries of functions that are easy to read. But - I think I am the only person in the world who uses SBasic!
C on the other hand has all these advantages, but simple variants of C seem to have hardly changed in the last thirty years or more, and there are millions of people using C.
Having said that, I use Basic mainly because of string manipulation, and I like strings because a lot of code then ends up working in "plain english" rather than obscure manipulations of single element arrays. Mike Green once kindly posted a Spin translation of some arbitrary string code and it certainly cleared a lot of things up, but it still seems that Spin struggles a bit with strings. I think C has libraries for string manipulation, and I think some of those libraries are being used in that code above.
Maybe the code I'm writing is a bit unusual, but I'm working with strings of text, searching for sub strings, chopping them up, putting them back together in other ways, saving them to disk (sd card) and sending and receiving them via serial ports. Here is a tiny bit of pseudo basic code that encapsulates some of this. I wonder what it would look like in C (both machine translated, and hand coded)?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
There are many alternatives - one possible line by line translation is given below. I have not written a replacement for "out" as it is platform dependent - I just used 'putchar' to write it to the terminal:
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
The FILE *f declares a variable (called f) that is a pointer to a file control block (the * indicates a pointer). Then f=fopen("filename", "mode") opens the file and returns a pointer to a file control block for the file, and fprintf(f, "format", <values>) writes values to the file pointed to by f (using the specified format string).
It would be easy to write a mid type function in C:
Note that this returns a copy of the string - which would have to be freed later using free(string).
Also note I have not tested it!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
I won't be able to on the STM8S-Discovery that has just turned up, C is the only thing offered ( I even managed to get it for free through work) It is 16KB restricted and only runs about 12MIPS, single cored, so I don't think I am posting a threat to the Prop.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
The // convention for comments is not supported by all compilers. It was introduced by C99, but was one of the few useful things introduced in that release, so most C89 compilers adopted it as well and simply ignored the other stuff.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
What immediately comes to mind is the way you define functions.
Nowadays you might write:
In the original old style C you might write:
Note the lack of parameter types in the old style declaration and how the parameter types are written in the definition.
Actually you can compile the old style with modern compilers (not recommended) but of course not the other way around.
Also be aware that the size of an int is machine specific. On BDS C it is only 16 bits.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
in c it is
and in sbasic it is
Ok, Zog can do this which is great. CP/M has the advantage of being able to run wordstar to edit a program then recompile it on the board. But many times there is only one program you ever use, so whether it is C in CP/M or C in Zog or C in Catalina should not matter. (except for speed, where C in CP/M is going to be the slowest by a big margin).
Maybe with all the enhancements coming out the external memory might not be needed for some applications!
But there is one thing I'm finding exceptionally useful with CP/M and that is to have all the keyboard/vga display/LCD display/serial ports all working behind the scenes in cogs. Especially with buffers, so the program can be doing what it wants and it can get a byte from the keyboard or the serial port when it wants to. Right now, it is hard to think of another way to do keyboard drivers/multiple serial drivers in one cog etc in any other way except raw PASM. Some only fit with a few bytes to spare, so I'm not sure if rewriting them in C would necessarily fit. Spin code is very useful for binding things together, so if I use PRINT in basic or printf() in C, then it is possible to route that byte to the VGA display and to the small LCD display and to serial port 1. Or, indeed to turn one or more of those off (especially the serial port, then use it for other things) and to even do this turning on/off in software.
I'm wondering how such things might be accomplised in Zog or catalina?
addit (and I ask this in a very small voice as I understand Zog is a 1000 foot tall alien www.marvunapp.com/Appendix3/zogjim.htm )
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
#1 nearly all BASICs that live in environments with enough memory use Pascal style strings with a garbage collected common string area, which is poor for performance but great for preventing buffer overruns and allowing strings that contain zero bytes. C doesn't support strings at all, the so-called standard library that adds string support treats them as fixed dimension byte arrays with null termination, nobody guards against overrunning the dimensioned length, and the nonstandard libraries that work better aren't standard so you can't depend on them being available. Bridging this gap automatically just plain takes lots of code in C and won't always work; it is depressingly common especially when working with modern point of sale style printers to need null bytes in escape sequences to do simple things like resetting the font size to default, and it is just plain impossible to auto-translate totally working Basic string code that works to C code that works for this reason.
#2 nearly all BASICs that have enough resources use single precision floating point math by default, and a lot of them use single precision math even when you're working entirely with integer declared variables. C uses processor integer math by default, and as with the strings other types of math require bolt on libraries that are ugly and inconvenient to use. Again, this means that some very common tasks, such as representing a price in dollars and cents, require completely different high level strategies; in Basic you can let the float carry cents, but in C you have to work in integer cents and adjust your display. Getting one strategy to work automatically with the other set of libraries will result in ugly code at best.
So in real life it doesn't really matter that, for example, both languages support very similar loop, branch, and case structures; God is in the details, and those are what kill you. BASIC was designed to be safe and convenient, C to be fast and powerful. They are as different as a pickup truck and a Ferrari, and that's why converters tend to be more interesting as curiosities than useful.
Now for all the support environment in CP/M and in the ZiCog emulator plus all the drivers you have added to it, well, at the moment Zog does not have any of that. It's early days yet.
However as you might have noticed we are progressing that direction.
Firstly Zog will use external memory so there will be space free in the Prop for UART, video drivers. Just as in ZiCog. With the help of Bill Henning's Virtual Memory system it will be easier to use all the different external memory hardware solutions including some we have not seen before like serial SPI RAM. The latter will be slower but free up a lot of pins.
Secondly I'm going to implement I/O for Zog using so called sys calls in C. The sys calls get you out of the Zog emulation into Spin code. Much the same way as the I/O works in Zicog but a bit more sophisticated. This means it will be just as easy to redirect input and output of the console, for example, by tweaking things in Spin. BUT it will also mean that I/O can be redirected from UART to Video from within the C program itself.
Thirdly, looks like soon we will have Bills Largos operating system. Think CP/M on steroids.
When all that is in place, Zog and Largos and Virtual memory, you will be able to use it like a souped up CP/M system. Of course unlike CP/M it will not have hundreds of ready made programs to run like WordStar etc etc etc.
Note 1: I suspect one day RossH might get Catalina to run on the Prop. and be able to compile stuff there.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Correct. What's stopping me is that Catalina uses Homespun to assemble the PASM it generates, so while Catalina itself can run on the Prop, Homespun can't - and without that there's not much point.
I look at Sphinx periodically, but the last time I looked (a while ago) the limitations were too severe. I'm contemplating writing my own PASM assembler.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
and scope. Some people use dynamic arrays which are harder to "digress" to C89 ... I don't use them.
I won't use a compiler that doesn't support // comments and will not do a job where they are prohibited
for any amount of money.
That conversion example "makes my eyes bleed" [noparse]:)[/noparse]
You're right - inline is probably the other generally useful C99 feature.
As for variable length arrays - I use them in Ada where they were properly integrated with the language. In C they just seem 'tacked on' as a fairly clumsy afterthought.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Routing standard output or standard error to the LCD is feasible in Catalina - when I get time I will write a HMI plugin that supports both - standard output will go to VGA and standard error will go to the LCD.
All I need is time [noparse]:)[/noparse]
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH
v0.5 implements console I/O using the ZPU SYSCALL instruction. That is the standard C read() and write() will go though SYSCALL into a syscall handler in Spin. There us still a print() in C that goes through an emulated memory mapped UART.
Currently only stdin, stdout and stderr are implemented. Later syscall will be expanded to handle SD files as well. Then other devices are also possible.
Many other stubs are in place in Spin for other syscalls, open, close, seek etc.
The test.c program exercises the syscall read() and write() and generates a FIBO sequence.
Next up is integration of Bill Hennings Virtual Memory system.
Bill: You will be disturbed to see that the return values from syscall, well actually all function calls, are left in a pseudo register R1 which is actually address zero in memory. Rather than being left on the stack. Not sure why but that's what GCC does.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Optimization 1: Last PUSH - first POP removal
My thinking goes like this:
1) Most ZPU ops have the pattern: a) Do some action, b) PUSH the result onto the stack.
2) Most ZPU ops have the pattern: a) POP something from the stack, b) Do some action.
Well, the simple idea is that if that is generally the case then the the final PUSH of instruction "n" need not be done and likewise the first POP of instruction "n+1" need not be done at all. Rather instruction "n" leaves it's result in a COG variable, "TOS" (or "data" as it is now). Then the following instruction "n+1" just picks up the data from that COG variable.
Let's consider an example, in the following ZPU instructions for the C call show:
An "im" instruction pushes a small value on the stack, a following "im" pops the value, extends it with another small value and the pushes it back again. So all those pairs of "im" are normally 3 stack operations that could be reduced to one. But wait, if the instruction following "im", say "callpcrel" uses the top of stack then we can skip all stack operations in "im" and skip the POP in "callpcrel". Result: No stack operations at all!
In that code example I have put in comments the stack operations of each instruction with and without this optimization. Total goes from 21 stack ops to none!.
Well, that's extreme and often that last PUSH of an instruction should actually be done as the next instruction may not immediately POP the data. What to do?
I'm thinking this:
1) The last PUSH of all instructions removed and replaced by setting a flag that means "PUSH pending".
2) All instructions that would first POP some data only do so if "PUSH pending" is false (Then they clear the flag) otherwise they use the data in COG.
3) All instructions that don't do POP would perform the PUSH if "PUSH pending" is set and then clear the "PUSH pending" flag.
The COGs CARRY flag van be used as a "PUSH pending" flag between ZPU instructions steps. We can set or clear carry with a single instruction and use conditional execution on the POP calls so this is very efficient.
Bill: I'm sorry if this is a long winded way of saying what you already said in your optimization posts. I have not gone back to check but it feels different because we are completely removing PUSH/POP pairs rather than just using the top of stack value from COG with out a POP.
Are my three rules above sufficient to make this work?
Optimization 2: Low memory CACHE
I have discovered that the ZPU / GCC combination uses four "pseudo" registers R0, R1, R2, R3. These are not real CPU registers they are just the first four LONGS of memory.
the pseudo register R0 is used to return values from functions. In the above code you see:
callpcrel
im 0
load
where the "im" is loading the address of R0 (0000) and the "load" reads from R0 and pushes it to stack. This is the return value from the strlen() call.
So, a possible optimization is to always keep the first 4 LONGs of memory in COG.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 3/2/2010 8:01:13 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Being able to skip PUSH/POP pairs might save about 40% of the executed PASM code when emulating!
For example, the "im" instrution emulation takes 24 PASM instructions (excluding the fetch/dispatch loop). 12 of those instructions are PUSH and POP code.
But we are aiming at using a variety of external memory solutions, both parallel and serial devices, via Bill's Virtual Memory system, VMCog. In that case stack ops become the primary time consumer and optimizing them away is essential. Even if there is some PASM overhead in managing the optimization dynamically, on an instruction by instruction basis, as I suggest.
I had thought about keeping a mini stack, say 16 LONGs, in COG or HUB that spills over to real memory but that seems a tad complicated just now.
I just have to convince myself that the 3 rules I put down for managing the PUSH/POPs and pending flag are sufficient in all cases. For example "loadsp" reads a LONG from some offset on the stack then pushes it. The SP had better be in the right place when it reads that LONG.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
No, I am not disturbed. I do find it a bit strange, but if that's what gcc wants, that's what gcc gets [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Re/ 1:
Instead of a "PUSH pending", a "LASTOP_WAS_IM" flag that is set at the end of every IM instruction, and cleared by any non-IM instruction, combined with my TOS in a register scheme as detailed in my (long) message is the fastest yet.
Consider:
- only IM would need to check "LASTOP_WAS_IM" flag
- if "LASTOP_WAS_IM", no stack ops for a following IM, it can operate on TOS directly
- only the first IM of a sequence of IM's will cause a PUSH
Now if callpcrel is ALWAYS followed by an IM, you can have callpcrel set TOS to 0, and set "LASTOP_WAS_IM" to avoid a pop inside it. You can only do this if the next op is an IM so make sure you see what code is generated for void functions that don't have a return code!
With my previously suggested optimization:
- single operand instructions that leave a result cause NO stack accesses
- two operand instructions that leave a result only cause a "POP" stack operation
- with a flag as descibed above (inspired by your "PUSH PENDING" flag) only the first IM of a sequence of IM's causes a stack operation
New optimizations:
im0load
im1load
im2load
im3load
im0store
im1store
im2store
im3store
Extended single-byte instructions, if you can get GCC/linker to use them...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
I've played with mini-stacks, and it is simply not worth it. It actually ends up (at most) breaking even, due to the lack of indexed addressing!
Keeping the top of the stack (TOS) in a register is a HUGE win - I got the idea from the hardware implementation of some stack machines.
Having a NOS register that you pop the second operand of dual operand instructions like ADD, SUB, AND etc is also a big win, but remember to adjust for non-commutative instructions.
It is NOT worth it to maintain both TOS and NOS as registers all the time, the overhead is too big.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
If you want to push two immediate values on to the stack you need two runs of IM's with something else in between, often a NOP.
mini-stacks and spilling is out. Confuses me to much.
So that optimization can be put in place easily I guess.
I worry when you say "...make sure you see what code is generated for ...". Any optimizations have to work for all possible code sequences. We know not how the compiler may change.
>> - single operand instructions that leave a result cause NO stack accesses
>> - two operand instructions that leave a result only cause a "POP" stack operation
Hmm... that's what I'm looking at with the PUSH_PENDING idea. I have to go back and see how you suggest to it with no flag.
>> - with a flag as descibed above (inspired by your "PUSH PENDING" flag) only the first IM of a sequence of IM's causes a stack operation.
Well there is my thing. I think that with a PUSH_PENDING flag it is quite possible for most sequences of IM's have ZERO stack operations. That is because immediates generally get pushed only to be used straight away by the next instruction. In which case, why PUSH the immediate at all. And why POP it again in the next op.
As you see from that sequence I posted the whole thing can be done with no stack ops.
Given that IM is the most prevalent instruction by a wide margin this could be a big win.
I counted 750 IM's in an assebler listing of 2000 instructions.
>> New optimizations:
>> im0load
>> im1load
No chance, I think. I'm not up to hacking GCC and it's best to stay compatible with the real ZPU.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Perfect. That is all that is required.
The simplest and cheapest (in terms of longs used and execution time) is to implement it like I suggested, with the TOS kept in a register. I've built about half a dozen virtual machines on the prop that I never published, and that technique is by far the best optimization I've been able to come up with over years of tweaking.
Yes, there are a couple of boundary cases where setting flags etc would win in a small percentage of cases, but in the vast majority of cases this wins.
If there is room to spare, keeping both TOS and NOS can be a win, but it chews a lot of memory, and at the end of the day, it makes the same number of stack ops. The only way for it to be faster is to delay actual pushes for TWO stacking operations, but then when you don't have a binary op, you end up doing a lot of extra moves and hub access "catching up", AND you end up being slower in single op cases.
That is why I don't like the delayed-push flag. Too expensive in cog memory terms to implement, as every single emulated instruction would have to incorporate "handling logic" for it. Overall, I believe it would be a loss.
With my method, there is absolutely no need for a flag, and my version is immune to compiler changes.
Given that there is an existing version of "LAST_OP_WAS_IM", only the first IM would cause a push, the rest would operate on the TOS (top of stack) register.
Ok, we will leave that for another day [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 3/2/2010 6:17:13 PM GMT