This has various little optimizations to the ZPU engine.
Dispatch table is now only 128 bytes in HUB
13% faster running FIBO from HUB.
BEWARE - Has an issue with TriBlade external memory, random incorrect
results running fibo or dhrystone or even crashes. Seems to depend
on the number of working pages in VMCOG.
For example fibo(22) is wrong here even if all the other results are correct!
Anyone out there able to run this on TriBlade external memory so we can verify if it's a real problem or just my hardware?
Anyone able to report if it works on Hydra or other platform?
I am pretty sure there is a bug or two still lurking in VMCOG - so I am not convinced the problem is the TriBlade code.
If I had to guess, I am probably re-using a variable and occasionally corrupting the data. My guess would be that the problem is somewhere in the BUSERR area.
Why do I think it may be a VMCOG bug?
1) GuardDog showed no stack corruption
2) Different behaviour (other than length of run) with different number of pages
After UPEW I will be taking a close look...
FYI, I will be bringing a PropCade proto with SPI ram's to UPEW.
Some random thoughts were going round in my mind last night about Big Spin.
Consider, for no language in particular:
dimension myarray[noparse][[/noparse]100000]
mycode...
That gets hard on the propeller. The propeller has 32k of memory.
It gets hard on the Z80 emulations too, as they are limited to 64k.
Maybe one needs to start with a processor that is not necessarily limited to a small amount of memory. Maybe a stack based processor? Hmm - that led me to.... Zog.
So, first question, say we have some external memory. How does C++ cope with that?
function1
call function10000
function2
...
function9999 // using up 100000 bytes of memory so far
function10000
some code
Any problems with a big jump like that?
Any problems with a huge array at the beginning, then some code?
Just brainstorming here, but I'm wondering that if C++ can handle jumps within a much larger memory space, maybe one could think about a Spin to C translator, as a path to Big Spin. Many spin commands do translate 1:1 to C.
But I suspect I'm getting ahead of myself here. Probably need to do this in baby steps. So - have you got the dracblade ram driver code we used in zicog, and would it be possible to add this to Zog?
Firstly in C there is this weird thing that the size of an integer is not specified in the language it is machine dependant. And should be "The most natural size of integer for the machine". So an integer declaration:
Not sure if other non-multiple of 8 bits have ever been used.
Under BDSC an int is 16 bits which one might say is not the natural size for an 8 bit Z80. But 8 bits would be pretty useless.
All this machine dependence carries over to C++ as well.
The size of a pointer, memory address, need not be the same as an int but must not be smaller I believe.
Turning to the ZPU. It is a 32 bit architecture. Both ints and pointers in C/C++ are therefore 32 bits. So potentially you can have up to 4GB of code and data, just like a 32bit PC. No problem.
Practically if you programs are getting any where near that big it's time to look for an Intel or ARM or other computer. The ZPU, despite bing 32 bit is designed to be very small when implemented in FPGA chips leaving more FPGA logic for the rest of whatever application. Speed was not the priority.
Turning to the Zog version of ZPU. Due to the fact that ZPU was designed to be "logically" small it is a perfect fit into a COG in PASM. That's why I just had to do it.
Zog can be built to run in HUB RAM or external RAM. Given that it's external RAM is making use of the VMCog interface it's external RAM space is limited by whatever VMCog provides. Currently that is 64K. Soon Bill will get that up to megabytes.
So as you see big jumps are not an issue for C/C++/ZPU/ZOG. I do believe someone has created a Spin to C translator that works to some level of usefulness. Can't remember the name now. Zog is not unique in this Catalina C is also 32 bit. Not sure if ICC for the Prop supports external memory
The DracBlade driver: I'm glad you mentioned that, I was about to approach you to see if you have a moment to add DracBlade support. I'm ashamed to say my DracBlade board is still naked. I've been very slow at collecting the parts. I could add your DracBlade driver to VMCog to get Zog working on DracBlade but then you would have to test it.
We should continue that discussion on Bill's VMCog thread...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Dr_A: One thing to be aware of is that Zog itself does not have any external memory access code in it like ZiCog for example. Instead it relies on Bill Henning's VMCog to provide all physical external memory interface and "virtual memory"
Zog does have #ifdefs to select using HUB RAM or VMCog for external RAM. That's it.
I have no plans to add physical memory drivers with lots of #ifdefs to Zog. Perhaps it would speed it up a bit but it's not worth making that "#ifdef soup" in there for me just now.
What's this about "Virtual memory"? Well Bill's VM Cog fetches 512 byte blocks as fast as possible from the external RAM. It keeps many blocks in HUB, a user defined number from 2 upwards. It delivers BYTES/WORDS/LONGS to Zog or other application. If the app requires access to data that is currently not in HUB then VM Cog will fetch the appropriate block from ext RAM. Automatically writing out other blocks if it needs the space in HUB.
The bottom line is that it speeds things up if you have slower RAM like serial SPI RAM or RAM that is better suited to fast block access like the Hydra. It should work well for DracBlade. It also means thngs like Zog don't have to worry about what board they are running on, just use VMCog. I plan to make a version of ZiCog that does uses VMCog.
So VMCog is where the DracBlade driver code should go. It's quite easy to drop in there, just routines for read ablock, write a block and maybe some initialization. There is plenty of space.
Have a look in the VMCog.spin that comes with Zog to see how it goes. Look for sections called BINIT, BSTART, BREAD, BWRITE and BDATA.
As for your offer of bits for the DracBlade that would be great. I guess the main obstacles are the RAM and SD card socket.
P.S. I don't use anything but BST for a long time now. What with having a Linux only household.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Like you say, the dracblade ought to be possible to drop in.
Re "Perhaps it would speed it up a bit but it's not worth making that "#ifdef soup" in there for me just now."
Hmm - looks like #ifdef soup is already happily replicating itself in the code *grin*;
PUB start : okay | n
stop 'Stop ZPU if it's running already
ser.start(31,30,0,baud) 'Start the debug Terminal
ser.str(string("ZOG v0.18"))
FDSVarAddr := ser.getVarBlock
#ifdef USE_VIRTUAL_MEMORY
vm.start(@mailbox, $7E00, 5) 'Start up VMCOG
ser.str(string(" (VM)"))
#endif
#ifdef USE_HUB_MEMORY
ser.str(string(" (HUB)"))
#endif
crlf
waitcnt(cnt+80_000_000) 'Give serial terminal window time
#ifdef USE_VIRTUAL_MEMORY
ser.str (string("Starting SD driver...",))
err := sd.startSD(@ioControl) 'Start the SD routines
ser.hex(err, 8)
crlf
ser.str (string("Mount SD...",))
err := \sd.mount_explicit(spiDO,spiClk,spiDI,spiCS) 'Mount the SD
ser.hex(err, 8)
crlf
#endif
just go with the flow. repeat until end "#ifdef soup is good for me..."
vmcog looks simple. But complex at the same time.
I'm up with reading and writing a byte;
PUB rdvbyte(adr)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]cmdptr] := (adr<<9)|READVMB
repeat while long[noparse][[/noparse]cmdptr]
return long[noparse][[/noparse]dataptr]
PUB rdvword(adr)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]cmdptr] := (adr<<9)|READVMW
repeat while long[noparse][[/noparse]cmdptr]
return long[noparse][[/noparse]dataptr]
PUB rdvlong(adr)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]cmdptr] := (adr<<9)|READVML
repeat while long[noparse][[/noparse]cmdptr]
return long[noparse][[/noparse]dataptr]
PUB wrvbyte(adr,dt)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]dataptr] := dt
long[noparse][[/noparse]cmdptr] := (adr<<9)|WRITEVMB
repeat while long[noparse][[/noparse]cmdptr]
PUB wrvword(adr,dt)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]dataptr] := dt
long[noparse][[/noparse]cmdptr] := (adr<<9)|WRITEVMW
repeat while long[noparse][[/noparse]cmdptr]
PUB wrvlong(adr,dt)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]dataptr] := dt
long[noparse][[/noparse]cmdptr] := (adr<<9)|WRITEVML
repeat while long[noparse][[/noparse]cmdptr]
PUB rdfbyte(adr)
repeat while 0
long[noparse][[/noparse]fcmdptr] := (adr<<9)|READVMB
repeat while 0
return 0
PUB wrfbyte(adr,dt)
repeat while 0
long[noparse][[/noparse]fdataptr] := dt
long[noparse][[/noparse]fcmdptr] := (adr<<9)|WRITEVMB
repeat while 0
PUB Flush
repeat while long[noparse][[/noparse]cmdptr]
word[noparse][[/noparse]cmdptr] := FLUSHVM
repeat while long[noparse][[/noparse]cmdptr]
return
PUB Look(adr)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]dataptr] := adr
long[noparse][[/noparse]cmdptr] := DUMPVM
repeat while long[noparse][[/noparse]cmdptr]
return
PUB GetPhysVirt(vaddr)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]cmdptr] := (vaddr<<9)|VIRTPHYS
repeat while long[noparse][[/noparse]cmdptr]
return long[noparse][[/noparse]dataptr]
PUB GetVirtLoadAddr(vaddr)|va
va:= vaddr&$7FFE00 ' 23 bit VM address - force start of page
wrvbyte(vaddr,rdvbyte(va)) ' force page into working set, set dirty bit
return GetPhysVirt(va) ' note returned pointer only valid until next vm call
PUB GetVirtPhys(adr)
' waitcnt(cnt+80_000_000)
return 0
PUB Lock(vaddr,pages) ' should be called after Flush, before any other access
return 0
PUB Unlock(vaddr,page)
return 0
but I'm afraid my eyes glaze over when considering the subtleties of "GetPhysVirt" vs "GetVirtLoadAddr" vs "GetVirtPhys"
Hmm - rather than try to understand this - how about I send you a freebie soldered up Dracblade? Maybe with some parts too so you can buid up one with the boards you have? Ok, ram, plus sd card socket. Do you have a vga socket? And the switching regs? May as well include a few bits. Can you PM me with your address again? [noparse]:)[/noparse]
Zog looks cool. I'll see what is out there for a Spin to C translator.
In general terms, I like the idea of starting with a flat 4 Gig memory space. Build a program with 32 bit integer adresses. Ok, in reality you might only have 512k, but it is not that hard to add more memory. Certainly with the Dracblade it would be possible to go to 24 bit using just more ram chips as there are three 8bit latches for addresses. 2k is very limiting, 32k is limiting but with a few meg you can code away for months and not worry about running out of space.
The arguments for Big C (a few megs of code) ought to be applicable to Big Spin.
Mind you, I'm even more intrigued about the possibilities of vmcog, once I get to understand it. It ought to be almost as fast as real memory, and it is going to be a much smaller solution using less prop pins and opens up the possibility of tiny surface mount boards with Big C and Big Spin and a cost that is not much more than the demo board.
Dr_A: Thats a fabulous offer and I will gladly take you up on it. Expect a PM shortly.
Re: "#ifdef soup is good for me...",
Yep it already looks a mess. However the plan is to:
a) Split the Zog PASM out into it's own Spin file. As done in ZiCog.
b) Throw away all that Spin code and replace it with a simple Zog loader.
Result being that after that there is no Spin in the Prop, Zog can run C that takes over the whole machine. Just like Spin does normally.
Then for debugging purposes the current Spin code may get replace with the C equivalent. Then one Zog Cog can be used for debugging another Zog and/or the C code it runs.
The little Zog in HUB will be able to start other little Zogs that use HUB RAM or a big Zog that uses ext RAM.
Of course the current style Spin + C will continue in case anyone wants to combine C and Spin objects.
Re: vmcog,
Actually I think it is complicated, Bill has done a great job here. Luckily one does not HAVE to use all it's features. You will notice that Zog only uses rdvbyte/long and wrvbyte/long.
Re: The rest,
A flat memory space is great. However Zog will get a bit weird in that respect. Already Zog in it's own HUB area or Zog running from external RAM have access to all of HUB memory by using addresses outside of 64K. This will change to 128K, 256K whatever when VMCog can handle it.
The reverse is also going to happen, Zog in HUB will be able to access ext RAM through VMCog. I'm just now trying to convert the VMCog Spin into C.
Hmm.. a 16 MByte DracBlade is on the cards then (24 bit) not sure if Bill can push VM Cog that far.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
There have been so many changes recently I'm not sure where to start:
1) zog.spin is now reduced to mostly just the interpreter PASM and has a nice PAR block to configure it on start up. The intention is to be able to extract that PASM binary blob and then be able use that to start other Zog interpreters on other Cogs from C running under Zog. So Zog can start Zog like Spin can start Spin.
2) There is a new main module, run_zog.spin, that can start a Zog interpreter running some C/C++ code in it's own space in HUB RAM. Alternatively run_zog can bootstrap Zog to take over the entire Propeller displacing all vestiges of Spin. Set the CUCKOO_MODE define to do this.
Basically run_zog then starts a Cog running a PASM bootstrap code that moves the ZPU executable to $0000 in HUB, moves the Zog dispatch table to the last 128 bytes in HUB, sets the SP just below that and then starts Zog. Boom, no more Spin.
3) Zog now runs C++ quite happily. There is a new lib directory containing C++ replacements of FullDuplexSerial and VMCog. These use the PASM from the original Spin versions compiled and extracted into binary blobs by BST. The blobs are the converted to object files and linked with the C
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Sorry my keyboard died during that last post. I had to just hit send.
Further to 3 above. There is a lot of magic going on in the Make file in the lib directory. That make file builds C++ versions of FullDulexSerial and VMCog and places them into a library libzog.a. It also builds the demo/test program test_libzog.bin.
You will see that the Make file compiles the original Spin code with BST extracting a binary blob of only the DAT section.
The blobs are converted to object files for GCC with objcopy. They are then linked against the compiled C++ modules to produce the library. A further compile/link builds the test program using the library.
Some how I have to streamline that process to make it more generally useful.
As run_zog no longer has any of that support code for debugging zog, single step and syscall handling, it is no longer possible to use printf or iprintf in C programs. This means all the other test programs in the test directory will not run. This will be fixed at some point by making printf and friends use FullDuplexSerial or whatever other driver.
Running C progs from external memory via VMCog is broken at the moment.
Bill, if you look in run_zog you will see that it is easy to get the bootloader to position the zpu executable and the stack anywhere in HUB so it should adapt to Minos easily.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I will be back on VMCOG soon. Right now I am trying to figure out why VMCOG 'f' test fails on the new revision of PropCade. It passes the other tests...
I'm trying to use ZOG again since it will play an important role in some of my experiments.
Tried run_zog.spin with serial port set to 115200 and get junk characters on the terminal.
Set xinfreq = 5_000_000 in run_zog.spin for my board. What else do I need to do?
I have linux tools from zylin but am missing g++. Do I need to build g++ with sources?
Jazzed: "...Zog ... important..". I'm very curious about your experiments now.
Junk chars from serial are due to an incorrect clock frequency setting. If you are using run_zog there are the normal clock setting things in there. But that last release of zog is using the C++ version of FullDuplexSerial which currently knows nothing about what is set in the Spin start up and is set up for my 104MHz TriBlade.
So to change it you are going to have to change it in lib/propeller.c and rebuild the library and test program. Which is just a question of running Make in the lib directory and copying the resulting test_libzog.bin up to the top directory.
There are some paths in that Makefile you will have to fix up first to find the compiler.
So yes you need C++ enabled in the compiler. I built my compiler on Linux from the latest source I think. I used the attached script to run configure with some nice options, including C++ and FORTRAN. It also builds binutils. Use it from the "toolchain" directory.
Good luck.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I want to try running LMM from a variant of VMCOG and external SRAM having 64 byte pages.
There is more to do like trying the cached EEPROM with separate text/data.
I'm trying your build script now. Output looks kind of familiar especially when I get an error
Any idea what the error below means?
I've never heard of a program called 'no' and sudo apt-get install no doesn't find anything.
Are you working in Ubuntu? Don't do that, nothing ever works there [noparse]:)[/noparse]
I don't know but I was guessing the "no" is the opposite to "yes" which prints "y" on most Unix boxes and so should print "n". However typing "no" here results in "command not found"
I could have a go at building zpu-gcc on an Ubuntu box at work in the morning (about 8 hours from now here).
I'm totally confused by the description of your experiment by the way.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
heater said...
Are you working in Ubuntu? Don't do that, nothing ever works there [noparse]:)[/noparse]
I don't know but I was guessing the "no" is the opposite to "yes" which prints "y" on most Unix boxes and so should print "n". However typing "no" here results in "command not found"
I could have a go at building zpu-gcc on an Ubuntu box at work in the morning (about 8 hours from now here).
I'm totally confused by the description of your experiment by the way.
Well Fedora was mostly a disaster and I haven't tried Debian. Ubuntu has been very good to me since Hardy where I built AVR and the native GCC chain.The 10.4 build has been a wonderful experience mostly ... until now.
I would like to try ZOG on a different external memory platform or three but first want to see some results with normal Propeller hardware. What is broken about external mode now?
Thanks,
--Steve
I was missing gettext internationalization stuff Waiting for next error ... meanwhile reading LFS.pdf
1) In that latest version I split the basic ZPU interpreter out into it's own spin object in it's own file zog.spin. This has a minimal amount of Spin code in it. The idea being to be able to extract the interpreter PASM into a binary blob with "bst -c" which can then loaded from anywhere and run by anyone, from Spin or C or whatever.
2) That meant that I had to make some changes to the way the ZPU interpreter PASM was set up prior to starting it. Instead of having Spin code "poke" parameters, like ZPU memory space address, I/O mailbox address etc, into the DAT section these things had to be passed in via a PAR block.
3) A new feature there was to actually tell the interpreter where it's own dispatch table is in HUB memory. As it runs now the dispatch table gets moved to high HUB RAM whilst the ZPU memory space gets moved to low HUB RAM. So the interpretter needs to know where all this is through PAR.
4) Just for show I wrapped up FullDuplexSerial's PASM in a C++ wrapper and it is now started and used directly from C++ code.
5) All of this means that the original Spin support code that provide I/O through a kind of mailbox with ZOG was broken.
However, since then I found the time to put all the original Spin support code into a spin object of it's own, debug_zog.spin and it can at least run the new zog object from HUB ram.
So, after all these changes I'm not quite sure what might be broken with zog running from external memory. For sure the C++ version of FullDuplexSerial will not work from ext RAM.
What I have to do is get back to the original C test programs that do I/O through a mailbox which is then handled by the new zog_debug.spin. Then I think we will have what you want.
Given the pressure of work just now this might take a few days.
Hope that made some sense for you.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Jazzed. I just checked out and tried to build the ZPU toolchain on Debian and Ubuntu machines.
# git clone git://www.ecosforeg.net:8100/zpu/toolchain.git.
# cd toolchain/toolchain
# .fixperms
# source ./env
# sh build.sh
The Debian one built cleanly, the Ubuntu stops with your "(" unexpected error. Never did get that "no" error.
Moving on I used my version of build.sh on Debian and it gets me a C++ compiler without hitch.
I have no idea what that "(" unexpected is all about. I tried changing ../../binutils/ld/emulparams/zpuelf.sh line 30 to use a simple hex value with no brackets around it. After that the build hangs a bit further along [noparse]:([/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I would put this update on the first post but I can't get in as "heater" just now.
This release:
The main thing is that following the separation of the zog pasm code away from all the spin support in v1.0 operation from external RAM is now working again.
Use debug_zog.spin as the top level module to run code that uses regular printf etc from libc. Define USE_VIRTUAL_MEMORY or USE_HUB_MEMORY as required.
As delivered it is set up to run the fibo test from external memory on a TRIBLADE.
Use run_zog.spin to run zog without any I/O support. This only works from HUB RAM for now. It starts zog and optionally takes over all HUB RAM for C code or can continue with further spin code with the C code being confined to it's defined memory area.
When using zog this way one has to use the C++ versions of FullDuplexSerial in order to get any I/O. See the test_libzog.cpp for examples of how this works. There is also a C++ version of VMCog in there.
Can you please do me a favor and change your TOOLPATH references to use a shell variable?
This bash command executed from the zog directory will change the make files and save the original:
$ for X in `find . -name Makefile`; do sed 's/^TOOLPATH=/# TOOLPATH=/g' "$X" > tmp; cp "$X" "${X}.save"; cp tmp "$X"; done
To use it in ksh/bash, one should do this:
$ TOOLPATH=<your toolchain install bin path>
$ export TOOLPATH
If nothing else, I guess this post may be a good reference for replacing TOOLPATH for new users.
when creating the binary. This reverses the order of bytes within each long of the binary so that run_zog and debug_zog no longer have to do it when loading code.
Also adopted Jazzed's suggestion re: the TOOLPATH environment variable in the Makefiles. You will now need to set the TOOLPATH to point to your zpu-gcc complier installation before running make for the C programs. For example under BASH on Linux:
$ export TOOLPATH=/home/you/ZPU/toolchain/install/bin/
$ make clean
$ make
I don't know how this has happened but I only have 10 LONGs and 50 LONGs left in cog when Zog is compiled for HUB or external RAM respectively.
This could become about 30 and 70 if I reused some init code locations for variables.
There was a plan to adapt the float32 code to work as LMM and run it from an LMM kernel in Zog for float support in C. Looking at float32 I'm now inclined to just run it in another COG as normal but through a mailbox interface from C rather than Spin. This way Zog gets the native PASM speed of float32 and saves HUB memory by loading up a float Cog on start up and reusing the HUB space it came in.
Anyone have any preference here? Or even likely to use it?
If nothing else I have to have a result for the Whetstone floating point benchmark that RossH has run under Catalina:)
SDRAM access via VMCOG is horribly slow because I have ZERO room for tricks like using CTRA for clocking and bursting 32bytes at a time. The SdramCache code is much faster and when working will beat the TriBlade performance for fibo(20).
But if using vmcog v075 and 20 pages surely you should get the same OK result as me. Assuming SDRAM access is always working correctly?
I had to rewrite the driver to cram things into VMCOG. The behavior is pretty wierd so I'll look at it again later. I want SdramCache working first.
SDRAM code is way too big for direct access. I've done lots of chopping today.
What I'm doing is adding code that communicates with the SdramCache manager COG. That code can burst 32 bytes in about 6us (5.3MB/s) and it provides a buffer for the user rather than relying on some multi-HUB transaction per byte/word/long scheme.
I had to un-inline as you say to make things fit and have no wiggle room (in the straight jacket again). Right now I see with the debugger that it's running "several" instructions before it does silly stuff. I'm too tired to do much more with it now.
Jazzed, I have a result for v0.976 with 20 pages which looks kind of similar to yours. It prints out a bit more but it is hitting a break point at the same place with the same stack value. Looks like you should be in luck with v0.976.
By the way that address 0x00005C5 is in the middle of the fibo() routine and the opcode read there should be 0xff which is load immediate -1.
In ZOG there are now about 20-30 instructions just reading PAR parameters during init. I had to add this when separating the Zog PASM code away from the Spin support code.
Zog has 20-30 variables that can be "overlaid" against that init code seeing as the init code only runs once. Perhaps that space saving helps you.
Comments
This has various little optimizations to the ZPU engine.
Dispatch table is now only 128 bytes in HUB
13% faster running FIBO from HUB.
BEWARE - Has an issue with TriBlade external memory, random incorrect
results running fibo or dhrystone or even crashes. Seems to depend
on the number of working pages in VMCOG.
For example fibo(22) is wrong here even if all the other results are correct!
Anyone out there able to run this on TriBlade external memory so we can verify if it's a real problem or just my hardware?
Anyone able to report if it works on Hydra or other platform?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 6/23/2010 6:53:52 AM GMT
I am pretty sure there is a bug or two still lurking in VMCOG - so I am not convinced the problem is the TriBlade code.
If I had to guess, I am probably re-using a variable and occasionally corrupting the data. My guess would be that the problem is somewhere in the BUSERR area.
Why do I think it may be a VMCOG bug?
1) GuardDog showed no stack corruption
2) Different behaviour (other than length of run) with different number of pages
After UPEW I will be taking a close look...
FYI, I will be bringing a PropCade proto with SPI ram's to UPEW.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Consider, for no language in particular:
dimension myarray[noparse][[/noparse]100000]
mycode...
That gets hard on the propeller. The propeller has 32k of memory.
It gets hard on the Z80 emulations too, as they are limited to 64k.
Maybe one needs to start with a processor that is not necessarily limited to a small amount of memory. Maybe a stack based processor? Hmm - that led me to.... Zog.
Just scanning through the instruction set repo.or.cz/w/zpu.git?a=blob_plain;f=zpu/docs/zpu_arch.html;hb=HEAD#instructionset there are some nice features in there. Not too many un-needed instructions. Lots of nice ones (multiply and divide - yum!)
So, first question, say we have some external memory. How does C++ cope with that?
function1
call function10000
function2
...
function9999 // using up 100000 bytes of memory so far
function10000
some code
Any problems with a big jump like that?
Any problems with a huge array at the beginning, then some code?
Just brainstorming here, but I'm wondering that if C++ can handle jumps within a much larger memory space, maybe one could think about a Spin to C translator, as a path to Big Spin. Many spin commands do translate 1:1 to C.
But I suspect I'm getting ahead of myself here. Probably need to do this in baby steps. So - have you got the dracblade ram driver code we used in zicog, and would it be possible to add this to Zog?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Firstly in C there is this weird thing that the size of an integer is not specified in the language it is machine dependant. And should be "The most natural size of integer for the machine". So an integer declaration:
int someInteger;
can be 16, 32, 64 bits depending on the machine you are compiling for. See this page for details en.wikipedia.org/wiki/C_variable_types_and_declarations#Size
Not sure if other non-multiple of 8 bits have ever been used.
Under BDSC an int is 16 bits which one might say is not the natural size for an 8 bit Z80. But 8 bits would be pretty useless.
All this machine dependence carries over to C++ as well.
The size of a pointer, memory address, need not be the same as an int but must not be smaller I believe.
Turning to the ZPU. It is a 32 bit architecture. Both ints and pointers in C/C++ are therefore 32 bits. So potentially you can have up to 4GB of code and data, just like a 32bit PC. No problem.
Practically if you programs are getting any where near that big it's time to look for an Intel or ARM or other computer. The ZPU, despite bing 32 bit is designed to be very small when implemented in FPGA chips leaving more FPGA logic for the rest of whatever application. Speed was not the priority.
Turning to the Zog version of ZPU. Due to the fact that ZPU was designed to be "logically" small it is a perfect fit into a COG in PASM. That's why I just had to do it.
Zog can be built to run in HUB RAM or external RAM. Given that it's external RAM is making use of the VMCog interface it's external RAM space is limited by whatever VMCog provides. Currently that is 64K. Soon Bill will get that up to megabytes.
So as you see big jumps are not an issue for C/C++/ZPU/ZOG. I do believe someone has created a Spin to C translator that works to some level of usefulness. Can't remember the name now. Zog is not unique in this Catalina C is also 32 bit. Not sure if ICC for the Prop supports external memory
The DracBlade driver: I'm glad you mentioned that, I was about to approach you to see if you have a moment to add DracBlade support. I'm ashamed to say my DracBlade board is still naked. I've been very slow at collecting the parts. I could add your DracBlade driver to VMCog to get Zog working on DracBlade but then you would have to test it.
We should continue that discussion on Bill's VMCog thread...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I can send you some parts if you like - which bits would you need?
Re adding the driver code - I need about 20-30 longs of pasm space. Would you have that free?
If so, please point me to some demo code. I guess it might end up being an ifdef for different types of external memory - are you ok about using BST?
One cog driving external ram would be very nifty. For a start, it might free up the bulk of hub ram for video buffering.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Zog does have #ifdefs to select using HUB RAM or VMCog for external RAM. That's it.
I have no plans to add physical memory drivers with lots of #ifdefs to Zog. Perhaps it would speed it up a bit but it's not worth making that "#ifdef soup" in there for me just now.
What's this about "Virtual memory"? Well Bill's VM Cog fetches 512 byte blocks as fast as possible from the external RAM. It keeps many blocks in HUB, a user defined number from 2 upwards. It delivers BYTES/WORDS/LONGS to Zog or other application. If the app requires access to data that is currently not in HUB then VM Cog will fetch the appropriate block from ext RAM. Automatically writing out other blocks if it needs the space in HUB.
Checkout the VMCog thread. Bill explains the hows and whys of all this better than I can http://forums.parallax.com/showthread.php?p=878382
The bottom line is that it speeds things up if you have slower RAM like serial SPI RAM or RAM that is better suited to fast block access like the Hydra. It should work well for DracBlade. It also means thngs like Zog don't have to worry about what board they are running on, just use VMCog. I plan to make a version of ZiCog that does uses VMCog.
So VMCog is where the DracBlade driver code should go. It's quite easy to drop in there, just routines for read ablock, write a block and maybe some initialization. There is plenty of space.
Have a look in the VMCog.spin that comes with Zog to see how it goes. Look for sections called BINIT, BSTART, BREAD, BWRITE and BDATA.
As for your offer of bits for the DracBlade that would be great. I guess the main obstacles are the RAM and SD card socket.
P.S. I don't use anything but BST for a long time now. What with having a Linux only household.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Ok, got the download. Looking at vmcog.
Like you say, the dracblade ought to be possible to drop in.
Re "Perhaps it would speed it up a bit but it's not worth making that "#ifdef soup" in there for me just now."
Hmm - looks like #ifdef soup is already happily replicating itself in the code *grin*;
just go with the flow. repeat until end "#ifdef soup is good for me..."
vmcog looks simple. But complex at the same time.
I'm up with reading and writing a byte;
but I'm afraid my eyes glaze over when considering the subtleties of "GetPhysVirt" vs "GetVirtLoadAddr" vs "GetVirtPhys"
Hmm - rather than try to understand this - how about I send you a freebie soldered up Dracblade? Maybe with some parts too so you can buid up one with the boards you have? Ok, ram, plus sd card socket. Do you have a vga socket? And the switching regs? May as well include a few bits. Can you PM me with your address again? [noparse]:)[/noparse]
Zog looks cool. I'll see what is out there for a Spin to C translator.
In general terms, I like the idea of starting with a flat 4 Gig memory space. Build a program with 32 bit integer adresses. Ok, in reality you might only have 512k, but it is not that hard to add more memory. Certainly with the Dracblade it would be possible to go to 24 bit using just more ram chips as there are three 8bit latches for addresses. 2k is very limiting, 32k is limiting but with a few meg you can code away for months and not worry about running out of space.
The arguments for Big C (a few megs of code) ought to be applicable to Big Spin.
Mind you, I'm even more intrigued about the possibilities of vmcog, once I get to understand it. It ought to be almost as fast as real memory, and it is going to be a much smaller solution using less prop pins and opens up the possibility of tiny surface mount boards with Big C and Big Spin and a cost that is not much more than the demo board.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Re: "#ifdef soup is good for me...",
Yep it already looks a mess. However the plan is to:
a) Split the Zog PASM out into it's own Spin file. As done in ZiCog.
b) Throw away all that Spin code and replace it with a simple Zog loader.
Result being that after that there is no Spin in the Prop, Zog can run C that takes over the whole machine. Just like Spin does normally.
Then for debugging purposes the current Spin code may get replace with the C equivalent. Then one Zog Cog can be used for debugging another Zog and/or the C code it runs.
The little Zog in HUB will be able to start other little Zogs that use HUB RAM or a big Zog that uses ext RAM.
Of course the current style Spin + C will continue in case anyone wants to combine C and Spin objects.
Re: vmcog,
Actually I think it is complicated, Bill has done a great job here. Luckily one does not HAVE to use all it's features. You will notice that Zog only uses rdvbyte/long and wrvbyte/long.
Re: The rest,
A flat memory space is great. However Zog will get a bit weird in that respect. Already Zog in it's own HUB area or Zog running from external RAM have access to all of HUB memory by using addresses outside of 64K. This will change to 128K, 256K whatever when VMCog can handle it.
The reverse is also going to happen, Zog in HUB will be able to access ext RAM through VMCog. I'm just now trying to convert the VMCog Spin into C.
Hmm.. a 16 MByte DracBlade is on the cards then (24 bit) not sure if Bill can push VM Cog that far.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
There have been so many changes recently I'm not sure where to start:
1) zog.spin is now reduced to mostly just the interpreter PASM and has a nice PAR block to configure it on start up. The intention is to be able to extract that PASM binary blob and then be able use that to start other Zog interpreters on other Cogs from C running under Zog. So Zog can start Zog like Spin can start Spin.
2) There is a new main module, run_zog.spin, that can start a Zog interpreter running some C/C++ code in it's own space in HUB RAM. Alternatively run_zog can bootstrap Zog to take over the entire Propeller displacing all vestiges of Spin. Set the CUCKOO_MODE define to do this.
Basically run_zog then starts a Cog running a PASM bootstrap code that moves the ZPU executable to $0000 in HUB, moves the Zog dispatch table to the last 128 bytes in HUB, sets the SP just below that and then starts Zog. Boom, no more Spin.
3) Zog now runs C++ quite happily. There is a new lib directory containing C++ replacements of FullDuplexSerial and VMCog. These use the PASM from the original Spin versions compiled and extracted into binary blobs by BST. The blobs are the converted to object files and linked with the C
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Further to 3 above. There is a lot of magic going on in the Make file in the lib directory. That make file builds C++ versions of FullDulexSerial and VMCog and places them into a library libzog.a. It also builds the demo/test program test_libzog.bin.
You will see that the Make file compiles the original Spin code with BST extracting a binary blob of only the DAT section.
The blobs are converted to object files for GCC with objcopy. They are then linked against the compiled C++ modules to produce the library. A further compile/link builds the test program using the library.
Some how I have to streamline that process to make it more generally useful.
As run_zog no longer has any of that support code for debugging zog, single step and syscall handling, it is no longer possible to use printf or iprintf in C programs. This means all the other test programs in the test directory will not run. This will be fixed at some point by making printf and friends use FullDuplexSerial or whatever other driver.
Running C progs from external memory via VMCog is broken at the moment.
Bill, if you look in run_zog you will see that it is easy to get the bootloader to position the zpu executable and the stack anywhere in HUB so it should adapt to Minos easily.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I will be back on VMCOG soon. Right now I am trying to figure out why VMCOG 'f' test fails on the new revision of PropCade. It passes the other tests...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I'm trying to use ZOG again since it will play an important role in some of my experiments.
Tried run_zog.spin with serial port set to 115200 and get junk characters on the terminal.
Set xinfreq = 5_000_000 in run_zog.spin for my board. What else do I need to do?
I have linux tools from zylin but am missing g++. Do I need to build g++ with sources?
Thanks,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Junk chars from serial are due to an incorrect clock frequency setting. If you are using run_zog there are the normal clock setting things in there. But that last release of zog is using the C++ version of FullDuplexSerial which currently knows nothing about what is set in the Spin start up and is set up for my 104MHz TriBlade.
So to change it you are going to have to change it in lib/propeller.c and rebuild the library and test program. Which is just a question of running Make in the lib directory and copying the resulting test_libzog.bin up to the top directory.
There are some paths in that Makefile you will have to fix up first to find the compiler.
So yes you need C++ enabled in the compiler. I built my compiler on Linux from the latest source I think. I used the attached script to run configure with some nice options, including C++ and FORTRAN. It also builds binutils. Use it from the "toolchain" directory.
Good luck.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
There is more to do like trying the cached EEPROM with separate text/data.
I'm trying your build script now. Output looks kind of familiar especially when I get an error
Any idea what the error below means?
I've never heard of a program called 'no' and sudo apt-get install no doesn't find anything.
Cheers.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I don't know but I was guessing the "no" is the opposite to "yes" which prints "y" on most Unix boxes and so should print "n". However typing "no" here results in "command not found"
I could have a go at building zpu-gcc on an Ubuntu box at work in the morning (about 8 hours from now here).
I'm totally confused by the description of your experiment by the way.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I would like to try ZOG on a different external memory platform or three but first want to see some results with normal Propeller hardware. What is broken about external mode now?
Thanks,
--Steve
I was missing gettext internationalization stuff Waiting for next error ... meanwhile reading LFS.pdf
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Post Edited (jazzed) : 7/20/2010 11:57:21 PM GMT
How can I explain:
1) In that latest version I split the basic ZPU interpreter out into it's own spin object in it's own file zog.spin. This has a minimal amount of Spin code in it. The idea being to be able to extract the interpreter PASM into a binary blob with "bst -c" which can then loaded from anywhere and run by anyone, from Spin or C or whatever.
2) That meant that I had to make some changes to the way the ZPU interpreter PASM was set up prior to starting it. Instead of having Spin code "poke" parameters, like ZPU memory space address, I/O mailbox address etc, into the DAT section these things had to be passed in via a PAR block.
3) A new feature there was to actually tell the interpreter where it's own dispatch table is in HUB memory. As it runs now the dispatch table gets moved to high HUB RAM whilst the ZPU memory space gets moved to low HUB RAM. So the interpretter needs to know where all this is through PAR.
4) Just for show I wrapped up FullDuplexSerial's PASM in a C++ wrapper and it is now started and used directly from C++ code.
5) All of this means that the original Spin support code that provide I/O through a kind of mailbox with ZOG was broken.
However, since then I found the time to put all the original Spin support code into a spin object of it's own, debug_zog.spin and it can at least run the new zog object from HUB ram.
So, after all these changes I'm not quite sure what might be broken with zog running from external memory. For sure the C++ version of FullDuplexSerial will not work from ext RAM.
What I have to do is get back to the original C test programs that do I/O through a mailbox which is then handled by the new zog_debug.spin. Then I think we will have what you want.
Given the pressure of work just now this might take a few days.
Hope that made some sense for you.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Now I'm down to the error below. Think I'll stop for today [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
# git clone git://www.ecosforeg.net:8100/zpu/toolchain.git.
# cd toolchain/toolchain
# .fixperms
# source ./env
# sh build.sh
The Debian one built cleanly, the Ubuntu stops with your "(" unexpected error. Never did get that "no" error.
Moving on I used my version of build.sh on Debian and it gets me a C++ compiler without hitch.
I have no idea what that "(" unexpected is all about. I tried changing ../../binutils/ld/emulparams/zpuelf.sh line 30 to use a simple hex value with no brackets around it. After that the build hangs a bit further along [noparse]:([/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I have put up a ZPU GCC tool chain that includes a C++ compiler at this location:
fits-server.rsm.fi/ZPU_toolchain.tgz
Only 162 MBytes !
This tool chain was built on Debian Lenny and has been tested on Ubuntu only so far as it will compile a hello.cpp program using the command:
#zpu-elf-g++ -phi hello.cpp
All this because I absolutely cannot figure out why trying build this tool chain on Ubuntu fails.
This is a limited time offer as my superiors may take a dim view of using this server unless I can convince them we actually need ZOG for real work.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I would put this update on the first post but I can't get in as "heater" just now.
This release:
The main thing is that following the separation of the zog pasm code away from all the spin support in v1.0 operation from external RAM is now working again.
Use debug_zog.spin as the top level module to run code that uses regular printf etc from libc. Define USE_VIRTUAL_MEMORY or USE_HUB_MEMORY as required.
As delivered it is set up to run the fibo test from external memory on a TRIBLADE.
Use run_zog.spin to run zog without any I/O support. This only works from HUB RAM for now. It starts zog and optionally takes over all HUB RAM for C code or can continue with further spin code with the C code being confined to it's defined memory area.
When using zog this way one has to use the C++ versions of FullDuplexSerial in order to get any I/O. See the test_libzog.cpp for examples of how this works. There is also a C++ version of VMCog in there.
Can you please do me a favor and change your TOOLPATH references to use a shell variable?
This bash command executed from the zog directory will change the make files and save the original:
$ for X in `find . -name Makefile`; do sed 's/^TOOLPATH=/# TOOLPATH=/g' "$X" > tmp; cp "$X" "${X}.save"; cp tmp "$X"; done
To use it in ksh/bash, one should do this:
$ TOOLPATH=<your toolchain install bin path>
$ export TOOLPATH
If nothing else, I guess this post may be a good reference for replacing TOOLPATH for new users.
Thanks.
--Steve
Realized many Makefiles were missing the command:
objcopy -I binary -O binary --reverse-bytes=4 $(BINARY)
when creating the binary. This reverses the order of bytes within each long of the binary so that run_zog and debug_zog no longer have to do it when loading code.
Also adopted Jazzed's suggestion re: the TOOLPATH environment variable in the Makefiles. You will now need to set the TOOLPATH to point to your zpu-gcc complier installation before running make for the C programs. For example under BASH on Linux:
$ export TOOLPATH=/home/you/ZPU/toolchain/install/bin/
$ make clean
$ make
This could become about 30 and 70 if I reused some init code locations for variables.
There was a plan to adapt the float32 code to work as LMM and run it from an LMM kernel in Zog for float support in C. Looking at float32 I'm now inclined to just run it in another COG as normal but through a mailbox interface from C rather than Spin. This way Zog gets the native PASM speed of float32 and saves HUB memory by loading up a float Cog on start up and reusing the HUB space it came in.
Anyone have any preference here? Or even likely to use it?
If nothing else I have to have a result for the Whetstone floating point benchmark that RossH has run under Catalina:)
I have a project in a big code base that most likely uses floating point.
Having an option to include floating point in a COG would be useful.
Meanwhile, I think I have something loading/running on SDRAM via VMCOG.
Too bad it doesn't finish.
Cheers.
--Steve
Is your vmcog_sdram.spin based off of vmcog.spin v0.975 ?
And are you using 20 pages ?
That's the only working combination I have so far.
I notice your SDRAM is ~8% behind TriBlade speed.
SDRAM: fibo(20) = 006765 (03005ms)
TriBlade: fibo(20) = 006765 (02782ms)
Re: "Having an option to include floating point in a COG would be useful."
Then it will be so.
SDRAM access via VMCOG is horribly slow because I have ZERO room for tricks like using CTRA for clocking and bursting 32bytes at a time. The SdramCache code is much faster and when working will beat the TriBlade performance for fibo(20).
I'll try to answer some of the VMCOG things here:
I had to rewrite the driver to cram things into VMCOG. The behavior is pretty wierd so I'll look at it again later. I want SdramCache working first.
SDRAM code is way too big for direct access. I've done lots of chopping today.
What I'm doing is adding code that communicates with the SdramCache manager COG. That code can burst 32 bytes in about 6us (5.3MB/s) and it provides a buffer for the user rather than relying on some multi-HUB transaction per byte/word/long scheme.
I had to un-inline as you say to make things fit and have no wiggle room (in the straight jacket again). Right now I see with the debugger that it's running "several" instructions before it does silly stuff. I'm too tired to do much more with it now.
Cheers.
--Steve
BTW, I don't see any source for FIBO. Can you provide that?
By the way that address 0x00005C5 is in the middle of the fibo() routine and the opcode read there should be 0xff which is load immediate -1.
In ZOG there are now about 20-30 instructions just reading PAR parameters during init. I had to add this when separating the Zog PASM code away from the Spin support code.
Zog has 20-30 variables that can be "overlaid" against that init code seeing as the init code only runs once. Perhaps that space saving helps you.