Wow! Thanks for all of the detailed information. That should help a lot. I'll let you know when I get something running. I'm currently trying to see if ZOG will work for my bytecode Basic compiler that won't fit in HUB memory using Catalina C.
That BASIC compiler sounds intriguing. Not compiling to ZPU byte codes I guess:)
I didn't know about ZPU when I wrote the original compiler but I guess it wouldn't be hard to modify it to generate ZPU instructions. I'll have to look over the instruction set. Thanks for the suggestion!
Do you know if there is a PDF file describing the ZPU architecture? I found a web page describing it but it would be nice to have a reference to print out. Is there a PDF version of the architecture document that you know of?
There is no nice PDF document of the ZPU instruction set.
Sometime ago I did volunteer to write one. Perhaps now is the time to do so. Be aware that there are some errors and omissions in the instruction definitions on the Zylin web site.
As is often the case the best reference is some working code. I started with the source code of the ZPU simulator written in Java. Or there is my version in C, attached.
Do you already have an interpreter for your byte codes on the Prop?
Perhaps using ZPU byte codes and ZOG would be a short circuit in development of your BASIC, if the stack based ZPU machine is as efficient as you would like.
Do you already have an interpreter for your byte codes on the Prop?
Perhaps using ZPU byte codes and ZOG would be a short circuit in development of your BASIC, if the stack based ZPU machine is as efficient as you would like.
Yes, I already have an interpreter for the Propeller but ZOG still might be a better choice. I have an interpreter for my bytecodes written in C that I have run on the PIC and AVR processors and I ported it to the Propeller using Catalina C and it seems to work. I've also written one in Propeller ASM that I haven't had time to test yet. Still, ZOG would probably be a better choice. I could even support compiled Basic code linked with C/C++ code if I use ZOG!!
VMCOG comes with a built-in SPI SRAM driver for PropCade, and I am adding (hopefully today) a version for SPI RAM's on Morpheus. Either would be very easy to modify for the SPI ram being on other pins.
Later this week I am adding a driver for my FlexMem board (four bit wide bus using four SPI ram's for >2MB/sec burst transfer, up to 6.6MB/sec with the two cog special driver I am working on)
VMCOG was designed to allow easily adding additional drivers for any sort of memory interface - I intended it to be a HAL for different memory interfaces. Please see the comments and #ifdef'ed sections in the code.
The currently released VMCOG only supports 64KB of VM, however I am working on a version that will support at least 2MB of VM, much more if I go to a multi-level TLB.
Later this week I am adding a driver for my FlexMem board (four bit wide bus using four SPI ram's for >2MB/sec burst transfer, up to 6.6MB/sec with the two cog special driver I am working on).
That sounds quite cool. Is there any chance it could be made to work with a pair of SRAMs in a two bit wide bus?
Maybe I should try porting XLISP to the Propeller using ZOG/VMCOG! I'm sure there are lots of people who would want to program the Propeller in Lisp! :-)
I could even support compiled Basic code linked with C/C++ code if I use ZOG!!
That would be brilliant. One could create applications in BASIC if that is ones preferred language and be able to make use of all kinds of C libraries. We would have common drivers for whatever devices in BASIC and C.
That sounds quite cool. Is there any chance it could be made to work with a pair of SRAMs in a two bit wide bus?
It would be pretty easy to do, however the speed gain over single bit access would be minimal due to the overhead.
The main reason I have not made an 8-bit wide version is that it would require ten Prop pins, and the cost/benefit is not there - eight of the 23K256 chips is $12 for 256KB of ram, and would result in an unpalatable price for any board using eight such chips.
Maybe I should try porting XLISP to the Propeller using ZOG/VMCOG! I'm sure there are lots of people who would want to program the Propeller in Lisp! :-)
Now that would be interesting... I wonder if I still have my old XLISP expert system code in a readable format?
I could even support compiled Basic code linked with C/C++ code if I use ZOG!!
That would be brilliant. One could create applications in BASIC if that is ones preferred language and be able to make use of all kinds of C libraries. We would have common drivers for whatever devices in BASIC and C.
Wow! That would be great!
Here's my latest SDRAM fibo result.
I just couldn't help stumbling over an SDRAM Cache optimization
I'm just looking over the Zog code (1.3) for the first time, and it seems to me that the 'nos' (next on stack, I'm assuming) variable is only ever used right after it's set. Is it possible that 'nos' was going to be an optimization, but then got dropped? I think you could save yourself some cycles and longs by removing it.
Zog is looking awesome, btw! I'm really looking fwd to playing with it.
Jonathan
Edit: Also, it looks like the mult16x16 routine does a full 32 cycles, could you set the number of bits externally before the call?
Thank you Lonesock, well spotted. Every cycle counts and LONGs are precious.
In my C version of ZPU pretty much every instruction uses nos, which might as well have been called "temp" or something. A few of those survived the transition to PASM. I don't recall I had an actual use for nos in mind except to print a register dump compatible with that of the GHDL test harness for the VHDL ZPU even there it is not necessary.
I'll have a look at the 16x16 mult.
That reminds me, the ZPU in C should be got running under ZOG on the Prop just to complete the circle.
TOS holds top of stack, but you need to pop next item off the stack to be able to do any add, sub etc op --> thus you need NOS, which really is just a temp register
Thank you Lonesock, well spotted. Every cycle counts and LONGs are precious.
In my C version of ZPU pretty much every instruction uses nos, which might as well have been called "temp" or something. A few of those survived the transition to PASM. I don't recall I had an actual use for nos in mind except to print a register dump compatible with that of the GHDL test harness for the VHDL ZPU even there it is not necessary.
I'll have a look at the 16x16 mult.
That reminds me, the ZPU in C should be got running under ZOG on the Prop just to complete the circle.
Just finished making a test rig for the MCP23S17 port on the new boards; now I can finish my MCP23S17 object and easily test those ports in the future!
* Checking for the zpu_im, you could do this instead:
cmpsub data, #$80 wc 'Check for IM instruction. This saves table lookup
if_c jmp #zpu_im 'for the most common op. 7% fibo speed gain!
that way, you don't need the "and data, #$7F" line inside zpu_im, :next.
* And you can try this for multiplication code
' make x the smaller of the 2 parameters
mov t2, x
max x, y
min y, t2
mmul shr x,#1 wc,wz 'multiply
if_c add t1,y
shl y,#1
if_nz jmp #mmul
Note that the 1st 3 ops are optional, but will increase the average speed, assuming you aren't using the mult operation exclusively for squaring values [8^)
@Lonesock, Nice to see you and others interested in ZOG.
@Heater,
I was having malloc troubles ... now I don't.
Before, I could get malloc to work like 3 times which happened to be after I ran a HUB based program. I've added some code to put garbage into memory, then clean it out to zero before I boot. If I don't clean memory before boot, malloc fails. If I clean it, malloc works reasonably. I'm still not sure why I can't successfully malloc over 16.6MB, but I don't think that's a memory hardware problem. Results posted below.
--Steve
Note below that "Free big buffer" after malloc means success, and "Oops ..." means malloc failed.
Filling first 1MB memory with garbage.
ZOG v1.2 (CACHE)
Starting SD driver...0000FFFF
Mounting SD...00000000
Booting mall.bin
00000000
Filling memory with garbage .....
Reading image... 17339 Bytes Loaded.
Done
Waiting 1 seconds before program check...
Starting SD driver...0000FFFF
Mounting SD...00000000
Checking image... 17339 Bytes Checked.
Program Load OK.
Running Program!
Malloc Testing.
_hardware = 0 _cpu_config = 2
_use_syscall = 0 ZPU_ID = 18
Malloc big buffer size: 1024
Oops, malloc failed.
Malloc big buffer size: 8192
Oops, malloc failed.
Malloc big buffer size: 8000000
Oops, malloc failed.
Malloc big buffer size: 16000000
Oops, malloc failed.
Malloc big buffer size: 30000000
Oops, malloc failed.
Malloc big buffer size: 16777216
Oops, malloc failed.
Malloc big buffer size: 16711680
Oops, malloc failed.
Malloc big buffer size: 16646144
Oops, malloc failed.
Malloc big buffer size: 16515072
Oops, malloc failed.
Malloc big buffer size: 16252928
Oops, malloc failed.
Malloc big buffer size: 16000000
Oops, malloc failed.
All done!
Filling first 1MB memory with garbage, then cleaning up before boot.
ZOG v1.2 (CACHE)
Starting SD driver...0000FFFF
Mounting SD...00000000
Booting mall.bin
00000000
Filling memory with garbage .....
Clearing memory .........
Reading image... 17339 Bytes Loaded.
Done
Waiting 1 seconds before program check...
Starting SD driver...0000FFFF
Mounting SD...00000000
Checking image... 17339 Bytes Checked.
Program Load OK.
Running Program!
Malloc Testing.
_hardware = 0 _cpu_config = 2
_use_syscall = 0 ZPU_ID = 18
Malloc big buffer size: 1024
Free big buffer.
Malloc big buffer size: 8192
Free big buffer.
Malloc big buffer size: 8000000
Free big buffer.
Malloc big buffer size: 16000000
Free big buffer.
Malloc big buffer size: 30000000
Oops, malloc failed.
Malloc big buffer size: 16777216
Oops, malloc failed.
Malloc big buffer size: 16711680
Oops, malloc failed.
Malloc big buffer size: 16646144
Free big buffer.
Malloc big buffer size: 16515072
Free big buffer.
Malloc big buffer size: 16252928
Free big buffer.
Malloc big buffer size: 16000000
Free big buffer.
All done!
FWIW, the vaule 16646144 == (16 * 1024 * 1024) - (128 * 1024), which is easy to see if you look at the number in hex. Or in other words, your maximum allocation seems to be one half of (32 MB - 256 KB). Perhaps that might give some hints about how the C runtime heap subsystem is managing memory.
In other words you are allocating 16.6 MB using 16 * 1000 * 1000, but that is not a "binary" 16.6 MB -- it is really 15.87 MB.
IM is on the critical path as it is the most commonly used instruction. Removing the dispatch table lookup for IM made the fibo test 7% faster when running from HUB.
Looking at IM, I have now thrown out that decode_mask and used self-modifying code for the jmp in the execute loop to get to the right IM. Saves another LONG.
So IM is now:
zpu_im_next shl tos, #7
or tos, data
jmp #done_and_inc_pc
zpu_im_first call #push_tos
mov tos, data
shl tos, #(32 - 7) 'Sign extend
sar tos, #(32 - 7)
movs which_im, #zpu_im_next
and in the execute loop we have:
cmpsub data, #$80 wc 'Check for IM instruction. This saves table lookup
which_im if_c jmp #zpu_im_first 'for the most common op. 7% fibo speed gain!
movs which_im, #zpu_im_first 'Self modifying code at which_im selects, first or subsequent IM.
Jazzed,
That's good. So something is not being initialized properly, that's not good but perhaps we can live with zeroing RAM on start up. How long does that take?
GCC malloc is pretty complicated. I can't imagine it needing so much memory for managing memory, but it is possible. Meanwhile I'll see what I can do with what malloc gives me. Maybe if I ask for memory in chunks rather that one big blob, it will give me more.
Those observations about hex numbers are dead-on. Here's part of my test code:
malloctest((1<<24));
for(n = 16; n < 20; n++)
malloctest((1<<24)-(1<<n));
malloctest(16000000);
@Heater, I'll do a couple of more days testing and then look at some integration possibilities. You've seen the zog kernel changes I made ... there are 3 places where I in-lined code rather than calling zpu_cache to up performance: execute, push_tos, and pop. Everything is enclosed in USE_JCACHED_MEM. I'm still working in v1_2. Do you want me to migrate to v1_3? I have to get back into hardware mode real soon ....
I really would like to get this running with Catalina which is really complicated to me, but I think I'm safe waiting until after PCBs go out to FAB before getting back to it.
As far as zeroing at startup goes, it's all in spin and I'm just clearing 128KB for testing which takes a couple of seconds. I'm not doing anything smart with it such as looking for .bss section which if I remember correctly needs to be zero'd - I noticed heap_ptr lives there. The end of the binary appears to be the beginning of .bss, so you could probably get away with just clearing a chunk after the program load. I think an intel-hex or other hex file would have the .bss size.
I'm trying to get debug_zog.spin to work on my Hydra and am blocked by what seems like a trivial change. To get the clock setup correctly for the Hydra I added the following code to debug_zog.spin expecting to be able to use -DHYDRA on the bstc command line to select the HYDRA clock settings (and maybe other Hydra-specific stuff later). Unfortunately, bstc doesn't like this code. What am I doing wrong?
bstc -Ox debug_zog
bstc without arguments gives you a list of options.
Other popular bstc options
bstc -Ocgrux -d COM4 -p0 debug_zog
The bst IDE is also available and has similar look-feel as propeller tool and supports bstc options.
--Steve
Thanks for the tip on using -Ox. That worked great!
I also figured out that it doesn't seem to like -d COM11. I had to use -d \\.\COM11 instead. I guess that's really a problem with Windows not with bstc though.
I haven't tried the IDE because I want to run bstc from a Makefile. I'm an old geezer and tend to use command line tools if I can. :-)
I got the bstc download working and now I'm having trouble getting ZOG to run on the Hydra. I changed the clock speed and I'm using the updated test_libzog.bin file that was built for 80mhz. I also uncommented the setup of the serial port and the first few debug messages in debug_zog.spin but I get no output on my serial terminal when I run the program. I'm using the test.bin file created by building the 'test' directory in the ZOG distribution archive. I'm using putty as a terminal program set to 115200 baud and connected to the COM port with the Propeller attached. Any idea what might be going wrong?
I've run into some trouble with malloc testing and USE_HUB_MEMORY. I'm attaching my source that has output for you to have a look. I'm scratching my head a bit over it.
--Steve
Hmm. I do the same test on SDRAM Cache and it works either way. I suppose some limit was breeched using HUB - don't know for sure. Now I'm testing 8MB SDRAM and other sizes. Will post results later.
Putty with a serial port? I *still* learn something new everyday.
If you're using debug_zog.spin, test_libzog.bin doesn't seem to matter.
Still, it sounds like you've done everything right so far. One thing to try is using the fibo.bin or hello.bin instead of test.bin. Are there any LEDs blinking on your Hydra? Can you use the Parallax Serial Terminal for your serial port? It comes with Propeller Tool.
I lost my Hydra 10MHz crystal, so I can't really test your clock settings exactly. The fibo.bin works with my 5MHz Hydra though.
Perhaps that is best. We now have some changes in the pipe for v1_4 but hey can be easily merged with your mods.
I'm surprised that anything needs zeroing prior to start up. After all ZPU code has normally runs on a core in an FPGA with no loader or other OS support. Do FPGAs zero their RAM blocks on reset?
I'll have a look at your malloc test when I have a moment.
Catalina which is really complicated to me
Memo to Zog Inc. marketing department: "Zog is much simpler than other C solutions for the Propeller":)
1) Forget about run_zog and test_libzog for now. They are somewhat experimental and do not work from external RAM at the moment.
2) Start with debug_zog and fibo, be sure you can compile it and make a fibo.bin. Copy fibo.bin to the zog directory.
3) Be sure you have "fibo.bin" in the file statement in debug_zog and also in the parameter to sd.popen in the load_bytecode method.
4) Be sure you have "USE_VIRTUAL_MEMORY" commented out and "USE_HUB_MEMORY" is in effect.
5) When that works simply reverse the defines to use external RAM.
6) If it does not work, in either case, try uncommenting SINGLE_STEP we should be able to see something happening with that. Just hit SPACE to execute single instructions.
This all assumes your serial connection is working, easy to see with debug_zog and that your SD card is working for the ext RAM load. Again easy to see with debug_zog.
Comments
If you or anyone else has some juicy questions I'll try to get into the habit of adding the info there.
That BASIC compiler sounds intriguing. Not compiling to ZPU byte codes I guess:)
I didn't know about ZPU when I wrote the original compiler but I guess it wouldn't be hard to modify it to generate ZPU instructions. I'll have to look over the instruction set. Thanks for the suggestion!
Sometime ago I did volunteer to write one. Perhaps now is the time to do so. Be aware that there are some errors and omissions in the instruction definitions on the Zylin web site.
As is often the case the best reference is some working code. I started with the source code of the ZPU simulator written in Java. Or there is my version in C, attached.
Do you already have an interpreter for your byte codes on the Prop?
Perhaps using ZPU byte codes and ZOG would be a short circuit in development of your BASIC, if the stack based ZPU machine is as efficient as you would like.
Yes, I already have an interpreter for the Propeller but ZOG still might be a better choice. I have an interpreter for my bytecodes written in C that I have run on the PIC and AVR processors and I ported it to the Propeller using Catalina C and it seems to work. I've also written one in Propeller ASM that I haven't had time to test yet. Still, ZOG would probably be a better choice. I could even support compiled Basic code linked with C/C++ code if I use ZOG!!
VMCOG comes with a built-in SPI SRAM driver for PropCade, and I am adding (hopefully today) a version for SPI RAM's on Morpheus. Either would be very easy to modify for the SPI ram being on other pins.
Later this week I am adding a driver for my FlexMem board (four bit wide bus using four SPI ram's for >2MB/sec burst transfer, up to 6.6MB/sec with the two cog special driver I am working on)
VMCOG was designed to allow easily adding additional drivers for any sort of memory interface - I intended it to be a HAL for different memory interfaces. Please see the comments and #ifdef'ed sections in the code.
The currently released VMCOG only supports 64KB of VM, however I am working on a version that will support at least 2MB of VM, much more if I go to a multi-level TLB.
Regards,
Bill
(p.s. I used XLISP many years ago...)
That sounds quite cool. Is there any chance it could be made to work with a pair of SRAMs in a two bit wide bus?
Maybe I should try porting XLISP to the Propeller using ZOG/VMCOG! I'm sure there are lots of people who would want to program the Propeller in Lisp! :-)
That would be brilliant. One could create applications in BASIC if that is ones preferred language and be able to make use of all kinds of C libraries. We would have common drivers for whatever devices in BASIC and C.
It would be pretty easy to do, however the speed gain over single bit access would be minimal due to the overhead.
The main reason I have not made an 8-bit wide version is that it would require ten Prop pins, and the cost/benefit is not there - eight of the 23K256 chips is $12 for 256KB of ram, and would result in an unpalatable price for any board using eight such chips.
Now that would be interesting... I wonder if I still have my old XLISP expert system code in a readable format?
Wow! That would be great!
Here's my latest SDRAM fibo result.
I just couldn't help stumbling over an SDRAM Cache optimization
fibo(00) = 000000 (00000ms)
fibo(01) = 000001 (00000ms)
fibo(02) = 000001 (00000ms)
fibo(03) = 000002 (00000ms)
fibo(04) = 000003 (00001ms)
fibo(05) = 000005 (00001ms)
fibo(06) = 000008 (00003ms)
fibo(07) = 000013 (00005ms)
fibo(08) = 000021 (00008ms)
fibo(09) = 000034 (00013ms)
fibo(10) = 000055 (00022ms)
fibo(11) = 000089 (00036ms)
fibo(12) = 000144 (00058ms)
fibo(13) = 000233 (00094ms)
fibo(14) = 000377 (00153ms)
fibo(15) = 000610 (00248ms)
fibo(16) = 000987 (00404ms)
fibo(17) = 001597 (00658ms)
fibo(18) = 002584 (01067ms)
fibo(19) = 004181 (01725ms)
fibo(20) = 006765 (02783ms)
fibo(21) = 010946 (04486ms)
fibo(22) = 017711 (07237ms)
fibo(23) = 028657 (11687ms)
fibo(24) = 046368 (18896ms)
Cheers.
--Steve
I'm just looking over the Zog code (1.3) for the first time, and it seems to me that the 'nos' (next on stack, I'm assuming) variable is only ever used right after it's set. Is it possible that 'nos' was going to be an optimization, but then got dropped? I think you could save yourself some cycles and longs by removing it.
Zog is looking awesome, btw! I'm really looking fwd to playing with it.
Jonathan
Edit: Also, it looks like the mult16x16 routine does a full 32 cycles, could you set the number of bits externally before the call?
In my C version of ZPU pretty much every instruction uses nos, which might as well have been called "temp" or something. A few of those survived the transition to PASM. I don't recall I had an actual use for nos in mind except to print a register dump compatible with that of the GHDL test harness for the VHDL ZPU even there it is not necessary.
I'll have a look at the 16x16 mult.
That reminds me, the ZPU in C should be got running under ZOG on the Prop just to complete the circle.
TOS holds top of stack, but you need to pop next item off the stack to be able to do any add, sub etc op --> thus you need NOS, which really is just a temp register
So those redundant moves are gone and "nos" is now "data".
Just finished making a test rig for the MCP23S17 port on the new boards; now I can finish my MCP23S17 object and easily test those ports in the future!
* Checking for the zpu_im, you could do this instead: that way, you don't need the "and data, #$7F" line inside zpu_im, :next.
* And you can try this for multiplication code Note that the 1st 3 ops are optional, but will increase the average speed, assuming you aren't using the mult operation exclusively for squaring values [8^)
Jonathan
@Heater,
I was having malloc troubles ... now I don't.
Before, I could get malloc to work like 3 times which happened to be after I ran a HUB based program. I've added some code to put garbage into memory, then clean it out to zero before I boot. If I don't clean memory before boot, malloc fails. If I clean it, malloc works reasonably. I'm still not sure why I can't successfully malloc over 16.6MB, but I don't think that's a memory hardware problem. Results posted below.
--Steve
Note below that "Free big buffer" after malloc means success, and "Oops ..." means malloc failed.
Filling first 1MB memory with garbage.
Filling first 1MB memory with garbage, then cleaning up before boot.
FWIW, the vaule 16646144 == (16 * 1024 * 1024) - (128 * 1024), which is easy to see if you look at the number in hex. Or in other words, your maximum allocation seems to be one half of (32 MB - 256 KB). Perhaps that might give some hints about how the C runtime heap subsystem is managing memory.
In other words you are allocating 16.6 MB using 16 * 1000 * 1000, but that is not a "binary" 16.6 MB -- it is really 15.87 MB.
Great stuff re: IM and CMPSUB.
IM is on the critical path as it is the most commonly used instruction. Removing the dispatch table lookup for IM made the fibo test 7% faster when running from HUB.
Looking at IM, I have now thrown out that decode_mask and used self-modifying code for the jmp in the execute loop to get to the right IM. Saves another LONG.
So IM is now:
and in the execute loop we have:
Jazzed,
That's good. So something is not being initialized properly, that's not good but perhaps we can live with zeroing RAM on start up. How long does that take?
Those observations about hex numbers are dead-on. Here's part of my test code:
@Heater, I'll do a couple of more days testing and then look at some integration possibilities. You've seen the zog kernel changes I made ... there are 3 places where I in-lined code rather than calling zpu_cache to up performance: execute, push_tos, and pop. Everything is enclosed in USE_JCACHED_MEM. I'm still working in v1_2. Do you want me to migrate to v1_3? I have to get back into hardware mode real soon ....
I really would like to get this running with Catalina which is really complicated to me, but I think I'm safe waiting until after PCBs go out to FAB before getting back to it.
As far as zeroing at startup goes, it's all in spin and I'm just clearing 128KB for testing which takes a couple of seconds. I'm not doing anything smart with it such as looking for .bss section which if I remember correctly needs to be zero'd - I noticed heap_ptr lives there. The end of the binary appears to be the beginning of .bss, so you could probably get away with just clearing a chunk after the program load. I think an intel-hex or other hex file would have the .bss size.
Cheers,
--Steve
The bstc manual says that #ifdef, #else, and #endif are supported. I'm running bstc 0.15.3.
bstc without arguments gives you a list of options.
Other popular bstc options
bstc -Ocgrux -d COM4 -p0 debug_zog
The bst IDE is also available and has similar look-feel as propeller tool and supports bstc options.
--Steve
Thanks for the tip on using -Ox. That worked great!
I also figured out that it doesn't seem to like -d COM11. I had to use -d \\.\COM11 instead. I guess that's really a problem with Windows not with bstc though.
I haven't tried the IDE because I want to run bstc from a Makefile. I'm an old geezer and tend to use command line tools if I can. :-)
Thanks!
David
I've run into some trouble with malloc testing and USE_HUB_MEMORY. I'm attaching my source that has output for you to have a look. I'm scratching my head a bit over it.
--Steve
Hmm. I do the same test on SDRAM Cache and it works either way. I suppose some limit was breeched using HUB - don't know for sure. Now I'm testing 8MB SDRAM and other sizes. Will post results later.
Putty with a serial port? I *still* learn something new everyday.
If you're using debug_zog.spin, test_libzog.bin doesn't seem to matter.
Still, it sounds like you've done everything right so far. One thing to try is using the fibo.bin or hello.bin instead of test.bin. Are there any LEDs blinking on your Hydra? Can you use the Parallax Serial Terminal for your serial port? It comes with Propeller Tool.
I lost my Hydra 10MHz crystal, so I can't really test your clock settings exactly. The fibo.bin works with my 5MHz Hydra though.
Perhaps that is best. We now have some changes in the pipe for v1_4 but hey can be easily merged with your mods.
I'm surprised that anything needs zeroing prior to start up. After all ZPU code has normally runs on a core in an FPGA with no loader or other OS support. Do FPGAs zero their RAM blocks on reset?
I'll have a look at your malloc test when I have a moment.
Memo to Zog Inc. marketing department: "Zog is much simpler than other C solutions for the Propeller":)
A little check list:
1) Forget about run_zog and test_libzog for now. They are somewhat experimental and do not work from external RAM at the moment.
2) Start with debug_zog and fibo, be sure you can compile it and make a fibo.bin. Copy fibo.bin to the zog directory.
3) Be sure you have "fibo.bin" in the file statement in debug_zog and also in the parameter to sd.popen in the load_bytecode method.
4) Be sure you have "USE_VIRTUAL_MEMORY" commented out and "USE_HUB_MEMORY" is in effect.
5) When that works simply reverse the defines to use external RAM.
6) If it does not work, in either case, try uncommenting SINGLE_STEP we should be able to see something happening with that. Just hit SPACE to execute single instructions.
This all assumes your serial connection is working, easy to see with debug_zog and that your SD card is working for the ext RAM load. Again easy to see with debug_zog.