Will this work as a fast overlay loader?
lonesock
Posts: 917
Hi, everybody.
I have some untested code, but was hoping for some comments from those who have already done this sort of thing. It uses a counter PHSx to auto-increment the Hub address pointer, meaning the overlays themselves need to have the instructions spaced out every 4 longs. You can interleave 4 different overlays, of course, to minimize lost Hub RAM. If this idea doesn't look sound, I don't want to spend a bunch of time on a pre-processor or SPIN code to do the interleaving automatically.
Thanks in advance,
Jonathan
edit: fixed the line "mov phsb,overlay_pointer"
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
Post Edited (lonesock) : 12/11/2009 4:53:40 PM GMT
I have some untested code, but was hoping for some comments from those who have already done this sort of thing. It uses a counter PHSx to auto-increment the Hub address pointer, meaning the overlays themselves need to have the instructions spaced out every 4 longs. You can interleave 4 different overlays, of course, to minimize lost Hub RAM. If this idea doesn't look sound, I don't want to spend a bunch of time on a pre-processor or SPIN code to do the interleaving automatically.
{{ Jonathan Dummer untested overlay code }} PUB dummy DAT ORG 0 {{ Inputs * overlay_pointer - Hub location of the start of the overlay code * overlay_longs - number of instructions in this overlay The actual overlay instructions must be every 4 longs, because we are using a counter to auto increment the Hub Pointer. So, feel free to interlace 4 different overlay functions. I should probably write some SPIN code to do the interleaving, unless it's in a pre-processor which would have to run on the PC. Warning: Overwrites the Z flag! }} QuadOverlay ' Did we already load this? cmp overlay_pointer,last_ovr_pointer wz if_z jmp #Overlay_Execute ' save the address for next time, and find the end location mov last_ovr_pointer,overlay_pointer mov ovr_jump_return,overlay_longs add ovr_jump_return,overlay_address mov ovr_jump_return,jump_instruction ' start writing to the overlay's cog address movd Overlay_Load,overlay_address ' and start here in the Hub mov phsb,overlay_pointer Overlay_Load rdlong 0-0,phsb add Overlay_Load,incDest djnz overlay_longs,#Overlay_Load Overlay_Execute jmpret overlay_address,ovr_jump_return QuadOverlay_ret ret {===== PASM Initialized variables, constants, or Parameters =====} overlay_address long 0 ' where in cog to start the overlay incDest long 512 ' add 1 to the Destination of an instruction last_ovr_pointer long -1 ' this is guaranteed to net be valid jump_instruction jmp #0 ' We'll add this to the end of the overlay {===== PASM Scratch Variables =====} overlay_pointer res 1 ' Set this to the hub address of the 1st overlay inst. overlay_longs res 1 ' How many instructions ovr_jump_return res 1 ' the address of the end of this overlay FIT 496
Thanks in advance,
Jonathan
edit: fixed the line "mov phsb,overlay_pointer"
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
Post Edited (lonesock) : 12/11/2009 4:53:40 PM GMT
Comments
1http://forums.parallax.com/showthread.php?p=730815
Post Edited (kuroneko) : 12/9/2009 11:59:51 PM GMT
Half of the code size in the above post is a lazy overlay/execute. This let's me say (the equivalent of) "Overlay & execute Mult_32_32", and call the routine. If this was not the last overlay loaded, then the load is executed, otherwise it just jumps to the overlay and executes it (a total of 6 instructions of overhead to execute: call QuadOvr, cmp address, jmp, jmpret, ret_from_overlay, ret from QuadOvr). So I never have to explicitly track the last overlay function I called. I did not see a similar mechanism in the code that Cluso posted in his overlay thread (though, of course, this is probably because he doesn't need/use that feature [noparse][[/noparse]8^). If the lazy-overlay stuff is removed, the code is just 5 instructions (not including the initial set-Hub-address, and set-number-of-longs).
thanks,
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
The Phil/Cluso overlay loader is 5 instructions in the start up and then 3 instructions per LONG loaded. Minus one, because the last LONG loaded overwrites the DJNZ.
So if you are getting down to 2 instructions in the start up you are winning.
The Phil/Cluso loader requires a even number of LONGS to be loaded. As yours does not that's 3 instructions shaved off the loading time.
The Phil/Cluso loader loop is 6 instructions, loading two longs each time. Yours is only 3.
Looks like 7 LONGS of code space saved. Starts to sound very interesting. ZiCog needs all the LONGS it can get.
In the emulation engine I don't think we want the overhead of lazy loads. There are lots of overlays being swapped all the time. It might cost more to do the checking than it saves on not loading occasionally.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
It·has·already been·done and tested here (heater uses it in ZiCog). Acknowledgements in the code. Any improvements/suggestions appreciated.
Assembly Oververlay Loader for Cog FAST· http://forums.parallax.com/showthread.php?p=730815
It can be used as a call or just execute it. It checks if already loaded. The addresses and size are computed and optimised prior so it runs the fastest. The overlay has to be a multiple of 2 longs and code will automatically fill the wasted long if needed.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
So the questions are: Does lonesocks overlay loader work? Does it save any execution time? (seems not). Does it save any space in CLOG? Is it worth the effort to change given the need for a pre-processor?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 12/11/2009 5:04:45 PM GMT
This would a few different things:
1 - You would require the return jmp call to be part of the overlay instruction set inside Hub RAM, which in turn requires that the return jump be to a know location. If you have the Overlay code start at the beginning of the cog (right after the ORG 0), then you can terminate all your overlays with "jmp #6" (see requirement 3...it get's incremented by 1).
2 - you will be loading the final 0 to the cog, so you end up wasting one long of storage in the cog, and one long per overlay in the Hub.
3 - The other thing placing the loader at location 0 means, is you have to fall through gracefully on the 1st pass. For simplicity, I'll put a jmp to the entry point as the first instruction, and the overlay will effectively start at 1 instead of 0, so I update my overlays' final instruction to be "jmp #7".
4 - if you really need a NOP inside the overlay, use something like "if_never jmp #0".
NOTE: there was another typo in my first post (as well as my having callret operands backwards): the line initializing phsb should have read "mov phsb,overlay_pointer".
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.