newbie question: Inline assembly?

Roger Milne · 2009-09-17 03:33

I am attempting to optimize some of my code from Spin to Assembly.

I hit the books, and couldn't see any way to inline assembly within my spin method.· Nor could I find a way to call an assembly function stored in a DAT block, without spawning another cog to go execute that code?· Is this true?· Help!· I'm itching to make my code faster!

····

Thanks!!

··· Roger

Bill Henning · 2009-09-17 03:36

You are right, it has to be launched in a different cog.

Roger Milne said...
I am attempting to optimize some of my code from Spin to Assembly.

I hit the books, and couldn't see any way to inline assembly within my spin method. Nor could I find a way to call an assembly function stored in a DAT block, without spawning another cog to go execute that code? Is this true? Help! I'm itching to make my code faster!

Thanks!!

Roger

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheus & Mem+dual Prop SBC w/ 512KB kit $119.95, 2MB memory IO board kit $89.95, both kits $189.95
www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller

potatohead · 2009-09-17 05:45

You can have that other COG launched and waiting to run though. Depending on what you want to do, this can be damn quick! Look at how Chip did the graphics functions in the demo program.

Have your assembly cog looking for a one set in some long somewhere. When it sees it, it loads it's parameters and does it's thing, writing a zero when done.

Your parent cog watches for the 0 while doing it's thing.

Share data between the two with a block of longs in the HUB.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!

Kye · 2009-09-17 21:37

Since spin is not asmebly it cannot be in line. both have two very different constructs and implementations.

Not really a problem however as you have 8 cores. Use em.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,

Peter Jakacki · 2009-09-17 22:44

The program memory for each cog is independent and only 512 longs whereas the 32K bytes of hub memory is only accessible as slow data memory. So Spin object code resides in hub memory and is actually interpreted by a Spin interpreter loaded into a cog which is of course in assembler.

*Peter*

Roger Milne · 2009-09-18 00:51

Ah yes, this all makes much sense in hind-sight. I really like the command-listener thing. To take it one step further, there could be a command to cause that cog to exit, and free-up the cog over the time that you don't need to send any commands to it (extended idle time).

Thanks to everyone!

Roger

ericball · 2009-09-18 13:29

Roger Milne said...
Ah yes, this all makes much sense in hind-sight. I really like the command-listener thing. To take it one step further, there could be a command to cause that cog to exit, and free-up the cog over the time that you don't need to send any commands to it (extended idle time).

COGSTOP, but then you will need to use COGNEW/COGINIT to reload the cog (and endure the 8000 CLK startup time).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

Ken Peterson · 2009-09-18 15:06

To bring the cog load time into perspective, if your propeller is clocked at 80MHz, it's takes approximately 102 microseconds to load a cog. That's about 1 1/2 scan lines of NTSC video. At that rate you can re-launch a cog 9800 times a second.

potatohead · 2009-09-18 15:14

Thanks for that quick reference. Perfect.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!

jazzed · 2009-09-18 15:22

Roger Milne said...
To take it one step further, there could be a command to cause that cog to exit, and free-up the cog over the time that you don't need to send any commands to it (extended idle time).

Hi Roger. Welcome to the Propeller forum.

As a new-comer you should probably look at learning PASM the way the Propeller manual describes to do it. A vast majority of PASM code is written that way (this of course includes "Spinning" in a command listener). What Ken suggests is a reasonable and simple method that is usable with normal PASM coding (beat me to the post).

Still, what you have suggested is possible and I do something similar to that in my PASM BMA debugger at some cost in terms of extra HUB memory used. I don't have to restart a cog to do it. Starting a COG takes about 102us at 80MHz (12.5*16*512). Basically, a stub for handling COG maintenance (COG server?) is loaded into a COG, and a Spin object (COG client?) manages what the COG does. A set of PASM instructions ending with a JMP #1 (the COG server reentry) can be sent to the COG as needed from any source (a Coglet?) and run there in real-time. The source is available if you want to have a look in my .signature BMA debugger link. File PASMstub.spin has most of the methods to manage the COG server; the "GO" feature that tells the loaded code to run is in BMAutility.spin for version 1.3.

The LMM or Large Memory Model (as Bill Henning originally called it) is another method which has lots of developer history. LMM is slower than run-time PASM, can run up to 8K longs vs 512 in a COG, and is a little different from PASM. LMM is slower because it uses a COG loop to grab and "interpret" LMM-PASM from hub memory. Since the LMM-PASM code lives in the HUB, near 8K (32KB/4B per instructions) of PASM can be used per program load. The LMM deviations from normal PASM have to do with the way jumps and data are handled and is a little hard to follow at first.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propeller Tools

Ken Peterson · 2009-09-18 16:28

Speaking of LMM (not to hijack the thread) but is there a comprehensive reference describing LMM and how it works in detail in one document?

Now...back to your Inline PASM conversation....

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"I have not failed. I've just found 10,000 ways that won't work. "
- Thomas A. Edison

jazzed · 2009-09-18 17:41

Ken Peterson said...
Speaking of LMM (not to hijack the thread) but is there a comprehensive reference describing LMM and how it works in detail in one document?

Now...back to your Inline PASM conversation....

Ken, there is nothing short and fully comprehensive that I know of (because different people like different approaches).

Here are some wiki pages:
propeller.wikispaces.com/Large+Memory+Model
propeller.wikispaces.com/LMM+Phil+Pilgrim+(PhiPi)
propeller.wikispaces.com/LMM+Pacito
propeller.wikispaces.com/LMM+AiChip+Industries

Many people have written variations. ImageCraft produced one of the first widely used working implementations of LMM. Bill is morphing LMM into LAS (or has morphed it) which is supposed to make LMM less abstract. The original thread is here: http://forums.parallax.com/forums/default.aspx?f=25&m=154421

Hippy was I think the first to do several projects with LMM and I followed his work as examples. Here is an excerpt from Hippy's AI_Chip LMM version 1 (can't find the thread, but I believe there are later versions) which is an original and well thought description worthy of being included in a book (for royalties). Maybe someone will invite his participation some day.

' *******************************************************************************************************
' *                                                                                                     *
' *     AiChip_LmmVm_001.spin                                                                           *
' *                                                                                                     *
' *******************************************************************************************************

PRI Version
  return String("AiChip_LmmVm_001")
  
CON
{

This is a Virtual Machine (VM) which allows execution of Large Memory Model (LMM) programs. LMM programs
are those which can be larger than the 496 longs allowed for by a Cog's memory limitation. This VM is
designed to be easy to use with the Propeller Tool although it does necessitate some manual changes to
use a Propeller Assembler program as an LMM program.

This VM only supports user variables which are held within the executing Cog memory and the number of
variables is an absolute maximum of 496 and will be lower than that because of the space taken up by
the VM code itself. If the internal stack is used, the number of user variables is reduced further.

The current LMM VM footprint is approximately 72 longs ( excluding debugging code ) and can be reduced
to around 60 longs if conditional LMM instructions are not required ( LMM_Conditional removed ). There
are therefore around 420 Cog memory locations available for user registers and internal stack.

The following changes must be made to make a Propeller Assembly program LMM compatible -

                mov     <reg>,<longConstant>            jmp     #LMM_Load
                                                        long    <reg>
                                                        long    <longConstant>

                jmp     #<Label>                        jmp     #LMM_Jmp
                                                        long    @<Label>

                jmp     <reg>                           jmp     #LMM_Jmp_Reg
                                                        long    <reg>

                call    #<Label>                        jmp     #LMM_Call
                                                        long    @<Label>

                jmpret  <Label>_Ret,#<Label>            jmp     #LMM_Call
                                                        long    @<Label>

                ret                                     jmp     #LMM_Ret

                djnz    <reg>,#<Label>                  jmp     #LMM_Djnz
                                                        long    <reg>
                                                        long    @<Label>

                tjz     <reg>,#<Label>                  jmp     #LMM_Tjz
                                                        long    <reg>
                                                        long    @<Label>

                tjnz    <reg>,#<Label>                  jmp     #LMM_Tjnz
                                                        long    <reg>
                                                        long    @<Label>

If any of the above need to be conditionally executed they should be preceded by 'jmp #LMM_Conditional'
as in the following example ...

        IF_Z    call    #<Label>                        jmp     #LMM_Conditional
                                                IF_Z    jmp     #LMM_Call
                                                        long    @<Label>

The following Propeller Assembly instructions are not currently supported for LMM programs and must
not be used -

                jmpret  reg,reg
                djnz    reg,reg
                tjz     reg,reg
                tjnz    reg,reg

This VM uses a software stack rather than modifying 'ret' instructions as a genuine Propeller Cog
program would. An internal stack is used by default which grows down in Cog memory from $1EF. When
a large number of user Registers are defined care should be taken to avoid stack overflow corrupting
registers, or an external stack may be used. If an external stack is required it should be enabled
by using -

                jmp     #LMM_Load
                long    sp
                long    <stackAddress>

When an external stack is used, the stack will grow upwards in memory from the start of the array. To
switch back from the external stack to the internal stack, use -

                jmp     #LMM_Load
                long    sp
                long    0

Stack swithing between internal and external can be done dynamically as required. Take care to return
from the same stack as any call was issued upon. There is little, if any, discernable deterioration
in performnce when using an external ( hub memory ) stack as opposed to the internal stack.

Note that when converting 'jmp reg' the 'reg' should have been loaded with an '@Label' of where the
destination jump will be to. For example, the following are equivalent -

                jmp     #LMM_Jmp                        jmp     #LMM_Load
                long    @<Label>                        long    <reg>
                                                        long    @<Label>
                                                        :
                                                        jmp     #LMM_Jmp_Reg
                                                        long    <reg>

The current LMM substitutions for Propeller instructions have been chosen to be easy to enter by hand
and use within the Propeller Tool and these could be optimised. The 'jmp #LMM' commands can have an
associated register embedded within the command ( as the unused destination register ) and two word
arguments can be compacted as a single long. The consequence though is that any increased code density
will be offset by decreased speed of execution. The VM code for LMM_Djnz, LMM_Tjz and LMM_Tjnz could
be reduced to set the opcode bits of the required operation and make most of each routine common code 
and this would probably be worthwile if few of those instructions are executed within the LMM program.

There is no overlay or 'block load and execute' capability in this VM. The VM is however very easily
extensible to add overlays or other special VM instructions such as Push and Pop register.

The major ommission in this VM is the lack of register indirect operations which cannot be achieved by
using self-modifying code. Registers can be modified using movi, movd and movs but there is no 'execute
register as an instruction' operation. It would be easy to add one, and provide Get Register Indirect
and Put register Indirect instructions.
 
LMM Instructions tested ...

  LMM_Load
  LMM_Jmp
  LMM_Jmp_Reg
  LMM_Call
  LMM_Ret
  LMM_Djnz
  LMM_Tjz
  LMM_Tjnz
  
LMM Instructions not tested ...

  LMM_Conditional
}
}
CON
{
  
PLEASE NOTE : If modifying and re-releasing this code, please change the prefix of the source file from
              AiChip so as to avoid any confusion as to the origin of the source and update all internal
              references as appropriate. If you wish to credit the orginal source, anything  along the
              lines of, "Based on original AiChip_LmmVm_XXX from AiChip Industries", is acceptable.

              AiChip is a trademark of AiChip Industries.

}

A small performance enhancement done in ICC is to use "sub/add PC, offset" for jumps ... harder to read but faster.

Now, back to your regularly scheduled program ... brought to you by Ivory Soap!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propeller Tools

Post Edited (jazzed) : 9/18/2009 5:47:44 PM GMT

Mike Green · 2009-09-18 19:05

Another small optimization that can be used with LMM is to use a JMPRET with the NR result flag set instead of a JMP followed by a long with a "register address". The "register address" is put in the destination field of the JMPRET and the LMM subroutine used (like LMM_Djnz, LMM_Tjz, LMM_Tjnz, LMM_Load, and LMM_Jmpret) can use that.

jazzed · 2009-09-18 22:54

Mike Green said...
Another small optimization that can be used with LMM is to use a JMPRET with the NR result flag set instead of a JMP followed by a long with a "register address". The "register address" is put in the destination field of the JMPRET and the LMM subroutine used (like LMM_Djnz, LMM_Tjz, LMM_Tjnz, LMM_Load, and LMM_Jmpret) can use that.

Yes! I remember this now. Good for passing a parameter, return value, etc....
Funny how I used to think CALL was a separate instruction [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propeller Tools

newbie question: Inline assembly?

Comments