Shop OBEX P1 Docs P2 Docs Learn Events
newbie question: Inline assembly? — Parallax Forums

newbie question: Inline assembly?

Roger MilneRoger Milne Posts: 11
edited 2009-09-18 22:54 in Propeller 1
I am attempting to optimize some of my code from Spin to Assembly.

I hit the books, and couldn't see any way to inline assembly within my spin method.· Nor could I find a way to call an assembly function stored in a DAT block, without spawning another cog to go execute that code?· Is this true?· Help!· I'm itching to make my code faster!

···· hop.gif

Thanks!!

··· Roger

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2009-09-17 03:36
    You are right, it has to be launched in a different cog.
    Roger Milne said...
    I am attempting to optimize some of my code from Spin to Assembly.

    I hit the books, and couldn't see any way to inline assembly within my spin method. Nor could I find a way to call an assembly function stored in a DAT block, without spawning another cog to go execute that code? Is this true? Help! I'm itching to make my code faster!

    hop.gif

    Thanks!!

    Roger
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
    Morpheus & Mem+dual Prop SBC w/ 512KB kit $119.95, 2MB memory IO board kit $89.95, both kits $189.95
    www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
    Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
  • potatoheadpotatohead Posts: 10,261
    edited 2009-09-17 05:45
    You can have that other COG launched and waiting to run though. Depending on what you want to do, this can be damn quick! Look at how Chip did the graphics functions in the demo program.

    Have your assembly cog looking for a one set in some long somewhere. When it sees it, it loads it's parameters and does it's thing, writing a zero when done.

    Your parent cog watches for the 0 while doing it's thing.

    Share data between the two with a block of longs in the HUB.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
    Safety Tip: Life is as good as YOU think it is!
  • KyeKye Posts: 2,200
    edited 2009-09-17 21:37
    Since spin is not asmebly it cannot be in line. both have two very different constructs and implementations.

    Not really a problem however as you have 8 cores. Use em.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nyamekye,
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2009-09-17 22:44
    The program memory for each cog is independent and only 512 longs whereas the 32K bytes of hub memory is only accessible as slow data memory. So Spin object code resides in hub memory and is actually interpreted by a Spin interpreter loaded into a cog which is of course in assembler.

    *Peter*
  • Roger MilneRoger Milne Posts: 11
    edited 2009-09-18 00:51
    Ah yes, this all makes much sense in hind-sight. I really like the command-listener thing. To take it one step further, there could be a command to cause that cog to exit, and free-up the cog over the time that you don't need to send any commands to it (extended idle time).

    Thanks to everyone!

    Roger
  • ericballericball Posts: 774
    edited 2009-09-18 13:29
    Roger Milne said...
    Ah yes, this all makes much sense in hind-sight. I really like the command-listener thing. To take it one step further, there could be a command to cause that cog to exit, and free-up the cog over the time that you don't need to send any commands to it (extended idle time).
    COGSTOP, but then you will need to use COGNEW/COGINIT to reload the cog (and endure the 8000 CLK startup time).


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Composite NTSC sprite driver: Forum
    NTSC & PAL driver templates: ObEx Forum
    OnePinTVText driver: ObEx Forum
  • Ken PetersonKen Peterson Posts: 806
    edited 2009-09-18 15:06
    To bring the cog load time into perspective, if your propeller is clocked at 80MHz, it's takes approximately 102 microseconds to load a cog. That's about 1 1/2 scan lines of NTSC video. At that rate you can re-launch a cog 9800 times a second.
  • potatoheadpotatohead Posts: 10,261
    edited 2009-09-18 15:14
    Thanks for that quick reference. Perfect.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
    Safety Tip: Life is as good as YOU think it is!
  • jazzedjazzed Posts: 11,803
    edited 2009-09-18 15:22
    Roger Milne said...
    To take it one step further, there could be a command to cause that cog to exit, and free-up the cog over the time that you don't need to send any commands to it (extended idle time).
    Hi Roger. Welcome to the Propeller forum.

    As a new-comer you should probably look at learning PASM the way the Propeller manual describes to do it. A vast majority of PASM code is written that way (this of course includes "Spinning" in a command listener). What Ken suggests is a reasonable and simple method that is usable with normal PASM coding (beat me to the post).

    Still, what you have suggested is possible and I do something similar to that in my PASM BMA debugger at some cost in terms of extra HUB memory used. I don't have to restart a cog to do it. Starting a COG takes about 102us at 80MHz (12.5*16*512). Basically, a stub for handling COG maintenance (COG server?) is loaded into a COG, and a Spin object (COG client?) manages what the COG does. A set of PASM instructions ending with a JMP #1 (the COG server reentry) can be sent to the COG as needed from any source (a Coglet?) and run there in real-time. The source is available if you want to have a look in my .signature BMA debugger link. File PASMstub.spin has most of the methods to manage the COG server; the "GO" feature that tells the loaded code to run is in BMAutility.spin for version 1.3.

    The LMM or Large Memory Model (as Bill Henning originally called it) is another method which has lots of developer history. LMM is slower than run-time PASM, can run up to 8K longs vs 512 in a COG, and is a little different from PASM. LMM is slower because it uses a COG loop to grab and "interpret" LMM-PASM from hub memory. Since the LMM-PASM code lives in the HUB, near 8K (32KB/4B per instructions) of PASM can be used per program load. The LMM deviations from normal PASM have to do with the way jumps and data are handled and is a little hard to follow at first.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve

    Propeller Tools
  • Ken PetersonKen Peterson Posts: 806
    edited 2009-09-18 16:28
    Speaking of LMM (not to hijack the thread) but is there a comprehensive reference describing LMM and how it works in detail in one document?

    Now...back to your Inline PASM conversation....

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "I have not failed. I've just found 10,000 ways that won't work. "
    - Thomas A. Edison
  • jazzedjazzed Posts: 11,803
    edited 2009-09-18 17:41
    Ken Peterson said...
    Speaking of LMM (not to hijack the thread) but is there a comprehensive reference describing LMM and how it works in detail in one document?

    Now...back to your Inline PASM conversation....

    Ken, there is nothing short and fully comprehensive that I know of (because different people like different approaches).

    Here are some wiki pages:
    propeller.wikispaces.com/Large+Memory+Model
    propeller.wikispaces.com/LMM+Phil+Pilgrim+(PhiPi)
    propeller.wikispaces.com/LMM+Pacito
    propeller.wikispaces.com/LMM+AiChip+Industries

    Many people have written variations. ImageCraft produced one of the first widely used working implementations of LMM. Bill is morphing LMM into LAS (or has morphed it) which is supposed to make LMM less abstract. The original thread is here: http://forums.parallax.com/forums/default.aspx?f=25&m=154421

    Hippy was I think the first to do several projects with LMM and I followed his work as examples. Here is an excerpt from Hippy's AI_Chip LMM version 1 (can't find the thread, but I believe there are later versions) which is an original and well thought description worthy of being included in a book (for royalties). Maybe someone will invite his participation some day.

    ' *******************************************************************************************************
    ' *                                                                                                     *
    ' *     AiChip_LmmVm_001.spin                                                                           *
    ' *                                                                                                     *
    ' *******************************************************************************************************
    
    PRI Version
      return String("AiChip_LmmVm_001")
      
    CON
    {
    
    This is a Virtual Machine (VM) which allows execution of Large Memory Model (LMM) programs. LMM programs
    are those which can be larger than the 496 longs allowed for by a Cog's memory limitation. This VM is
    designed to be easy to use with the Propeller Tool although it does necessitate some manual changes to
    use a Propeller Assembler program as an LMM program.
    
    This VM only supports user variables which are held within the executing Cog memory and the number of
    variables is an absolute maximum of 496 and will be lower than that because of the space taken up by
    the VM code itself. If the internal stack is used, the number of user variables is reduced further.
    
    The current LMM VM footprint is approximately 72 longs ( excluding debugging code ) and can be reduced
    to around 60 longs if conditional LMM instructions are not required ( LMM_Conditional removed ). There
    are therefore around 420 Cog memory locations available for user registers and internal stack.
    
    The following changes must be made to make a Propeller Assembly program LMM compatible -
    
                    mov     <reg>,<longConstant>            jmp     #LMM_Load
                                                            long    <reg>
                                                            long    <longConstant>
    
                    jmp     #<Label>                        jmp     #LMM_Jmp
                                                            long    @<Label>
    
                    jmp     <reg>                           jmp     #LMM_Jmp_Reg
                                                            long    <reg>
    
                    call    #<Label>                        jmp     #LMM_Call
                                                            long    @<Label>
    
                    jmpret  <Label>_Ret,#<Label>            jmp     #LMM_Call
                                                            long    @<Label>
    
                    ret                                     jmp     #LMM_Ret
    
                    djnz    <reg>,#<Label>                  jmp     #LMM_Djnz
                                                            long    <reg>
                                                            long    @<Label>
    
                    tjz     <reg>,#<Label>                  jmp     #LMM_Tjz
                                                            long    <reg>
                                                            long    @<Label>
    
                    tjnz    <reg>,#<Label>                  jmp     #LMM_Tjnz
                                                            long    <reg>
                                                            long    @<Label>
    
    If any of the above need to be conditionally executed they should be preceded by 'jmp #LMM_Conditional'
    as in the following example ...
    
            IF_Z    call    #<Label>                        jmp     #LMM_Conditional
                                                    IF_Z    jmp     #LMM_Call
                                                            long    @<Label>
    
    The following Propeller Assembly instructions are not currently supported for LMM programs and must
    not be used -
    
                    jmpret  reg,reg
                    djnz    reg,reg
                    tjz     reg,reg
                    tjnz    reg,reg
    
    This VM uses a software stack rather than modifying 'ret' instructions as a genuine Propeller Cog
    program would. An internal stack is used by default which grows down in Cog memory from $1EF. When
    a large number of user Registers are defined care should be taken to avoid stack overflow corrupting
    registers, or an external stack may be used. If an external stack is required it should be enabled
    by using -
    
                    jmp     #LMM_Load
                    long    sp
                    long    <stackAddress>
    
    When an external stack is used, the stack will grow upwards in memory from the start of the array. To
    switch back from the external stack to the internal stack, use -
    
                    jmp     #LMM_Load
                    long    sp
                    long    0
    
    Stack swithing between internal and external can be done dynamically as required. Take care to return
    from the same stack as any call was issued upon. There is little, if any, discernable deterioration
    in performnce when using an external ( hub memory ) stack as opposed to the internal stack.
    
    Note that when converting 'jmp reg' the 'reg' should have been loaded with an '@Label' of where the
    destination jump will be to. For example, the following are equivalent -
    
                    jmp     #LMM_Jmp                        jmp     #LMM_Load
                    long    @<Label>                        long    <reg>
                                                            long    @<Label>
                                                            :
                                                            jmp     #LMM_Jmp_Reg
                                                            long    <reg>
    
    The current LMM substitutions for Propeller instructions have been chosen to be easy to enter by hand
    and use within the Propeller Tool and these could be optimised. The 'jmp #LMM' commands can have an
    associated register embedded within the command ( as the unused destination register ) and two word
    arguments can be compacted as a single long. The consequence though is that any increased code density
    will be offset by decreased speed of execution. The VM code for LMM_Djnz, LMM_Tjz and LMM_Tjnz could
    be reduced to set the opcode bits of the required operation and make most of each routine common code 
    and this would probably be worthwile if few of those instructions are executed within the LMM program.
    
    There is no overlay or 'block load and execute' capability in this VM. The VM is however very easily
    extensible to add overlays or other special VM instructions such as Push and Pop register.
    
    The major ommission in this VM is the lack of register indirect operations which cannot be achieved by
    using self-modifying code. Registers can be modified using movi, movd and movs but there is no 'execute
    register as an instruction' operation. It would be easy to add one, and provide Get Register Indirect
    and Put register Indirect instructions.
     
    LMM Instructions tested ...
    
      LMM_Load
      LMM_Jmp
      LMM_Jmp_Reg
      LMM_Call
      LMM_Ret
      LMM_Djnz
      LMM_Tjz
      LMM_Tjnz
      
    LMM Instructions not tested ...
    
      LMM_Conditional
    }
    }
    CON
    {
      
    PLEASE NOTE : If modifying and re-releasing this code, please change the prefix of the source file from
                  AiChip so as to avoid any confusion as to the origin of the source and update all internal
                  references as appropriate. If you wish to credit the orginal source, anything  along the
                  lines of, "Based on original AiChip_LmmVm_XXX from AiChip Industries", is acceptable.
    
                  AiChip is a trademark of AiChip Industries.
    
    }
    
    



    A small performance enhancement done in ICC is to use "sub/add PC, offset" for jumps ... harder to read but faster.

    Now, back to your regularly scheduled program ... brought to you by Ivory Soap!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve

    Propeller Tools Post Edited (jazzed) : 9/18/2009 5:47:44 PM GMT
  • Mike GreenMike Green Posts: 23,101
    edited 2009-09-18 19:05
    Another small optimization that can be used with LMM is to use a JMPRET with the NR result flag set instead of a JMP followed by a long with a "register address". The "register address" is put in the destination field of the JMPRET and the LMM subroutine used (like LMM_Djnz, LMM_Tjz, LMM_Tjnz, LMM_Load, and LMM_Jmpret) can use that.
  • jazzedjazzed Posts: 11,803
    edited 2009-09-18 22:54
    Mike Green said...
    Another small optimization that can be used with LMM is to use a JMPRET with the NR result flag set instead of a JMP followed by a long with a "register address". The "register address" is put in the destination field of the JMPRET and the LMM subroutine used (like LMM_Djnz, LMM_Tjz, LMM_Tjnz, LMM_Load, and LMM_Jmpret) can use that.

    Yes! I remember this now. Good for passing a parameter, return value, etc....
    Funny how I used to think CALL was a separate instruction [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve

    Propeller Tools
Sign In or Register to comment.