Shop OBEX P1 Docs P2 Docs Learn Events
Assembly Code Examples for the Beginner - Page 2 — Parallax Forums

Assembly Code Examples for the Beginner



  • WurlitzerWurlitzer Posts: 237
    edited 2006-12-13 19:20
    Sorry, I'm thick today (well maybe more than today). I have an application which will require 5 or more cogs (some with SPIN others in Assy) to read and write to a common Byte Array[noparse][[/noparse]100] in Main RAM.

    I cannot get my head around how to assure all cogs know the starting address address of this common array. Does it require the array to be declared in the Top Object then have the Top Object call all the subseqent "CogNew(@SpinOrAssyObjectName,@@ByteArray[noparse][[/noparse]0])" and then the PAR register in each cog will contain the proper starting address?

    I don't have any worries in this program regarding conflicting writes I just need to be able to read/write to any element in this 100 element ByteArray from any cog.

  • SailerManSailerMan Posts: 337
    edited 2006-12-14 14:47
    Hello... I've had my propeller for a few weeks now and am starting to understand SPIN, now for the first time I am ready to begin my quest into ASM... I thought the propeller and it's community would be a great place to start.

    Everytime I begin to learn ASM I get discouraged and quit because although I understand what some of the commands do ( MOV, ADD SHR) when I try to combine them to make programs I get lost.

    I have been programming in dialects of basic since the days of the commodore pet 1977. but never took it upon myself to dive into ASM, now my mind always thinks in the BASIC way of doing things.

    Since this is an ASM beginners thread, can we start from the basics and compare the ASM to SPIN or (BASIC) Equivalents.

    Thanks to anyone that is willing to lead me in the right direction.

  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-12-14 22:09
    Eric, this discussion is best placed in the non-sticky portion of the forum, first alot of people dont regularly look to see if there are new posts in the sticky section and second we try to keep sticky discussions as on-topic as possible. Fundamentals questions fit here, but comparison/contrast with other languages doesn't.


    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • SailerManSailerMan Posts: 337
    edited 2006-12-15 00:09
    I guess what I am after is more ASM code examples..... for the Beginner... The only reason I mentioned comparison was because sometimes things are easier to understand when you see ASM code with an Equivalent Spin code. Sorry.

  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-12-15 01:08
    No problem, I didn't mean to sound harsh.

    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-12-15 04:16
    Eric, in an effort to help you, can you name a specific Spin or SX/B command you'd like to see coded into assembly? I just ask you start with something a little simpler than SHIFTIN/OUT or other similarly complex commands.

    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • SailerManSailerMan Posts: 337
    edited 2006-12-15 15:46
    Thanks... Why don't you move this section of the tread to another post.
    In any acount My problem lies in the fact that I have a hard time thinking low level.
    Let's start out really small.

    Pub Main|Index,Count
         Repeat Index From 0 to 10

  • Paul BakerPaul Baker Posts: 6,351
    edited 2006-12-16 00:05
    Here's the thread I started:

    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • Karl SmithKarl Smith Posts: 50
    edited 2006-12-30 02:33
    How would I convert this line of spin code to asm, I just can't seem to get it to work, I have tried all different variants of rdbyte and wrbyte one variant listed below:


    ······················· org
    ······················· mov r0, @Start
    ······················· mov r3, @End
    ······················· rdbyte r1,r0
    ······················· wrbyte r0,r3
    ······················· CogId·· CogNum········· 'Get COG ID
    ······················· CogStop CogNum········· 'Stop this COG

    CogNum·············· res···· 1·············· 'Reserved variables
    R0····················· long
    R1····················· long
    R2····················· long
    R3····················· long
    R4····················· long
    Start·················· byte "ABCD"
    End···················· byte "EFGH"
  • Mike GreenMike Green Posts: 23,101
    edited 2006-12-30 02:45
    You can't copy bytes individually in assembly the way you might do in SPIN because the cog's memory is not byte addressable. Each location is a long word. If you want to copy bytes in HUB (main - SPIN) memory, you could do it like this:
    VAR byte Start, End ' Must be in this order
    PUB start
       repeat          ' Wait for operation to finish
             org      0
    begin mov     addr,PAR   ' get address of Start
             rdbyte  temp,addr ' get byte value
             add      addr,#1    ' move to next location
             wrbyte temp,addr ' store  byte value
             cogid   temp         ' stop cog
             cogstop temp
    addr   res      1
    temp  res     1

    If what you want to do is to copy bytes from one location in a cog's memory to another,
    you will have to use AND/OR and shift instructions and keep track of which byte in a word
    that you're copying.
  • Karl SmithKarl Smith Posts: 50
    edited 2006-12-30 03:15
    Thank Mike, I will give it a try
  • JelloJello Posts: 9
    edited 2007-04-23 23:37
    Hi everybody,
    Does anyone have a simple example of a case where a spin method calls·and asm method·tha calls
    another asm method?
    I need to clear my lcd screen and do other functions fast fast (via SPI).
    So I am starting with cls method to get the hang of asm in hopes of eventually
    refactoring·all of my lcd spin code to asm.
    I figure a cls would be a good place to start wrapping my brain around it all.

    such like:
    'in spin
    ·pub CLS(0) 'to clear screen with given color····
    ··· 'call asm _CLS method
    ··· _cls(color)

    'in asm
    ·· loop·n times (calling asm spi engine shiftout method)
    ······ shiftout(...)

    I'm sure it's a simple matter (just not simple to me)· [noparse]:)[/noparse]

  • KaioKaio Posts: 253
    edited 2007-04-24 12:02

    you don't have methods in assembly code. Your assembly code is running in a separate Cog independently from the Cog which is interpreting the Spin code. You could have some functions in your assembly code which are starting with a label and ending with a ret-instruction. Then you can use a call-instruction to perform a function like a method in Spin. When you want to pass arguments to a function you have to declare these as long data.

    But you can't call such a function directly from Spin. Therefore you must use some assembly code that will communicate over the main memory with the Spin code. It's waiting for a command and can use also arguments which must be passed over the main memory. If a command is received it calls accordingly the function.
                            org     0
    entry                   mov     Arg1,#$20
                            call    #cout
                            jmp     #entry                  'endless loop only for this example
    cout                    mov     Temp,Arg1               'get argument
                            'do something
    cout_ret                ret
    Arg1                    res     1                       'or long    0
    Temp                    res     1                       'or long    0                                               

    For an example of such a communication routine in assembly please have a look at file AsmDebug.spin from POD. This routine can also have a return value for some functions. To see how easy it is to call from Spin have a look at file PropDebugger.spin and there at method getFlags.

    Post Edited (Kaio) : 4/24/2007 4:44:58 PM GMT
  • JelloJello Posts: 9
    edited 2007-04-24 14:44
    What you said makes sense and the examples you sited are helpful.
    I haven't quite digested it all yet [noparse]:)[/noparse] but working on it. I have a lot to learn.
    Thanks for the help Kaio!
  • ericballericball Posts: 774
    edited 2007-06-13 13:31
    Bean (Hitt Consulting) said...
    How long does it take to start/stop a new cog with an assembly program ?
    If I have a routine that isn't fast enough in spin, I know it would be faster in assembly, but I don't know what the time delay is to launch a new cog.


    CogInit/CogNew forces the cog to execute a RDLONG for each word of cog RAM.· So the startup delay will be 512*16 = 8192 cycles (probably a few more for the initial HUB access and other startup delays).
  • ErNaErNa Posts: 1,743
    edited 2007-06-23 18:32
    Does this also mean, that all other cogs are blocked for global access during this time?
  • Paul BakerPaul Baker Posts: 6,351
    edited 2007-06-23 20:08
    No, hub accesses are performed in round-robin non-blocking style (thats why theres the *16 factor in ericball's equation).

    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • deSilvadeSilva Posts: 2,967
    edited 2007-07-05 10:20
    Well, not strictly for the beginner... However....
    It is not widely known that SPIN allows full recursion of calls! This can be emulated within an assembly program by installing an ad-hoc stack mechanism.
    It will be instructive anyhow to have a look at the many "patches". Note that you never shall "patch" crossing JMPRETs (aka CALLs), as the code has to stay re-entrant!

    The speed-up is about 40, which is not so overwhelming compared to the general speed-up from SPIN to handmade assembly (about 80 according to my experience) which discloses a very efficient stack management within SPIN!

    This innocent looking piece of SPIN.....
    PUB spinFibo(n)
      if n>2
         return spinFibo(n-1)+ spinFibo(n-2)
         return 1

    ... has thus created this "assembly-monster":
    ' PAR shall contain a reference to 2 longs
    '  [noparse][[/noparse] 0 ] Argument for fibo (0: result ready)
    '  [noparse][[/noparse] 1 ] Result
        mov a, #$1ff
        add a, cnt
        waitcnt a,#0    ' save energy while idling
        rdlong a, par
        tjz a,#fiboasm
      ' organize a stack
        mov stackP, #stack
        jmpret retaddr, #fibo   ' result = fibo(a)
       ' result available
        mov a, par
        add a, #4
        wrlong result, a
        mov a,#0
        wrlong a, par
        jmp #fiboasm
    ' if a<3 return 1
        cmps a, #3   wc
        mov resultat, #1
        if_c jmp retaddr
        add stackP, #1   ' points to the LAST USED entry
        movd :f1, stackP
        add stackP, #1       
        movd :f2, stackP  
    :f1 mov 0-0, retaddr ' push return address 
    :f2 mov 0-0, a       ' push argument
        sub a, #1 
        jmpret retaddr, #fibo ' call fibo(a-1)
        movs :f3, stackP 
        movd :f4, stackP 
    :f3 mov a, 0-0      ' get argument
                        '... and substitute by result
    :f4 mov 0-0, result                    
        sub a, #2
        jmpret retaddr, #fibo  ' call fibo(a-2)         
    ' add both reults
        movs :f5, stackP
        sub stackP, #1
    :f5 add result, 0-0  
        movs :f6, stackP   ' return to caller
        sub stackP, #1     ' adjust stack
    :f6 jmp 0-0    
    retaddr  res 1
    result res 1
    a  res 1
    ' The stack runs from lower to higher addresses; stackP always points to the last used entry!
    stackP res 0    ' a litte bit over-optimized [img][/img]
    stack res 100     ' ... or as long as it will go

    If you are interested in the general timing without trying yourself:
    fibo(29) needed:
    26 sec with SPIN
    1.8 sec with PHP on my mid-range Windows Notebook
    800 ms with the above posted piece of code
    30 ms with a very efficient FORTH Implementation on my mid-range Windows Notebook
    BTW: I am well aware that there are simple algorithms to compute the n-th Fibonacci number in o(1) - this is obviously not the point smile.gif

    Edit a long time later:
    1,1 sec PureBasic in Interpreter/Debugger Mode (on same Notebook)
    15 ms PureBasic compiled to 16kB EXE-file on same Notebook

    Post Edited (deSilva) : 12/28/2007 6:12:59 PM GMT
  • mirrormirror Posts: 322
    edited 2007-07-05 23:39
    Just out of curiosity - which Spin to assembler compiler did you use?

    Spin to bytecodes I understand. Spin to handcoded assembler I understand.

    Are you saying that you have manually handcoded the Spin to assembler, and simulated a stack type machine in the process?

  • deSilvadeSilva Posts: 2,967
    edited 2007-07-06 06:53
    mirror said...
    Are you saying that you have manually handcoded the Spin to assembler, and simulated a stack type machine in the process?
    Yes, I hand-translated it and - necesssarily - needed a "stack".

    A SPIN-to-machine-code compiler however is an interesting idea:
    (1) Without a working LMM impossible, but.....
    (2) ... within the range of 1,5 k generated code quite feasible
    (3) As SPIN is a grammatically (and semantically as well) extremely simple language this can be done in a few weekends
    (4) You could restrict the semantic somewhat to simplify the translation and will not necesarily need a "stack" at all for it

    The main benefit for such a rudimentary compiler will be:
    - automatically speed up your simple "hardware drivers" - written in SPIN for the sake of clarity and/or missing assembly skill
    - standardize the SPIN - COG data exchange interface ("PAR") which had been mostly ad-hoc in the past.

    But my posting had nothing to do with all this! I just wanted to:
    - prove the feasibility of an advanced programming concept as recursion in Propeller Assembler
    - show again the huge speed-up using machine code even in this case
    - mention that - in this case - the optimized code runs faster than a GHz Windows PHP programm (which - of course - is not the slowest of all script languages, but comes close to it smile.gif )
  • KaioKaio Posts: 253
    edited 2007-07-06 10:51

    nice example of recursion even in assembly code. And the time it takes is also very interesting in comparison with a routine running on a GHz PC.

  • Cats92Cats92 Posts: 149
    edited 2007-08-15 09:58

    as a beginner in assembly, i found your commented code examples very useful.

    And i hope others.


  • deSilvadeSilva Posts: 2,967
    edited 2008-04-13 09:56
    This is not strictly for beginners, but might help understanding indexed addressing
    aTable             LONG 339999, -1, 66,1, 255, 777
    ' How to fetch 'theIndex' from 'aTable'
                        ADD     :mod1, theIndex
                        SUB     :mod1, theIndex
    :mod1               MOV     theData, #aTable

    Post Edited (deSilva) : 4/13/2008 10:04:05 AM GMT
  • ClemensClemens Posts: 236
    edited 2008-04-13 15:55
    That's similar to what mike showed in his example halfway down on page 1 of this thread. - A little bit easier to comprehend, though. I wish I had this two days ago when I pondered for hours over what's the secret behind "0-0" in ":inline····mov····data,0-0" :-)
  • darkxceeddarkxceed Posts: 34
    edited 2008-09-26 17:39

    I have made an assembly program that write's data to an array, this is handled with the PAR.
    A[noparse][[/noparse]1000] is the long array which contains the data but if I have B[noparse][[/noparse]1000], how can I now read and write to A and B with rdlong and wrlong?

    And I have 2 cogs, can they read and write at the same time on A[noparse][[/noparse]100] for example?

  • hippyhippy Posts: 1,981
    edited 2008-09-26 20:54
    1) You could make the two arrays contiguous which would place B[noparse][[/noparse] 0..999 ] as A[noparse][[/noparse] 1000..1999 ], or you could create a secondary array, the first entry which holds the address of A[noparse][[/noparse]0] the second the address of B[noparse][[/noparse]0] then pass the address of this 'pointer array' using PAR.

    2) Yes, two or more Cogs can read or write at the same time ( ignoring that such access won't happen simultaneously due to the way each Cog gets access to Hub memory in sequence ).
  • darkxceeddarkxceed Posts: 34
    edited 2008-09-27 15:52
    Owke sow the hub makes it impossible that 2 cogs manipulate or read the data at the same hubmem adres.

    Yes I tries something similar you mention about using 2 arrays, I made A even(0,2,4...) and B uneven(1,3,5...).

    But is it then possible to write to PAR, I thought that it wasn't possible to write to PAR like

    -mov PAR,#A

    Btw, can I find some multiply en devide examples so I don't have to rewrite then if they exist, and will be faster that if I made such a function.

    If there aren't I will have to write it.

    Post Edited (darkxceed) : 9/27/2008 3:58:15 PM GMT
  • hippyhippy Posts: 1,981
    edited 2008-09-27 16:43
    The hub mechanism prevents genuinely simultaneous access to the same hub memory location but that won't stop you getting into a mess if used inappropriately. Without taking care, if two Cogs write a value to the same location you'd not know which value were written. If one Cog is writing, any others can read without problems of them getting 'half written' values, providing the value is written in entirety in one go with wrlong etc.

    PAR is read-only, you are right you cannot update it with 'mov', but you can alter what PAR will be set to as the second parameter of CogNew().

    There are multiply and divide routines already written. A forum search is the best thing there.
  • darkxceeddarkxceed Posts: 34
    edited 2008-10-06 16:39
    I could not find multiplication code which can handel for example 18bit(value) * 4bit(value)

    What is btw faster in spin(interpreter) or assembly?

  • Mike GreenMike Green Posts: 23,101
    edited 2008-10-06 17:08
    I don't believe there is ready-made code for anything other than 16-bit x 16-bit multiplication. The assembly code for the Spin interpreter is available and that includes 32-bit x 32-bit multiplication. The 16-bit x 16-bit multiplication routine can easily be extended to a 32-bit x 16-bit routine by using two 32-bit locations for the product.

    Assembly is always faster by quite a lot. Remember that the Spin interpreter is written in assembly language and it has additional overhead beyond the code required to do the actual operations. Multiplication and division have to be done with subroutines in either event. The Propeller doesn't have multiply or divide instructions to do it in hardware.
Sign In or Register to comment.