Assembly Code Examples for the Beginner

Wurlitzer · 2006-12-13 19:20

Sorry, I'm thick today (well maybe more than today). I have an application which will require 5 or more cogs (some with SPIN others in Assy) to read and write to a common Byte Array[noparse][[/noparse]100] in Main RAM.

I cannot get my head around how to assure all cogs know the starting address address of this common array. Does it require the array to be declared in the Top Object then have the Top Object call all the subseqent "CogNew(@SpinOrAssyObjectName,@@ByteArray[noparse][[/noparse]0])" and then the PAR register in each cog will contain the proper starting address?

I don't have any worries in this program regarding conflicting writes I just need to be able to read/write to any element in this 100 element ByteArray from any cog.

Thanks

SailerMan · 2006-12-14 14:47

Hello... I've had my propeller for a few weeks now and am starting to understand SPIN, now for the first time I am ready to begin my quest into ASM... I thought the propeller and it's community would be a great place to start.

Everytime I begin to learn ASM I get discouraged and quit because although I understand what some of the commands do ( MOV, ADD SHR) when I try to combine them to make programs I get lost.

I have been programming in dialects of basic since the days of the commodore pet 1977. but never took it upon myself to dive into ASM, now my mind always thinks in the BASIC way of doing things.

Since this is an ASM beginners thread, can we start from the basics and compare the ASM to SPIN or (BASIC) Equivalents.

Thanks to anyone that is willing to lead me in the right direction.

Regards,
Eric

Paul Baker · 2006-12-14 22:09

Eric, this discussion is best placed in the non-sticky portion of the forum, first alot of people dont regularly look to see if there are new posts in the sticky section and second we try to keep sticky discussions as on-topic as possible. Fundamentals questions fit here, but comparison/contrast with other languages doesn't.

-Thanks

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

SailerMan · 2006-12-15 00:09

I guess what I am after is more ASM code examples..... for the Beginner... The only reason I mentioned comparison was because sometimes things are easier to understand when you see ASM code with an Equivalent Spin code. Sorry.

·

Paul Baker · 2006-12-15 01:08

No problem, I didn't mean to sound harsh.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Paul Baker · 2006-12-15 04:16

Eric, in an effort to help you, can you name a specific Spin or SX/B command you'd like to see coded into assembly? I just ask you start with something a little simpler than SHIFTIN/OUT or other similarly complex commands.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

SailerMan · 2006-12-15 15:46

Thanks... Why don't you move this section of the tread to another post.
In any acount My problem lies in the fact that I have a hard time thinking low level.
Let's start out really small.

·······

Pub Main|Index,Count
     Repeat Index From 0 to 10
       Count+=1

····

Paul Baker · 2006-12-16 00:05

Here's the thread I started: http://forums.parallax.com/showthread.php?p=621432

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Karl Smith · 2006-12-30 02:33

How would I convert this line of spin code to asm, I just can't seem to get it to work, I have tried all different variants of rdbyte and wrbyte one variant listed below:

bytemove(@End,@Start,1)

DAT
······················· org
······················· mov r0, @Start
······················· mov r3, @End
······················· rdbyte r1,r0
······················· wrbyte r0,r3
·······················
······················· CogId·· CogNum········· 'Get COG ID
······················· CogStop CogNum········· 'Stop this COG

CogNum·············· res···· 1·············· 'Reserved variables
R0····················· long
R1····················· long
R2····················· long
R3····················· long
R4····················· long
Start·················· byte "ABCD"
End···················· byte "EFGH"

Mike Green · 2006-12-30 02:45

You can't copy bytes individually in assembly the way you might do in SPIN because the cog's memory is not byte addressable. Each location is a long word. If you want to copy bytes in HUB (main - SPIN) memory, you could do it like this:

VAR byte Start, End ' Must be in this order
PUB start
   cognew(@begin,@Start)
   repeat          ' Wait for operation to finish
DAT
         org      0
begin mov     addr,PAR   ' get address of Start
         rdbyte  temp,addr ' get byte value
         add      addr,#1    ' move to next location
         wrbyte temp,addr ' store  byte value
         cogid   temp         ' stop cog
         cogstop temp
addr   res      1
temp  res     1

If what you want to do is to copy bytes from one location in a cog's memory to another,
you will have to use AND/OR and shift instructions and keep track of which byte in a word
that you're copying.

Karl Smith · 2006-12-30 03:15

Thank Mike, I will give it a try

Jello · 2007-04-23 23:37

Hi everybody,
Does anyone have a simple example of a case where a spin method calls·and asm method·tha calls
another asm method?
I need to clear my lcd screen and do other functions fast fast (via SPI).
So I am starting with cls method to get the hang of asm in hopes of eventually
refactoring·all of my lcd spin code to asm.
I figure a cls would be a good place to start wrapping my brain around it all.

such like:
'in spin
·pub CLS(0) 'to clear screen with given color····
··· 'call asm _CLS method
··· _cls(color)

'in asm
·_cls(color)··
·· loop·n times (calling asm spi engine shiftout method)
······ shiftout(...)

I'm sure it's a simple matter (just not simple to me)· [noparse]:)[/noparse]

thanks
j

Kaio · 2007-04-24 12:02

Jello,

you don't have methods in assembly code. Your assembly code is running in a separate Cog independently from the Cog which is interpreting the Spin code. You could have some functions in your assembly code which are starting with a label and ending with a ret-instruction. Then you can use a call-instruction to perform a function like a method in Spin. When you want to pass arguments to a function you have to declare these as long data.

But you can't call such a function directly from Spin. Therefore you must use some assembly code that will communicate over the main memory with the Spin code. It's waiting for a command and can use also arguments which must be passed over the main memory. If a command is received it calls accordingly the function.

DAT
                        org     0
                        
entry                   mov     Arg1,#$20
                        call    #cout
                        jmp     #entry                  'endless loop only for this example

cout                    mov     Temp,Arg1               'get argument
                        'do something
cout_ret                ret

Arg1                    res     1                       'or long    0
Temp                    res     1                       'or long    0

For an example of such a communication routine in assembly please have a look at file AsmDebug.spin from POD. This routine can also have a return value for some functions. To see how easy it is to call from Spin have a look at file PropDebugger.spin and there at method getFlags.
http://forums.parallax.com/showthread.php?p=639020

Post Edited (Kaio) : 4/24/2007 4:44:58 PM GMT

Jello · 2007-04-24 14:44

Kaio,
What you said makes sense and the examples you sited are helpful.
I haven't quite digested it all yet [noparse]:)[/noparse] but working on it. I have a lot to learn.
Thanks for the help Kaio!
J

ericball · 2007-06-13 13:31

Bean (Hitt Consulting) said...
Beau,
How long does it take to start/stop a new cog with an assembly program ?
If I have a routine that isn't fast enough in spin, I know it would be faster in assembly, but I don't know what the time delay is to launch a new cog.

Bean.

CogInit/CogNew forces the cog to execute a RDLONG for each word of cog RAM.· So the startup delay will be 512*16 = 8192 cycles (probably a few more for the initial HUB access and other startup delays).
·

ErNa · 2007-06-23 18:32

Does this also mean, that all other cogs are blocked for global access during this time?

Paul Baker · 2007-06-23 20:08

No, hub accesses are performed in round-robin non-blocking style (thats why theres the *16 factor in ericball's equation).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

deSilva · 2007-07-05 10:20

Well, not strictly for the beginner... However....
It is not widely known that SPIN allows full recursion of calls! This can be emulated within an assembly program by installing an ad-hoc stack mechanism.
It will be instructive anyhow to have a look at the many "patches". Note that you never shall "patch" crossing JMPRETs (aka CALLs), as the code has to stay re-entrant!

The speed-up is about 40, which is not so overwhelming compared to the general speed-up from SPIN to handmade assembly (about 80 according to my experience) which discloses a very efficient stack management within SPIN!

This innocent looking piece of SPIN.....

PUB spinFibo(n)
  if n>2
     return spinFibo(n-1)+ spinFibo(n-2)
  else
     return 1

... has thus created this "assembly-monster":

DAT
fiboasm
' PAR shall contain a reference to 2 longs
'  [noparse][[/noparse] 0 ] Argument for fibo (0: result ready)
'  [noparse][[/noparse] 1 ] Result
 
    mov a, #$1ff
    add a, cnt
    waitcnt a,#0    ' save energy while idling
    rdlong a, par
    tjz a,#fiboasm
  ' organize a stack
    mov stackP, #stack
    
    jmpret retaddr, #fibo   ' result = fibo(a)

   ' result available
    mov a, par
    add a, #4
    wrlong result, a
    mov a,#0
    wrlong a, par
    jmp #fiboasm

fibo
' if a<3 return 1
    cmps a, #3   wc
    mov resultat, #1
    if_c jmp retaddr

    
    add stackP, #1   ' points to the LAST USED entry
    movd :f1, stackP
    add stackP, #1       
    movd :f2, stackP  
:f1 mov 0-0, retaddr ' push return address 
:f2 mov 0-0, a       ' push argument

    sub a, #1 
    jmpret retaddr, #fibo ' call fibo(a-1)

    movs :f3, stackP 
    movd :f4, stackP 
:f3 mov a, 0-0      ' get argument
                    '... and substitute by result
:f4 mov 0-0, result                    

    sub a, #2
    jmpret retaddr, #fibo  ' call fibo(a-2)         

' add both reults
    movs :f5, stackP
    sub stackP, #1
:f5 add result, 0-0  
    movs :f6, stackP   ' return to caller
    sub stackP, #1     ' adjust stack
:f6 jmp 0-0    


retaddr  res 1
result res 1
a  res 1

' The stack runs from lower to higher addresses; stackP always points to the last used entry!
stackP res 0    ' a litte bit over-optimized [img]http://forums.parallax.com/images/smilies/smile.gif[/img]
stack res 100     ' ... or as long as it will go

----
If you are interested in the general timing without trying yourself:
fibo(29) needed:
26 sec with SPIN
1.8 sec with PHP on my mid-range Windows Notebook
800 ms with the above posted piece of code
30 ms with a very efficient FORTH Implementation on my mid-range Windows Notebook
----
BTW: I am well aware that there are simple algorithms to compute the n-th Fibonacci number in o(1) - this is obviously not the point

---
Edit a long time later:
1,1 sec PureBasic in Interpreter/Debugger Mode (on same Notebook)
15 ms PureBasic compiled to 16kB EXE-file on same Notebook

Post Edited (deSilva) : 12/28/2007 6:12:59 PM GMT

mirror · 2007-07-05 23:39

Just out of curiosity - which Spin to assembler compiler did you use?

Spin to bytecodes I understand. Spin to handcoded assembler I understand.

Are you saying that you have manually handcoded the Spin to assembler, and simulated a stack type machine in the process?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

deSilva · 2007-07-06 06:53

mirror said...
Are you saying that you have manually handcoded the Spin to assembler, and simulated a stack type machine in the process?

Yes, I hand-translated it and - necesssarily - needed a "stack".

A SPIN-to-machine-code compiler however is an interesting idea:
(1) Without a working LMM impossible, but.....
(2) ... within the range of 1,5 k generated code quite feasible
(3) As SPIN is a grammatically (and semantically as well) extremely simple language this can be done in a few weekends
(4) You could restrict the semantic somewhat to simplify the translation and will not necesarily need a "stack" at all for it

The main benefit for such a rudimentary compiler will be:
- automatically speed up your simple "hardware drivers" - written in SPIN for the sake of clarity and/or missing assembly skill
- standardize the SPIN - COG data exchange interface ("PAR") which had been mostly ad-hoc in the past.

But my posting had nothing to do with all this! I just wanted to:
- prove the feasibility of an advanced programming concept as recursion in Propeller Assembler
- show again the huge speed-up using machine code even in this case
- mention that - in this case - the optimized code runs faster than a GHz Windows PHP programm (which - of course - is not the slowest of all script languages, but comes close to it

)

Kaio · 2007-07-06 10:51

deSilva,

nice example of recursion even in assembly code. And the time it takes is also very interesting in comparison with a routine running on a GHz PC.

Thomas

Cats92 · 2007-08-15 09:58

Parkso,

as a beginner in assembly, i found your commented code examples very useful.

And i hope others.

Thanks

Cats92

deSilva · 2008-04-13 09:56

This is not strictly for beginners, but might help understanding indexed addressing

aTable             LONG 339999, -1, 66,1, 255, 777

' How to fetch 'theIndex' from 'aTable'

                    ADD     :mod1, theIndex
                    SUB     :mod1, theIndex
:mod1               MOV     theData, #aTable

Post Edited (deSilva) : 4/13/2008 10:04:05 AM GMT

Clemens · 2008-04-13 15:55

That's similar to what mike showed in his example halfway down on page 1 of this thread. - A little bit easier to comprehend, though. I wish I had this two days ago when I pondered for hours over what's the secret behind "0-0" in ":inline····mov····data,0-0" :-)
thanks.

darkxceed · 2008-09-26 17:39

Hello,

I have made an assembly program that write's data to an array, this is handled with the PAR.
A[noparse][[/noparse]1000] is the long array which contains the data but if I have B[noparse][[/noparse]1000], how can I now read and write to A and B with rdlong and wrlong?

And I have 2 cogs, can they read and write at the same time on A[noparse][[/noparse]100] for example?

Bart

hippy · 2008-09-26 20:54

1) You could make the two arrays contiguous which would place B[noparse][[/noparse] 0..999 ] as A[noparse][[/noparse] 1000..1999 ], or you could create a secondary array, the first entry which holds the address of A[noparse][[/noparse]0] the second the address of B[noparse][[/noparse]0] then pass the address of this 'pointer array' using PAR.

2) Yes, two or more Cogs can read or write at the same time ( ignoring that such access won't happen simultaneously due to the way each Cog gets access to Hub memory in sequence ).

darkxceed · 2008-09-27 15:52

Owke sow the hub makes it impossible that 2 cogs manipulate or read the data at the same hubmem adres.

Yes I tries something similar you mention about using 2 arrays, I made A even(0,2,4...) and B uneven(1,3,5...).

But is it then possible to write to PAR, I thought that it wasn't possible to write to PAR like

-mov PAR,#A

Btw, can I find some multiply en devide examples so I don't have to rewrite then if they exist, and will be faster that if I made such a function.

If there aren't I will have to write it.

Post Edited (darkxceed) : 9/27/2008 3:58:15 PM GMT

hippy · 2008-09-27 16:43

The hub mechanism prevents genuinely simultaneous access to the same hub memory location but that won't stop you getting into a mess if used inappropriately. Without taking care, if two Cogs write a value to the same location you'd not know which value were written. If one Cog is writing, any others can read without problems of them getting 'half written' values, providing the value is written in entirety in one go with wrlong etc.

PAR is read-only, you are right you cannot update it with 'mov', but you can alter what PAR will be set to as the second parameter of CogNew().

There are multiply and divide routines already written. A forum search is the best thing there.

darkxceed · 2008-10-06 16:39

I could not find multiplication code which can handel for example 18bit(value) * 4bit(value)

What is btw faster in spin(interpreter) or assembly?
A=B*C
A=A+B
A=B/C

·

Mike Green · 2008-10-06 17:08

I don't believe there is ready-made code for anything other than 16-bit x 16-bit multiplication. The assembly code for the Spin interpreter is available and that includes 32-bit x 32-bit multiplication. The 16-bit x 16-bit multiplication routine can easily be extended to a 32-bit x 16-bit routine by using two 32-bit locations for the product.

Assembly is always faster by quite a lot. Remember that the Spin interpreter is written in assembly language and it has additional overhead beyond the code required to do the actual operations. Multiplication and division have to be done with subroutines in either event. The Propeller doesn't have multiply or divide instructions to do it in hardware.

Assembly Code Examples for the Beginner

Comments