Sorry, I'm thick today (well maybe more than today). I have an application which will require 5 or more cogs (some with SPIN others in Assy) to read and write to a common Byte Array[noparse][[/noparse]100] in Main RAM.
I cannot get my head around how to assure all cogs know the starting address address of this common array. Does it require the array to be declared in the Top Object then have the Top Object call all the subseqent "CogNew(@SpinOrAssyObjectName,@@ByteArray[noparse][[/noparse]0])" and then the PAR register in each cog will contain the proper starting address?
I don't have any worries in this program regarding conflicting writes I just need to be able to read/write to any element in this 100 element ByteArray from any cog.
Hello... I've had my propeller for a few weeks now and am starting to understand SPIN, now for the first time I am ready to begin my quest into ASM... I thought the propeller and it's community would be a great place to start.
Everytime I begin to learn ASM I get discouraged and quit because although I understand what some of the commands do ( MOV, ADD SHR) when I try to combine them to make programs I get lost.
I have been programming in dialects of basic since the days of the commodore pet 1977. but never took it upon myself to dive into ASM, now my mind always thinks in the BASIC way of doing things.
Since this is an ASM beginners thread, can we start from the basics and compare the ASM to SPIN or (BASIC) Equivalents.
Thanks to anyone that is willing to lead me in the right direction.
Eric, this discussion is best placed in the non-sticky portion of the forum, first alot of people dont regularly look to see if there are new posts in the sticky section and second we try to keep sticky discussions as on-topic as possible. Fundamentals questions fit here, but comparison/contrast with other languages doesn't.
I guess what I am after is more ASM code examples..... for the Beginner... The only reason I mentioned comparison was because sometimes things are easier to understand when you see ASM code with an Equivalent Spin code. Sorry.
Eric, in an effort to help you, can you name a specific Spin or SX/B command you'd like to see coded into assembly? I just ask you start with something a little simpler than SHIFTIN/OUT or other similarly complex commands.
Thanks... Why don't you move this section of the tread to another post.
In any acount My problem lies in the fact that I have a hard time thinking low level.
Let's start out really small.
·······
Pub Main|Index,Count
Repeat Index From 0 to 10
Count+=1
How would I convert this line of spin code to asm, I just can't seem to get it to work, I have tried all different variants of rdbyte and wrbyte one variant listed below:
bytemove(@End,@Start,1)
DAT ······················· org ······················· mov r0, @Start ······················· mov r3, @End ······················· rdbyte r1,r0 ······················· wrbyte r0,r3 ······················· ······················· CogId·· CogNum········· 'Get COG ID ······················· CogStop CogNum········· 'Stop this COG
CogNum·············· res···· 1·············· 'Reserved variables
R0····················· long
R1····················· long
R2····················· long
R3····················· long
R4····················· long
Start·················· byte "ABCD"
End···················· byte "EFGH"
You can't copy bytes individually in assembly the way you might do in SPIN because the cog's memory is not byte addressable. Each location is a long word. If you want to copy bytes in HUB (main - SPIN) memory, you could do it like this:
VAR byte Start, End ' Must be in this order
PUB start
cognew(@begin,@Start)
repeat ' Wait for operation to finish
DAT
org 0
begin mov addr,PAR ' get address of Start
rdbyte temp,addr ' get byte value
add addr,#1 ' move to next location
wrbyte temp,addr ' store byte value
cogid temp ' stop cog
cogstop temp
addr res 1
temp res 1
If what you want to do is to copy bytes from one location in a cog's memory to another,
you will have to use AND/OR and shift instructions and keep track of which byte in a word
that you're copying.
Hi everybody,
Does anyone have a simple example of a case where a spin method calls·and asm method·tha calls
another asm method?
I need to clear my lcd screen and do other functions fast fast (via SPI).
So I am starting with cls method to get the hang of asm in hopes of eventually
refactoring·all of my lcd spin code to asm.
I figure a cls would be a good place to start wrapping my brain around it all.
such like:
'in spin ·pub CLS(0) 'to clear screen with given color···· ··· 'call asm _CLS method ··· _cls(color)
you don't have methods in assembly code. Your assembly code is running in a separate Cog independently from the Cog which is interpreting the Spin code. You could have some functions in your assembly code which are starting with a label and ending with a ret-instruction. Then you can use a call-instruction to perform a function like a method in Spin. When you want to pass arguments to a function you have to declare these as long data.
But you can't call such a function directly from Spin. Therefore you must use some assembly code that will communicate over the main memory with the Spin code. It's waiting for a command and can use also arguments which must be passed over the main memory. If a command is received it calls accordingly the function.
DAT
org 0
entry mov Arg1,#$20
call #cout
jmp #entry 'endless loop only for this example
cout mov Temp,Arg1 'get argument
'do something
cout_ret ret
Arg1 res 1 'or long 0
Temp res 1 'or long 0
For an example of such a communication routine in assembly please have a look at file AsmDebug.spin from POD. This routine can also have a return value for some functions. To see how easy it is to call from Spin have a look at file PropDebugger.spin and there at method getFlags. http://forums.parallax.com/showthread.php?p=639020
Kaio,
What you said makes sense and the examples you sited are helpful.
I haven't quite digested it all yet [noparse]:)[/noparse] but working on it. I have a lot to learn.
Thanks for the help Kaio!
J
Bean (Hitt Consulting) said...
Beau,
How long does it take to start/stop a new cog with an assembly program ?
If I have a routine that isn't fast enough in spin, I know it would be faster in assembly, but I don't know what the time delay is to launch a new cog.
Bean.
CogInit/CogNew forces the cog to execute a RDLONG for each word of cog RAM.· So the startup delay will be 512*16 = 8192 cycles (probably a few more for the initial HUB access and other startup delays). ·
Well, not strictly for the beginner... However....
It is not widely known that SPIN allows full recursion of calls! This can be emulated within an assembly program by installing an ad-hoc stack mechanism.
It will be instructive anyhow to have a look at the many "patches". Note that you never shall "patch" crossing JMPRETs (aka CALLs), as the code has to stay re-entrant!
The speed-up is about 40, which is not so overwhelming compared to the general speed-up from SPIN to handmade assembly (about 80 according to my experience) which discloses a very efficient stack management within SPIN!
This innocent looking piece of SPIN.....
PUB spinFibo(n)
if n>2
return spinFibo(n-1)+ spinFibo(n-2)
else
return 1
... has thus created this "assembly-monster":
DAT
fiboasm
' PAR shall contain a reference to 2 longs
' [noparse][[/noparse] 0 ] Argument for fibo (0: result ready)
' [noparse][[/noparse] 1 ] Result
mov a, #$1ff
add a, cnt
waitcnt a,#0 ' save energy while idling
rdlong a, par
tjz a,#fiboasm
' organize a stack
mov stackP, #stack
jmpret retaddr, #fibo ' result = fibo(a)
' result available
mov a, par
add a, #4
wrlong result, a
mov a,#0
wrlong a, par
jmp #fiboasm
fibo
' if a<3 return 1
cmps a, #3 wc
mov resultat, #1
if_c jmp retaddr
add stackP, #1 ' points to the LAST USED entry
movd :f1, stackP
add stackP, #1
movd :f2, stackP
:f1 mov 0-0, retaddr ' push return address
:f2 mov 0-0, a ' push argument
sub a, #1
jmpret retaddr, #fibo ' call fibo(a-1)
movs :f3, stackP
movd :f4, stackP
:f3 mov a, 0-0 ' get argument
'... and substitute by result
:f4 mov 0-0, result
sub a, #2
jmpret retaddr, #fibo ' call fibo(a-2)
' add both reults
movs :f5, stackP
sub stackP, #1
:f5 add result, 0-0
movs :f6, stackP ' return to caller
sub stackP, #1 ' adjust stack
:f6 jmp 0-0
retaddr res 1
result res 1
a res 1
' The stack runs from lower to higher addresses; stackP always points to the last used entry!
stackP res 0 ' a litte bit over-optimized [img]http://forums.parallax.com/images/smilies/smile.gif[/img]
stack res 100 ' ... or as long as it will go
----
If you are interested in the general timing without trying yourself:
fibo(29) needed:
26 sec with SPIN
1.8 sec with PHP on my mid-range Windows Notebook
800 ms with the above posted piece of code
30 ms with a very efficient FORTH Implementation on my mid-range Windows Notebook
----
BTW: I am well aware that there are simple algorithms to compute the n-th Fibonacci number in o(1) - this is obviously not the point
---
Edit a long time later:
1,1 sec PureBasic in Interpreter/Debugger Mode (on same Notebook)
15 ms PureBasic compiled to 16kB EXE-file on same Notebook
mirror said...
Are you saying that you have manually handcoded the Spin to assembler, and simulated a stack type machine in the process?
Yes, I hand-translated it and - necesssarily - needed a "stack".
A SPIN-to-machine-code compiler however is an interesting idea:
(1) Without a working LMM impossible, but.....
(2) ... within the range of 1,5 k generated code quite feasible
(3) As SPIN is a grammatically (and semantically as well) extremely simple language this can be done in a few weekends
(4) You could restrict the semantic somewhat to simplify the translation and will not necesarily need a "stack" at all for it
The main benefit for such a rudimentary compiler will be:
- automatically speed up your simple "hardware drivers" - written in SPIN for the sake of clarity and/or missing assembly skill
- standardize the SPIN - COG data exchange interface ("PAR") which had been mostly ad-hoc in the past.
But my posting had nothing to do with all this! I just wanted to:
- prove the feasibility of an advanced programming concept as recursion in Propeller Assembler
- show again the huge speed-up using machine code even in this case
- mention that - in this case - the optimized code runs faster than a GHz Windows PHP programm (which - of course - is not the slowest of all script languages, but comes close to it )
That's similar to what mike showed in his example halfway down on page 1 of this thread. - A little bit easier to comprehend, though. I wish I had this two days ago when I pondered for hours over what's the secret behind "0-0" in ":inline····mov····data,0-0" :-)
thanks.
I have made an assembly program that write's data to an array, this is handled with the PAR.
A[noparse][[/noparse]1000] is the long array which contains the data but if I have B[noparse][[/noparse]1000], how can I now read and write to A and B with rdlong and wrlong?
And I have 2 cogs, can they read and write at the same time on A[noparse][[/noparse]100] for example?
1) You could make the two arrays contiguous which would place B[noparse][[/noparse] 0..999 ] as A[noparse][[/noparse] 1000..1999 ], or you could create a secondary array, the first entry which holds the address of A[noparse][[/noparse]0] the second the address of B[noparse][[/noparse]0] then pass the address of this 'pointer array' using PAR.
2) Yes, two or more Cogs can read or write at the same time ( ignoring that such access won't happen simultaneously due to the way each Cog gets access to Hub memory in sequence ).
The hub mechanism prevents genuinely simultaneous access to the same hub memory location but that won't stop you getting into a mess if used inappropriately. Without taking care, if two Cogs write a value to the same location you'd not know which value were written. If one Cog is writing, any others can read without problems of them getting 'half written' values, providing the value is written in entirety in one go with wrlong etc.
PAR is read-only, you are right you cannot update it with 'mov', but you can alter what PAR will be set to as the second parameter of CogNew().
There are multiply and divide routines already written. A forum search is the best thing there.
I don't believe there is ready-made code for anything other than 16-bit x 16-bit multiplication. The assembly code for the Spin interpreter is available and that includes 32-bit x 32-bit multiplication. The 16-bit x 16-bit multiplication routine can easily be extended to a 32-bit x 16-bit routine by using two 32-bit locations for the product.
Assembly is always faster by quite a lot. Remember that the Spin interpreter is written in assembly language and it has additional overhead beyond the code required to do the actual operations. Multiplication and division have to be done with subroutines in either event. The Propeller doesn't have multiply or divide instructions to do it in hardware.
Comments
I cannot get my head around how to assure all cogs know the starting address address of this common array. Does it require the array to be declared in the Top Object then have the Top Object call all the subseqent "CogNew(@SpinOrAssyObjectName,@@ByteArray[noparse][[/noparse]0])" and then the PAR register in each cog will contain the proper starting address?
I don't have any worries in this program regarding conflicting writes I just need to be able to read/write to any element in this 100 element ByteArray from any cog.
Thanks
Everytime I begin to learn ASM I get discouraged and quit because although I understand what some of the commands do ( MOV, ADD SHR) when I try to combine them to make programs I get lost.
I have been programming in dialects of basic since the days of the commodore pet 1977. but never took it upon myself to dive into ASM, now my mind always thinks in the BASIC way of doing things.
Since this is an ASM beginners thread, can we start from the basics and compare the ASM to SPIN or (BASIC) Equivalents.
Thanks to anyone that is willing to lead me in the right direction.
Regards,
Eric
-Thanks
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
In any acount My problem lies in the fact that I have a hard time thinking low level.
Let's start out really small.
·······
····
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
bytemove(@End,@Start,1)
DAT
······················· org
······················· mov r0, @Start
······················· mov r3, @End
······················· rdbyte r1,r0
······················· wrbyte r0,r3
·······················
······················· CogId·· CogNum········· 'Get COG ID
······················· CogStop CogNum········· 'Stop this COG
CogNum·············· res···· 1·············· 'Reserved variables
R0····················· long
R1····················· long
R2····················· long
R3····················· long
R4····················· long
Start·················· byte "ABCD"
End···················· byte "EFGH"
If what you want to do is to copy bytes from one location in a cog's memory to another,
you will have to use AND/OR and shift instructions and keep track of which byte in a word
that you're copying.
Does anyone have a simple example of a case where a spin method calls·and asm method·tha calls
another asm method?
I need to clear my lcd screen and do other functions fast fast (via SPI).
So I am starting with cls method to get the hang of asm in hopes of eventually
refactoring·all of my lcd spin code to asm.
I figure a cls would be a good place to start wrapping my brain around it all.
such like:
'in spin
·pub CLS(0) 'to clear screen with given color····
··· 'call asm _CLS method
··· _cls(color)
'in asm
·_cls(color)··
·· loop·n times (calling asm spi engine shiftout method)
······ shiftout(...)
I'm sure it's a simple matter (just not simple to me)· [noparse]:)[/noparse]
thanks
j
you don't have methods in assembly code. Your assembly code is running in a separate Cog independently from the Cog which is interpreting the Spin code. You could have some functions in your assembly code which are starting with a label and ending with a ret-instruction. Then you can use a call-instruction to perform a function like a method in Spin. When you want to pass arguments to a function you have to declare these as long data.
But you can't call such a function directly from Spin. Therefore you must use some assembly code that will communicate over the main memory with the Spin code. It's waiting for a command and can use also arguments which must be passed over the main memory. If a command is received it calls accordingly the function.
For an example of such a communication routine in assembly please have a look at file AsmDebug.spin from POD. This routine can also have a return value for some functions. To see how easy it is to call from Spin have a look at file PropDebugger.spin and there at method getFlags.
http://forums.parallax.com/showthread.php?p=639020
Post Edited (Kaio) : 4/24/2007 4:44:58 PM GMT
What you said makes sense and the examples you sited are helpful.
I haven't quite digested it all yet [noparse]:)[/noparse] but working on it. I have a lot to learn.
Thanks for the help Kaio!
J
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
It is not widely known that SPIN allows full recursion of calls! This can be emulated within an assembly program by installing an ad-hoc stack mechanism.
It will be instructive anyhow to have a look at the many "patches". Note that you never shall "patch" crossing JMPRETs (aka CALLs), as the code has to stay re-entrant!
The speed-up is about 40, which is not so overwhelming compared to the general speed-up from SPIN to handmade assembly (about 80 according to my experience) which discloses a very efficient stack management within SPIN!
This innocent looking piece of SPIN.....
... has thus created this "assembly-monster":
----
If you are interested in the general timing without trying yourself:
fibo(29) needed:
26 sec with SPIN
1.8 sec with PHP on my mid-range Windows Notebook
800 ms with the above posted piece of code
30 ms with a very efficient FORTH Implementation on my mid-range Windows Notebook
----
BTW: I am well aware that there are simple algorithms to compute the n-th Fibonacci number in o(1) - this is obviously not the point
---
Edit a long time later:
1,1 sec PureBasic in Interpreter/Debugger Mode (on same Notebook)
15 ms PureBasic compiled to 16kB EXE-file on same Notebook
Post Edited (deSilva) : 12/28/2007 6:12:59 PM GMT
Spin to bytecodes I understand. Spin to handcoded assembler I understand.
Are you saying that you have manually handcoded the Spin to assembler, and simulated a stack type machine in the process?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
A SPIN-to-machine-code compiler however is an interesting idea:
(1) Without a working LMM impossible, but.....
(2) ... within the range of 1,5 k generated code quite feasible
(3) As SPIN is a grammatically (and semantically as well) extremely simple language this can be done in a few weekends
(4) You could restrict the semantic somewhat to simplify the translation and will not necesarily need a "stack" at all for it
The main benefit for such a rudimentary compiler will be:
- automatically speed up your simple "hardware drivers" - written in SPIN for the sake of clarity and/or missing assembly skill
- standardize the SPIN - COG data exchange interface ("PAR") which had been mostly ad-hoc in the past.
But my posting had nothing to do with all this! I just wanted to:
- prove the feasibility of an advanced programming concept as recursion in Propeller Assembler
- show again the huge speed-up using machine code even in this case
- mention that - in this case - the optimized code runs faster than a GHz Windows PHP programm (which - of course - is not the slowest of all script languages, but comes close to it )
nice example of recursion even in assembly code. And the time it takes is also very interesting in comparison with a routine running on a GHz PC.
Thomas
as a beginner in assembly, i found your commented code examples very useful.
And i hope others.
Thanks
Cats92
Post Edited (deSilva) : 4/13/2008 10:04:05 AM GMT
thanks.
I have made an assembly program that write's data to an array, this is handled with the PAR.
A[noparse][[/noparse]1000] is the long array which contains the data but if I have B[noparse][[/noparse]1000], how can I now read and write to A and B with rdlong and wrlong?
And I have 2 cogs, can they read and write at the same time on A[noparse][[/noparse]100] for example?
Bart
2) Yes, two or more Cogs can read or write at the same time ( ignoring that such access won't happen simultaneously due to the way each Cog gets access to Hub memory in sequence ).
Yes I tries something similar you mention about using 2 arrays, I made A even(0,2,4...) and B uneven(1,3,5...).
But is it then possible to write to PAR, I thought that it wasn't possible to write to PAR like
-mov PAR,#A
Btw, can I find some multiply en devide examples so I don't have to rewrite then if they exist, and will be faster that if I made such a function.
If there aren't I will have to write it.
Post Edited (darkxceed) : 9/27/2008 3:58:15 PM GMT
PAR is read-only, you are right you cannot update it with 'mov', but you can alter what PAR will be set to as the second parameter of CogNew().
There are multiply and divide routines already written. A forum search is the best thing there.
What is btw faster in spin(interpreter) or assembly?
A=B*C
A=A+B
A=B/C
·
Assembly is always faster by quite a lot. Remember that the Spin interpreter is written in assembly language and it has additional overhead beyond the code required to do the actual operations. Multiplication and division have to be done with subroutines in either event. The Propeller doesn't have multiply or divide instructions to do it in hardware.