mixing spin and assembly?
mike56
Posts: 22
I'm writing some assembly and it seems that inline assembly can't be mixed with spin?· Nor can functions be in assembly?
For example, I can declare "Pub Myfunc" then call it from the "pub main".· This is normal subroutine behavior like in C.
But if I write an assembly code routine in DAT called "MyAsm" then I need to run a cognew(@MyAsm,0)
I can't just call "MyAsm" like a regular subroutine?
Is there anyway around this?
Also, if I write some fast code in assembly then run a "cognew(@MyAsm,0)" but want to continue in spin then that's not possible because cognew runs in parallel.
Is there anyway to call an assembly subroutine without starting a new cog?
If not, is there a way to pause operation until a cog is finished?
For example, it would be the same as:
Pub Main
·code1
·code2
·cognew(@MyAsm,0)
·don't process anymore lines until MyAsm is finished
Dat
·org 0
MyAsm code3
········· code4
Any hints?· Also, are there more examples for the assembly besides the manual?·
There are many quirks that I have found only through trial and error.·
The first is that dira register needs to be set for each cog.· In the manual it says the registers are all ORed together but if I set the dira[noparse][[/noparse]5]=1 then call outa[noparse][[/noparse]5]=1 from another cog, it doesn't work unless I specifically set the dira for that cog even though it has been set in the first cog -> weird!
·
For example, I can declare "Pub Myfunc" then call it from the "pub main".· This is normal subroutine behavior like in C.
But if I write an assembly code routine in DAT called "MyAsm" then I need to run a cognew(@MyAsm,0)
I can't just call "MyAsm" like a regular subroutine?
Is there anyway around this?
Also, if I write some fast code in assembly then run a "cognew(@MyAsm,0)" but want to continue in spin then that's not possible because cognew runs in parallel.
Is there anyway to call an assembly subroutine without starting a new cog?
If not, is there a way to pause operation until a cog is finished?
For example, it would be the same as:
Pub Main
·code1
·code2
·cognew(@MyAsm,0)
·don't process anymore lines until MyAsm is finished
Dat
·org 0
MyAsm code3
········· code4
Any hints?· Also, are there more examples for the assembly besides the manual?·
There are many quirks that I have found only through trial and error.·
The first is that dira register needs to be set for each cog.· In the manual it says the registers are all ORed together but if I set the dira[noparse][[/noparse]5]=1 then call outa[noparse][[/noparse]5]=1 from another cog, it doesn't work unless I specifically set the dira for that cog even though it has been set in the first cog -> weird!
·
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
propmod_us and propmod_1x1 are in stock. Only $30. PCB available for $5
Want to make projects and have Gadget Gangster sell them for you? propmod-us_ps_sd and propmod-1x1 are now available for use in your Gadget Gangster Projects.
Need to upload large images or movies for use in the forum. you can do so at uploader.propmodule.com for free.
Yes you can, but in a roundabout way.
What you need to do is load a cog with the assembly code and let it run there.
Now there is the problem, of course, with the cog running parallel to the cog that is
running the Spin code.
What you need is then to have the spin code WAIT for the PASM code to do its work
and return the results.
You can do this by having a sentinel variable that the Spin code will keep polling in a loop
until it is flagged and thus the Spin code knows that the PASM code has finished its job.
The PASM code has to, of course, flag the sentinel variable to indicate it has finished its work.
So you can have as many PASM routines as you need. Then every time you want to call one
you need to load it into a cog, wait for it to finish and then go on.
So this is as if you have called a subroutine but with a little extra work.
It is a minor shortcoming of the Propeller/SPIN that PASM code cannot be inline and has to be
run in a COG.
This is not really much of a shortcoming since there is the workaround as above, but also
most of the time you need PASM to do fast work and most of this work USUALLY needs to
be run in Parallel (e.g. UART, PWM, etc. etc.) to the normal program.
Sam
LMM is not native PASM, but it is close. You have to learn a few extra things about how to do jmp, etc... and it's about 1/8th as fast.
You can find examples for ImageCraft C in the obex, and Catalina on one of the "release" threads.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite
·
[noparse][[/noparse]Edit] Lest I create the impression that I came up with this structure... I didn't. After studying several interesting examples from some of the better known contributors on this forum I settled on this as my starting point.
Post Edited (JonnyMac) : 8/10/2009 3:21:40 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Quicker answers in the #propeller chat channel on freenode.net. Don't know squat about IRC? Download Pigin! So easy a caveman could do it...
http://folding.stanford.edu/ - Donating some CPU/GPU downtime just might lead to a cure for cancer! My team stats.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum
I am very surprised you can't just run an Asm subroutine. Seems like the compiler should be written with this in mind. Oh well can't have everything.
The suggestion of running Pasm subroutine which calls a cognew then waits until the variables are finished makes sense.
Is there an example of this that I can just cut and paste my assembly code into the DAT section?
I think JonnyMac's example is close but my asm and prop skills aren't that good yet so more examples of using the structure would be great!
The actual cog instructions are 32-bit RISC instructions that are completely different from the Spin byte codes. They have to be executed in cog memory. They can't be executed from the shared hub memory which is actually treated as a special I/O device by the cog, accessed with special instructions for reading and writing.
There are several assembly tutorials with links in one of the "sticky threads" at the top of the thread list in this forum. Look at "Propeller: Getting Started and Key Thread Index". Have a look at them.
I have been away from the propeller for over 3 years now, but with the EOL announcement of the SX, I'm having another look at whether the Prop can do what I need it to do. So I've been starting to experiment again, and reading some of the Prop threads, in particular assembler related ones. This particular thread indicates a very long time to launch an assembly routine, and my results vary greatly...... but perhaps I'm doing something wrong (I still consider myself a "Prop Newbie") and I will defer to the experts here.
My results in launching 7 consecutive identical assembly cogs with COGNEW show exactly 354 clocks between each launch..... not the 8300 posted above!
Cheers,
Peter (pjv)
Post Edit:....OOPS hit the wrong key...... the number of clocks I see are 754, not 354············ ·Sorry about that!
Post Edited (pjv) : 8/10/2009 8:08:17 PM GMT
or similar.
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
I'm not sure we understand each other.
I follow that the COGNEW instruction may be non-blocking, but clearly each whole cog needs to load, regardless of its program size, before it is released to run. On launching these identical cogs, I see a consistent delay of 754 clocks from cog to cog to cog as each statrts running. Would this then not be the measurement of "how long it takes to launch a cog" ?
Cheers,
Peter (pjv)
The COGINIT cog instruction (which is executed by the Spin interpreter when you do a COGNEW or COGINIT) only initiates the operation before continuing. There's the usual variable hub synchronization plus another few clock cycles. Once initiated, the new cog independently loads up its memory from the hub, clears the I/O registers, and begins executing from location zero. All of this takes about 100us as others have mentioned. The 754 clocks you're seeing is probably the interpreter overhead to finish the COGNEW Spin operation and continue execution of the cog initiating the COGNEW. The newly started cog would still be in the process of loading up its memory and would not actually be starting execution of the new code for a while yet.
The best way to demonstrate this would be to use COGNEW to start an assembly routine where the first thing done would be to store the system clock in some known hub memory location and set the following location to zero. The Spin program that does the COGNEW would initialize the two longs involved to -1, store the system clock in another variable, do the COGNEW, then wait for the assembly program to set the 2nd long to zero. The difference between the two system clock values would give you a good idea of the total startup time for an assembly routine.
Just to be clear, I'm saying there's a difference between how long it takes to for the cognew command to return in Spin, and how long it takes before the new cog is active.
yields the following numbers:
Note that I added the "dec" function to Simple_Serial.spin, so this won't compile "out of the box".
Jonathan
<edit> Mike's idea of storing the system counter value directly was a better idea! [noparse][[/noparse]8^) Edited the code snippet to match.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
Post Edited (lonesock) : 8/10/2009 9:03:08 PM GMT
I'm sorry, but I'm not yet up to the level of fully understanding the implications of SPIN code, and I can only grasp assembler, so your code examples, although appreciated, are not of great help to me. ·I have to revert to as close to the hardware as I can get, and for me that means assembler.
So my test code is as below, and the timings I observe are from the first rising edge of port pin1 to port pin2 to pin3 and so on.
If for illustrating my point we assumed for a moment that it only took one clock cycle for the launching cog0 to trigger the load of a cog (we know this is not correct), and then measured the time in clock cycles from rise of cog1 pin to the rise of cog2 pin, then the time for·the second·cog to be loaded and released to run would be equal to the time measured from cog1 to cog2 minus one clock. Now the cognew instruction takes more than 1 cycle to trigger the load, but isn't the total time to trigger the load and the loading itself equal to my observed time?
If so, is that not considered the "launch time" for a cog?
What am I not seeing here?
Thanks for your interest in helping me understand this!
Cheers,
Peter (pjv)
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
You are seeing the time for the Spin interpreter to process a COGNEW call because the time difference between the execution of cog One and the execution of cog Two is offset by the difference in the execution times of their corresponding COGNEWs. Try what I suggested:
So, there's a 754 clock delay between cognews, and a 8300 clock delay from the cognew to the cog starting executing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum
Let me wrap my head around this for a few cycles!
Cheers,
Peter (pjv)
I followed Johnathan's suggestion and measured again.... the outA instruction takes 370 clocks OUCH! and one cog launch procedure measures 8950 clocks.
Wow, that is UGLY !
So, can assembly cogs reasonably launch other assembly cogs? (I suppose I should start another thread..... have done enough damage here)
Cheers,
Peter (pjv)
but that wil not remove the ~8000 clks from starting a cog - that is loading the cog ram - 512 * 16clks
Post Edited (Timmoore) : 8/11/2009 12:02:48 AM GMT
That's a long thread, and I did not find the example. I know I messed with this over 3 years ago, but have forgotten much about the propeller, and focussed on squeezing performance out of the SX.
Has anyone found a faster way to load assembly programs from EEROM without going through the HUB?, or by using the counters or video facilities?
Cheers,
Peter (pjv)
Any of the I/O functions (including FullDuplexSerial or the SPI engine or the I2C / SPI driver) can be modified to use space in the cog as their buffer. FullDuplexSerial and the I2C / SPI driver use most of the memory in the cog, so they'd have to be significantly simplified to have much buffer memory available. It would be straightforward to make a cog loader that would load from EEPROM or from SPI flash or SRAM since all of these routines actually input the data first to a location in cog memory, then transfer it to hub memory. In most cases, you'd want to pack the data, 4 bytes per long word, but that's easy.
Yes, the PASM COGINIT instruction may be used by an assembly cog to launch another assembly cog.· However:
Therefore, it's typically easier for the SPIN code to start the assembly cogs.· (Unless your app is doing things like reloading all of HUB RAM from the EEPROM and decompressing on the fly.)· The typical design loads all of the cogs on startup (with PASM or another SPIN interpretter), then shuffles data between the threads via HUB RAM.
Oh.· Before I forget - that 8000+ clock cycle latency also applies to SPIN threads started via COGNEW.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum
http://forums.parallax.com/showthread.php?p=694135·SpinForth thread...
ps, @·Graham, SpinForth ought to be added to the good thread list, yes?
Post Edited (Fred Hawkins) : 8/13/2009 2:58:13 PM GMT