P2 Execute PASM COG-CODE in hRAM
pic18f2550
Posts: 400
in Propeller 2
Hello,
I noticed that in a SPIN project COG code can be executed directly.
I would like to know what are the requirements to use this under PASM.
How does it behave with the addresses since the COG supports only 9 bits?
Comments
You can start a cog using COGINIT to run A PASM program.
For a cog that is running the Spin2 interpreter, registers $000.. $123 are available for PASM code, as well, via in-line assembly and REGEXEC, REGLOAD, and CALL commands.
I just found the piece of CODE I was referring to in #001.
What are the restrictions on the PASM code between "org" and "end"?
What wording should be avoided as it can change the timing in the code processing. E.g. where from one command in the source code the IDE has to split it into several.
@pic18f2550 I think we're calling this "inline assembly". There probably are some restrictions and it's good you asked because it doesn't appear to be documented in the Spin2 docs yet. I guess that's where it should go...
In the FlexProp C version, you have to use local variables only in the inline assembly code. You can't use things like global variables or global constants.
The QMUL generates three instructions due to the ## and ##: AUGD, AUGS, QMUL. The other lines each generate one instruction.
The code between ORG and END must fit in $000..$123.
I seem to recall you don’t need a RET line at the end of the code because a “ret” is automatically added for you.
This inline assembly feature is really nice.
There are underscores on either side of the “ret” above but they don’t show up for some reason
I am currently working with the Propeller Tool.
between org and end fit $12F commands before the IDE grumbles.
Try using ".exit" instead of "exit" in two places. It may be that only local labels are allowed... Could be wrong though...
I've also not seen things like "a1 long 0" in inline assembly. Might be better to have them as local Spin2 variables.
I did a little reformatting and used DEBUG statements -- other than the compiler not seeming to like the use of 'exit,' everything behaves as expected. Note, too, that I traded your RAM array for a DAT array so I could pre-load it with known values.
I also called go() from another method since inline methods are designed to return to a caller. In your case, I'm not sure what would happen given it's the only code in the program. If you want to run pure assembly, you can do that -- just don't put the code into a Spin2 method.
Here's the DEBUG output after running.
Hello JonnyMac,
I noticed two things:
the IDE (Propeller Tool) does not check the values for permissible value range, where it actually could.
This concerns the value "b".
"rdlong b3, b" should be only a 9Bit value because in the COG code no "long b 0" was defined.
"rdlong b3, b" no value check
"rdlong b3, MP" Value check OK
the code is not executed directly in the hRAM, but loaded with a "rdfast" into the COG-RAM and executed only here.
I thought that it gets a segment address like the i8086 and uses this as COG-Ram except for the special registers.
That would be maybe an option for the P3?
b is defined as local long in spin and can be used in assembler, so it is defined, allowed to be >512, but locals have no guaranteed value, just return values are initialized to 0.
And no there are no segment registers. For code addresses $000-$1FF are COG ram execution $200-3FF are LUT ram execution >=$400 is HUB ram execution.
you need to jmp over borders (no problem for real Germans) your code can not simply run from COG to LUT ram or LUT ram to HUB ram. You need to jmp/call.
for data access rd/wr/long/word/byte will access $000-$400 as HUB ram (no code execution). There was some discussion a long time ago that code execution in HUB ram below $400 would work with odd (not even) addresses, not sure where that went and if still valid.
Enjoy!
Mike
Yes, it was -- by the compiler. When inline code is passed to the cog, all of the parameters, return value(s), and local variable(s) are passed, too. When the routine is finished, all of those [potentially modified] values are moved back so they can be accessed by high-level Spin code that might follow.
This example shows how the inline PASM can modify variables defined in the high-level code of the method.
The 9-bit limitation is for literal values, and even that can be modified by using ##.
Have a look. Note that when using constants in PASM they must be prefaced by # or ## (>511).
Don't hold your breath -- the P2 was 12 years in development because Chip accommodated nearly every request thrown at him (this is not a sustainable development process). The problem with those of us with experience is that we bring our biases. Give the P2 a try for what it is, not what you wish it was. With your experience I'm sure you'll be able to do really neat things that will benefit your clients and the Propeller community.
CogRAM cannot be compared to segmentation, or any other mapping tricks. There's no way for hubRAM to be accessed with low latency like cogRAM is. Even if caching was thrown at it you still don't get guarantees of deterministic read latencies.
EDIT: I guess the term "inline assembly" slightly misrepresents what is actually happening, since it only inline in the source. The byte-coded Pnut/Proptool output is more disjointed than that.
On the other hand, I think the default for Flexspin, since it compiles to native machine code, does produce truly inlined code as compiled hubexec. It can be given directives to use lutRAM for inline assembly routines if desired.
Okay, so it's not that simple.
So I stay with my "rdfast" method with loading routine to switch to other drivers.
My goal was to save the clocks for loading the newun code, by direct access to the hRAM.
JonnyMac don't panic I don't want to force more work on Chip.
I can imagine how such a wave of wishes and ideas rolls over you.
If you look at the P2 like this, you can only take your hat off and say thank you.
I don't think Flexspin uses LUT RAM any more... Was just reading the attached and says now goes to COG RAM:
COG RAM from $00 to $ff is used for FCACHE
@pic18f2550 You might want to read the "Restrictions in inline assembly" in the attached. I'm guessing that most of this applies to Spin2 as well...
Oops. This was old version.
I have attached the loader.
Ah, I don't think RDFAST is the instruction you're looking for ....
There's two general fast solutions for that. One is a SETQ+RDLONG combo for data block copy from hubRAM to cogRAM, which can then be branched into from hubexec. The other being COGINIT for relaunching the same cog with its self copy feature.
Cool, thanks. I've used Spin so little I've got behind on latest changes.
I see this in that "general.pdf":
So lutRAM still gets used as before. What I think is new is functions can individually be assigned to cogRAM and lutRAM and, relatedly, a lot more of cogRAM is available now ... so that "Most of COG RAM ..." assertion is also out of date.
And flexspin's inline assembly blurb:
Correction, if only a small routine being copied with SETQ+RDLONG then the branch can be from cogexec too.
@evanh Could you have an old version?
Here's what I have:
This is ver. 5.4.3 from 08May21
Reading further looks like you can also force things into the first half of LUT RAM.
I'm just reading the PDF you provided. I've hardly written any Spin code.
Ok, I posted an old version there... From way back in January... Sorry about that. Just deleted above.
Here's the new version.
Thanks, and good. I had thought about the change in memory uses when you brought it up. That leaves lutRAM unused by default. Leaves it free for application purposes, including streamer ops.
EDIT: Err, or not. Second half is now vaguely "used for internal purposes":
Looks like the different languages all have ways to force things into LUT...
Looks like there's a new "IN-LINE PASM CODE" section in the Spin2 docs:
https://docs.google.com/document/d/16qVkmA6Co5fUNKJHF6pBfGfDupuRwDtf-wyieh_fbqw
This section would be interesting for me, but unfortunately it is not completely readable.
Can anything be done about it?
Yes, just download it as a pdf. It displays then fine. Maybe there are other ways around as well.