Announcing CLMM (pronounced as Clem) Execute Code from the CLUT
ctwardell
Posts: 1,716
The P2 continues to bear fruit!
Attached is a sample file containing an execution engine that executes code from the CLUT giving an increase in the amount of code that can run within a COG.
In order to not take advantage of the preview that Parallax has given us, the code is released under the MIT license.
What I call CLMM Engine#1 is in the attached spin file.
The CLMM #1 has three NOPs between the POPAR instruction that reads the instruction to execute and the slot where the instruction will be executed.
This allows timing that supports using the SETSPA, ADDSPA, and SUBSPA to do absolute and relative branching.
The CLMM #2 will use a standard loop instead of REP so that it can run while multitasking.
The CLMM #3 will use a tighter loop that precludes using the SETSPA, ADDSPA, and SUBSPA but executes inline code faster.
Let's have fun with this and get back to exploring the P2.
Chris Wardell
CLMM_Engine1.spin
Please also see the following thread:
http://forums.parallax.com/showthread.php?144675-Challenge-Execute-Prop-II-code-from-it-s-CLUT-space-(CLMM)
Attached is a sample file containing an execution engine that executes code from the CLUT giving an increase in the amount of code that can run within a COG.
In order to not take advantage of the preview that Parallax has given us, the code is released under the MIT license.
What I call CLMM Engine#1 is in the attached spin file.
The CLMM #1 has three NOPs between the POPAR instruction that reads the instruction to execute and the slot where the instruction will be executed.
This allows timing that supports using the SETSPA, ADDSPA, and SUBSPA to do absolute and relative branching.
The CLMM #2 will use a standard loop instead of REP so that it can run while multitasking.
The CLMM #3 will use a tighter loop that precludes using the SETSPA, ADDSPA, and SUBSPA but executes inline code faster.
Let's have fun with this and get back to exploring the P2.
Chris Wardell
CLMM_Engine1.spin
Please also see the following thread:
http://forums.parallax.com/showthread.php?144675-Challenge-Execute-Prop-II-code-from-it-s-CLUT-space-(CLMM)
Comments
I'll get better docs out, just wanted this out under MIT ASAP.
I've been seeing a lot more possibilities as I work on examples, for instance a loop that alternates between using SPA and SPB would support two "threads" while consuming a single COG thread. So you could get 5 threads going, 4 of the new COG multitasking threads with CLMM running on one of those providing two threads for a total of 5.
You could also provide for "soft interrupts" by testing a condition in the CLMM loop while running on a thread using SPA and the interrupt handler would change to using SPB, the end of the handler would switch back to SPA.
C.W.
12/17/2012 - I believe Engine #4 Rev 2 is the most stable and should be used for testing as of now, I'm seeing a few pipeline issue on the others as I've made some changes.
Engine #2
This engine supports multi-tasking by using jmp instead of rep and branching via direct SP manipulation.
Test 1, simple test of two engines using SPA and SPB
CLMM_Engine2 Test1.spin
Engine #3
Faster engine forgoes direct SP manipulation in exchange for improved speed, uses JMP instead of REP for potential multi-tasking support.
Test 1, simple test of two engines using SPA and SPB
CLMM_Engine3 Test1.spin
Engine #4
Faster engine forgoes direct SP manipulation in exchange for improved speed, uses REP instead of JMP for even more speed at the expense of multi-tasking support.
Test 1, simple test of two engines using SPA and SPB
CLMM_Engine4 Test1.spin
Rev 2 Test 1, added move immediate and added initial clmm pipeline flush.
CLMM_Engine4 Revision 2 Test1.spin
Yes, that too. Lot's of ways to slice and dice.
I'm not sure how to name all the variants of "engines".
C.W.
I'm going to add a Move Immediate instuction to 1 through 4 and the missing pipeline flush to 3 & 4.
C.W.
Now It is positive competition instead for dumb war.
Thanks
Is it possible to fill the CLUT from HUB with instructions and execute them? Trying to make a better on cog cache.
Is threading impossible with this model? Fetching and executing in separate threads could offer some advantage for LMM style programs.
Yes you can load the CLUT with any source of instructions you wish. Once one of the CLMM engines is chosen a user will need to make use of the COG addresses used to make extension function and return calls so the CLUT code can be encoded accordingly.
I think Clusso has some code he worked out with Chip to fill the CLUT for overlays, that may be what you are looking for. Clusso doesn't have a DE0-Nano yet, so he can't work with it directly yet.
I think we will be able to use threading with one of the versions that uses JMP to close the CLMM loop instead of REPS, I just need to work out some pipeline issues.
C.W.
The CLUT is certainly turning out to be a great asset. I remember Chip discussing it and all the features last year as well as adding push/pop access. In many ways it's a generic array data structure. Thank goodness we don't have to use self modifying code anymore for such basic data access within the COG. I can't wait to use it as a packet buffer.
PUSH & POP are really MOV instructions after all, just that they use SPx as the one of the addresses.
Next we will be asking Chip for more SPx registers, and more INDx and PTRx registers. Oh, and 2KB. I didn't see this as being the first item we'd be asking for.