TLMM: Threaded LMM - four, nine or even 19 drivers in one cog!

Over in the i2c prop to prop thread http://forums.parallax.com/showthread.php?p=920681 I asked for a combination mouse/keyboard/serial driver, implemented on top of Peter's great scheduler.
Peter thought it was a great idea, and mentioned he was thinking about loading scheduled code on the fly - and asked if that was similar to LMM. It would be, if the LMM code used FCACHE [noparse]:)[/noparse]
Why is this relevant?
Because it reminded me of the idea for a deterministic LMM threading model. Mind you, to keep it deterministic, there would be strict limitations - but if those are acceptable, it would be possible to have four 1MIPS or nine 0.5MIPS threads running in a single cog! It would also be possible to have 19 0.25MIPS threads.
This would allow four to nine low speed drivers to co-exist in a single cog - and allow them to be easy to write!
Mind you, Peter's scheduler is MUCH nicer for a larger number of tasks - but TLMM would be more deterministic.
I don't want to take too much time away from debugging the new PCB's, but I will post some simple source code to illustrate how it would work later.
Limitations required for deterministic timing:
- each thread is limited in size, about 1K instructions is the practical maximum, and only -127..+128 instruction relative jumps would be permitted
- no FCACHE, no calling native pasm "primitives"
- doing a CALL will take multiple instructions, so will a RET - this model is mainly meant for in-line code without subroutines
- no WAITxxx instructions
- must use very few RD* and WR* instructions, and only where it has little impact as it will offset all other threads by 200ns
Note how none of the limitations would really impact keyboard, mouse and serial drivers!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 12:08:37 AM GMT
Peter thought it was a great idea, and mentioned he was thinking about loading scheduled code on the fly - and asked if that was similar to LMM. It would be, if the LMM code used FCACHE [noparse]:)[/noparse]
Why is this relevant?
Because it reminded me of the idea for a deterministic LMM threading model. Mind you, to keep it deterministic, there would be strict limitations - but if those are acceptable, it would be possible to have four 1MIPS or nine 0.5MIPS threads running in a single cog! It would also be possible to have 19 0.25MIPS threads.
This would allow four to nine low speed drivers to co-exist in a single cog - and allow them to be easy to write!
Mind you, Peter's scheduler is MUCH nicer for a larger number of tasks - but TLMM would be more deterministic.
I don't want to take too much time away from debugging the new PCB's, but I will post some simple source code to illustrate how it would work later.
Limitations required for deterministic timing:
- each thread is limited in size, about 1K instructions is the practical maximum, and only -127..+128 instruction relative jumps would be permitted
- no FCACHE, no calling native pasm "primitives"
- doing a CALL will take multiple instructions, so will a RET - this model is mainly meant for in-line code without subroutines
- no WAITxxx instructions
- must use very few RD* and WR* instructions, and only where it has little impact as it will offset all other threads by 200ns
Note how none of the limitations would really impact keyboard, mouse and serial drivers!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 12:08:37 AM GMT
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 12:08:30 PM GMT
Just a thought...
Keyboard (and I presume mouse because I have not tested it) are really serial devices. I have written the keyboard driver using a single pin and therefore presume the mouse could be done similarly. This would save 2 pins (we are always short of pins).
Now the issues with 1pin keyboard are..
* you cannot reset the keyboard (not really required anyway)
* you cannot set the leds on the keyboard (dont have them on a laptop anyway)
* need to test the timing of the keyboard initially. It can then be fixed in the program if required.
* Note it still works with the existing hardware 2pin interface
This would be a really neat driver to handle 1pin keyboard, 1pin mouse and serial in a single cog.
With a splitter cable, an existing Keyboard socket could be used to interface to both the 1pin keyboard and 1pin mouse.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Since posting, I have been doing some additional thinking on this matter. If a lower MIPS rating is acceptable, I can see how to make the threads jitterless while still allowing them to do a hub read and a hub write per execution cycle.
But first, a problem. The flags are not preserved across threads, so the conditional branch I showed above can't work.
One simple solution is to use two slots for every thread, and preserve/restore WC and Z for each thread.
I hope to have a better solution RSN.
The basis of this deterministic threaded LMM is to stay synchronized to the hub, and have a number of "hub slots" assigned to every thread.
For arguments sake, let's assume there are 20 slots.
Each slot can contain one of:
- LMM fetch, increment thead_pc, execute instruction
- hub read for one of the threads
- hub write for one of the threads
- jump to top of TLMM loop (call it TLOOP slot)
Note that if at least one slot contains a hub read or hub write, there is no need to waste a slot on a TLOOP slot!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 12:22:42 AM GMT
I also feel a challenge coming on ...
Who will be the first to get a single cog to support a keyboard, mouse, sd card and rtc? All these drivers are individually quite small, and it should be possible to fit them all into a single cog.
I would include a display driver as well, but I suspect this would be pushing things a bit too far even for the Propeller. Or would it?
I don't think we really need a prize, but I'll happily give anyone who achieves this feat a copy of the new Catalina Code Optimizer (10% smaller code with up to 15% speed improvement!).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Actually, I think Peter's scheduler is significantly more sophisticated, and better for most applications. His scheduler actually schedules... TLMM time-slices.
Where TLMM may be a bit better is on finer-grained determinism. I am still thinking through all the implications...
I can easily see a single TLMM cog handling:
- keyboard
- mouse
- two 38.4kbps serial ports
- maybe even a 100khz I2C read/write engine instead of the serial ports
As for your challenge... I am afraid I have too much on my plate to compete [noparse]:)[/noparse]
I think I am better off coming up with enabling technologies for stretching the propeller's limits (LMM, VMCOG, TLMM, more in the future) and letting others implement apps with them
As an aside, if I saved/restored flags, TLMM could run four non-deterministic (think Catalina <grin>) threads at about 0.5MIPS each! Or two at 1MIPS.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Now that I think about it... what about interleaving the code in the cog itself... is there any benefit to doing that? It would basically make it unreadable, but a tool could easily be built to do it for us after we code them independently.
Is there some mechanism or memory model where this would be optimal?
BTW... I really love this model!!! I'm going to have to rethink what we're doing with our little toy, now! I think this can really save us a bunch of cost associated with our new upcoming hardware and software requirements. Instead of adding a second prop, I think we can get it to fit quite easily in the one we have with this technique. For example, we're spinning up 3 cogs for different serial stuff right now... which, with this, I'm sure can be done in one now!
Bill
Sorry - I thought your solution used Peter's scheduler. But in any case, the challenge itself is still valid - it could be done either using your time slicer, Peter's scheduler - or simply some very efficient hand-crafted code.
My main interest is simply in reducing the number of cogs currently required for Catalina drivers - this would give Catalina programs more cogs on which to run application code.
While multithreading Catalina is something I've long wanted to do, it hasn't proved practical yet. My original design for Catalina included a subsidiary kernel that was going to support multithreading - but my attempts to do this so far have required me to remove so much of the kernel functionality that I need yet another code generator to generate code for it. This really complicates everything, since it means Catalina now has to know at compile time where the code is to be executed (whereas the ideal situation is for this to be determined at run time- i.e. only running multiple threads on the one cog when there are no more available cogs). This in turn eliminates one of Catalina's main features - i.e. that it provides a sophisticated "Hardware Abstraction Layer" which allows the same Catalina program to run unmodified on any Propeller platform (since the number of cogs available changes depending on the device drivers - which are platform dependent).
However, while typing this I've just thought of a fairly simple idea that may make this unnecessary. I'll have to give this some more thought.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Wondering how to set tile colors in the graphics_demo.spin?
Safety Tip: Life is as good as YOU think it is!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Wondering how to set tile colors in the graphics_demo.spin?
Safety Tip: Life is as good as YOU think it is!
Bill: I got a up close and detailed demo of Peter's threading stuff along with Chip during the late night hours of UPEW, and it seemed to me to be reaonably deterministic. I guess it just depends on the speed you are going after and how well behaved your "threads" are...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out the Propeller Wiki·and contribute if you can.
This shows just how important it is for everyone
to learn and use PASM.
With the prop you just have to know PASM to do
really good work. The prop's assembly language
is a joy to work with compared to some other
processors.
Interleaving would be useful for a non-LMM threaded cog where there was no need for branching or looping - but that is of marginal utility.
I'm glad to hear you like TLMM - the whole intention was to be able to easily write multi-driver cogs, so that we don't "waste" cogs in our products!
RossH:
No worries! Peter's schedule and TLMM are both great approaches for getting more work done in a single cog, precisely so that we can all save cogs.
Hmm.. maybe Peter and I should start a "Save the COGs!" foundation...
For multi-threading Catalina, after careful thought, I think you would be better off having a small scheduler and swapping out the state of the LMM kernel every X*100 LMM instructions. Less overhead, but less fine grained than TLMM - but that is fine for "business logic".
kwin:
Thank you!
For a lot of uses Peter's scheduler is actually superior - ie if you have threads that need to be woken up at specific times or based on specific events (pins going high/low).
TLMM may be somewhat better when you are trying to fit several high speed serial ports into a cog.
potatohead:
TLMM is not intended to do video drivers!
Having said that, a TV driver *might* be doable if assigned several thread slots in order to give it about half of the cog's time. I'd have to do timing calculations to verify this.
It should however be possible to write a PASM TV driver that executed one LMM instruction per WAITVID, for a slow background thread.
Roy Eltham:
I totally agree - that's why in my previous post I said:
I am aware that Peter has a pretty sophisticated scheduler, however I don't believe it could reliably schedule (with more than a single thread running) so that four threads run every 1us.
Conceptually, you can think of Peter's scheduler as a "real" scheduler for multiple threads, and you can think of TLMM as a way of turning one cog into four/nine/... baby slow cogs, which are implemented by round-robbin time slicing a cog, giving each thread one LMM instruction before passing control to the next.
HollyMonkowski
Thank you!
I could not agree with you more about the importance of learning PASM, and how powerful it is.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Wondering how to set tile colors in the graphics_demo.spin?
Safety Tip: Life is as good as YOU think it is!
Anyway, my point was that it is dealt with in the keyboard driver software, so unless the led is required then a pin can be saved.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Actually the more I think about it, the more I like interleaving two LMM threads (one per waitvid) with a TV driver... perfect for also doing the keyboard and mouse driver in that cog!
Heck, there may even be enough time for a 38.4kbps serial driver too.
What's nice about TLMM is that the threads are pretty much normal PASM (as long as I save/restore the flags) - no need for SLEEP, no need to call a scheduler regularly. The down side is that threads need to busy wait for pin states or time if they need them as trigger events - so Peter's scheduler and TLMM are complementary, not competing with each other.
(i figured it was something like that for the don't care)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
This would be especially handy where we are using the same serial code for multiple serial connections, etc. We could then initialize the VM thread with the code block to use as well as the stack block to use. Doing that, we could easily have something like 3 serial drivers + 1 keyboard driver = 4 stacks - giving us 4 logical threads, but only using two sets of instructions in cog ram.
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
This would fetch the code for the different threads from inside the cog; however any hub reference will cause a 200ns-300ns "offset" to running the next threads.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 4:25:45 PM GMT
' save flags
muxnz flags,#1
rcl flags,#1
' restore flags
rcr flags,#1 wz
Note, the above are untested, and are probably exactly the same as in the thread about saving/restoring the flags that I can't find....
Or they simply might not work.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Yep, I agree, but it could still allow a nice approach if speed wern't important. The thing to point out here is it's always going to be slower than PASM, that's what this trade off is, but this process is actually using cycles that were going to be wasted by the cog anyway. So, any gain we get is still a gain, and it's a real bonus if we're eliminating the need for additional cogs, right?
Bill
Where something like that is useful is in what I suggested to heater, swapping out the context every 100us or so, for "high level" threading of C apps.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Bill
Without indirect addressing capabilities in the Propeller, re-using code for several iterations is a real bear. It probably takes a similar amount of code to deal with that as it does to just make several copies.
That said however, it IS what I am pursuing with loading the tiny drivers from HubRam, so maybe an efficient method can be uncovered.
Cheers,
Peter (pjv)
I tend to agree [noparse]:)[/noparse]
FYI, I think your scheduler is far superior to TLMM for most scheduled tiny drivers, and I plan on using your scheduler soon [noparse]:)[/noparse]
The i2c thread just reminded me of my musings in '06 about making a multi-threaded LMM, which lead me to think of how to make it deterministic at the 1us level. Then frustrations with debugging SPI RAM issues on a new PCB lead me to whip up a quick sample TLMM implementation [noparse]:)[/noparse] [noparse]:)[/noparse] [noparse]:)[/noparse]
Unfortunately having to save/restore flags reduces the throughput per thread; two threads would run at 1.25MIPS each, three at 0.714MIPS each, four at 0.55MIPS each, five at 0.45MIPS... seven threads at 0.333333MIPS
MIPS PER THREAD = (COG_MIPS) / (8*NUM_THREADS + 4)
For TCOG, the variant that is less deterministic, but runs totally in-cog, but hub access messes up determinism big time here...
MIPS PER THREAD = (COG_MIPS) / (6*NUM_THREADS + 1)
Regards,
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 6:48:36 PM GMT
Keyboard and mouse drivers have a clock frequency between 10-16.7khz
0.333MIPS / 10khz = 33.3 instructions per keyboard bit - piece of cake.
0.333MIPS / 16.7Khz = 20 instructions per bit
So we know that a single TLMM cog can easily support up to seven keyboards or mice!
Here is a hypothetical 7 cog thread allocation:
thread 0 = keyboard
thread 1 = mouse
thread 2 = 19.2KBPS comm port
leaving four more threads available for additional drivers for which 0.333MIPS is fast enough!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 7:00:39 PM GMT
I hope we can help Peter with his driver loader.
I've never heard of anyone trying to do multi-thread chunk overlay loading
except for this TLMM which is essentially a one long at a time overlay loader.
I guess a question is: do the scheduler code threads need to be swapped often?
Having a tiny unrestricted overlay loader would be useful. Many of us swap
out blocks of memory as required to run PASM chunks, the most efficient
cases appear to have predetermined begin/end points. I cheat and zero
terminate chunks (NOP must be non-zero and data is not interleaved in code).
I use the unrolled read 4 long then jump method because several fragments
required by the JVM have less than 6 instructions and use on COG macros.
Loading 6 instructions with an unrolled loop is essentially as fast and may
be faster than a perfect window timing loader because of lower overhead.
The things good about chunks over LMM style are that once loaded the PASM
can run at speed, natural jumps can be used, and on COG service macro
routines are accessible directly. The bad thing about chunks is that for the
most part they have to be predefined. LMM does not suffer from predefinition.
If there was a way for chunks to be compiled and used generically that would
be great ... FCACHE does this a little, but one still needs to use LMM macros
and if I remember correctly, you have to declare code as FCACHE-able.
BTW: how about a little golf challenge (maybe another thread):
How many instructions and registers does it take you to save/restore C & Z?
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Nope. Can't be.
LMM's theoretical maximum execution rate is 5MIPS for "pure" LMM code.
With a four-way unrolled LMM inner loop, 80% of that ideal is achieved, ie 4MIPS for a single thread.
An eight-way unrolled LMM inner loop would hit (8/9)*5MIPS ie 4.44MIPS for "pure" LMM code
Mind you, it is more complicated than that. Executing hub instructions slows things down, using FCACHE properly speeds things up dramatically.
A very good optimizing compiler, or good programmer, should be able to hit 15-19 MIPS, depending no how much advantage it can take of FCACHE.
I agree!
Yep, Peter's approach is very interesting.
Umm.. that's not how I'd describe it, nor do I think of it that way.
Basically, TLMM is an unrolled LMM loop (unroll factor = # threads) where instead of running the same thread, each read/exec cycle executes a different thread.
Theoretically, for non-highly deterministic threads, TLMM could use kernel primitives, FCACHE, etc, to run multiple compiled LMM threads on the same cog.
Heck, it could be independent processes!
Vi would probably run usably at 0.5MIPS!
This would trade almost-determinism for running large multi-threaded user code ("business logic"). That's the role I originally envisioned for multi-threaded LMM kernels a couple of years ago; it only dawned on me to use it for (almost) deterministic medium-speed drivers when I started this thread.
That wold depend on the application.
Sounds like a perfect fit for your JVM!
FCACHE does exactly that.
I think you may wish to re-read my postings on FCACHE in my original thread [noparse]:)[/noparse]
In my original kernel, FCACHE loads code into the $080-$0FF range.
Code that is loaded there runs at full speed, and can use regular jumps, djnz etc within that range.
The intention was, for example, that str*() functions would be FCALL'ed
LMM code would do the initialization (as it is faster than loading the FCACHE'd block)
but the actual working loop would be FCACHE'd - so all the body of the str*() functions, the loops, would run at full speed!
Same for memcpy() and friends.
Heck, FltDIV / FltMul loops would also approach raw pasm speeds!
A few posts above I showed some code that should save in two instructions, and restore in only one, and requires one flag register per thread. I am pretty sure it is exactly the same (or functionally the same) as the code posted in the original threads that talked about saving/restoring flags... which I could not find!
I'd love to see a way to save both C and Z in one instruction... but I just don't see how that could be done.
(Having fun)
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system