TLMM: Threaded LMM - four, nine or even 19 drivers in one cog!
Bill Henning
Posts: 6,445
Over in the i2c prop to prop thread http://forums.parallax.com/showthread.php?p=920681 I asked for a combination mouse/keyboard/serial driver, implemented on top of Peter's great scheduler.
Peter thought it was a great idea, and mentioned he was thinking about loading scheduled code on the fly - and asked if that was similar to LMM. It would be, if the LMM code used FCACHE [noparse]:)[/noparse]
Why is this relevant?
Because it reminded me of the idea for a deterministic LMM threading model. Mind you, to keep it deterministic, there would be strict limitations - but if those are acceptable, it would be possible to have four 1MIPS or nine 0.5MIPS threads running in a single cog! It would also be possible to have 19 0.25MIPS threads.
This would allow four to nine low speed drivers to co-exist in a single cog - and allow them to be easy to write!
Mind you, Peter's scheduler is MUCH nicer for a larger number of tasks - but TLMM would be more deterministic.
I don't want to take too much time away from debugging the new PCB's, but I will post some simple source code to illustrate how it would work later.
Limitations required for deterministic timing:
- each thread is limited in size, about 1K instructions is the practical maximum, and only -127..+128 instruction relative jumps would be permitted
- no FCACHE, no calling native pasm "primitives"
- doing a CALL will take multiple instructions, so will a RET - this model is mainly meant for in-line code without subroutines
- no WAITxxx instructions
- must use very few RD* and WR* instructions, and only where it has little impact as it will offset all other threads by 200ns
Note how none of the limitations would really impact keyboard, mouse and serial drivers!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 12:08:37 AM GMT
Peter thought it was a great idea, and mentioned he was thinking about loading scheduled code on the fly - and asked if that was similar to LMM. It would be, if the LMM code used FCACHE [noparse]:)[/noparse]
Why is this relevant?
Because it reminded me of the idea for a deterministic LMM threading model. Mind you, to keep it deterministic, there would be strict limitations - but if those are acceptable, it would be possible to have four 1MIPS or nine 0.5MIPS threads running in a single cog! It would also be possible to have 19 0.25MIPS threads.
This would allow four to nine low speed drivers to co-exist in a single cog - and allow them to be easy to write!
Mind you, Peter's scheduler is MUCH nicer for a larger number of tasks - but TLMM would be more deterministic.
I don't want to take too much time away from debugging the new PCB's, but I will post some simple source code to illustrate how it would work later.
Limitations required for deterministic timing:
- each thread is limited in size, about 1K instructions is the practical maximum, and only -127..+128 instruction relative jumps would be permitted
- no FCACHE, no calling native pasm "primitives"
- doing a CALL will take multiple instructions, so will a RET - this model is mainly meant for in-line code without subroutines
- no WAITxxx instructions
- must use very few RD* and WR* instructions, and only where it has little impact as it will offset all other threads by 200ns
Note how none of the limitations would really impact keyboard, mouse and serial drivers!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 12:08:37 AM GMT

Comments
' TLMM - Threaded Large Memory Model ' ' This version of LMM is intended to be reasonably deterministic, and as such, does NOT support FCACHE ' ' Copyright 2010 by William Henning ' ' Distributable under the terms of the MIT license, attribution required in source and documentation. ' ' T4LMM - four thread version, 1MIPS per thread ' T8LMM - nine thread version, 0.5MIPS per thread ' CON version = 1_00 threads = 4 VAR long mbox PUB launch(thread1,thread2,thread3,thread4) mbox[noparse][[/noparse]0]:=thread1 mbox(1):=thread2 mbox(2):=thread3 mbox(3):=thread4 cognew(@tlmm_init,@mbox) ' invoke as launch(@thread1,@thread2,@thread3,@thread4) DAT ptr long 0 ' 1+9*4 = 37 longs for registers pc1 long 0 r1a long 0 ' A register for thread 1 r1b long 0 ' B register for thread 1 r1c long 0 ' C register for thread 1 r1d long 0 ' D register for thread 1 r1e long 0 ' E register for thread 1 r1f long 0 ' F register for thread 1 r1g long 0 ' G register for thread 1 r1h long 0 ' H register for thread 1 pc2 long 0 r2a long 0 ' A register for thread 2 r2b long 0 ' B register for thread 2 r2c long 0 ' C register for thread 2 r2d long 0 ' D register for thread 2 r2e long 0 ' E register for thread 2 r2f long 0 ' F register for thread 2 r2g long 0 ' G register for thread 2 r2h long 0 ' H register for thread 2 pc3 long 0 r3a long 0 ' A register for thread 3 r3b long 0 ' B register for thread 3 r3c long 0 ' C register for thread 3 r3d long 0 ' D register for thread 3 r3a long 0 ' E register for thread 3 r3b long 0 ' F register for thread 3 r3c long 0 ' G register for thread 3 r3d long 0 ' H register for thread 3 pc4 long 0 r4a long 0 ' A register for thread 4 r4b long 0 ' B register for thread 4 r4c long 0 ' C register for thread 4 r4d long 0 ' D register for thread 4 r4e long 0 ' E register for thread 4 r4f long 0 ' F register for thread 4 r4g long 0 ' G register for thread 4 r4h long 0 ' H register for thread 4 '------------------------------------------------------------------------------------------ org 0 tlmm_init ' later overlaid by program counters & registers mov ptr,par rdlong pc1,ptr add ptr,#4 rdlong pc2,ptr add ptr,#4 rdlong pc3,ptr add ptr,#4 rdlong pc4,ptr jmp #next long 0[noparse][[/noparse]28] next rdlong ins1,pc1 add pc1,#4 ins1 nop rdlong ins2,pc2 add pc2,#4 ins2 nop rdlong ins3,pc3 add pc3,#4 ins3 nop rdlong ins4,pc4 add pc4,#4 ins4 nop jmp #next '------------------------------------------------------------------------------------------ ' area from here on is available as scratch registers for the treaded code! ' i recommend ' reg 100-199 for thread 1 scratch area ' reg 200-299 for thread 2 scratch area ' reg 300-399 for thread 3 scratch area ' reg 400-495 for thread 4 scratch area '------------------------------------------------------------------------------------------ ' sample thread 1 - blink an LED on P0 at 50Hz '------------------------------------------------------------------------------------------ org 0 thread1 or dira,#%000000001 mov r1a,#100 iloop1 mov r1b,#100 sub r1b,#1 wz if_nz sub pc1,#12 ' branch back to iloop sub r1a,#1 wz if_nz sub pc1,#20 xor outa,#%000000001 sub pc1,#32 '------------------------------------------------------------------------------------------ ' sample thread 2 - blink an LED on P1 at 25Hz '------------------------------------------------------------------------------------------ org 0 thread2 or dira,#%000000010 mov r2a,#50 iloop2 mov r2b,#100 sub r2b,#1 wz if_nz sub pc2,#12 ' branch back to iloop sub r2a,#1 wz if_nz sub pc2,#20 xor outa,#%000000010 sub pc2,#32 '------------------------------------------------------------------------------------------ ' sample thread 3 - blink an LED on P2 at 12.5Hz '------------------------------------------------------------------------------------------ org 0 thread3 or dira,#%000000100 mov r3a,#50 iloop3 mov r3b,#100 sub r3b,#1 wz if_nz sub pc3,#12 ' branch back to iloop sub r3a,#1 wz if_nz sub pc3,#20 xor outa,#%000000100 sub pc3,#32 '------------------------------------------------------------------------------------------ ' sample thread 4 - generate a 500Khz square wave on P3 '------------------------------------------------------------------------------------------ org 0 thread4 or dira,#%000001000 iloop4 xor outa,#%000001000 sub pc4,#4▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 12:08:30 PM GMT
Just a thought...
Keyboard (and I presume mouse because I have not tested it) are really serial devices. I have written the keyboard driver using a single pin and therefore presume the mouse could be done similarly. This would save 2 pins (we are always short of pins).
Now the issues with 1pin keyboard are..
* you cannot reset the keyboard (not really required anyway)
* you cannot set the leds on the keyboard (dont have them on a laptop anyway)
* need to test the timing of the keyboard initially. It can then be fixed in the program if required.
* Note it still works with the existing hardware 2pin interface
This would be a really neat driver to handle 1pin keyboard, 1pin mouse and serial in a single cog.
With a splitter cable, an existing Keyboard socket could be used to interface to both the 1pin keyboard and 1pin mouse.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Since posting, I have been doing some additional thinking on this matter. If a lower MIPS rating is acceptable, I can see how to make the threads jitterless while still allowing them to do a hub read and a hub write per execution cycle.
But first, a problem. The flags are not preserved across threads, so the conditional branch I showed above can't work.
One simple solution is to use two slots for every thread, and preserve/restore WC and Z for each thread.
I hope to have a better solution RSN.
The basis of this deterministic threaded LMM is to stay synchronized to the hub, and have a number of "hub slots" assigned to every thread.
For arguments sake, let's assume there are 20 slots.
Each slot can contain one of:
- LMM fetch, increment thead_pc, execute instruction
- hub read for one of the threads
- hub write for one of the threads
- jump to top of TLMM loop (call it TLOOP slot)
Note that if at least one slot contains a hub read or hub write, there is no need to waste a slot on a TLOOP slot!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 12:22:42 AM GMT
I also feel a challenge coming on ...
Who will be the first to get a single cog to support a keyboard, mouse, sd card and rtc? All these drivers are individually quite small, and it should be possible to fit them all into a single cog.
I would include a display driver as well, but I suspect this would be pushing things a bit too far even for the Propeller. Or would it?
I don't think we really need a prize, but I'll happily give anyone who achieves this feat a copy of the new Catalina Code Optimizer (10% smaller code with up to 15% speed improvement!).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Actually, I think Peter's scheduler is significantly more sophisticated, and better for most applications. His scheduler actually schedules... TLMM time-slices.
Where TLMM may be a bit better is on finer-grained determinism. I am still thinking through all the implications...
I can easily see a single TLMM cog handling:
- keyboard
- mouse
- two 38.4kbps serial ports
- maybe even a 100khz I2C read/write engine instead of the serial ports
As for your challenge... I am afraid I have too much on my plate to compete [noparse]:)[/noparse]
I think I am better off coming up with enabling technologies for stretching the propeller's limits (LMM, VMCOG, TLMM, more in the future) and letting others implement apps with them
As an aside, if I saved/restored flags, TLMM could run four non-deterministic (think Catalina <grin>) threads at about 0.5MIPS each! Or two at 1MIPS.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Now that I think about it... what about interleaving the code in the cog itself... is there any benefit to doing that? It would basically make it unreadable, but a tool could easily be built to do it for us after we code them independently.
Is there some mechanism or memory model where this would be optimal?
BTW... I really love this model!!! I'm going to have to rethink what we're doing with our little toy, now! I think this can really save us a bunch of cost associated with our new upcoming hardware and software requirements. Instead of adding a second prop, I think we can get it to fit quite easily in the one we have with this technique. For example, we're spinning up 3 cogs for different serial stuff right now... which, with this, I'm sure can be done in one now!
Bill
Sorry - I thought your solution used Peter's scheduler. But in any case, the challenge itself is still valid - it could be done either using your time slicer, Peter's scheduler - or simply some very efficient hand-crafted code.
My main interest is simply in reducing the number of cogs currently required for Catalina drivers - this would give Catalina programs more cogs on which to run application code.
While multithreading Catalina is something I've long wanted to do, it hasn't proved practical yet. My original design for Catalina included a subsidiary kernel that was going to support multithreading - but my attempts to do this so far have required me to remove so much of the kernel functionality that I need yet another code generator to generate code for it. This really complicates everything, since it means Catalina now has to know at compile time where the code is to be executed (whereas the ideal situation is for this to be determined at run time- i.e. only running multiple threads on the one cog when there are no more available cogs). This in turn eliminates one of Catalina's main features - i.e. that it provides a sophisticated "Hardware Abstraction Layer" which allows the same Catalina program to run unmodified on any Propeller platform (since the number of cogs available changes depending on the device drivers - which are platform dependent).
However, while typing this I've just thought of a fairly simple idea that may make this unnecessary. I'll have to give this some more thought.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Wondering how to set tile colors in the graphics_demo.spin?
Safety Tip: Life is as good as YOU think it is!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Wondering how to set tile colors in the graphics_demo.spin?
Safety Tip: Life is as good as YOU think it is!
Bill: I got a up close and detailed demo of Peter's threading stuff along with Chip during the late night hours of UPEW, and it seemed to me to be reaonably deterministic. I guess it just depends on the speed you are going after and how well behaved your "threads" are...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out the Propeller Wiki·and contribute if you can.
This shows just how important it is for everyone
to learn and use PASM.
With the prop you just have to know PASM to do
really good work. The prop's assembly language
is a joy to work with compared to some other
processors.
Interleaving would be useful for a non-LMM threaded cog where there was no need for branching or looping - but that is of marginal utility.
I'm glad to hear you like TLMM - the whole intention was to be able to easily write multi-driver cogs, so that we don't "waste" cogs in our products!
RossH:
No worries! Peter's schedule and TLMM are both great approaches for getting more work done in a single cog, precisely so that we can all save cogs.
Hmm.. maybe Peter and I should start a "Save the COGs!" foundation...
For multi-threading Catalina, after careful thought, I think you would be better off having a small scheduler and swapping out the state of the LMM kernel every X*100 LMM instructions. Less overhead, but less fine grained than TLMM - but that is fine for "business logic".
kwin:
Thank you!
For a lot of uses Peter's scheduler is actually superior - ie if you have threads that need to be woken up at specific times or based on specific events (pins going high/low).
TLMM may be somewhat better when you are trying to fit several high speed serial ports into a cog.
potatohead:
TLMM is not intended to do video drivers!
Having said that, a TV driver *might* be doable if assigned several thread slots in order to give it about half of the cog's time. I'd have to do timing calculations to verify this.
It should however be possible to write a PASM TV driver that executed one LMM instruction per WAITVID, for a slow background thread.
Roy Eltham:
I totally agree - that's why in my previous post I said:
I am aware that Peter has a pretty sophisticated scheduler, however I don't believe it could reliably schedule (with more than a single thread running) so that four threads run every 1us.
Conceptually, you can think of Peter's scheduler as a "real" scheduler for multiple threads, and you can think of TLMM as a way of turning one cog into four/nine/... baby slow cogs, which are implemented by round-robbin time slicing a cog, giving each thread one LMM instruction before passing control to the next.
HollyMonkowski
Thank you!
I could not agree with you more about the importance of learning PASM, and how powerful it is.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Wondering how to set tile colors in the graphics_demo.spin?
Safety Tip: Life is as good as YOU think it is!
Anyway, my point was that it is dealt with in the keyboard driver software, so unless the led is required then a pin can be saved.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Actually the more I think about it, the more I like interleaving two LMM threads (one per waitvid) with a TV driver... perfect for also doing the keyboard and mouse driver in that cog!
Heck, there may even be enough time for a 38.4kbps serial driver too.
What's nice about TLMM is that the threads are pretty much normal PASM (as long as I save/restore the flags) - no need for SLEEP, no need to call a scheduler regularly. The down side is that threads need to busy wait for pin states or time if they need them as trigger events - so Peter's scheduler and TLMM are complementary, not competing with each other.
(i figured it was something like that for the don't care)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
This would be especially handy where we are using the same serial code for multiple serial connections, etc. We could then initialize the VM thread with the code block to use as well as the stack block to use. Doing that, we could easily have something like 3 serial drivers + 1 keyboard driver = 4 stacks - giving us 4 logical threads, but only using two sets of instructions in cog ram.
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
This would fetch the code for the different threads from inside the cog; however any hub reference will cause a 200ns-300ns "offset" to running the next threads.
' i know - flags are not preserved... but interesting thought experiment - 5x >1MIPS in-cog threads! ' a thread can perform an absolute jump by 'movs threadN,#label' org 0 start movs thread1,#thread1_code movs thread2,#thread2_code movs thread3,#thread3_code movs thread4,#thread4_code movs thread5,#thread5_code thread1 mov ins1,0-0 add thread1,#1 ins1 nop thread2 mov ins2,0-0 add thread2,#1 ins2 nop thread3 mov ins3,0-0 add thread3,#1 ins3 nop thread4 mov ins4,0-0 add thread4,#1 ins4 nop thread5 mov ins5,0-0 add thread5,#1 ins5 nop jmp #thread1 thread1_code ' do stuff thread2_code ' do stuff thread3_code ' do stuff thread4_code ' do stuff thread5_code ' do stuff▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 4:25:45 PM GMT
' save flags
muxnz flags,#1
rcl flags,#1
' restore flags
rcr flags,#1 wz
Note, the above are untested, and are probably exactly the same as in the thread about saving/restoring the flags that I can't find....
Or they simply might not work.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Yep, I agree, but it could still allow a nice approach if speed wern't important. The thing to point out here is it's always going to be slower than PASM, that's what this trade off is, but this process is actually using cycles that were going to be wasted by the cog anyway. So, any gain we get is still a gain, and it's a real bonus if we're eliminating the need for additional cogs, right?
Bill
Where something like that is useful is in what I suggested to heater, swapping out the context every 100us or so, for "high level" threading of C apps.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Bill
Without indirect addressing capabilities in the Propeller, re-using code for several iterations is a real bear. It probably takes a similar amount of code to deal with that as it does to just make several copies.
That said however, it IS what I am pursuing with loading the tiny drivers from HubRam, so maybe an efficient method can be uncovered.
Cheers,
Peter (pjv)
I tend to agree [noparse]:)[/noparse]
FYI, I think your scheduler is far superior to TLMM for most scheduled tiny drivers, and I plan on using your scheduler soon [noparse]:)[/noparse]
The i2c thread just reminded me of my musings in '06 about making a multi-threaded LMM, which lead me to think of how to make it deterministic at the 1us level. Then frustrations with debugging SPI RAM issues on a new PCB lead me to whip up a quick sample TLMM implementation [noparse]:)[/noparse] [noparse]:)[/noparse] [noparse]:)[/noparse]
Unfortunately having to save/restore flags reduces the throughput per thread; two threads would run at 1.25MIPS each, three at 0.714MIPS each, four at 0.55MIPS each, five at 0.45MIPS... seven threads at 0.333333MIPS
MIPS PER THREAD = (COG_MIPS) / (8*NUM_THREADS + 4)
For TCOG, the variant that is less deterministic, but runs totally in-cog, but hub access messes up determinism big time here...
MIPS PER THREAD = (COG_MIPS) / (6*NUM_THREADS + 1)
Regards,
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 6:48:36 PM GMT
Keyboard and mouse drivers have a clock frequency between 10-16.7khz
0.333MIPS / 10khz = 33.3 instructions per keyboard bit - piece of cake.
0.333MIPS / 16.7Khz = 20 instructions per bit
So we know that a single TLMM cog can easily support up to seven keyboards or mice!
Here is a hypothetical 7 cog thread allocation:
thread 0 = keyboard
thread 1 = mouse
thread 2 = 19.2KBPS comm port
leaving four more threads available for additional drivers for which 0.333MIPS is fast enough!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 7/9/2010 7:00:39 PM GMT
I hope we can help Peter with his driver loader.
I've never heard of anyone trying to do multi-thread chunk overlay loading
except for this TLMM which is essentially a one long at a time overlay loader.
I guess a question is: do the scheduler code threads need to be swapped often?
Having a tiny unrestricted overlay loader would be useful. Many of us swap
out blocks of memory as required to run PASM chunks, the most efficient
cases appear to have predetermined begin/end points. I cheat and zero
terminate chunks (NOP must be non-zero and data is not interleaved in code).
I use the unrolled read 4 long then jump method because several fragments
required by the JVM have less than 6 instructions and use on COG macros.
Loading 6 instructions with an unrolled loop is essentially as fast and may
be faster than a perfect window timing loader because of lower overhead.
The things good about chunks over LMM style are that once loaded the PASM
can run at speed, natural jumps can be used, and on COG service macro
routines are accessible directly. The bad thing about chunks is that for the
most part they have to be predefined. LMM does not suffer from predefinition.
If there was a way for chunks to be compiled and used generically that would
be great ... FCACHE does this a little, but one still needs to use LMM macros
and if I remember correctly, you have to declare code as FCACHE-able.
BTW: how about a little golf challenge (maybe another thread):
How many instructions and registers does it take you to save/restore C & Z?
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Nope. Can't be.
LMM's theoretical maximum execution rate is 5MIPS for "pure" LMM code.
With a four-way unrolled LMM inner loop, 80% of that ideal is achieved, ie 4MIPS for a single thread.
An eight-way unrolled LMM inner loop would hit (8/9)*5MIPS ie 4.44MIPS for "pure" LMM code
Mind you, it is more complicated than that. Executing hub instructions slows things down, using FCACHE properly speeds things up dramatically.
A very good optimizing compiler, or good programmer, should be able to hit 15-19 MIPS, depending no how much advantage it can take of FCACHE.
I agree!
Yep, Peter's approach is very interesting.
Umm.. that's not how I'd describe it, nor do I think of it that way.
Basically, TLMM is an unrolled LMM loop (unroll factor = # threads) where instead of running the same thread, each read/exec cycle executes a different thread.
Theoretically, for non-highly deterministic threads, TLMM could use kernel primitives, FCACHE, etc, to run multiple compiled LMM threads on the same cog.
Heck, it could be independent processes!
Vi would probably run usably at 0.5MIPS!
This would trade almost-determinism for running large multi-threaded user code ("business logic"). That's the role I originally envisioned for multi-threaded LMM kernels a couple of years ago; it only dawned on me to use it for (almost) deterministic medium-speed drivers when I started this thread.
That wold depend on the application.
Sounds like a perfect fit for your JVM!
FCACHE does exactly that.
I think you may wish to re-read my postings on FCACHE in my original thread [noparse]:)[/noparse]
In my original kernel, FCACHE loads code into the $080-$0FF range.
Code that is loaded there runs at full speed, and can use regular jumps, djnz etc within that range.
The intention was, for example, that str*() functions would be FCALL'ed
LMM code would do the initialization (as it is faster than loading the FCACHE'd block)
but the actual working loop would be FCACHE'd - so all the body of the str*() functions, the loops, would run at full speed!
Same for memcpy() and friends.
Heck, FltDIV / FltMul loops would also approach raw pasm speeds!
A few posts above I showed some code that should save in two instructions, and restore in only one, and requires one flag register per thread. I am pretty sure it is exactly the same (or functionally the same) as the code posted in the original threads that talked about saving/restoring flags... which I could not find!
I'd love to see a way to save both C and Z in one instruction... but I just don't see how that could be done.
(Having fun)
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system