Quick-n-dirty float32 hack for sharing it between multiple objects
M. K. Borri
Posts: 279
This is a quick and dirty hack used to allow multiple objects to share a "math coprocessor" cog. It's the best I could think of, please tell me if it's been done before as it's probably been done better [noparse]:)[/noparse]
Very simple -- basically it starts a cog (waiting if there's none available), does the operation, then stops it.
Post Edited (M. K. Borri) : 10/20/2006 4:35:35 PM GMT
Very simple -- basically it starts a cog (waiting if there's none available), does the operation, then stops it.
Post Edited (M. K. Borri) : 10/20/2006 4:35:35 PM GMT
Comments
Post Edited (M. K. Borri) : 10/20/2006 4:55:47 PM GMT
512 longs to be read·* 16 clock cycles per hub access * 12.5ns per clock = 102.4us.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
Since it's just 1/10th of a millisecond more, I'm keeping this downloadable because it's still faster than doing everything in spin, some people may have a use for it and I'm going to use it until I do something better optimized [noparse]:)[/noparse]
still mostly worth it if you're not using the cog anyway...
What I'm thinking is, instead of messing with locks since the rest of the program may be using them, just upload the particular function I need to run into a cog -- would that be cheaper or does the hub have to send over the full 512 words anyway?
Post Edited (M. K. Borri) : 10/20/2006 8:40:47 PM GMT
Reserve an area of shared RAM as a message queue. Also reserve a 'head' and 'tail' pointer.
Any Cog that needs floating-point will:
- Reserve an additional chunk of shared RAM (say 1 word) for the result. (Note that modifying the queue from more than one cog may require a lock, unless you're very careful.)
- Wait until there is room in the queue.
- Write a message (format of your choosing) into the queue indicating the operation to perform, operands (immediate or address), and a pointer to that cog's result word.
- Wait until the FPU Cog notifies you that the operation is done (either by setting yet another word, incrementing the 'tail' pointer, or by simply changing the result word)
The FPU Cog would:
- Wait until there are messages in the queue.
- Process.
- Repeat.
This may be overly complicated, depending on how much floating point you're doing. The main advantage is that floating point operations can be made asynchronous -- rather than waiting for the lock, you can issue the operation when you have the operands, and block on the result only when you really need it. It also scales better: if throughput becomes an issue, you can dedicate additional Cogs to floating point, reading from the same message queue.
Having now reread something I hadn't looked at for more than six months, it's apparent that I could have explained things better. I'll try to answer any questions that come up as best I can.
A question came up in that thread about whether this object could be used for inter-Propeller communication. I believed it might be helpful and answered thus:
Another question came up regarding the advantages/disadvantages of this method, as opposed to just using global variables to manage the calling and to pass the data. Here, in part, is my answer:
Cheers!
Phil