Using threads for assembly code in a Cog

Kaio · 2007-03-02 22:54

Hi all,

I have developed a multithread kernel, that you can use some threads of assembly routines simultaneously in the same Cog. The code is well documented and a demo of usage is contained in the attached file.

Thomas

Jasper_M · 2007-03-03 12:07

I've been working on a similar project. Now I'm busy with other things, but I'll do a total rewrite of it when I have free time again.
Your code looks very clean when compared to mine... But here are some ideas (from my project) even though they don't probably serve the purpose of your code:

- Loading threads from HUB memory. It allows larger threads, but decreases performance a lot. Also self-modifying code might become a problem. It would also require some functions in the runtime for farjump, farcall etc.

- Not executing a single instruction a time, but loading the code in pages from hub ram. For example, my code uses 128 long page size. Then the code calls a runtime function at times, and the function decides whether to dump context in shared ram and load in next thread.

- External "supervisor" program running on a separate cog and checking if runtime on the "thread cog" is alive, and if it has not updated its status for some time (ie. a thread has deadlocked), it restarts the cog without the thread that caused the locking. (Not really useful for embedded applications, but great for my purposes). This is of course not useful at all with your current code, since the virtual processor approach efficiently prevents this (with the exception of wait instructions).

EDIT: One more idea - as the wait instructions are problematic, how about implementing waitstates for threads? So that the v. processor wouldn't even begin running the code if it's in a waitstate and the wait state's condition isn't met.

Post Edited (Jasper_M) : 3/3/2007 12:19:49 PM GMT

Kaio · 2007-03-03 22:47

Jasper_M said...

- Loading threads from HUB memory. It allows larger threads, but decreases performance a lot. Also self-modifying code might become a problem. It would also require some functions in the runtime for farjump, farcall etc.

- Not executing a single instruction a time, but loading the code in pages from hub ram. For example, my code uses 128 long page size. Then the code calls a runtime function at times, and the function decides whether to dump context in shared ram and load in next thread.

All this you can find in the concept of Bill Henning, please refer http://forums.parallax.com/showthread.php?p=615022.

Jasper_M said...

- External "supervisor" program running on a separate cog and checking if runtime on the "thread cog" is alive, and if it has not updated its status for some time (ie. a thread has deadlocked), it restarts the cog without the thread that caused the locking.

You can do this using a pin "listen" the supervisor for a level change in a defined time. In the thread Cog you must toggle the pin in the main loop. For this you cannot use waitpeq and waitpne because the supervisor would be sleep infinite if a deadlock occur. So you should have a time out loop in which you look for the level change on then pin.

Jasper_M said...

This is of course not useful at all with your current code, since the virtual processor approach efficiently prevents this (with the exception of wait instructions).

You are right with the exception, but I have advised about this in the documentation at begin of spin file, so the developer must take care.

Somebody said...

One more idea - as the wait instructions are problematic, how about implementing waitstates for threads? So that the v. processor wouldn't even begin running the code if it's in a waitstate and the wait state's condition isn't met.

Yes, I still have implemented this. For the waitcnt instruction you can use the equivalent function vWaitCnt in the kernel of my code. I have used this also in the demo code in thread 4, 6, 7 and 8. Currently the wait time is not exactly performed, but I'm working on this.

Jasper_M · 2007-03-04 10:17

Kaio said...

Jasper_M said...

- Loading threads from HUB memory. It allows larger threads, but decreases performance a lot. Also self-modifying code might become a problem. It would also require some functions in the runtime for farjump, farcall etc.

- Not executing a single instruction a time, but loading the code in pages from hub ram. For example, my code uses 128 long page size. Then the code calls a runtime function at times, and the function decides whether to dump context in shared ram and load in next thread.

All this you can find in the concept of Bill Henning, please refer http://forums.parallax.com/showthread.php?p=615022.

Yes, I have read that thread. Your code is awesome for someone who wants to fit some relatively small drivers (A/D, LCD, sensors) on the same cog. It isn't enough for my purposes though (for example, I have a filesystem driver and an EEPROM driver on a same cog and they don't fit in 384 longs).

By the way, what are you going to use your code for? (just curious [noparse];)[/noparse])

Kaio said...

Jasper_M said...

- External "supervisor" program running on a separate cog and checking if runtime on the "thread cog" is alive, and if it has not updated its status for some time (ie. a thread has deadlocked), it restarts the cog without the thread that caused the locking.

You can do this using a pin "listen" the supervisor for a level change in a defined time. In the thread Cog you must toggle the pin in the main loop. For this you cannot use waitpeq and waitpne because the supervisor would be sleep infinite if a deadlock occur. So you should have a time out loop in which you look for the level change on then pin.

Actually I mean changing a value in hub RAM. There'd be one long that would have the index of the current thread and another with the number of threads running. If number of threads > 1, the supervisor would keep reading the thread index (at the rate of about 1/2 of the timeslice) and if it hasn't changed for too long a time, stop the cog and restart it. But still this isn't useful with the virtual processor approach and with the assumption that the code works (which is quite often the case in embedded programming).

Kaio said...

Somebody said...

One more idea - as the wait instructions are problematic, how about implementing waitstates for threads? So that the v. processor wouldn't even begin running the code if it's in a waitstate and the wait state's condition isn't met.

Yes, I still have implemented this. For the waitcnt instruction you can use the equivalent function vWaitCnt in the kernel of my code. I have used this also in the demo code in thread 4, 6, 7 and 8. Currently the wait time is not exactly performed, but I'm working on this.

In multitasking environment it shouldn't matter if the timing is not exact (as it'd be nearly impossible to achieve) as long as the time waited would be at least the one specified (it would be bad for reliability if you'd drive the EEPROM or some other serial device too fast).

Kaio · 2007-03-04 21:11

Jasper_M said...

It isn't enough for my purposes though (for example, I have a filesystem driver and an EEPROM driver on a same cog and they don't fit in 384 longs).

You can comment out the code of _v_jmpret and _v_wait. This would release 21 longs. With some modifications you can release further 5 longs, that will only be used in context with vWaitCnt. When this will help, I could make a minimized version for you.

Jasper_M said...

By the way, what are you going to use your code for? (just curious [noparse];)[/noparse])

Currently I does not have any requirements for this. But I have read in some threads from guys which are using the Propeller to control stepper motors e.g. on their robots. They have used spin routines, but they need more speed. And such a routine in assembly is very small. I think, that a Cog could be control up to 8 stepper motors.

Jasper_M said...

In multitasking environment it shouldn't matter if the timing is not exact (as it'd be nearly impossible to achieve) as long as the time waited would be at least the one specified (it would be bad for reliability if you'd drive the EEPROM or some other serial device too fast).

Yes, you are right. This is also the current behaviour of my code. But then I should name this function sleep instead waitcnt, because the name suggest that it would wait exactly a defined time. That is right for the assembly instruction but not possible in the multitasking environment.

Jasper_M · 2007-03-04 21:49

No thanks, I don't need a minimized version as I'm still working on my own code with totally different purposes. Stepper motor control sounds like a good application, I'd not thought before about running multiple threads with the same code (you meant that, right? One thread per motor?). I also assume you're talking about micro-stepping... Maybe servo-control is a similar application, I think I saw some post about using one cog for each servo...

Kaio · 2007-03-05 16:41

Jasper_M said...

Stepper motor control sounds like a good application, I'd not thought before about running multiple threads with the same code (you meant that, right? One thread per motor?)

Yes, I think it is possible. I would try to use the algorithm from BTX to implement in assembly.

Using threads for assembly code in a Cog

Comments