If marking a function with HUBTEXT is quite as fast as LMM mode that's absolutely fine for me. Never had thought that this would work.
That's a good idea :-) I think that thats quite powerful. You can have a function in HUB and other parts (the function calls another one, which is not HUBTEXT tagged) in XMM memory.
The parts which should be fast are in HUB and the parts which are not necessary to run fast can live in XMM memory.
If marking a function with HUBTEXT is quite as fast as LMM mode that's absolutely fine for me. Never had thought that this would work.
That's a good idea :-) I think that thats quite powerful. You can have a function in HUB and other parts (the function calls another one, which is not HUBTEXT tagged) in XMM memory.
The parts which should be fast are in HUB and the parts which are not necessary to run fast can live in XMM memory.
That's the idea. Also, we put things in hub memory that can't be interrupted by a cache miss. By the way, each XMM COG will require its own cache and I believe that will currently be the same size as the cache in the original XMM COG so you probably want to reduce the cache size from 8k if you think you're going to use multiple XMM COGs even if those COGs are going to be running HUBTEXT functions.
That's the idea. Also, we put things in hub memory that can't be interrupted by a cache miss. By the way, each XMM COG will require its own cache and I believe that will currently be the same size as the cache in the original XMM COG so you probably want to reduce the cache size from 8k if you think you're going to use multiple XMM COGs even if those COGs are going to be running HUBTEXT functions.
I think I will try 4k or 2k cache size. It could be that I will come up with a question regarding cache size settings in the config file...
Btw: How does the release branch handle the cognew function when running in XMM mode? Because all the code is in external memory and the release branch does only support one XMM cog.
So does the release branch, when running a program in XMM mode, only support one running cog (except .cogc programs)?
I think I will try 4k or 2k cache size. It could be that I will come up with a question regarding cache size settings in the config file...
Btw: How does the release branch handle the cognew function when running in XMM mode? Because all the code is in external memory and the release branch does only support one XMM cog.
So does the release branch, when running a program in XMM mode, only support one running cog (except .cogc programs)?
As you can probably guess, cognew only works if the code you're loading is in hub memory. There is a set of macros that I defined to load drivers that might reside in external memory. I'll try to dig out a description of them.
As you can probably guess, cognew only works if the code you're loading is in hub memory. There is a set of macros that I defined to load drivers that might reside in external memory. I'll try to dig out a description of them.
Thanks David! It would be good to also know the macro which starts a new cog with code in xmm memory (default branch).
Thanks David! It would be good to also know the macro which starts a new cog with code in xmm memory (default branch).
The macros are at the end of propeller.h. Basically, the are use_cog_driver and load_cog_driver. They each take the root name of the file containing the driver as a parameter. For example, if the driver is compiled into a file called mydriver.dat, the symbol to use will be "mydriver". I'll try to put together a better description later. If you want to see an example of this in use you can look at demos/ebasic3 in the file db_vmint.c.
#ifdefined(__PROPELLER_USE_XMM__)
/**
* @brief Make the load symbols available for a driver
* @param id The COG driver name
*/#defineuse_cog_driver(id) externuint32_tbinary_##id##_dat_start[], binary_##id##_dat_end[]/**
* @brief Get a hub memory buffer containing a driver image
* @param id The COG driver name
*/#defineget_cog_driver(id) \
get_cog_driver_xmm( \
binary_##id##_dat_start, \
binary_##id##_dat_end-binary_##id##_dat_start)
/**
* @brief Load a COG driver
* @param id The COG driver name
* @param param Parameter to pass to the driver
* @returns the id of the COG that was loaded
*/#defineload_cog_driver(id, param) \
load_cog_driver_xmm( \
binary_##id##_dat_start, \
binary_##id##_dat_end-binary_##id##_dat_start, \
(uint32_t *)(param))
uint32_t *get_cog_driver_xmm(uint32_t *code, uint32_tcodelen);
intload_cog_driver_xmm(uint32_t *code, uint32_tcodelen, uint32_t *params);
#else/**
* @brief Make the load symbols available for a driver
* @param id The COG driver name
*/#defineuse_cog_driver(id) externuint32_tbinary_##id##_dat_start[]/**
* @brief Get a hub memory buffer containing a driver image
* @param id The COG driver name
*/#defineget_cog_driver(id) (binary_##id##_dat_start) \
/**
* @brief Load a COG driver
* @param id The COG driver name
* @param param Parameter to pass to the driver
* @returns the id of the COG that was loaded
*/#defineload_cog_driver(id, param) cognew(binary_##id##_dat_start, (uint32_t *)(param))
The macros are at the end of propeller.h. Basically, the are use_cog_driver and load_cog_driver. They each take the root name of the file containing the driver as a parameter. For example, if the driver is compiled into a file called mydriver.dat, the symbol to use will be "mydriver". I'll try to put together a better description later. If you want to see an example of this in use you can look at demos/ebasic3 in the file db_vmint.c.
So driver does not mean a cogc driver, which has to fit in cog ram but can also be a bigger program which is in extended memory, if the project is compiled in xmm mode?
Do I understand that correctly that the "driver" to load by using load_cog_driver has to be an own program with it's own main function like cogc programs? So starting a new cog where code is in XMM can't be done like cognew
where just a pointer to the function in XMM is handed over to a cognewXMM function?
In your ebasic code, the program you start is a spin program which is converted to cpp using spin2cpp with a --dat option. If I don't use spin code, how can I create such .dat code from c source code?
Is that correct how I understood the load_cog_driver methodology? This looks quite complicated ;-)
I thought I can just start a new cog with a pointer to a function in XMM and this new code can simply access global variables of the code which runs in COG0 just like using cognew in LMM mode. :-)
So driver does not mean a cogc driver, which has to fit in cog ram but can also be a bigger program which is in extended memory, if the project is compiled in xmm mode?
Do I understand that correctly that the "driver" to load by using load_cog_driver has to be an own program with it's own main function like cogc programs? So starting a new cog where code is in XMM can't be done like cognew
where just a pointer to the function in XMM is handed over to a cognewXMM function?
In your ebasic code, the program you start is a spin program which is converted to cpp using spin2cpp with a --dat option. If I don't use spin code, how can I create such .dat code from c source code?
Is that correct how I understood the load_cog_driver methodology? This looks quite complicated ;-)
I thought I can just start a new cog with a pointer to a function in XMM and this new code can simply access global variables of the code which runs in COG0 just like using cognew in LMM mode. :-)
No, it is a COGC or PASM driver that fits entirely within a COG. But the code could be in external memory initially and copied temporarily to a hub memory buffer before calling cognew to load it into a COG. Putting it in external memory is a way to save space in hub memory if there are a number of drivers to load. Then only a single hub buffer will be required which should require less space than a number of COG images all in hub memory. If your program only uses a single COG driver it won't help much if at all.
The program I load in ebasic3 is a PASM program contained in a .spin file. It has no Spin code in it and I use spin2cpp just to extract the binary COG image into a .dat file.
No, it is a COGC or PASM driver that fits entirely within a COG. But the code could be in external memory initially and copied temporarily to a hub memory buffer before calling cognew to load it into a COG. Putting it in external memory is a way to save space in hub memory if there are a number of drivers to load. Then only a single hub buffer will be required which should require less space than a number of COG images all in hub memory. If your program only uses a single COG driver it won't help much if at all.
The program I load in ebasic3 is a PASM program contained in a .spin file. It has no Spin code in it and I use spin2cpp just to extract the binary COG image into a .dat file.
Ah, I see. So the load_cog_driver saves hub space because the cog driver can be in XMM, that's good to know. I will have some cog drivers which then will not use hubspace (only the buffer).
But how do I start a new cog which executes code from XMM memory just like cog 0? Did I completely misunderstood the multi-cog XMM mode?
I thought it is possible to have a very big program, which is in XMM memory and multiple cogs can execute parts of the program just like in LMM mode, where the code is in Hub and cognew starts a cog which a pointer to code in hub.
So that isn't possible with multiple xmm cogs? I always thought that's possible?
Ah, I see. So the load_cog_driver saves hub space because the cog driver can be in XMM, that's good to know. I will have some cog drivers which then will not use hubspace (only the buffer).
But how do I start a new cog which executes code from XMM memory just like cog 0? Did I completely misunderstood the multi-cog XMM mode?
I thought it is possible to have a very big program, which is in XMM memory and multiple cogs can execute parts of the program just like in LMM mode, where the code is in Hub and cognew starts a cog which a pointer to code in hub.
So that isn't possible with multiple xmm cogs? I always thought that's possible?
I think that's done with cogstart although Eric did that code so I'm not sure.
$ less cogstart.c
#include<propeller.h>#include<sys/thread.h>#include<errno.h>/*
* start C code running in another cog
* returns -1 on failure, otherwise the
* id of the new cog
* "func" is the function to start running
* "arg" is the argument
* "stack" is the base of the new process' stack
* "stack_size" is the size of the stack area in bytes
* NOTE: this is a raw low-level function; the
* pthreads functions may be more useful
*/#if defined(__PROPELLER_USE_XMM__)#define EXTRA_STACK_SIZE (1024+128+32) /* space for cache lines and tags */#else#define EXTRA_STACK_SIZE 16#endifintcogstart(void (*func)(void *), void *arg, void *stack, size_t stack_size){
_thread_state_t *tls;
unsignedint *sp;
/* check the stack size */if (stack_size < sizeof(_thread_state_t) + EXTRA_STACK_SIZE) {
errno = EINVAL;
return-1;
}
/* put the thread local storage structure onto the stack */
tls = (_thread_state_t *)((char *)stack + stack_size - sizeof(_thread_state_t)
);
sp = (unsignedint *)tls;
return _start_cog_thread(sp, func, arg, tls);
}
Does the if defined(__PROPELLER_USE_XMM__) indicate, that cogstart can also be used in XMM mode? That would be exactly the feature I was looking for.
The documentation states, that cogstart cannot be used in XMM mode. But the code does have some specific XMM checks and alternative stack size calculation if mode is XMM.
Maybe the documentation is not up to date?
Does the if defined(__PROPELLER_USE_XMM__) indicate, that cogstart can also be used in XMM mode? That would be exactly the feature I was looking for.
Yes, it should work in XMM modes but it will work best in xmmc mode. The other modes where data can be in external memory will be hard to manage since each XMM COG has its own cache and shared variables won't work between XMM COGs because of cache coherency problems. We need to add a way to flush the cache to support shared data but that isn't done yet. Also, some library functions like malloc use globals that would have to be protected from cache coherencey problems as well for a general solution. Best to stick with xmmc if you're going to have multiple XMM COGs unless they are relatively independent and don't use the standard C library functions that have globals.
The documentation states, that cogstart cannot be used in XMM mode. But the code does have some specific XMM checks and alternative stack size calculation if mode is XMM.
Maybe the documentation is not up to date?
Right. The docs are out of date. That's one of the things that needs to be done before the default branch can be released.
#include<stdlib.h>#include<stdio.h>#include<propeller.h>/**
* This is the main XMMTest program file.
*/extern _Driver _SimpleSerialDriver;
_Driver *_driverlist[] = {
&_SimpleSerialDriver,
NULL
};
staticint changeMe = 0;
voidrunMeInXmmMode(void* arg){
changeMe = 1000;
}
intmain(void){
waitcnt(80000000 + CNT);
printf("Main started in XMM mode...\n");
printf("Start function in xmm mode...\n");
int stacksize = sizeof(_thread_state_t) + 2000; // Some big stacksize to exceed the minimum stack size defined in cogstart.cint *stack = (int*) malloc(stacksize);
int cog;
// start the cog
cog = cogstart(runMeInXmmMode, (void*) cog, stack, stacksize);
printf("Started cog id %d\n", cog);
waitcnt(160000000 + CNT);
printf("changeMe = %d\n", changeMe);
}
In XMMC mode, it works well, the changeMe variable is 1000 after the 2 second wait.
In XMM-SPLIT mode, the variable is still 0. Is this the problem you mentioned with regards to cache coherency issues? The value 0 is still in the cache of the main xmm function even function runMeInXmmMode changed it?
So basically it wouldn't work to run XMM-SPLIT in multi-xmm-cog mode? My code will be quite independend, but I planned to have sd access using stdio.h in different xmm cogs. So such standard c functions make problems
in XMM-SPLIT and only one cog should use these functions because they have static variables and other xmm cogs shouldn't use them. Just my own code which doesn't have global variables?
#include<stdlib.h>#include<stdio.h>#include<propeller.h>/**
* This is the main XMMTest program file.
*/extern _Driver _SimpleSerialDriver;
_Driver *_driverlist[] = {
&_SimpleSerialDriver,
NULL
};
staticint changeMe = 0;
voidrunMeInXmmMode(void* arg){
changeMe = 1000;
}
intmain(void){
waitcnt(80000000 + CNT);
printf("Main started in XMM mode...\n");
printf("Start function in xmm mode...\n");
int stacksize = sizeof(_thread_state_t) + 2000; // Some big stacksize to exceed the minimum stack size defined in cogstart.cint *stack = (int*) malloc(stacksize);
int cog;
// start the cog
cog = cogstart(runMeInXmmMode, (void*) cog, stack, stacksize);
printf("Started cog id %d\n", cog);
waitcnt(160000000 + CNT);
printf("changeMe = %d\n", changeMe);
}
In XMMC mode, it works well, the changeMe variable is 1000 after the 2 second wait.
In XMM-SPLIT mode, the variable is still 0. Is this the problem you mentioned with regards to cache coherency issues? The value 0 is still in the cache of the main xmm function even function runMeInXmmMode changed it?
So basically it wouldn't work to run XMM-SPLIT in multi-xmm-cog mode? My code will be quite independend, but I planned to have sd access using stdio.h in different xmm cogs. So such standard c functions make problems
in XMM-SPLIT and only one cog should use these functions because they have static variables and other xmm cogs shouldn't use them. Just my own code which doesn't have global variables?
There will be a problem with any global variable that is accessed by multiple XMM COGs whether that variable is in your own code or from a library.
I have a question with regards to start a new cog in the default branch. Is it by default, that the cog runs in xmm mode, when xmm is set when compiling the sources or do I have to use another function to do that?
Is there also a way to start a new cog in LMM mode?
New COGs will start in whatever mode the program was compiled for (so if it was compiled for xmm, then the new COG will also be xmm).
There's not really any way to mix LMM and XMM, since the libraries have to be built differently. As Steve pointed out it is possible to force a function to be in HUB memory, which would give you some of the benefits of LMM mode (but it's still really XMM mode, just running from HUB).
Ok, I did a simple testprogram
In XMMC mode, it works well, the changeMe variable is 1000 after the 2 second wait.
In XMM-SPLIT mode, the variable is still 0. Is this the problem you mentioned with regards to cache coherency issues? The value 0 is still in the cache of the main xmm function even function runMeInXmmMode changed it?
Yes. Practically speaking it's not possible to use multiple C COGs in XMM-SPLIT mode (or any XMM mode other than XMMC). I said "practically speaking" because in theory if all the variables that are shared are explicitly placed in HUB memory so as to avoid cache coherency problems then it will work. Actually getting this right will be difficult for any non-trivial programs, because there are some variables in the libraries that the user can't change the location of. If you avoid any use of library functions in the other COGs then it might work.
My code will be quite independend, but I planned to have sd access using stdio.h in different xmm cogs. So such standard c functions make problems
stdio will definitely not work in XMM-SPLIT mode when run on multiple COGs. If you really need XMM-SPLIT and multiple COGs then you'll have to restrict all the library calls to one COG (probably the first one started).
Fixing this is not easy at all -- it's not even really practical to disable the cache, because the cache is the only way to read data from external memory.
Yes. Practically speaking it's not possible to use multiple C COGs in XMM-SPLIT mode (or any XMM mode other than XMMC). I said "practically speaking" because in theory if all the variables that are shared are explicitly placed in HUB memory so as to avoid cache coherency problems then it will work.
I tried that and added the HUBDATA annotation to the global variable definition.
HUBDATA staticint changeMe = 0;
But this did't solve the issue that in XMM-SPLIT the variable is still 0 after the second XMM cog changed the value to 1000.
Is there anything else I have to consider?
It's absolutely fine for me to use standard library functions only in one cog and the other XMM cogs are just executing 100% user code from me.
The plan was to use some shared variables in HUB which can be accessed by all cogs but my test shows that shared variable in HUB also make problems.
Did I do something wrong?
I'm sorry, I didn't want to ask too many questions. I just saw the light at the end of the tunnel when I read that:
...if all the variables that are shared are explicitly placed in HUB memory so as to avoid cache coherency problems then it will work
I just wanted to share, that I tried it and it doesn't work. So maybe it just doesn't work or there is an issue that the cache gets touched - even if
a variable is in hub - when it shouldn't.
That's all.
Again, thanks to you, Dave and Eric who helped me a lot because now I made a big step forward due to all the help!
Christian, I'm not trying to discourage you. Your questions are very valuable.
It's just that the default branch is not in a release state, and you have discovered a major limit of a new feature (which has been discussed before).
Parallax education wanted multi-cog xmm so that lmm programs with multiple c function cog threads that outgrow propeller have a way to continue growing. This is possible with xmmc. We discussed deprecating xmm-single and xmm-split before exactly because of what you are seeing.
So, my question is: do we deprecate xmm-single and xmm-split or do we go back to the single c function model? The answer is greatly dependent on what Parallax wants with input from users.
I think that if it is not solvable, that variables, which are in HUB, can be shared between xmm cogs, xmm-single and split do not make sense with
multiple xmm cogs. I would not completely deprecate or remove these features. Maybe just allow one xmm cog in -single and -split mode.
If thats just a bug, that shared variables which live in HUB can't be accessed by different xmm cogs, then xmm-single and split might be quite useful after fixing it.
But then the documentation has to clearly state that the standard c libraries are not thread safe and can only be used in one cog, most likely the first one.
I will use XMMC for now and will send my sram to sleeping mode, because I don't need it ;-)
I tried that and added the HUBDATA annotation to the global variable definition.
HUBDATA staticint changeMe = 0;
But this did't solve the issue that in XMM-SPLIT the variable is still 0 after the second XMM cog changed the value to 1000.
Is there anything else I have to consider?
The variable changeMe probably shouldn't be "static", and should in fact be marked "volatile" since it can be changed by another COG. But doing that still doesn't help, which is very strange. Theoretically it should work, at least according to my understanding, but XMM-SPLIT is not a supported configuration and we haven't tested that mode much at all. Obviously there is some other issue that's preventing your program from working.
The toggle demo runs, but that's the only one I've actually tested in XMM mode.
But you can still use your SRAM for XMMC. I suppose you were planning to use it for data though.
It is also still possible to use a separate external SRAM for data given the right circumstances. What do you need it for exactly?
My plan was to have all the program data and objects in sram and the fast access data in hub ram. I have two big video buffers which need to be in hub for speed reasons. One is filled with content from sd card
while the other one is shifted out to the dmd by a cogc program and vice versa. Also the xmm cog cache fills hub for some amount and wav audio buffers and planned shared variables.
So I feared that I may run out of hub if all the program data/objects/variables also have to be in HUB.
I could use the sram for video buffering but maybe access is too slow because it has to be shifted out quite fast to be able to have >= 100 Hz refresh rate on the DMD.
I will continue with my program using XMMC and I hope I will not have 90% finished finding out that I run out of HUB :-)
Yes. Practically speaking it's not possible to use multiple C COGs in XMM-SPLIT mode (or any XMM mode other than XMMC). I said "practically speaking" because in theory if all the variables that are shared are explicitly placed in HUB memory so as to avoid cache coherency problems then it will work. Actually getting this right will be difficult for any non-trivial programs, because there are some variables in the libraries that the user can't change the location of. If you avoid any use of library functions in the other COGs then it might work.
I've been thinking about this and it's actually worse than you think. Even if your two XMM COGs share no variables at all they might still get in trouble if a variable from one COG is in the same cache line as a variable from another COG and both are updating their private variables at the same time. I'm beginning to think there is no practical use for xmm-single and xmm-split in multi-COG XMM mode. Maybe Jeff was right that we should remove them.
Comments
That's a good idea :-) I think that thats quite powerful. You can have a function in HUB and other parts (the function calls another one, which is not HUBTEXT tagged) in XMM memory.
The parts which should be fast are in HUB and the parts which are not necessary to run fast can live in XMM memory.
I think I will try 4k or 2k cache size. It could be that I will come up with a question regarding cache size settings in the config file...
Btw: How does the release branch handle the cognew function when running in XMM mode? Because all the code is in external memory and the release branch does only support one XMM cog.
So does the release branch, when running a program in XMM mode, only support one running cog (except .cogc programs)?
#if defined(__PROPELLER_USE_XMM__) /** * @brief Make the load symbols available for a driver * @param id The COG driver name */ #define use_cog_driver(id) extern uint32_t binary_##id##_dat_start[], binary_##id##_dat_end[] /** * @brief Get a hub memory buffer containing a driver image * @param id The COG driver name */ #define get_cog_driver(id) \ get_cog_driver_xmm( \ binary_##id##_dat_start, \ binary_##id##_dat_end - binary_##id##_dat_start) /** * @brief Load a COG driver * @param id The COG driver name * @param param Parameter to pass to the driver * @returns the id of the COG that was loaded */ #define load_cog_driver(id, param) \ load_cog_driver_xmm( \ binary_##id##_dat_start, \ binary_##id##_dat_end - binary_##id##_dat_start, \ (uint32_t *)(param)) uint32_t *get_cog_driver_xmm(uint32_t *code, uint32_t codelen); int load_cog_driver_xmm(uint32_t *code, uint32_t codelen, uint32_t *params); #else /** * @brief Make the load symbols available for a driver * @param id The COG driver name */ #define use_cog_driver(id) extern uint32_t binary_##id##_dat_start[] /** * @brief Get a hub memory buffer containing a driver image * @param id The COG driver name */ #define get_cog_driver(id) (binary_##id##_dat_start) \ /** * @brief Load a COG driver * @param id The COG driver name * @param param Parameter to pass to the driver * @returns the id of the COG that was loaded */ #define load_cog_driver(id, param) cognew(binary_##id##_dat_start, (uint32_t *)(param))
So driver does not mean a cogc driver, which has to fit in cog ram but can also be a bigger program which is in extended memory, if the project is compiled in xmm mode?
Do I understand that correctly that the "driver" to load by using load_cog_driver has to be an own program with it's own main function like cogc programs? So starting a new cog where code is in XMM can't be done like cognew
where just a pointer to the function in XMM is handed over to a cognewXMM function?
In your ebasic code, the program you start is a spin program which is converted to cpp using spin2cpp with a --dat option. If I don't use spin code, how can I create such .dat code from c source code?
Is that correct how I understood the load_cog_driver methodology? This looks quite complicated ;-)
I thought I can just start a new cog with a pointer to a function in XMM and this new code can simply access global variables of the code which runs in COG0 just like using cognew in LMM mode. :-)
The program I load in ebasic3 is a PASM program contained in a .spin file. It has no Spin code in it and I use spin2cpp just to extract the binary COG image into a .dat file.
Ah, I see. So the load_cog_driver saves hub space because the cog driver can be in XMM, that's good to know. I will have some cog drivers which then will not use hubspace (only the buffer).
But how do I start a new cog which executes code from XMM memory just like cog 0? Did I completely misunderstood the multi-cog XMM mode?
I thought it is possible to have a very big program, which is in XMM memory and multiple cogs can execute parts of the program just like in LMM mode, where the code is in Hub and cognew starts a cog which a pointer to code in hub.
So that isn't possible with multiple xmm cogs? I always thought that's possible?
The code looks as follows:
$ less cogstart.c #include <propeller.h> #include <sys/thread.h> #include <errno.h> /* * start C code running in another cog * returns -1 on failure, otherwise the * id of the new cog * "func" is the function to start running * "arg" is the argument * "stack" is the base of the new process' stack * "stack_size" is the size of the stack area in bytes * NOTE: this is a raw low-level function; the * pthreads functions may be more useful */ #if defined(__PROPELLER_USE_XMM__) #define EXTRA_STACK_SIZE (1024+128+32) /* space for cache lines and tags */ #else #define EXTRA_STACK_SIZE 16 #endif int cogstart(void (*func)(void *), void *arg, void *stack, size_t stack_size) { _thread_state_t *tls; unsigned int *sp; /* check the stack size */ if (stack_size < sizeof(_thread_state_t) + EXTRA_STACK_SIZE) { errno = EINVAL; return -1; } /* put the thread local storage structure onto the stack */ tls = (_thread_state_t *)((char *)stack + stack_size - sizeof(_thread_state_t) ); sp = (unsigned int *)tls; return _start_cog_thread(sp, func, arg, tls); }
Does the if defined(__PROPELLER_USE_XMM__) indicate, that cogstart can also be used in XMM mode? That would be exactly the feature I was looking for.
Maybe the documentation is not up to date?
#include <stdlib.h> #include <stdio.h> #include <propeller.h> /** * This is the main XMMTest program file. */ extern _Driver _SimpleSerialDriver; _Driver *_driverlist[] = { &_SimpleSerialDriver, NULL }; static int changeMe = 0; void runMeInXmmMode(void* arg) { changeMe = 1000; } int main(void) { waitcnt(80000000 + CNT); printf("Main started in XMM mode...\n"); printf("Start function in xmm mode...\n"); int stacksize = sizeof(_thread_state_t) + 2000; // Some big stacksize to exceed the minimum stack size defined in cogstart.c int *stack = (int*) malloc(stacksize); int cog; // start the cog cog = cogstart(runMeInXmmMode, (void*) cog, stack, stacksize); printf("Started cog id %d\n", cog); waitcnt(160000000 + CNT); printf("changeMe = %d\n", changeMe); }
In XMMC mode, it works well, the changeMe variable is 1000 after the 2 second wait.
In XMM-SPLIT mode, the variable is still 0. Is this the problem you mentioned with regards to cache coherency issues? The value 0 is still in the cache of the main xmm function even function runMeInXmmMode changed it?
So basically it wouldn't work to run XMM-SPLIT in multi-xmm-cog mode? My code will be quite independend, but I planned to have sd access using stdio.h in different xmm cogs. So such standard c functions make problems
in XMM-SPLIT and only one cog should use these functions because they have static variables and other xmm cogs shouldn't use them. Just my own code which doesn't have global variables?
Is there a way to completely disable the cache (for testing purposes)?
-mno-fcache doesn't disable the cache itself, only the fast cache?
There's not really any way to mix LMM and XMM, since the libraries have to be built differently. As Steve pointed out it is possible to force a function to be in HUB memory, which would give you some of the benefits of LMM mode (but it's still really XMM mode, just running from HUB).
Eric
stdio will definitely not work in XMM-SPLIT mode when run on multiple COGs. If you really need XMM-SPLIT and multiple COGs then you'll have to restrict all the library calls to one COG (probably the first one started).
Fixing this is not easy at all -- it's not even really practical to disable the cache, because the cache is the only way to read data from external memory.
Eric
HUBDATA static int changeMe = 0;
But this did't solve the issue that in XMM-SPLIT the variable is still 0 after the second XMM cog changed the value to 1000.
Is there anything else I have to consider?
It's absolutely fine for me to use standard library functions only in one cog and the other XMM cogs are just executing 100% user code from me.
The plan was to use some shared variables in HUB which can be accessed by all cogs but my test shows that shared variable in HUB also make problems.
Did I do something wrong?
Christian
It's pretty clear by now that xmm-single and xmm-split from the default branch will not do what you want.
I'm sorry, I didn't want to ask too many questions. I just saw the light at the end of the tunnel when I read that:
I just wanted to share, that I tried it and it doesn't work. So maybe it just doesn't work or there is an issue that the cache gets touched - even if
a variable is in hub - when it shouldn't.
That's all.
Again, thanks to you, Dave and Eric who helped me a lot because now I made a big step forward due to all the help!
Christian
It's just that the default branch is not in a release state, and you have discovered a major limit of a new feature (which has been discussed before).
Parallax education wanted multi-cog xmm so that lmm programs with multiple c function cog threads that outgrow propeller have a way to continue growing. This is possible with xmmc. We discussed deprecating xmm-single and xmm-split before exactly because of what you are seeing.
So, my question is: do we deprecate xmm-single and xmm-split or do we go back to the single c function model? The answer is greatly dependent on what Parallax wants with input from users.
multiple xmm cogs. I would not completely deprecate or remove these features. Maybe just allow one xmm cog in -single and -split mode.
If thats just a bug, that shared variables which live in HUB can't be accessed by different xmm cogs, then xmm-single and split might be quite useful after fixing it.
But then the documentation has to clearly state that the standard c libraries are not thread safe and can only be used in one cog, most likely the first one.
I will use XMMC for now and will send my sram to sleeping mode, because I don't need it ;-)
Christian
The toggle demo runs, but that's the only one I've actually tested in XMM mode.
It is also still possible to use a separate external SRAM for data given the right circumstances. What do you need it for exactly?
My plan was to have all the program data and objects in sram and the fast access data in hub ram. I have two big video buffers which need to be in hub for speed reasons. One is filled with content from sd card
while the other one is shifted out to the dmd by a cogc program and vice versa. Also the xmm cog cache fills hub for some amount and wav audio buffers and planned shared variables.
So I feared that I may run out of hub if all the program data/objects/variables also have to be in HUB.
I could use the sram for video buffering but maybe access is too slow because it has to be shifted out quite fast to be able to have >= 100 Hz refresh rate on the DMD.
I will continue with my program using XMMC and I hope I will not have 90% finished finding out that I run out of HUB :-)