Board design and performances
macca
Posts: 784
Hello All,
I would like to design a board that could use GCC's XMM modes in order to allow more than 32k of code and data.
I'm most concerned about performances of this solution and seems that I can't find much details about the impact of the various designs.
What I would like to do is:
- Being able to write programs with more than 32k of code and data, these programs may reside on a EEPROM or an SD card (not yet decided what may be best)
- Code should be run from the external memory but some portions must be run from hub memory for maximum performances, if I understand correctly this is already possible by using type modifiers like HUBTEXT
- Data should reside on HUB ram for maximum performances but with the ability to swap (or overwite) variables with data stored in the external memory. Say for example, the program needs to work on different data arrays, it moves the first array in HUB memory, work with it, then it reads another array from the external memory, work on it, and so on. Doesn't necessarily need to save the data back to external memory, I think these will be mostly read-only data.
What kind of design do you suggest ?
Is there some performances data that I could use to compare the impact of the design ? For example, the time needed to run a chunk of code off the HUB ram vs. external SPI ram vs. EEPROM, etc.
Thanks for your help.
Regards,
Marco.
I would like to design a board that could use GCC's XMM modes in order to allow more than 32k of code and data.
I'm most concerned about performances of this solution and seems that I can't find much details about the impact of the various designs.
What I would like to do is:
- Being able to write programs with more than 32k of code and data, these programs may reside on a EEPROM or an SD card (not yet decided what may be best)
- Code should be run from the external memory but some portions must be run from hub memory for maximum performances, if I understand correctly this is already possible by using type modifiers like HUBTEXT
- Data should reside on HUB ram for maximum performances but with the ability to swap (or overwite) variables with data stored in the external memory. Say for example, the program needs to work on different data arrays, it moves the first array in HUB memory, work with it, then it reads another array from the external memory, work on it, and so on. Doesn't necessarily need to save the data back to external memory, I think these will be mostly read-only data.
What kind of design do you suggest ?
Is there some performances data that I could use to compare the impact of the design ? For example, the time needed to run a chunk of code off the HUB ram vs. external SPI ram vs. EEPROM, etc.
Thanks for your help.
Regards,
Marco.
Comments
However, if you go with SPI or SQI flash, keep in mind that that only gives you more code space. Your data and stack will still have to fit in hub memory reduced by the size of the cache. We usually configure for an 8k cache although smaller cache sizes can be used.
Using XMM-SPLIT or XMM-SINGLE slows things significantly. If you only need to read tables or other data from external memory, they can be forced into the code section the same way we add PASM driver code or by other means.
Still I don't understand the performance impact. Compared to the LMM model, how XMM with a, say, quad-spi ram affects the speed ? Half ? 1/10th ? 1/100th ?
Steve has posted some good comparisons before. Maybe he'll post a link to them. Unfortunately, the performance you actually get will depend a lot on your code as it does on any cached system. There are actually two levels of cache involved with an XMM solution. The code is cached in hub memory and GCC will cache code in COG memory if it's small enough to fit into its fcache buffer. This means that you can get performance that is nearly as good as LMM or even CMM if you have small loops that fit in fcache. On the other hand, you can get horrible performance if your code branches a lot over a wide range of addresses. It's hard to predict the performance of a program without trying it. Sorry, I guess that isn't a very helpful answer.
So basically, with a cache of 8k, as seems it is the default setting, if the code is within that range I could expect performances comparable to LMM, right ? Code can be organized so functions that calls each other frequently are placed near each other. this could work.
How the cache reads other code chunks ? It always reads the whole 8k size or smaller segments as needed ?
XMMC can be about the same speed of LMM with fcached code.
XMMC (single bif SPI Flash) can be up to 10 times slower than LMM according to locality and size.
Dual quad-spi flash (10 pin interface) is at least 30% faster in most cases than regular spi flash.
HUBTEXT tagged XMMC functions are the same speed as LMM of course since they are really LMM.
Optimized LMM is at least 10x faster than SPIN according to Eric.
I imagine, although don't really know, that dual SQI would be faster with faster cache sizes...
Also, for xmm-split mode, is the cache still 8k? Or, do you make it bigger then?
Many thanks to you all for the informations.