Now I think I'm confused about LARGE and SMALL stored in flash on the C3.'
As I understand it, LARGE programs put their code and cnst segments in flash and init and data in SPI SRAM.
SMALL seems to put its code in flash but cnst, init, and data in hub memory.
Why doesn't SMALL also put cnst in flash? It seems like that would save some of the precious hub memory and also speed up program launching because cnst wouldn't have to be copied from flash to hub memory.
SMALL uses a different data addressing scheme for code and data - basically, all data addresses have to be in the hub so that they are accessible via RDLONG and WRLONG (or WRBYTE and WRLONG etc) instructions. That's what makes SMALL programs faster than LARGE programs.
SMALL uses a different data addressing scheme for code and data - basically, all data addresses have to be in the hub so that they are accessible via RDLONG and WRLONG (or WRBYTE and WRLONG etc) instructions. That's what makes SMALL programs faster than LARGE programs.
Ross.
What would it take to make a MEDIUM module where code and cnst are in flash and init and data are in hub memory?
I think I'm getting confused again. What exactly is the "prolog"? Does it include the 0x18 bytes starting at .binary file offset 0x8000? I had thought that the prolog was just the part that starts with two longs of zero and is followed by the jmp table.
Technically, I suppose you are correct - but when compiling XMM programs it always ends up at offset $18 in the file, so it is easier just to include those first 18 bytes as well (e.g. when transferring a program serially, this makes the "prologue" just the sector of the file that starts at $8000).
Here is yet another thing I don't understand. When I look at xbasic.lst I see this segment table:
0168(0054): ' segtable
0168(0054): f0 81 01 00 ' long @Catalina_Code
016c(0055): 74 01 00 00 ' long @Catalina_Cnst
0170(0056): 58 16 00 00 ' long @Catalina_Init
0174(0057): 70 19 00 00 ' long @Catalina_Data
0178(0058): 84 44 03 00 ' long @Catalina_Ends
017c(0059): b0 5b 00 00 ' long @Catalina_RO_Base
0180(005a): 74 01 00 00 ' long @Catalina_RW_Base
But later on in the file if I look for "Catalina_Code" I find this:
18200(607a): ' long ' align long
18200(607a): ' Catalina_Code
18200(607a): ' long ' align long
18200(607a): ' C__exit
18200(607a): 80 66 fc a0' mov r0,#$80
18204(607b): 00 66 7c 0c' clkset r0
This suggests that Catalina_Code is at address 0x18200 but the entry in segtable says it's at 0x181f0. Why the difference?
The spin compiler always adds an offset of $10 to the binary output that I have to take into account. This crops up all over the place - it drives me crazy too!.
The spin compiler always adds an offset of $10 to the binary output that I have to take into account. This crops up all over the place - it drives me crazy too!.
Ross.
So I guess that means that I have to add 0x10 to the values in the segtable, right?
Seriously though, it would be a useful mode on boards that have flash. In the case of xbasic it would save over 5k of hub memory.
I tried it, but it means instead of using a simple RDLONG and WRLONG, every data accesss has to be explicitly checked to see if the address is in hub or in Flash. Performing this check eliminates much of the benefit this mode might have over the current LARGE mode.
I agree it might help on a platform like the C3 where XMM access is so much slower than Hub Access - but the cost (in my time) is simply not justifiable.
If you want to have a go at it yourself, you'll need a new code generator, a new kernel, a new set of libraries, a new set of target files, a new SD loader, a new serial loader, a new EEPROM loader, a new debugger mode ... and probably a few other things I've forgotten. But be my guest
I tried it, but it means instead of using a simple RDLONG and WRLONG, every data accesss has to be explicitly checked to see if the address is in hub or in Flash. Performing this check eliminates much of the benefit this mode might have over the current LARGE mode.
I agree it might help on a platform like the C3 where XMM access is so much slower than Hub Access - but the cost (in my time) is simply not justifiable.
If you want to have a go at it yourself, you'll need a new code generator, a new kernel, a new set of libraries, a new set of target files, a new SD loader, a new serial loader, a new EEPROM loader, a new debugger mode ... and probably a few other things I've forgotten. But be my guest
Ross.
It probably won't be that useful on the C3 because it has SPI SRAM. It will be useful on jazzed's new SpinSocket-Flash board though since it has flash but no SRAM.
It probably won't be that useful on the C3 because it has SPI SRAM. It will be useful on jazzed's new SpinSocket-Flash board though since it has flash but no SRAM.
Is there supposed to be padding out to a 0x200 boundary after each of the data segments or just between what goes in flash and what has to be copied to SRAM or hub memory? I think I'm finally writing everything into the correct place in flash as I understand it but my flash program still won't load.
Is there supposed to be padding out to a 0x200 boundary after each of the data segments or just between what goes in flash and what has to be copied to SRAM or hub memory? I think I'm finally writing everything into the correct place in flash as I understand it but my flash program still won't load.
Thanks,
David
No - the only padding is between the read-only and read/write segments.
The way I debug changes to the loader is to first load the program with a known working loader, then use the ram test program to dump the whole of Flash to a terminal emulator window on the PC. Then I save the contents of the window to a text file. Then I load the program again with the non-working loader and do the same again. Then I simply compare the two text files (e.g. with vim).
No - the only padding is between the read-only and read/write segments.
The way I debug changes to the loader is to first load the program with a known working loader, then use the ram test program to dump the whole of Flash to a terminal emulator window on the PC. Then I save the contents of the window to a text file. Then I load the program again with the non-working loader and do the same again. Then I simply compare the two text files (e.g. with vim).
Ross.
Do you have a way of dumping the entire flash without having to press 'Y' a million times? Also, I tried just holding the "Y" key but the RAM test eventually hung and wouldn't respond to characters anymore.
Do you have a way of dumping the entire flash without having to press 'Y' a million times? Also, I tried just holding the "Y" key but the RAM test eventually hung and wouldn't respond to characters anymore.
Yes, that happens to me to when I use the Parallax Serial Terminal (PST). Is that what you're using? I never bothered to figure it out, but it seems to be a windows or PST issue - restarting PST seems to fix it.
I would suggest just loading a small executable - a couple of kb is usually enough to spot the load error without wearing out your Y key.
Or you could modify the source (in target\Catalina_XMM_RamTest.spin) to just continue dumping FLASH without needing the key press (or perhaps to continue until it sees a key press).
I've reread the documentation and it is a little clearer now.
Maybe the best thing here is for me to write some code and you can tell me what I am doing wrong?
The task - pause catalina running in XMM mode.
At the moment, I just want this program *not* to get to the end.
#include <stdio.h>
#include "utilities.h" // copy this from the custom_demo folder to the include folder
#include "generic_plugin.h" // copy this from the custom_demo folder to the include folder
#include "plugin_array.h" // copy this from custom_demo folder to the include folder. This is the plugin data
#define OUTPUTS 0x00000001
#define DUM 8 // the generic plugin has this type
void main ()
{
long plugin_type = 8;
long code = 22;
long param = 0;
int return_value;
unsigned reg;
int cog;
printf("Test program to pause catalina\n");
// get the address of the registry (so we can pass it to the plugin)
reg = _registry();
printf("Registry is at %x \n",reg);
// see if the generic plugin is already loaded
cog = _locate_plugin(DUM);
printf("Cog number = %i \n",cog);
return_value = _sys_plugin (plugin_type, (code<<24) + (param & 0x00FFFFFF));
printf("Return value = %i \n",return_value);
printf("End program\n");
while (1); // Prop reboots on exit from main()
}
The values I get are that the registry is at 0x7edc, cog number is -1, return_value is -1 and it *does* get to the end.
To get catalina to pause, this is what I think I need to do: I think I need to call a plugin. But I don't want to actually put any plugin into a cog.
(later I'll tackle the cog code, but for now the task is to pause catalina).
So I don't think I need to put any code into a cog. But I do think I need to register a plugin of some sort.
If this logic is correct, maybe I am only one line away from getting this working? Maybe all I need is a line of code that registers a plugin?
Hmm - I think I might be stuck. This is from the complex_test program
cog = _coginit((int)reg>>2, (int)catalina_plugin_array>>2, ANY_COG);
if (cog > 0) {
// now we can register the plugin (so various functions can find it)
// NOTE that we don't need to do this if the plugin registers itself:
// _register_plugin(cog, LMM_DUM);
But I think this is loading up a cog before registering it. If I run this next line, I am going to need to pass a real cog number?
I've reread the documentation and it is a little clearer now.
Maybe the best thing here is for me to write some code and you can tell me what I am doing wrong?
The task - pause catalina running in XMM mode.
At the moment, I just want this program *not* to get to the end.
You want advice from me on how to make your program *not*work? What an unusual request!
Actually, you're nearly there - you just need to call _register_plugin to register a plugin of type DUM on a free cog before you call _sys_plugin. This will make Catalina think the plugin exists, and it will pause - forever - waiting for it to respond.
Note that I said earlier that _sys_plugin will return 1 if the plugin is not registered, but I should have said -1 (which is what you are seeing). Same for _locate_plugin.
However, it surprises me that your registry is at 0x7edc - which version of Catalina are you using?
Yes, *cough*, I want you to stop my program working!
I'm using an older version of catalina - I'll need to look at upgrading.
When I get home from work I'll try the register. I did try _register_plugin(9,8); but you probably have a check on that being an invalid cog number. I'll try _register_plugin(7,8); as that cog is free.
As an aside, I added -D PLUGIN to my default batch file and all my existing programs run fine. Is there any 'cost' associated with adding this? ie extra code added or something?
Yes, *cough*, I want you to stop my program working!
I'm using an older version of catalina - I'll need to look at upgrading.
When I get home from work I'll try the register. I did try _register_plugin(9,8); but you probably have a check on that being an invalid cog number. I'll try _register_plugin(7,8); as that cog is free.
Actually, no I don't - doing that will overwrite a high memory location somewhere off the end of the registry. I suppose I'd better add a check!
As an aside, I added -D PLUGIN to my default batch file and all my existing programs run fine. Is there any 'cost' associated with adding this? ie extra code added or something?
All the -D PLUGIN does is define the symbol. There is no cost to this - and no consequences - unless you are compiling against a target that understands that symbol. The lmm_default target in the custom target directory is one such - if it sees that symbol defined it loads the generic plugin and starts it. But if you never call any of the services provided by that plugin, it will just sit there doing nothing. But yes, it will cost some code space in an LMM program (but not in an XMM program).
I dumped both my bad load and the one generated by payload and there are differences. It seems like payload is putting another copy of the prolog right after the padding that follows the stuff the code. Also, it looks like the segments may not be written in the order that I expect. I am writing cnst, init, data for layout 4 but it seems like payload may be writing cnst after one or both of the others (maybe last). Would it be possible for you to verify the actual order you're writing these segments in and also whether there is supposed to be a second copy of the prolog following the padding?
By the way, I did these dumps by using an old test program I wrote for my cache driver. I modified it so it can dump as many flash blocks as needed all in one command.
Edit: I just looked more closely at the data that follows the padding and second prolog in the dump I generated after using payload to load the xbasic.binary file and I can't find that data anywhere in the xbasic.binary file. Does the payload loader put data into fhe flash that did not come from the binary file it is loading? Maybe data from the Spin part of the loader itself?
But yes, it will cost some code space in an LMM program (but not in an XMM program).
Ok, in that case what I think I will do in the IDE is leave this out for LMM but include it for the default for XMM. I think that is logical because the sort of programs that are loading and unloading multiple plugins are likely to be big programs.
I am testing this little code fragment
#include <stdio.h>
#include "utilities.h" // copy this from the custom_demo folder to the include folder
#include "generic_plugin.h" // copy this from the custom_demo folder to the include folder
#include "plugin_array.h" // copy this from custom_demo folder to the include folder. This is the plugin data
// compile with catalina -lcx -D PLUGIN -lm -x5 -M 256k -d DRACBLADE -D HIRES_VGA myprog.c
void main ()
{
long plugin_type = 8;
long code = 22;
long param = 0;
int return_value;
unsigned reg;
int cog;
printf("Test program to pause catalina\n");
// get the address of the registry (so we can pass it to the plugin)
reg = _registry();
printf("Registry is at %x \n",reg);
// see if the generic plugin is already loaded - 8=LMM_DMM
cog = _locate_plugin(8);
printf("Cog number = %i \n",cog);
_register_plugin(7, 8); // try registering a fake cog number eg 7 is free, and with 8=DUM
cog = _locate_plugin(8);
printf("Cog number = %i \n",cog);
return_value = _sys_plugin (plugin_type, (code<<24) + (param & 0x00FFFFFF));
printf("Return value = %i \n",return_value);
printf("End program\n");
while (1); // Prop reboots on exit from main()
}
Near the beginning is the command line.
It tells me the registry is at 7edc, it returns the cog number as -1, after the plugin has been registered it returns the cog number as 7 (yay!), but then the "return_value" for _sys_plugin is still -1. So something is not quite right with that.
I'm a bit stuck here and your advice would be most appreciated!
Ok, in that case what I think I will do in the IDE is leave this out for LMM but include it for the default for XMM. I think that is logical because the sort of programs that are loading and unloading multiple plugins are likely to be big programs.
...
Near the beginning is the command line.
It tells me the registry is at 7edc, it returns the cog number as -1, after the plugin has been registered it returns the cog number as 7 (yay!), but then the "return_value" for _sys_plugin is still -1. So something is not quite right with that.
I'm a bit stuck here and your advice would be most appreciated!
Hi Dr_A,
Odd - it works for me (or rather doesn't work for me) - i.e. it hangs after printing "Cog number = 7", which is what I would expect. This means it is never returning from the _sys_plugin call.
Note I had to change -d to -D from your command line. Can you paste the output of your catalina command here so I can see what version of Catalina you have, and also what plugins are being loaded?
I dumped both my bad load and the one generated by payload and there are differences. It seems like payload is putting another copy of the prolog right after the padding that follows the stuff the code. Also, it looks like the segments may not be written in the order that I expect. I am writing cnst, init, data for layout 4 but it seems like payload may be writing cnst after one or both of the others (maybe last). Would it be possible for you to verify the actual order you're writing these segments in and also whether there is supposed to be a second copy of the prolog following the padding?
By the way, I did these dumps by using an old test program I wrote for my cache driver. I modified it so it can dump as many flash blocks as needed all in one command.
Edit: I just looked more closely at the data that follows the padding and second prolog in the dump I generated after using payload to load the xbasic.binary file and I can't find that data anywhere in the xbasic.binary file. Does the payload loader put data into fhe flash that did not come from the binary file it is loading? Maybe data from the Spin part of the loader itself?
Hi Ross, good to hear it works on your setup. It could be because I have not upgraded catalina for a while - I'll download the latest version.
Hi Dr_A,
Your program should work (or rather not work - how confusing ) on any recent version of Catalina - but upgrading is a good idea anyway - it will make it easier for me to help.
Just watch out for the change in registry location - aren't you hardcoding it in some of your code?
Ummm... I just tried a tiny example instead of the rather large xbasic program and my loader seems to work on that program. It just doesn't work on xbasic.binary. Also, my dump program seems to show the data byte swapped. I'm not sure why that happens either. This is using my C3 cache driver to do the dumping.
Ummm... I just tried a tiny example instead of the rather large xbasic program and my loader seems to work on that program. It just doesn't work on xbasic.binary. Also, my dump program seems to show the data byte swapped. I'm not sure why that happens either. This is using my C3 cache driver to do the dumping.
Hi David,
Any progress is good news! In the small working program, try adding stuff to each segment (e.g. add a big constant array, and/or a couple of functions) to make sure each segment needs multiple sectors/pages/writes etc - that may be where the issue is.
As to using the byte swapping - I seem to remember that your caching driver stores and loads things in big-endian format. So if you load the Flash using Catalina's caching driver and then dump it with your caching driver, you might see this.
Any progress is good news! In the small working program, try adding stuff to each segment (e.g. add a big constant array, and/or a couple of functions) to make sure each segment needs multiple sectors/pages/writes etc - that may be where the issue is.
As to using the byte swapping - I seem to remember that your caching driver stores and loads things in big-endian format. So if you load the Flash using Catalina's caching driver and then dump it with your caching driver, you might see this.
Ross.
I see what you mean about the endian problems in my cache driver. It seems I write flash in little-endian order but read it in big-endian order. I wonder why ZOG doesn't have a problem with that? Anyway, I'm going to fix my driver to do everything in little-endian order. I guess you did that by doing byte-at-a-time transfers instead of long-at-a-time like I was doing (copied from the VMCOG code).
I see what you mean about the endian problems in my cache driver. It seems I write flash in little-endian order but read it in big-endian order. I wonder why ZOG doesn't have a problem with that? Anyway, I'm going to fix my driver to do everything in little-endian order. I guess you did that by doing byte-at-a-time transfers instead of long-at-a-time like I was doing (copied from the VMCOG code).
Well, I fixed the endian problem in my C3 cache driver and now my dump program seems to work. Unfortunately, as might be expected, ZOG no longer works with it. Looks like I have some work to do! :-)
Comments
SMALL uses a different data addressing scheme for code and data - basically, all data addresses have to be in the hub so that they are accessible via RDLONG and WRLONG (or WRBYTE and WRLONG etc) instructions. That's what makes SMALL programs faster than LARGE programs.
Ross.
What would it take to make a MEDIUM module where code and cnst are in flash and init and data are in hub memory?
Technically, I suppose you are correct - but when compiling XMM programs it always ends up at offset $18 in the file, so it is easier just to include those first 18 bytes as well (e.g. when transferring a program serially, this makes the "prologue" just the sector of the file that starts at $8000).
Ross.
The spin compiler always adds an offset of $10 to the binary output that I have to take into account. This crops up all over the place - it drives me crazy too!.
Ross.
My sanity!
So I guess that means that I have to add 0x10 to the values in the segtable, right?
Yes, unless you are just using them to calculate segment sizes.
Ross.
I think I've already lost that myself! :-)
Seriously though, it would be a useful mode on boards that have flash. In the case of xbasic it would save over 5k of hub memory.
I tried it, but it means instead of using a simple RDLONG and WRLONG, every data accesss has to be explicitly checked to see if the address is in hub or in Flash. Performing this check eliminates much of the benefit this mode might have over the current LARGE mode.
I agree it might help on a platform like the C3 where XMM access is so much slower than Hub Access - but the cost (in my time) is simply not justifiable.
If you want to have a go at it yourself, you'll need a new code generator, a new kernel, a new set of libraries, a new set of target files, a new SD loader, a new serial loader, a new EEPROM loader, a new debugger mode ... and probably a few other things I've forgotten. But be my guest
Ross.
It probably won't be that useful on the C3 because it has SPI SRAM. It will be useful on jazzed's new SpinSocket-Flash board though since it has flash but no SRAM.
Is there supposed to be padding out to a 0x200 boundary after each of the data segments or just between what goes in flash and what has to be copied to SRAM or hub memory? I think I'm finally writing everything into the correct place in flash as I understand it but my flash program still won't load.
Thanks,
David
No - the only padding is between the read-only and read/write segments.
The way I debug changes to the loader is to first load the program with a known working loader, then use the ram test program to dump the whole of Flash to a terminal emulator window on the PC. Then I save the contents of the window to a text file. Then I load the program again with the non-working loader and do the same again. Then I simply compare the two text files (e.g. with vim).
Ross.
Do you have a way of dumping the entire flash without having to press 'Y' a million times? Also, I tried just holding the "Y" key but the RAM test eventually hung and wouldn't respond to characters anymore.
Yes, that happens to me to when I use the Parallax Serial Terminal (PST). Is that what you're using? I never bothered to figure it out, but it seems to be a windows or PST issue - restarting PST seems to fix it.
I would suggest just loading a small executable - a couple of kb is usually enough to spot the load error without wearing out your Y key.
Or you could modify the source (in target\Catalina_XMM_RamTest.spin) to just continue dumping FLASH without needing the key press (or perhaps to continue until it sees a key press).
Ross.
I've reread the documentation and it is a little clearer now.
Maybe the best thing here is for me to write some code and you can tell me what I am doing wrong?
The task - pause catalina running in XMM mode.
At the moment, I just want this program *not* to get to the end.
The values I get are that the registry is at 0x7edc, cog number is -1, return_value is -1 and it *does* get to the end.
To get catalina to pause, this is what I think I need to do: I think I need to call a plugin. But I don't want to actually put any plugin into a cog.
(later I'll tackle the cog code, but for now the task is to pause catalina).
So I don't think I need to put any code into a cog. But I do think I need to register a plugin of some sort.
If this logic is correct, maybe I am only one line away from getting this working? Maybe all I need is a line of code that registers a plugin?
Hmm - I think I might be stuck. This is from the complex_test program
But I think this is loading up a cog before registering it. If I run this next line, I am going to need to pass a real cog number?
You want advice from me on how to make your program *not* work? What an unusual request!
Actually, you're nearly there - you just need to call _register_plugin to register a plugin of type DUM on a free cog before you call _sys_plugin. This will make Catalina think the plugin exists, and it will pause - forever - waiting for it to respond.
Note that I said earlier that _sys_plugin will return 1 if the plugin is not registered, but I should have said -1 (which is what you are seeing). Same for _locate_plugin.
However, it surprises me that your registry is at 0x7edc - which version of Catalina are you using?
Ross.
I'm using an older version of catalina - I'll need to look at upgrading.
When I get home from work I'll try the register. I did try _register_plugin(9,8); but you probably have a check on that being an invalid cog number. I'll try _register_plugin(7,8); as that cog is free.
As an aside, I added -D PLUGIN to my default batch file and all my existing programs run fine. Is there any 'cost' associated with adding this? ie extra code added or something?
Ross.
I dumped both my bad load and the one generated by payload and there are differences. It seems like payload is putting another copy of the prolog right after the padding that follows the stuff the code. Also, it looks like the segments may not be written in the order that I expect. I am writing cnst, init, data for layout 4 but it seems like payload may be writing cnst after one or both of the others (maybe last). Would it be possible for you to verify the actual order you're writing these segments in and also whether there is supposed to be a second copy of the prolog following the padding?
By the way, I did these dumps by using an old test program I wrote for my cache driver. I modified it so it can dump as many flash blocks as needed all in one command.
Edit: I just looked more closely at the data that follows the padding and second prolog in the dump I generated after using payload to load the xbasic.binary file and I can't find that data anywhere in the xbasic.binary file. Does the payload loader put data into fhe flash that did not come from the binary file it is loading? Maybe data from the Spin part of the loader itself?
Thanks,
David
Ok, in that case what I think I will do in the IDE is leave this out for LMM but include it for the default for XMM. I think that is logical because the sort of programs that are loading and unloading multiple plugins are likely to be big programs.
I am testing this little code fragment
Near the beginning is the command line.
It tells me the registry is at 7edc, it returns the cog number as -1, after the plugin has been registered it returns the cog number as 7 (yay!), but then the "return_value" for _sys_plugin is still -1. So something is not quite right with that.
I'm a bit stuck here and your advice would be most appreciated!
Hi Dr_A,
Odd - it works for me (or rather doesn't work for me) - i.e. it hangs after printing "Cog number = 7", which is what I would expect. This means it is never returning from the _sys_plugin call.
The command I used to compile it was:
Note I had to change -d to -D from your command line. Can you paste the output of your catalina command here so I can see what version of Catalina you have, and also what plugins are being loaded?
Ross.
Hi David,
I'll check this out when I get home tonight.
Ross.
Thanks Ross!
Hi Dr_A,
Your program should work (or rather not work - how confusing ) on any recent version of Catalina - but upgrading is a good idea anyway - it will make it easier for me to help.
Just watch out for the change in registry location - aren't you hardcoding it in some of your code?
Ross.
Hi David,
Any progress is good news! In the small working program, try adding stuff to each segment (e.g. add a big constant array, and/or a couple of functions) to make sure each segment needs multiple sectors/pages/writes etc - that may be where the issue is.
As to using the byte swapping - I seem to remember that your caching driver stores and loads things in big-endian format. So if you load the Flash using Catalina's caching driver and then dump it with your caching driver, you might see this.
Ross.
I see what you mean about the endian problems in my cache driver. It seems I write flash in little-endian order but read it in big-endian order. I wonder why ZOG doesn't have a problem with that? Anyway, I'm going to fix my driver to do everything in little-endian order. I guess you did that by doing byte-at-a-time transfers instead of long-at-a-time like I was doing (copied from the VMCOG code).
Well, I fixed the endian problem in my C3 cache driver and now my dump program seems to work. Unfortunately, as might be expected, ZOG no longer works with it. Looks like I have some work to do! :-)