We have been thinking wrongly about cog+lut code!
cgracey
Posts: 14,208
I think it is impractical, somewhat, to make contiguous cog..lut programs.
The reason is that cog memory has to contain all reserved (RES) register variables. These are allocated registers with no compile-time data. Using RES allows cog registers to to be declared without them occupying the cog image sitting in hub memory.
In order for cog programs to consistently start at the first possible cog location, RES's must be declared AFTER all compile-time cog code and data. This inhibits the possibility of having contiguous cog+lut programs in the same way as having the special cog registers mapped at $1F8..$1FF did.
Really, lut execution is much closer to hub execution than either are to cog execution, since only cog execution overlaps with register space.
In light of these matters, do you guys think it would be better to move the special registers back up to the top of cog memory, so that cog code begins at $000, like it used to?
I think lut exec needs to treated more as a fast way to execute hub code than a way to extend cog code.
On the other hand, if we leave the special cog registers at $000, so that cog code code starts at $008, we leave open the possibility of contiguous cog+lut programs. However, such programs would have to be specially prepared and loaded, so that anything like RES'd registers would PRECEDE contiguous cog+lut code. For these kinds of programs, we almost need the assembler to top-justify all cog+lut program code against $3FF, and bottom-justify all LONG data and RES registers against $008, since these types must be accessible as D and S registers. Maybe the special cog registers should remain at $000..$007.
I've just been thinking about this a lot.
The reason is that cog memory has to contain all reserved (RES) register variables. These are allocated registers with no compile-time data. Using RES allows cog registers to to be declared without them occupying the cog image sitting in hub memory.
In order for cog programs to consistently start at the first possible cog location, RES's must be declared AFTER all compile-time cog code and data. This inhibits the possibility of having contiguous cog+lut programs in the same way as having the special cog registers mapped at $1F8..$1FF did.
Really, lut execution is much closer to hub execution than either are to cog execution, since only cog execution overlaps with register space.
In light of these matters, do you guys think it would be better to move the special registers back up to the top of cog memory, so that cog code begins at $000, like it used to?
I think lut exec needs to treated more as a fast way to execute hub code than a way to extend cog code.
On the other hand, if we leave the special cog registers at $000, so that cog code code starts at $008, we leave open the possibility of contiguous cog+lut programs. However, such programs would have to be specially prepared and loaded, so that anything like RES'd registers would PRECEDE contiguous cog+lut code. For these kinds of programs, we almost need the assembler to top-justify all cog+lut program code against $3FF, and bottom-justify all LONG data and RES registers against $008, since these types must be accessible as D and S registers. Maybe the special cog registers should remain at $000..$007.
I've just been thinking about this a lot.
Comments
I say go for it!
The loss would be no possibility of having contiguous cog+lut programs.
(so you can use immediate addressing without offset)
Maybe that could be useful for P2 also?
Never mind, I see Chip already pointed this out...
I was mentally testing the value of loading both cog AND lut.
The assembler can collect items as needed very easily,
and I would split RES items into two camps
a) RES registers that must be initialized can be included in the load-image.
b) RES registers that have no initial loading, can go into REG space.
It is common for small MCUs to include a compact (few bytes) ClearRAM routine, so that a default RAM_VAR value of 0000 has no individual code-overhead.
Most just clear all var-space first, then init later in code, but the COG allows a slightly more compact approach.
If you need some VARs to preserve across a reset, then the coverage of that clear can be user controlled.
A plus-side of special cog registers $000..$007, is that the init value of those can be loaded with the code image, saving total code size, but that would need more smarts in the ASM - not hard to do ?
LUT code is most likely going to be called subroutines, and that makes a seamless COG-LUT boundary less vital, but forcing a break, does limit the largest subroutine.
It would need SEGMENT support in the ASM to say code must be LOW_COG or can be placed anywhere.
The other time a contiguous RAM area is useful, is for larger arrays. A split array is a royal pain.
Question is, can COG code index into all of LUT/COG in a seamless manner ? (reach any/all of the 1024L ?)
I don't see "specially prepared and loaded" as much of an issue, as anything is actually "specially prepared and loaded", and PC's are great at the simple librarian tasks, and that is solved off-chip.
What has happened to interrupt vectors ?
They need to be initialized, so need to go in the code image, which suggests a low mapping ?
Interrupt vectors can be loaded automatically, if they are at $00A..$00F. Not so, if they are at the top of memory.
Seems logical, but it's not convenient to do in hardware.
If we move the special registers to the top of cog memory, it would dispel that notion.
Not really, to properly dispel the notion, the DOCs need to have two memory maps, one for PC.Code and another one for Indexing.
Meanwhile, you impose an undesirable break in code-space that is not required by the hardware.
Users also do not expect manual VAR init of interrupt vectors, which adds another surprise...
So, you are, at least, saying that registers should stay at $000?
Yes, I think that has less downsides,
* it allows clean/expected 'auto' Interrupt vector init
* it allows code flow over the COG-LUT boundary that the hardware nicely supports.
* User VAR init is also possible and CODE images can be kept small with non-init VARs not loaded. ASM manages placement.
* I think it also allows Special reg 0-7 direct init too ?
(Q: is that Special reg 0-7 direct init valid/safe ?)
Result is largest possible user code space, with least caveats.
Makes it more like P1 and just seems like the right way.
Running code from LUT seems like a second order priority at best to me.
And, seamless code from COG to LUT even lower priority...
However, P1 totally lacks interrupt vectors, and it is those that dictate placement, more than Special registers.
P2 is never going to clone a P1.
Do we have an option to load COG memory and not start the COG? If we had one, and it were possible to load at a given address, loading vectors would be simple.
Or, we just initalize them like we do things in the COG now, overwriting that when the process is done.
Not all cog code will use interrupts. It may make more sense for them to be loaded easily than it does to break lower COG RAM.
If the INT vectors split-off to the top, they need either a split-code load, or extra code (that's both deadwood & unexpected) that copies values to them.
HLL users just want to code away, and be told as they run out of space.
They are not used to complex code/data overlay gymnastics.
Sure, 0 index is nice, but the P2 has much more powerful opcodes, and lacks COG counters, so a significant re-write is going to be needed on any P1 design.
I'm fine with them being at the end of cog ram and having LUT be isolated.
Losing the int vector automatic loading is minor via the built in coginit is okay. We can still have our own fancier cog loading code version in hubexec and coginit that to start a cog with more complex loading needs.
Jmg, that is a sweet speed use case that we know will get used. And if one is writing intense, highly optimized COG code, having that case available as an option is more important than the more mundane, not performance impacting interrupt vector setup.
Besides, not all COGS will even use interrupts.
Setup on vectors is a totally "code away" solvable problem.
Not worth the trade.
I agree with Roy, and exposing them at the top of COG is the next best and most familiar case.
As for complex, I don't think so. PASM is not a hard assembly language. Users able to understand a COG and how it is different from an ordinary CPU and the HUB won't have any trouble with either the vectors, or how to make a zero table.
There will be easy, no brainer examples of both to work from too.