We have been thinking wrongly about cog+lut code!

cgracey · 2015-10-04 20:00

I think it is impractical, somewhat, to make contiguous cog..lut programs.

The reason is that cog memory has to contain all reserved (RES) register variables. These are allocated registers with no compile-time data. Using RES allows cog registers to to be declared without them occupying the cog image sitting in hub memory.

In order for cog programs to consistently start at the first possible cog location, RES's must be declared AFTER all compile-time cog code and data. This inhibits the possibility of having contiguous cog+lut programs in the same way as having the special cog registers mapped at $1F8..$1FF did.

Really, lut execution is much closer to hub execution than either are to cog execution, since only cog execution overlaps with register space.

In light of these matters, do you guys think it would be better to move the special registers back up to the top of cog memory, so that cog code begins at $000, like it used to?

I think lut exec needs to treated more as a fast way to execute hub code than a way to extend cog code.

On the other hand, if we leave the special cog registers at $000, so that cog code code starts at $008, we leave open the possibility of contiguous cog+lut programs. However, such programs would have to be specially prepared and loaded, so that anything like RES'd registers would PRECEDE contiguous cog+lut code. For these kinds of programs, we almost need the assembler to top-justify all cog+lut program code against $3FF, and bottom-justify all LONG data and RES registers against $008, since these types must be accessible as D and S registers. Maybe the special cog registers should remain at $000..$007.

I've just been thinking about this a lot.

Seairth · 2015-10-04 20:21

Now that you mention the issue concerning RES (and LONG/WORD/BYTE, for that matter), I suspect that the odds of having code that would span cog and LUT would be highly unlikely.

I say go for it!

cgracey · 2015-10-04 20:33

The win from moving the special registers back up to $1F8 is having programs start at $000, which is simple. The other win is saving an ADD instruction on cog lookups, since they could be $000-based.

The loss would be no possibility of having contiguous cog+lut programs.

Baggers · 2015-10-04 20:34

I second this, go for it, it makes more sense to have the cog code start at 0. Just my 2c worth.

Baggers · 2015-10-04 20:37

You don't need contiguous cog+lut programs, besides, it doesn't mean you can't put sub routines in lut and just have the main body of the program in cog. or jump over the registers should you need to go to lut

Rayman · 2015-10-04 20:43

for P1 people have made use of tables starting at cog reg#0...
(so you can use immediate addressing without offset)
Maybe that could be useful for P2 also?

Never mind, I see Chip already pointed this out...

cgracey · 2015-10-04 20:44

This whole concern came up because I was thinking about adding cog loading to COGINIT via bit 6 of D. It is a simple creature comfort that makes loading and running cog code much simpler, since always starting in hub has its complexities.

I was mentally testing the value of loading both cog AND lut.

jmg · 2015-10-04 21:07

cgracey wrote: »

On the other hand, if we leave the special cog registers at $000, so that cog code code starts at $008, we leave open the possibility of contiguous cog+lut programs. However, such programs would have to be specially prepared and loaded, so that anything like RES'd registers would PRECEDE contiguous cog+lut code. For these kinds of programs, we almost need the assembler to top-justify all cog+lut program code against $3FF, and bottom-justify all LONG data and RES registers against $008, since these types must be accessible as D and S registers. Maybe the special cog registers should remain at $000..$007.

The assembler can collect items as needed very easily,
and I would split RES items into two camps
a) RES registers that must be initialized can be included in the load-image.
b) RES registers that have no initial loading, can go into REG space.

It is common for small MCUs to include a compact (few bytes) ClearRAM routine, so that a default RAM_VAR value of 0000 has no individual code-overhead.
Most just clear all var-space first, then init later in code, but the COG allows a slightly more compact approach.
If you need some VARs to preserve across a reset, then the coverage of that clear can be user controlled.

A plus-side of special cog registers $000..$007, is that the init value of those can be loaded with the code image, saving total code size, but that would need more smarts in the ASM - not hard to do ?

LUT code is most likely going to be called subroutines, and that makes a seamless COG-LUT boundary less vital, but forcing a break, does limit the largest subroutine.
It would need SEGMENT support in the ASM to say code must be LOW_COG or can be placed anywhere.

The other time a contiguous RAM area is useful, is for larger arrays. A split array is a royal pain.

Question is, can COG code index into all of LUT/COG in a seamless manner ? (reach any/all of the 1024L ?)

I don't see "specially prepared and loaded" as much of an issue, as anything is actually "specially prepared and loaded", and PC's are great at the simple librarian tasks, and that is solved off-chip.

jmg · 2015-10-04 21:15

cgracey wrote: »

The win from moving the special registers back up to $1F8 is having programs start at $000, which is simple. The other win is saving an ADD instruction on cog lookups, since they could be $000-based.

The loss would be no possibility of having contiguous cog+lut programs.

What has happened to interrupt vectors ?
They need to be initialized, so need to go in the code image, which suggests a low mapping ?

cgracey · 2015-10-04 21:22

Jmg, cog code cannot index seamlessly from cog to lut. They are different animals.

cgracey · 2015-10-04 21:25

jmg wrote: »

cgracey wrote: »

The win from moving the special registers back up to $1F8 is having programs start at $000, which is simple. The other win is saving an ADD instruction on cog lookups, since they could be $000-based.

The loss would be no possibility of having contiguous cog+lut programs.

What has happened to interrupt vectors ?
They need to be initialized, so need to go in the code image, which suggests a low mapping ?

Interrupt vectors can be loaded automatically, if they are at $00A..$00F. Not so, if they are at the top of memory.

jmg · 2015-10-04 21:36

Interrupt vectors can be loaded automatically, if they are at $00A..$00F. Not so, if they are at the top of memory.

-so that points to keeping the low-mapping you have already ?

jmg · 2015-10-04 21:38

cgracey wrote: »

Jmg, cog code cannot index seamlessly from cog to lut. They are different animals.

Given the published memory map, and that PC can flow from COG to LUT, users will expect indexing to be able to do the same ?

cgracey · 2015-10-04 21:40

Well, it's kind of funny. They can be loaded automatically, but there are longs between them that will hold interrupt-return addresses at run time. So, those empty longs could be spent on 'MOV intjump,introutine' instructions, making it a wash.

cgracey · 2015-10-04 21:42

jmg wrote: »

cgracey wrote: »

Jmg, cog code cannot index seamlessly from cog to lut. They are different animals.

Given the published memory map, and that PC can flow from COG to LUT, users will expect indexing to be able to do the same ?

Seems logical, but it's not convenient to do in hardware.

If we move the special registers to the top of cog memory, it would dispel that notion.

jmg · 2015-10-04 21:58

cgracey wrote: »

Seems logical, but it's not convenient to do in hardware.

If we move the special registers to the top of cog memory, it would dispel that notion.

Not really, to properly dispel the notion, the DOCs need to have two memory maps, one for PC.Code and another one for Indexing.

Meanwhile, you impose an undesirable break in code-space that is not required by the hardware.

jmg · 2015-10-04 22:02

cgracey wrote: »

Well, it's kind of funny. They can be loaded automatically, but there are longs between them that will hold interrupt-return addresses at run time. So, those empty longs could be spent on 'MOV intjump,introutine' instructions, making it a wash.

? but you still need the reserved INT vector space, so the total COG footprint is larger in the not-low case ? ( ==less user code space)

Users also do not expect manual VAR init of interrupt vectors, which adds another surprise...

cgracey · 2015-10-04 22:05

jmg wrote: »

cgracey wrote: »

Seems logical, but it's not convenient to do in hardware.

If we move the special registers to the top of cog memory, it would dispel that notion.

Not really, to properly dispel the notion, the DOCs need to have two memory maps, one for PC.Code and another one for Indexing.

Meanwhile, you impose an undesirable break in code-space that is not required by the hardware.

So, you are, at least, saying that registers should stay at $000?

jmg · 2015-10-04 22:16

cgracey wrote: »

So, you are, at least, saying that registers should stay at $000?

Yes, I think that has less downsides,
* it allows clean/expected 'auto' Interrupt vector init
* it allows code flow over the COG-LUT boundary that the hardware nicely supports.
* User VAR init is also possible and CODE images can be kept small with non-init VARs not loaded. ASM manages placement.
* I think it also allows Special reg 0-7 direct init too ?
(Q: is that Special reg 0-7 direct init valid/safe ?)

Result is largest possible user code space, with least caveats.

Rayman · 2015-10-04 22:20

I really think they should be moved to the end.
Makes it more like P1 and just seems like the right way.

Running code from LUT seems like a second order priority at best to me.
And, seamless code from COG to LUT even lower priority...

jmg · 2015-10-04 22:27

Rayman wrote: »

I really think they should be moved to the end.
Makes it more like P1 and just seems like the right way.

However, P1 totally lacks interrupt vectors, and it is those that dictate placement, more than Special registers.
P2 is never going to clone a P1.

Rayman · 2015-10-04 22:53

Why can't we move interrupt vectors to the end too?

potatohead · 2015-10-04 22:56

I would prefer COG programs start at 0 myself, just for the cog table case being possible without an add instruction. We used that a lot in P1. It's still relevant in P2.

Do we have an option to load COG memory and not start the COG? If we had one, and it were possible to load at a given address, loading vectors would be simple.

Or, we just initalize them like we do things in the COG now, overwriting that when the process is done.

Not all cog code will use interrupts. It may make more sense for them to be loaded easily than it does to break lower COG RAM.

jmg · 2015-10-04 23:04

potatohead wrote: »

I would prefer COG programs start at 0 myself, just for the cog table case being possible without an add instruction.

There is a new JMPREL opcode, that handles case tables anywhere

potatohead · 2015-10-04 23:05

Not the same. If you want the tightest loop indexing tabular data for canned values, etc... low COG RAM is the best case and I would keep that case and just in it vectors easily. Those aren't hard.

jmg · 2015-10-04 23:09

Rayman wrote: »

Why can't we move interrupt vectors to the end too?

HW wise they can, but Chip was wanting a single-code load, of smallest size.
If the INT vectors split-off to the top, they need either a split-code load, or extra code (that's both deadwood & unexpected) that copies values to them.

potatohead · 2015-10-04 23:10

Yes, either is fine. Both can be accomplished in COG with no waste. In it first, then reclaim for storage and computation.

jmg · 2015-10-04 23:54

potatohead wrote: »

...In it first, then reclaim for storage and computation.

Perhaps, but now that's getting quite complex in setup and user understanding, and not really a C-Compiler friendly flow at all.
HLL users just want to code away, and be told as they run out of space.
They are not used to complex code/data overlay gymnastics.

Sure, 0 index is nice, but the P2 has much more powerful opcodes, and lacks COG counters, so a significant re-write is going to be needed on any P1 design.

Roy Eltham · 2015-10-05 00:12

If it were possible, I would like the special registers to not be in the memory map at all, and just be addressable by another bit in the instruction.

I'm fine with them being at the end of cog ram and having LUT be isolated.

Losing the int vector automatic loading is minor via the built in coginit is okay. We can still have our own fancier cog loading code version in hubexec and coginit that to start a cog with more complex loading needs.

Electrodude · 2015-10-05 00:34

Can you make execution from lutram roll over into cogram instead of doing bad things?

potatohead · 2015-10-05 00:40

Jmg, that is a sweet speed use case that we know will get used. And if one is writing intense, highly optimized COG code, having that case available as an option is more important than the more mundane, not performance impacting interrupt vector setup.

Besides, not all COGS will even use interrupts.

Setup on vectors is a totally "code away" solvable problem.

Not worth the trade.

I agree with Roy, and exposing them at the top of COG is the next best and most familiar case.

As for complex, I don't think so. PASM is not a hard assembly language. Users able to understand a COG and how it is different from an ordinary CPU and the HUB won't have any trouble with either the vectors, or how to make a zero table.

There will be easy, no brainer examples of both to work from too.

We have been thinking wrongly about cog+lut code!

Comments