If it were possible, I would like the special registers to not be in the memory map at all, and just be addressable by another bit in the instruction.
Good idea, other MCUs have SFRs in separate space, so that could be worth exploring.
Could the interrupt vectors even join them, freeing up all COG space for more valuable VARs and CODE ?
If it were possible, I would like the special registers to not be in the memory map at all, and just be addressable by another bit in the instruction.
I'm fine with them being at the end of cog ram and having LUT be isolated.
Losing the int vector automatic loading is minor via the built in coginit is okay. We can still have our own fancier cog loading code version in hubexec and coginit that to start a cog with more complex loading needs.
It would be great to get the special function registers out of the map, but then they would not be treatable as D and S registers, which is too great a compromise. We need to be able to do ALU operations on them. Just where they are mapped is the question.
It would be great to get the special function registers out of the map, but then they would not be treatable as D and S registers, which is too great a compromise. We need to be able to do ALU operations on them. Just where they are mapped is the question.
Which specific ALU operations ?
I've seen ARMs expand the memory map, so they have one location for R/W and two more (mask write only) for atomic bit-wise Set/Clear and IIRC some even expand the BITS to one memory slot each, for no-mask-needed boolean level atomic control.
That gives a way to have Peripherals under fine control, without needing to consume read-modify-write memory space.
Can you make execution from lutram roll over into cogram instead of doing bad things?
Interesting. If the PC advances from $3FF, set it to $000. Special registers would be $1F8. That's simple to do,
What would programs for that look like? I think they would start in the lut and wrap into the cog. Maybe too weird. As a feature, though, it would provide a known outcome for that singularity of a PC advance. Nobody may ever notice if it worked either way. It would be a strange thing to code for.
If it were possible, I would like the special registers to not be in the memory map at all, and just be addressable by another bit in the instruction.
I'm fine with them being at the end of cog ram and having LUT be isolated.
Losing the int vector automatic loading is minor via the built in coginit is okay. We can still have our own fancier cog loading code version in hubexec and coginit that to start a cog with more complex loading needs.
It would be great to get the special function registers out of the map, but then they would not be treatable as D and S registers, which is too great a compromise. We need to be able to do ALU operations on them. Just where they are mapped is the question.
Chip,
What I meant was if we had 2 bits in the instruction to indicate if D or S were special register indices instead of normal register indices, kind of like the I flag. Then all the ops could work on them. I would allow for a pile more special registers. Of course, those bits could instead be used for more registers/code memory overall, so blah.
Anyway, I think having the special registers at the end of COG will be better in the long run. It clearly separates COG and LUT and lets free memory/registers start at 0.
I imagine a "full" cog load/init being accomplished by have a small amount of hubexec code that loads up the COG ram with code/data and any interrupt vectors needed, then loading LUT up with any additional code/data desired, then jumping to 0 in the cog to start it going.
The simple version just being like it was on P1 with COGINIT loading up the cog code/data and running it.
Yep. A smaller COG load can just stuff the vectors first thing and then use that instruction area for tables or variable storage, if required. Given it's a smaller COG load, it probably won't be.
It would be great to get the special function registers out of the map, but then they would not be treatable as D and S registers, which is too great a compromise. We need to be able to do ALU operations on them. Just where they are mapped is the question.
Which specific ALU operations ?
I've seen ARMs expand the memory map, so they have one location for R/W and two more (mask write only) for atomic bit-wise Set/Clear and IIRC some even expand the BITS to one memory slot each, for no-mask-needed boolean level atomic control.
That gives a way to have Peripherals under fine control, without needing to consume read-modify-write memory space.
We can't grow our map, but we could make special instructions to write, OR, ANDN, and XOR the pin control registers. It is nice to have a mapped register to read the pins, though, for quick tests and data extracts. Then, there's the PTR registers which really only need to be set, added to, subtracted from, and read back. That way, we could have a pure cog RAM map. It seems easier to just keep them mapped.
I never liked the special registers and interrupt vectors being moved to $000 in the first place, for the reasons stated by others above.
IMHO it is fine to delineate LUT from COG space such that code cannot just flow between them. I still think that it should be possible to jmp/call/ret #abs or @rel between COG & LUT though.
As for COGINIT, what if there is a variant that...
* Loads the whole cog from hub for $000..$1EF and clears special registers and interrupt vectors $1F0..1FF, and starts at cog $000
Any other load can be performed by a hub routine, maybe even a few loaded from the ROM and called from via COGINIT hubexec.
While we are discussing this, Why do we need SETQ2 ??? Wouldn't SETQ suffice if...
* RD/WR-LONG/WORD/BYTE just uses 11-bits in the D register to address cog/lut ???
Then it would be nice for SETQ count to be 11-bits too!
The win from moving the special registers back up to $1F8 is having programs start at $000, which is simple. The other win is saving an ADD instruction on cog lookups, since they could be $000-based.
The loss would be no possibility of having contiguous cog+lut programs.
I never liked the special registers and interrupt vectors being moved to $000 in the first place, for the reasons stated by others above.
IMHO it is fine to delineate LUT from COG space such that code cannot just flow between them. I still think that it should be possible to jmp/call/ret #abs or @rel between COG & LUT though.
As for COGINIT, what if there is a variant that...
* Loads the whole cog from hub for $000..$1EF and clears special registers and interrupt vectors $1F0..1FF, and starts at cog $000
Any other load can be performed by a hub routine, maybe even a few loaded from the ROM and called from via COGINIT hubexec.
While we are discussing this, Why do we need SETQ2 ??? Wouldn't SETQ suffice if...
* RD/WR-LONG/WORD/BYTE just uses 11-bits in the D register to address cog/lut ???
Then it would be nice for SETQ count to be 11-bits too!
How likely is it that you would have something in hub that would span from cog to lut? Would the special registers be at $1F8?
I never liked the special registers and interrupt vectors being moved to $000 in the first place, for the reasons stated by others above.
IMHO it is fine to delineate LUT from COG space such that code cannot just flow between them. I still think that it should be possible to jmp/call/ret #abs or @rel between COG & LUT though.
As for COGINIT, what if there is a variant that...
* Loads the whole cog from hub for $000..$1EF and clears special registers and interrupt vectors $1F0..1FF, and starts at cog $000
Any other load can be performed by a hub routine, maybe even a few loaded from the ROM and called from via COGINIT hubexec.
While we are discussing this, Why do we need SETQ2 ??? Wouldn't SETQ suffice if...
* RD/WR-LONG/WORD/BYTE just uses 11-bits in the D register to address cog/lut ???
Then it would be nice for SETQ count to be 11-bits too!
How likely is it that you would have something in hub that would span from cog to lut? Would the special registers be at $1F8?
If I understand your question correctly...
I see two possibilities for cog & lut code and tables
1. COG & LUT contain programs. While they can (and hopefully will) compile as a block where we just RES out the special registers. In this case, my cog would contain (where possible) all the fast code subroutines. Lut would contain the main routine and slower code subroutines. So if possible, I would just start my code in LUT at probably $200 after hubexec loaded the required code etc.
2. LUT contains tables, vectors, fonts, and perhaps stack(s), while cog contains code.
Pretty straight forward. But Cog can also contain fast vectors based at cog $000.
While I am unsure of the implications, a larger/alternative ALTDS instruction might be nice...
where D and/or S could be replaced with 11-bit addresses so that D and/or S could address LUT space.
It would really be nice to be able to utilise all those mov/and/xor/add/etc instructions with LUT space. I realise this would most likely need extra clocks for access.
Maybe this is too complex, but certainly worth considering???
We would no longer require a RD/WRLUT instruction???
What would programs for that look like? I think they would start in the lut and wrap into the cog. Maybe too weird. As a feature, though, it would provide a known outcome for that singularity of a PC advance. Nobody may ever notice if it worked either way. It would be a strange thing to code for.
If it wraps into $000, that crashes into the 0 based data everyone wants ?!
Not quite a 'known outcome' there.
Of course, you could flip so LUT is $000-$1ff, and then COG is $200..$3ff-Top.
( The CODE mapping is arbitrary, and separate from the Data indexes. )
Now, someone can forgo 0 based data, to get larger subroutine. C-code generators can be simpler.
The tools could catch code spilling into reserved data, so that could work.
While I am unsure of the implications, a larger/alternative ALTDS instruction might be nice...
where D and/or S could be replaced with 11-bit addresses so that D and/or S could address LUT space.
It would really be nice to be able to utilise all those mov/and/xor/add/etc instructions with LUT space. I realise this would most likely need extra clocks for access.
Maybe this is too complex, but certainly worth considering???
We would no longer require a RD/WRLUT instruction???
Not only that, it removes the somewhat confusing split now in CODE : DATA mapping.
Code and PC flows across the COG:LUT (or LUT:COG) but the data indexing is 0 based in both spaces.
A consequence of being one-bit-short.
Unfortunately, it does not sound a trivial change.
While I am unsure of the implications, a larger/alternative ALTDS instruction might be nice...
where D and/or S could be replaced with 11-bit addresses so that D and/or S could address LUT space.
It would really be nice to be able to utilise all those mov/and/xor/add/etc instructions with LUT space. I realise this would most likely need extra clocks for access.
Maybe this is too complex, but certainly worth considering???
We would no longer require a RD/WRLUT instruction???
Not only that, it removes the somewhat confusing split now in CODE : DATA mapping.
Code and PC flows across the COG:LUT (or LUT:COG) but the data indexing is 0 based in both spaces.
A consequence of being one-bit-short.
Unfortunately, it does not sound a trivial change.
It would not be trivial, at all!
I think it is best if people figure that the lut memory is a separate animal for separate applications, that just happens to be able to serve as a code RAM. It is not integrated into the COG register system and, so, zero based addressing is appropriate for it. Besides, we only have 9 bit constant fields in the instructions. If we had more, the cog RAM would be bigger, itself.
Unfortunately, it does not sound a trivial change.
It would not be trivial, at all!
I think it is best if people figure that the lut memory is a separate animal for separate applications, that just happens to be able to serve as a code RAM. It is not integrated into the COG register system and, so, zero based addressing is appropriate for it. Besides, we only have 9 bit constant fields in the instructions. If we had more, the cog RAM would be bigger, itself.
that 9 bit index is key, as it forces PC(code) and Data to be managed differently, so what if we
we get a little more creative in how the PC maps.... like :
What would programs for that {LUT->COG} look like? I think they would start in the lut and wrap into the cog. Maybe too weird.
A deliberate wrap would be rather weird, but you could easily swap LUT:COG order so that boundary is no longer a wrap.
Now, PC flow makes more sense, plus users get a choice of 0 based COG ram or larger code block, and you do not lose that nice feature of flow-over.
I see two possibilities for cog & lut code and tables
1. COG & LUT contain programs. While they can (and hopefully will) compile as a block where we just RES out the special registers. In this case, my cog would contain (where possible) all the fast code subroutines. Lut would contain the main routine and slower code subroutines. So if possible, I would just start my code in LUT at probably $200 after hubexec loaded the required code etc.
The suggested simple swap of LUT:COG Code mapping means
a) There is no need to "RES out the special registers."
Top mapped, they are now at the true code-top, not in the middle there they cause problems.
b) LUT code starts at $000, and simply grows upward.
Having one unified code body between LUT and COG isn't a big deal. With the COG at higher addresses, don't we still lose the zero table without add case anyway? That's a sweet spot for a variety of things, and it's important we have it.
The LUT isn't intended as a default "holds code" use case. It's for data buffering, and can optionally hold code where needed to improve on a COG. With things arranged this way, we end up with some odd scenario where the LUT is more prominent as a CODE device than a DATA oriented one, and what about registers?
As it is right now, one can put some overflow code into the LUT for those cases where a COG program is just a bit too large, and not appropriate for HUB code. Perfect. The majority of cases will be data oriented ones.
With the COG at higher addresses, don't we still lose the zero table without add case anyway? That's a sweet spot for a variety of things, and it's important we have it.
Nope, the special registers move to the top. COG_0 is free for data or code. User choice.
Having one unified code body between LUT and COG isn't a big deal.
It's not a 'big deal', but the Hardware does support it.
Fragmented memory does make HLL design harder, and also constrains ASM designs.
It is simple : From the CODE_PC side, memory becomes :
LUT_CODE from $000 to $1ff LUT_DATA is $000 to $1ff as before
COG_CODE from $200 to $3ff COG_DATA is $000 to $1ff as before
Special Registers are $3f0 in Code, $1f0 in COG_Data space
INT_VECT are $3f8 in code, $1f8 in COG_Data,
I've also figured a way to load a single block that also cleanly inits INT_VECT, which was the other down-side of moving high.
ASM builds a block image with INT_VECT first, and the HUBEXEX loader starts the DATA pointer at $3f8, ($1f8 in 9 bit terms) and it wraps to COG_DATA $000, and fills COG from base.
Saves code, and INT_VECT init is cleanly managed by the tools, not the user.
Result:
Tight ASM gets COG_0 back, but keeps the option of simple continual code for HLL designs.
It would be great to get the special function registers out of the map, but then they would not be treatable as D and S registers, which is too great a compromise. We need to be able to do ALU operations on them. Just where they are mapped is the question.
Which specific ALU operations ?
I've seen ARMs expand the memory map, so they have one location for R/W and two more (mask write only) for atomic bit-wise Set/Clear and IIRC some even expand the BITS to one memory slot each, for no-mask-needed boolean level atomic control.
That gives a way to have Peripherals under fine control, without needing to consume read-modify-write memory space.
We can't grow our map, but we could make special instructions to write, OR, ANDN, and XOR the pin control registers. It is nice to have a mapped register to read the pins, though, for quick tests and data extracts. Then, there's the PTR registers which really only need to be set, added to, subtracted from, and read back. That way, we could have a pure cog RAM map. It seems easier to just keep them mapped.
If they are mapped, can they be loaded by coginit?
Maybe that is useful...
I don't think they are moving outside of COG space - Chip has already indicated that is too complex, and last indication was they go to the Top again.
That just leaves the question of LUT:COG or COG:LUT PC-domain ordering.
If they are mapped, can they be loaded by coginit?
Maybe that is useful...
I don't think they are moving outside of COG space - Chip has already indicated that is too complex, and last indication was they go to the Top again.
That just leaves the question of LUT:COG or COG:LUT PC-domain ordering.
I think he was asking if the special registers can be set via coginit.
Can you make execution from lutram roll over into cogram instead of doing bad things?
Interesting. If the PC advances from $3FF, set it to $000. Special registers would be $1F8. That's simple to do,
What would programs for that look like? I think they would start in the lut and wrap into the cog. Maybe too weird. As a feature, though, it would provide a known outcome for that singularity of a PC advance. Nobody may ever notice if it worked either way. It would be a strange thing to code for.
It would be useful for JMPREL. Put a JMPREL near the beginning of LUTRAM and you won't have special registers in the middle of your large jump tables.
I think he was asking if the special registers can be set via coginit.
Not how I read it, but there is a good case for load-init of INT_Vectors, and I think that can be done if they sit above SFRs. The SFR order could then be chosen to make the most-likely-init ones, to be nearest INT_Vector, and a single block copy can init and load COG Code.
yes, if special registers can be set via coginit...
Seems that could save some time and instructions...
It would save some time and instructions, and is compelling for INT_VECT, but maybe not all SFRs need init ?
If the block-copy (HUBEXEC?) routine sets a start pointer anywhere in SFR/VECT space, then you can init as much as you want in a single block copy.
The pointer wraps to COG 0 and continues...
The tools track the Init Size and required Pointer value, and collate the block of data.
Comments
Could the interrupt vectors even join them, freeing up all COG space for more valuable VARs and CODE ?
It would be great to get the special function registers out of the map, but then they would not be treatable as D and S registers, which is too great a compromise. We need to be able to do ALU operations on them. Just where they are mapped is the question.
Which specific ALU operations ?
I've seen ARMs expand the memory map, so they have one location for R/W and two more (mask write only) for atomic bit-wise Set/Clear and IIRC some even expand the BITS to one memory slot each, for no-mask-needed boolean level atomic control.
That gives a way to have Peripherals under fine control, without needing to consume read-modify-write memory space.
Interesting. If the PC advances from $3FF, set it to $000. Special registers would be $1F8. That's simple to do,
What would programs for that look like? I think they would start in the lut and wrap into the cog. Maybe too weird. As a feature, though, it would provide a known outcome for that singularity of a PC advance. Nobody may ever notice if it worked either way. It would be a strange thing to code for.
Chip,
What I meant was if we had 2 bits in the instruction to indicate if D or S were special register indices instead of normal register indices, kind of like the I flag. Then all the ops could work on them. I would allow for a pile more special registers. Of course, those bits could instead be used for more registers/code memory overall, so blah.
Anyway, I think having the special registers at the end of COG will be better in the long run. It clearly separates COG and LUT and lets free memory/registers start at 0.
I imagine a "full" cog load/init being accomplished by have a small amount of hubexec code that loads up the COG ram with code/data and any interrupt vectors needed, then loading LUT up with any additional code/data desired, then jumping to 0 in the cog to start it going.
The simple version just being like it was on P1 with COGINIT loading up the cog code/data and running it.
We can't grow our map, but we could make special instructions to write, OR, ANDN, and XOR the pin control registers. It is nice to have a mapped register to read the pins, though, for quick tests and data extracts. Then, there's the PTR registers which really only need to be set, added to, subtracted from, and read back. That way, we could have a pure cog RAM map. It seems easier to just keep them mapped.
IMHO it is fine to delineate LUT from COG space such that code cannot just flow between them. I still think that it should be possible to jmp/call/ret #abs or @rel between COG & LUT though.
As for COGINIT, what if there is a variant that...
* Loads the whole cog from hub for $000..$1EF and clears special registers and interrupt vectors $1F0..1FF, and starts at cog $000
Any other load can be performed by a hub routine, maybe even a few loaded from the ROM and called from via COGINIT hubexec.
While we are discussing this, Why do we need SETQ2 ??? Wouldn't SETQ suffice if...
* RD/WR-LONG/WORD/BYTE just uses 11-bits in the D register to address cog/lut ???
Then it would be nice for SETQ count to be 11-bits too!
How likely is it that you would have something in hub that would span from cog to lut? Would the special registers be at $1F8?
I see two possibilities for cog & lut code and tables
1. COG & LUT contain programs. While they can (and hopefully will) compile as a block where we just RES out the special registers. In this case, my cog would contain (where possible) all the fast code subroutines. Lut would contain the main routine and slower code subroutines. So if possible, I would just start my code in LUT at probably $200 after hubexec loaded the required code etc.
2. LUT contains tables, vectors, fonts, and perhaps stack(s), while cog contains code.
Pretty straight forward. But Cog can also contain fast vectors based at cog $000.
While I am unsure of the implications, a larger/alternative ALTDS instruction might be nice...
where D and/or S could be replaced with 11-bit addresses so that D and/or S could address LUT space.
It would really be nice to be able to utilise all those mov/and/xor/add/etc instructions with LUT space. I realise this would most likely need extra clocks for access.
Maybe this is too complex, but certainly worth considering???
We would no longer require a RD/WRLUT instruction???
Not quite a 'known outcome' there.
Of course, you could flip so LUT is $000-$1ff, and then COG is $200..$3ff-Top.
( The CODE mapping is arbitrary, and separate from the Data indexes. )
Now, someone can forgo 0 based data, to get larger subroutine. C-code generators can be simpler.
The tools could catch code spilling into reserved data, so that could work.
Code and PC flows across the COG:LUT (or LUT:COG) but the data indexing is 0 based in both spaces.
A consequence of being one-bit-short.
Unfortunately, it does not sound a trivial change.
It would not be trivial, at all!
I think it is best if people figure that the lut memory is a separate animal for separate applications, that just happens to be able to serve as a code RAM. It is not integrated into the COG register system and, so, zero based addressing is appropriate for it. Besides, we only have 9 bit constant fields in the instructions. If we had more, the cog RAM would be bigger, itself.
we get a little more creative in how the PC maps.... like :
A deliberate wrap would be rather weird, but you could easily swap LUT:COG order so that boundary is no longer a wrap.
Now, PC flow makes more sense, plus users get a choice of 0 based COG ram or larger code block, and you do not lose that nice feature of flow-over.
The suggested simple swap of LUT:COG Code mapping means
a) There is no need to "RES out the special registers."
Top mapped, they are now at the true code-top, not in the middle there they cause problems.
b) LUT code starts at $000, and simply grows upward.
The LUT isn't intended as a default "holds code" use case. It's for data buffering, and can optionally hold code where needed to improve on a COG. With things arranged this way, we end up with some odd scenario where the LUT is more prominent as a CODE device than a DATA oriented one, and what about registers?
As it is right now, one can put some overflow code into the LUT for those cases where a COG program is just a bit too large, and not appropriate for HUB code. Perfect. The majority of cases will be data oriented ones.
It's not a 'big deal', but the Hardware does support it.
Fragmented memory does make HLL design harder, and also constrains ASM designs.
It is simple : From the CODE_PC side, memory becomes :
I've also figured a way to load a single block that also cleanly inits INT_VECT, which was the other down-side of moving high.
ASM builds a block image with INT_VECT first, and the HUBEXEX loader starts the DATA pointer at $3f8, ($1f8 in 9 bit terms) and it wraps to COG_DATA $000, and fills COG from base.
Saves code, and INT_VECT init is cleanly managed by the tools, not the user.
Result:
Tight ASM gets COG_0 back, but keeps the option of simple continual code for HLL designs.
I absolutely HATE the PIC bank switching, or special instructions.
Maybe that is useful...
That just leaves the question of LUT:COG or COG:LUT PC-domain ordering.
I think he was asking if the special registers can be set via coginit.
It would be useful for JMPREL. Put a JMPREL near the beginning of LUTRAM and you won't have special registers in the middle of your large jump tables.
Seems that could save some time and instructions...
It would save some time and instructions, and is compelling for INT_VECT, but maybe not all SFRs need init ?
If the block-copy (HUBEXEC?) routine sets a start pointer anywhere in SFR/VECT space, then you can init as much as you want in a single block copy.
The pointer wraps to COG 0 and continues...
The tools track the Init Size and required Pointer value, and collate the block of data.