For one, the LUT is not a primary code area. It's an option. Programs there have the same general limits that HUB programs do, and one of those is using COG registers for storage. A LUT program will be somewhat confusing for this reason alone.
add a, b happens with "a" and "b" as COG register targets, not LUT "register" targets. There are no LUT registers, and the memory isn't the same either.
Putting some instructions as COG overflow into the LUT makes a ton of sense! Considering it all one larger COG space does not make any sense. It's just not one larger COG space anymore than HUB code is a larger COG space.
If there were no LUT execute, then having the COG at 0 would make great sense, as would moving the registers and interrupt vectors to the top of COG RAM where they have been the whole time. Given the LUT execute is entirely optional, and that it actually does work differently from COG code, the benefit of having COG at 0 is allowing for the COG data without having to add use case, which has been used a lot, and it may actually still get used that way. Maybe somebody wants to quickly combine LUT data and COG data, for example...
If LUT is $000..$1FF and COG is $200..$3FF then
1. The programming model will be confusing !!!
- cog registers will still be $000..$1FF because there are only 9 bits.
- so registers are going to look like they are in lut space but they are not.
2. Where does LUT go if it gets expanded in a later P2. Chip already said there may be room to expand it to 4KB.
Saw that example, and I don't want mixed up COG execute and data addresses, particularly when the LUT code won't actually work the same as the COG code.
That's actually an everyday mess, where the "I want to run code in the LUT" case isn't an everyday use case. It's not worth it to have different addressing schemes.
To be frank, I would rather not have LUT execute at all, if it means yet another contrived and goofy address mapping. The point of the very simple, and easy COG addressing is to be easy!
Besides, it's just a load and jump. No brainer. Easy as it is right now.
particularly when the LUT code won't actually work the same as the COG code.
Why not ? Chip says COG code can run into LUT and vice versa.
Plenty have said they will prefer LUT code for workhorse tasks and use COG for where CODE-in-Data makes sense.
Other MCUs keep code out of Data space, keeping Data for more valuable things.
HUBEXEC will be the main HLL code space, followed by LUT, then COG when LUT is used for LUT.
Only a small percentage of COGS will use LUT as LUT.
1. The programming model will be confusing !!!
- cog registers will still be $000..$1FF because there are only 9 bits.
- so registers are going to look like they are in lut space but they are not.
Nope, registers are clearly in COG 9b data space.
You can just as easily claim LUT is confusing, as LUT DATA address appears to be in COG space.
DOCs are already going to need to show two memory maps, one for PC_Space (10b) and one for Data Space as no matter what order LUT:COG are done, the data space in each is 9b.
The self modifying instructions (which are used a ton in PASM code in the COG) won't work in LUT space.
You'll have to create a way to force code that needs self modifying to work into cog code space.
I think it's better to have COG space first, and LUT space an extra if you need it for code. I think it'll be used for LUT and non-code a lot more than it's used for code.
The self modifying instructions (which are used a ton in PASM code in the COG) won't work in LUT space.
You'll have to create a way to force code that needs self modifying to work into cog code space.
That is simply a tool issue, and it will be needed anyway.
I think it's better to have COG space first, and LUT space an extra if you need it for code. I think it'll be used for LUT and non-code a lot more than it's used for code.
but the order does not affect who uses what, but it DOES affect breaking JMPREL tables.
It does affect who uses what, and the precise effect is losing COG execute and data at 0. I don't even consider the cross address mapping example as a starter. It's simply not needed and adds a lot of confusion to the COG, which is supposed to be easy and clear.
potatohead,
IMHO there will be a lot of objects that use LUT as additional code and memory space.
This doesn't suffer the same problems of hubexec such as
- determinism
- faster
- lower power
- keep objects within the cog
In fact, I bet more cogs will use LUT code than hubexec code.
But I agree, we don't want complex mapping schemes.
LUT_CODE from $000 to $1ff LUT_DATA is $000 to $1ff as before
COG_CODE from $200 to $3ff COG_DATA is $000 to $1ff as before
Special Registers are $3f0 in Code, $1f0 in COG_Data space
INT_VECT are $3f8 in code, $1f8 in COG_Data,
Sure!
But that's a mess, and it's not needed. LUT isn't a primary code target. It's a lookup table that can optionally contain some code. Code there won't work like COG code either.
Assuming we don't go for that mess, then it's simple: COG data and execute at 0 preserves the directly index without add case for speed we know will get used.
IMHO there will be a lot of objects that use LUT as additional code and memory space.
This doesn't suffer the same problems of hubexec such as
- determinism
- faster
- lower power
- keep objects within the cog
In fact, I bet more cogs will use LUT code than hubexec code.
I agree. Likely one main COG will run HUBEXEC and the many co-processor Reel time cogs, will run local code, in any user-chosen mix of LUT/COG
Hey, if we end up stuffing a lot of code into the LUT, fine! That's why having it be a target for instructions makes sense. I'm not opposed to instructions there at all. It's a great feature, and you may be right on how much it's used too Cluso.
But it's not an extension of the COG RAM, because it doesn't work the same as, nor is integrated with the COG registers like COG code actually is. Trying to make it look that way isn't doing anyone any good.
And that's why I don't want the funky mapping scheme at all, nor the LUT being addressed at 0. It's just fine where it is.
What about having LUT addressing be mapped to the end of address space so that it wraps into cog space (which would stay starting at 0)?
That seems more acceptable to me...
Do you mean LUT at the end of Code-address space ?
Sure, that works just as well
The PC value is arbitrary anyway, but it would need to still work with JMPREL, and an index value would need to wrap the address, which I think would happen ?
Addit: Plus short jump relative opcodes would need to wrap across, but again, I think that would happen.
To me, that is harder to explain, but it does have the psychological effect of mapping COG to 0000.
It means that labels will be the same value for branch targets as well as D/S targets (self modifying code instructions) in cog space, which is cleaner.
It means that labels will be the same value for branch targets as well as D/S targets (self modifying code instructions) in cog space, which is cleaner.
The 10th bit would be ignored by the tools anyway, but I see your point.
If the tools define a LUT_BASE constant then a simple change would allow other Memory sizes. ( Thinking ahead to FPGA versions and future variants here.)
Roy,
I don't really understand what you are getting at.
For LUT being $200..3FF, jmp/call/djxx/etc can all be relative +/-8bits and be able to go between cog and lut.
This would not work for LUT being $100000..$1001FF.
The latest is cog $000..1FF, lut $200..3FF.
Currently SETQ loads up to 511 longs into cog ram, SETQ2 loads up to 511 longs into lut ram (does 0=512 longs???).
With this, the compiler could have an ORG LUT which is $200.
Cluso, note I said wrap into cog space. This works both ways. Relative branches with negative values would wrap below 0 to LUT space.
It wouldn't be $100000 to $1001FF, it would be $FFE00 - $FFFFF (remember PC is 20bits and would wrap at $FFFFF back to $0).
Roy,
I don't really understand what you are getting at.
For LUT being $200..3FF, jmp/call/djxx/etc can all be relative +/-8bits and be able to go between cog and lut.
This would not work for LUT being $100000..$1001FF.
I think Roy means down from $100000, so then relative opcodes simply wrap.
I think Chip might be saving the top of address space in case he can squeeze in a full 1MB of HUB RAM...
That would be ok, with something like LUTSEG and COGSEG the large numbers would be insulated from users anyway.
Alternatively, think of LUT org as -0x200 & the tools manage the wrap to match the silicon.
Regardless of where lutram ends up, relative branches from cogexec or lutexec could be made to always stay in cog or lut and never go into hubexec. For example, supposing lutram still has $200 longs, "jmp @$400" and "jmp @-$400" would both jump to themselves.
Regardless of where lutram ends up, relative branches from cogexec or lutexec could be made to always stay in cog or lut and never go into hubexec. For example, supposing lutram still has $200 longs, "jmp @$400" and "jmp @-$400" would both jump to themselves.
Interesting idea, I think Chip has said JMP/CALL is needed anyway too cross into HUBEXEC, but I think he also said RJMPs would do that ?
Q : Assuming it is even possible, does the change in RJMP reach affect the Verilog much ?
Roy,
OK, I understand now. I would still prefer LUT to follow on from COG.
I would like to leave 8KB of hub free for 2KB Cog and up to 6KB of LUT just in case there is space, or later version.
In fact if there's space, 1 or 2 cogs with 6KB LUT would be nice. I know the purists will object although I disagree.
So we'd start hubexec at $02000 (byte address).
Cluso,
If the lut memory were mapped at the end of code address space, then it could grow without changing where hub code space starts.
In fact, hub code address space would start earlier in hub with this change too. $200 instead of $400. In the 512KB case we lose less hub exec space.
Cluso,
If the lut memory were mapped at the end of code address space, then it could grow without changing where hub code space starts.
In fact, hub code address space would start earlier in hub with this change too. $200 instead of $400. In the 512KB case we lose less hub exec space.
I see it differently...
If it sits at the end of hub and down, then it impacts two sets of addressing blocks.
I don't see an 8KB "hole" at the bottom of hub for cog/lut being a problem (without expansion, it would only be 4KB).
I think it's also a simpler explanation too.
However, I can live with cog & lut mapped anywhere in hub. I just prefer the bottom for both.
Comments
add a, b happens with "a" and "b" as COG register targets, not LUT "register" targets. There are no LUT registers, and the memory isn't the same either.
Putting some instructions as COG overflow into the LUT makes a ton of sense! Considering it all one larger COG space does not make any sense. It's just not one larger COG space anymore than HUB code is a larger COG space.
If there were no LUT execute, then having the COG at 0 would make great sense, as would moving the registers and interrupt vectors to the top of COG RAM where they have been the whole time. Given the LUT execute is entirely optional, and that it actually does work differently from COG code, the benefit of having COG at 0 is allowing for the COG data without having to add use case, which has been used a lot, and it may actually still get used that way. Maybe somebody wants to quickly combine LUT data and COG data, for example...
That data at 0 case is more than jump tables.
? Err, Data at 0 has not gone away.
The example I gave was for larger jump tables.
Properly & easily managed with LUT:COG, but a mess COG:LUT.
LUT is an extension of the cog space or lookup.
If LUT is $000..$1FF and COG is $200..$3FF then
1. The programming model will be confusing !!!
- cog registers will still be $000..$1FF because there are only 9 bits.
- so registers are going to look like they are in lut space but they are not.
2. Where does LUT go if it gets expanded in a later P2. Chip already said there may be room to expand it to 4KB.
That's actually an everyday mess, where the "I want to run code in the LUT" case isn't an everyday use case. It's not worth it to have different addressing schemes.
To be frank, I would rather not have LUT execute at all, if it means yet another contrived and goofy address mapping. The point of the very simple, and easy COG addressing is to be easy!
Besides, it's just a load and jump. No brainer. Easy as it is right now.
Plenty have said they will prefer LUT code for workhorse tasks and use COG for where CODE-in-Data makes sense.
Other MCUs keep code out of Data space, keeping Data for more valuable things.
HUBEXEC will be the main HLL code space, followed by LUT, then COG when LUT is used for LUT.
Only a small percentage of COGS will use LUT as LUT.
Nope, registers are clearly in COG 9b data space.
You can just as easily claim LUT is confusing, as LUT DATA address appears to be in COG space.
DOCs are already going to need to show two memory maps, one for PC_Space (10b) and one for Data Space as no matter what order LUT:COG are done, the data space in each is 9b.
You'll have to create a way to force code that needs self modifying to work into cog code space.
I think it's better to have COG space first, and LUT space an extra if you need it for code. I think it'll be used for LUT and non-code a lot more than it's used for code.
but the order does not affect who uses what, but it DOES affect breaking JMPREL tables.
COG execute is not gone, and COG Data at 00 is free. Nothing is lost.
IMHO there will be a lot of objects that use LUT as additional code and memory space.
This doesn't suffer the same problems of hubexec such as
- determinism
- faster
- lower power
- keep objects within the cog
In fact, I bet more cogs will use LUT code than hubexec code.
But I agree, we don't want complex mapping schemes.
LUT_CODE from $000 to $1ff LUT_DATA is $000 to $1ff as before
COG_CODE from $200 to $3ff COG_DATA is $000 to $1ff as before
Special Registers are $3f0 in Code, $1f0 in COG_Data space
INT_VECT are $3f8 in code, $1f8 in COG_Data,
Sure!
But that's a mess, and it's not needed. LUT isn't a primary code target. It's a lookup table that can optionally contain some code. Code there won't work like COG code either.
Assuming we don't go for that mess, then it's simple: COG data and execute at 0 preserves the directly index without add case for speed we know will get used.
I agree - Flipping Code PC.10 bit, is very far from a 'complex mapping scheme'
But it's not an extension of the COG RAM, because it doesn't work the same as, nor is integrated with the COG registers like COG code actually is. Trying to make it look that way isn't doing anyone any good.
And that's why I don't want the funky mapping scheme at all, nor the LUT being addressed at 0. It's just fine where it is.
That seems more acceptable to me...
Sure, that works just as well
The PC value is arbitrary anyway, but it would need to still work with JMPREL, and an index value would need to wrap the address, which I think would happen ?
Addit: Plus short jump relative opcodes would need to wrap across, but again, I think that would happen.
To me, that is harder to explain, but it does have the psychological effect of mapping COG to 0000.
It means that labels will be the same value for branch targets as well as D/S targets (self modifying code instructions) in cog space, which is cleaner.
If the tools define a LUT_BASE constant then a simple change would allow other Memory sizes. ( Thinking ahead to FPGA versions and future variants here.)
I don't really understand what you are getting at.
For LUT being $200..3FF, jmp/call/djxx/etc can all be relative +/-8bits and be able to go between cog and lut.
This would not work for LUT being $100000..$1001FF.
The latest is cog $000..1FF, lut $200..3FF.
Currently SETQ loads up to 511 longs into cog ram, SETQ2 loads up to 511 longs into lut ram (does 0=512 longs???).
With this, the compiler could have an ORG LUT which is $200.
It wouldn't be $100000 to $1001FF, it would be $FFE00 - $FFFFF (remember PC is 20bits and would wrap at $FFFFF back to $0).
1MB of hub is still usable as data (via RDxxxx/WRxxxx), we have 1MB minus 1k (I believe, maybe it's 4k?) for hubexec.
That would be ok, with something like LUTSEG and COGSEG the large numbers would be insulated from users anyway.
Alternatively, think of LUT org as -0x200 & the tools manage the wrap to match the silicon.
Q : Assuming it is even possible, does the change in RJMP reach affect the Verilog much ?
OK, I understand now. I would still prefer LUT to follow on from COG.
I would like to leave 8KB of hub free for 2KB Cog and up to 6KB of LUT just in case there is space, or later version.
In fact if there's space, 1 or 2 cogs with 6KB LUT would be nice. I know the purists will object although I disagree.
So we'd start hubexec at $02000 (byte address).
If the lut memory were mapped at the end of code address space, then it could grow without changing where hub code space starts.
In fact, hub code address space would start earlier in hub with this change too. $200 instead of $400. In the 512KB case we lose less hub exec space.
That's a good point, and we are likely to see LUT variants in FPGA versions, for example
Another benefit..
If it sits at the end of hub and down, then it impacts two sets of addressing blocks.
I don't see an 8KB "hole" at the bottom of hub for cog/lut being a problem (without expansion, it would only be 4KB).
I think it's also a simpler explanation too.
However, I can live with cog & lut mapped anywhere in hub. I just prefer the bottom for both.