I prefer the old model, with hubexec only being $1000 and above. I think it's totally fine that the first chunk of hub ram can't be hubexec'd.
Also, Seairth, the cog memory needs to be in the first block so that immediate mode numbers (9 bits) can address cog memory space without some funky remapping.
Also, Seairth, the cog memory needs to be in the first block so that immediate mode numbers (9 bits) can address cog memory space without some funky remapping.
The data addressing wouldn't change. Only the instruction addressing. And none of those use immediate mode.
I prefer the old model, with hubexec only being $1000 and above. I think it's totally fine that the first chunk of hub ram can't be hubexec'd.
Also, Seairth, the cog memory needs to be in the first block so that immediate mode numbers (9 bits) can address cog memory space without some funky remapping.
Exactly my thought.
Hubexec will anyways need longer addresses as fit into 9 bits. And we may still be able to uses addresses above the 512k without restrictions in further versions of FPGAs. Any ROM could be placed at the end of the address space?
Loosing 4K of 512K for Hubexec is not so bad, compared to a complicated memory system.
Another thought would be that there is no hub-memory below $1000. So HUB starts at $1000 and ends at $1000+512K. Then there would be space for the ROM at $0
But having the same address space shared if aligned or not to 'save' 4 more K for HubExec is - hmm - needs - a lot of explanation, documentation and is not worth it. To say HubExec can not start below $1000 is one sentence.
And I am quite sure that any compiler writer will be more happy with a simpler memory model.
This came up because the ROM gets loaded into $00000..$03FFF on startup and I needed to be able to execute it in place.
I think having these new address rules is not bad, at all. It does not impinge upon anything that existed before, but allows you to execute from the hub in what was cog/LUT-only area, but at offsets that you never would have used in cog/LUT code. In fact, the assembler errors out if you are in cog/LUT mode (ORG, as opposed to ORGH) and try to assemble anything at non-long-aligned addresses. That is within the context of a cog's memory, though, and not the hub's memory.
You can think of it like this: everything is hub-executable, except addresses: %0000000000xxxxxxxxxx00. Is that really so bad? It doesn't affect anything you might do with hub memory. It just means that if the cog's program counter is in the address range %0000000000xxxxxxxxxx00, it get its instruction from its own memory, as oppose to the hub's.
Here is what code (initially hub-exec, remember) looks like when it wants to load the cog with executable code immediately following, and then run it:
orgh
<this is the entry point of your hub-exec program>
hcode setq #$1F7 'ready to load $1F8 registers (can be reduced, of course)
rdlong 0,ptrb[ccode - hcode] 'load registers $000..$1F7 (doesn't need to start at 0)
jmp #0 'jump to loaded code in cog RAM
ccode
org
<your cog-exec program goes here>
I can see that we are going to need three ORG instructions:
ORGH - set or switch to hub origin
ORGC - set or switch to cog origin
ORGL - set or switch to LUT origin
Maybe we could just make ORGH into plain old ORG, since hub is the default now.
I was thinking about how addresses above $1000 are hub-exec, while addresses $0..$7FF are cog-exec and $800..$FFF are LUT-exec, and what a pain it is that you can't have hub-executable code below $1000. Then, it dawned on me that cog/LUT-exec could be restricted to long-aligned addresses, only, allowing hub exec to occur on non-long-aligned addresses below $1000. Here's the new way:
My initial reaction was just like the rest of you... No way!
In fact I wrote a post outlining some other suggestions but before I posted it I had second thoughts...
I love it !!!
The writeup just says that hub-exec is limited to hub memory above 4KB (this may go up in later revisions).
Then describe a quirk that can be used to get hub-exec to work in the lower 4KB.
This makes it easy to understand.
Besides, most code that will likely end up in the lower 4KB Hub space will be some form of hub-exec bootup & monitor code that will likely be provided in an object (or copied from ROM). So the normal user can forget about this.
As a secondary benefit, a number of us wanted a space for mailboxes to pass info between cogs. This could now be forced upon us
(4.) The hub ROM is read via COGID WC, sequentially. This happens at boot-up and the contents are loaded into the first 16KB of RAM and executed by cog0. 16KB is complete overkill, for now, but it is sufficient room for a complete on-chip development system in the future.
Does that mean you have a price for a ROM-revision mask ?
I was thinking about how addresses above $1000 are hub-exec, while addresses $0..$7FF are cog-exec and $800..$FFF are LUT-exec, and what a pain it is that you can't have hub-executable code below $1000. Then, it dawned on me that cog/LUT-exec could be restricted to long-aligned addresses, only, allowing hub exec to occur on non-long-aligned addresses below $1000. Here's the new way:
Interesting approach, users can still use LOWEST HUB as Arrays/Data right ?
Does adding that wider decode have much of a speed penalty ?
What about access outside HUB - does/can that generate a trap (interrupt?) like some MCUs do.
Doing that would give a low-level way to manage off-chip access for large data.
Jmg, you just gave me a hideous idea that is sure to be a hit:
ORGH - set or switch to hub origin
ORGC - set or switch to cog origin
ORGL - set or switch to LUT origin
Maybe we could just make ORGH into plain old ORG, since hub is the default now.
So...
ORG [offset] ' hub (any value below $1000 whose 2 LSBs are zero is an error)
ORGC [offset] ' cog (any value outside of $000-$7FF or whose 2 LSBs are not zero is an error)
ORGL [offset] ' lut (any value outside of $800-$FFF or whose 2 LSBs are not zero is an error)
Alternatively, you could get rid of DAT and add HUB, COG, and LUT. Then ORG is dependent on which section you are in. This might provide a bit better documentation than trying to spot ORGx directives to differentiate between code/data locations.
This keeps us sane. Plus, it cuts the special-awareness zone down to $00000..$007FF.
Maybe something even better is possible...
An appeal of that new idea, is the assembler can easily check every instruction within the 3 Memory segments you mentioned above & report that in a MAP file.
I can see that we are going to need three ORG instructions:
ORGH - set or switch to hub origin
ORGC - set or switch to cog origin
ORGL - set or switch to LUT origin
Most ASM's have an align operator, here
ORGC has an implicit align xx00
ORGL has an implicit align xx01
ORGH is neither of the above, but needs a value, maybe x10
What does xx11 then do ?, alias onto xx10 ? for xx1x decode ?
Is there a spare bit not an address bit somewhere to make it modal?
That bit has to be generated from something. The PC, while not ideal, seems to be the best mechanism to exploit for determining cog/lut/hub instruction space.
Alternatively, you could get rid of DAT and add HUB, COG, and LUT. Then ORG is dependent on which section you are in. This might provide a bit better documentation than trying to spot ORGx directives to differentiate between code/data locations.
Actually, In the cases of COG and LUT, it would just be implied that they start with the appropriate ORG ($0000 for COG, $0800 for LUT). For HUB, it starts at whatever the current hub address is (basically the same as what DAT is now). ORG would only be used to override the defaults.
This keeps us sane. Plus, it cuts the special-awareness zone down to $00000..$007FF.
Maybe something even better is possible...
An appeal of that new idea, is the assembler can easily check every instruction within the 3 Memory segments you mentioned above
I can see that we are going to need three ORG instructions:
ORGH - set or switch to hub origin
ORGC - set or switch to cog origin
ORGL - set or switch to LUT origin
Most ASM's have an align operator, here
ORGC has an implicit align xx00
ORGL has an implicit align xx01
ORGH is neither of the above, but needs a value, maybe x10
What does xx11 then do ?, alias onto xx10 ? for xx1x decode ?
All addresses have a hub address (19 bits) and a contextual cog/LUT address (9 bits). They needn't have common LSB's.
Why can't the ROM entry point just be at $01000? So when the first cog starts up it starts running the ROM code at $01000 instead of $00000 (or now $00001). Then the cog/lut image can be in the first 4k if you want, and the code at $01000 can load that into the cog and jump to $00000.
This seems a lot cleaner and simpler to me that concocting this oddball thing where jumping to odd addresses below $01000 = hub exec, but jumping to long aligned addresses goes to cog.
Code generation for hub exec code in the bottom 4k now has to make sure everything stays not long aligned for destinations for branches. Which will be the case if there is no embedded data, I guess, but still it's now a concern where it wasn't before. At least in the hand coding space you have to make sure you specify (or calculate) non-long aligned addresses for branches when in the first 4k so as not to accidentally branch into cog or lut code (which is perfectly reasonable to do in hub exec code).
To me it makes things more complex than they need to be, and that's not fun.
This should make the verilog very simple. In all cases, the 2 LSBs are set to zero. This means that hub instructions will still be long aligned, and the addressing is contiguous. This shouldn't be an issue for the cog or lut for the most part. Your code above would instead look like:
orgh
<this is the entry point of your hub-exec program>
hcode setq #$1F7 'ready to load $1F8 registers (can be reduced, of course)
rdlong 0,ptrb[ccode - hcode] 'load registers $000..$1F7 (doesn't need to start at 0)
jmp #%10 'jump to loaded code in cog RAM
ccode
org
<your cog-exec program goes here>
More likely, like:
orgh
<this is the entry point of your hub-exec program>
hcode setq #$1F7 'ready to load $1F8 registers (can be reduced, of course)
rdlong 0,ptrb[ccode - hcode] 'load registers $000..$1F7 (doesn't need to start at 0)
jmp #entry 'jump to loaded code in cog RAM
ccode
org
entry
<your cog-exec program goes here>
Code generation for hub exec code in the bottom 4k now has to make sure everything stays not long aligned for destinations for branches. Which will be the case if there is no embedded data, I guess, but still it's now a concern where it wasn't before. At least in the hand coding space you have to make sure you specify (or calculate) non-long aligned addresses for branches when in the first 4k so as not to accidentally branch into cog or lut code (which is perfectly reasonable to do in hub exec code).
To me it makes things more complex than they need to be, and that's not fun.
Surely an assembler (or compiler) manages all that housekeeping, once you specify the segment.
It only appears complex, if you try to go deeper & code in hex, but PCs are great at this sort of low level house keeping.
Why can't the ROM entry point just be at $01000? So when the first cog starts up it starts running the ROM code at $01000 instead of $00000 (or now $00001). Then the cog/lut image can be in the first 4k if you want, and the code at $01000 can load that into the cog and jump to $00000.
This seems a lot cleaner and simpler to me that concocting this oddball thing where jumping to odd addresses below $01000 = hub exec, but jumping to long aligned addresses goes to cog.
Code generation for hub exec code in the bottom 4k now has to make sure everything stays not long aligned for destinations for branches. Which will be the case if there is no embedded data, I guess, but still it's now a concern where it wasn't before. At least in the hand coding space you have to make sure you specify (or calculate) non-long aligned addresses for branches when in the first 4k so as not to accidentally branch into cog or lut code (which is perfectly reasonable to do in hub exec code).
To me it makes things more complex than they need to be, and that's not fun.
I agree that complex is not good. Simple linear address mapping is easiest to understand and doesn't tax one's mind unnecessarily.
The ROM could be made to load starting at $1000.
I'm also thinking that we don't need a special ORGL instruction, but just the ORG we use for the cog, allowing values (long-index) $000..$1F7, then $200..$3FF.
Man, I wish we could use simple long-index addresses within cog space to get around this everything-times-4 issue. It's a brain-bender.
Chip,
The ROM could still load into RAM starting at $00000. Just make the cog start hub exec at $01000. Then the image to be loaded into cog/lut could be in the first 4k, and the code at $01000 in hub could load that first 4k into the cog/lut and jump to it.
This of course is just one example of a simple startup, but in any case, I think keeping stuff simpler here is the wise choice.
%000000000_xxxxxxxxx_00 = HUB instruction addresses (must be long aligned for lower 2KB Hub)
%000000000_xxxxxxxxx_01 = COG instruction addresses & Register space
%000000000_xxxxxxxxx_10 = LUT instruction addresses
%000000000_xxxxxxxxx_11 = reserved for future 4KB COG or LUT instruction addresses
%xxxxxxxxx_xxxxxxxxx_xx = HUB instruction addresses >%000000000_FFFFFFFFF_00
This just means that when the lower 2KB HUB is used in hub-exec mode, instructions MUST be long aligned.
I expect mostly we will desire long aligned hub-exec to minimise clock stalls.
Is there any reason that the ROM could not just be loaded into HUB 2KB (%000000001_000000000_00) and above, and execution start at this address ???
Maybe the FUSES could be read into HUB $0 upwards, and the secure section cleared if security is enabled.
As for ORGC/ORGL/ORGH...
I would prefer to have use ORGx rather than use ORG for one of them (ie always specifically declare)
Actually ORGCOG/ORGLUT/ORGHUB is more intuitive.
Why do we need an ORGL (or ORGLUT) ???
Couldn't ORGC (ORGCOG) just be expanded to 4KB where the lower 2KB is COG/Registers and the upper 2KB is LUT ?
Also, won't LUT be used more for cog-exec or extended cog memory, rather than LUT usage?
If so, then maybe we need to think of a better name than LUT. Extended COG Memory is a bit of a mouthful though.
Comments
Also, Seairth, the cog memory needs to be in the first block so that immediate mode numbers (9 bits) can address cog memory space without some funky remapping.
The data addressing wouldn't change. Only the instruction addressing. And none of those use immediate mode.
Exactly my thought.
Hubexec will anyways need longer addresses as fit into 9 bits. And we may still be able to uses addresses above the 512k without restrictions in further versions of FPGAs. Any ROM could be placed at the end of the address space?
Loosing 4K of 512K for Hubexec is not so bad, compared to a complicated memory system.
Another thought would be that there is no hub-memory below $1000. So HUB starts at $1000 and ends at $1000+512K. Then there would be space for the ROM at $0
But having the same address space shared if aligned or not to 'save' 4 more K for HubExec is - hmm - needs - a lot of explanation, documentation and is not worth it. To say HubExec can not start below $1000 is one sentence.
And I am quite sure that any compiler writer will be more happy with a simpler memory model.
Enjoy!
Mike
It's only hard if somebody wants to put code there.
Enjoy!
Mike
This way the original model is mantained but above 512KB/1MB/whatever PC folds back and code could be still be executed from the lower 4KB.
Also nice.
Mike
Thanks for all your passionate responses!
This came up because the ROM gets loaded into $00000..$03FFF on startup and I needed to be able to execute it in place.
I think having these new address rules is not bad, at all. It does not impinge upon anything that existed before, but allows you to execute from the hub in what was cog/LUT-only area, but at offsets that you never would have used in cog/LUT code. In fact, the assembler errors out if you are in cog/LUT mode (ORG, as opposed to ORGH) and try to assemble anything at non-long-aligned addresses. That is within the context of a cog's memory, though, and not the hub's memory.
You can think of it like this: everything is hub-executable, except addresses: %0000000000xxxxxxxxxx00. Is that really so bad? It doesn't affect anything you might do with hub memory. It just means that if the cog's program counter is in the address range %0000000000xxxxxxxxxx00, it get its instruction from its own memory, as oppose to the hub's.
I can see that we are going to need three ORG instructions:
ORGH - set or switch to hub origin
ORGC - set or switch to cog origin
ORGL - set or switch to LUT origin
Maybe we could just make ORGH into plain old ORG, since hub is the default now.
In fact I wrote a post outlining some other suggestions but before I posted it I had second thoughts...
I love it !!!
The writeup just says that hub-exec is limited to hub memory above 4KB (this may go up in later revisions).
Then describe a quirk that can be used to get hub-exec to work in the lower 4KB.
This makes it easy to understand.
Besides, most code that will likely end up in the lower 4KB Hub space will be some form of hub-exec bootup & monitor code that will likely be provided in an object (or copied from ROM). So the normal user can forget about this.
As a secondary benefit, a number of us wanted a space for mailboxes to pass info between cogs. This could now be forced upon us
I just know that it's possible.
Jmg, you just gave me a hideous idea that is sure to be a hit:
%000000000xxxxxxxxx00 = cog instruction addresses
%000000000xxxxxxxxx01 = LUT instruction addresses
all others = hub instruction addresses
This keeps us sane. Plus, it cuts the special-awareness zone down to $00000..$007FF.
Maybe something even better is possible...
Later, when there is an on chip system, it may prove wise to make a lower RAM map for some specific purposes.
On the plus side, we got lots of ideas to choose from!
So...
Alternatively, you could get rid of DAT and add HUB, COG, and LUT. Then ORG is dependent on which section you are in. This might provide a bit better documentation than trying to spot ORGx directives to differentiate between code/data locations.
An appeal of that new idea, is the assembler can easily check every instruction within the 3 Memory segments you mentioned above & report that in a MAP file.
Most ASM's have an align operator, here
ORGC has an implicit align xx00
ORGL has an implicit align xx01
ORGH is neither of the above, but needs a value, maybe x10
What does xx11 then do ?, alias onto xx10 ? for xx1x decode ?
That bit has to be generated from something. The PC, while not ideal, seems to be the best mechanism to exploit for determining cog/lut/hub instruction space.
Actually, In the cases of COG and LUT, it would just be implied that they start with the appropriate ORG ($0000 for COG, $0800 for LUT). For HUB, it starts at whatever the current hub address is (basically the same as what DAT is now). ORG would only be used to override the defaults.
All addresses have a hub address (19 bits) and a contextual cog/LUT address (9 bits). They needn't have common LSB's.
Why can't the ROM entry point just be at $01000? So when the first cog starts up it starts running the ROM code at $01000 instead of $00000 (or now $00001). Then the cog/lut image can be in the first 4k if you want, and the code at $01000 can load that into the cog and jump to $00000.
This seems a lot cleaner and simpler to me that concocting this oddball thing where jumping to odd addresses below $01000 = hub exec, but jumping to long aligned addresses goes to cog.
Code generation for hub exec code in the bottom 4k now has to make sure everything stays not long aligned for destinations for branches. Which will be the case if there is no embedded data, I guess, but still it's now a concern where it wasn't before. At least in the hand coding space you have to make sure you specify (or calculate) non-long aligned addresses for branches when in the first 4k so as not to accidentally branch into cog or lut code (which is perfectly reasonable to do in hub exec code).
To me it makes things more complex than they need to be, and that's not fun.
Since you have pointed out that hubexec is the "default", how about:
%xxxxxxxxxxxxxxxxxx00 = hub instruction addresses
%xxxxxxxxxxxxxxxxxx01 = hub instruction addresses
%000000000xxxxxxxxx10 = cog instruction addresses
%000000000xxxxxxxxx11 = LUT instruction addresses
This should make the verilog very simple. In all cases, the 2 LSBs are set to zero. This means that hub instructions will still be long aligned, and the addressing is contiguous. This shouldn't be an issue for the cog or lut for the most part. Your code above would instead look like:
More likely, like:
Surely an assembler (or compiler) manages all that housekeeping, once you specify the segment.
It only appears complex, if you try to go deeper & code in hex, but PCs are great at this sort of low level house keeping.
I agree that complex is not good. Simple linear address mapping is easiest to understand and doesn't tax one's mind unnecessarily.
The ROM could be made to load starting at $1000.
I'm also thinking that we don't need a special ORGL instruction, but just the ORG we use for the cog, allowing values (long-index) $000..$1F7, then $200..$3FF.
Man, I wish we could use simple long-index addresses within cog space to get around this everything-times-4 issue. It's a brain-bender.
LSB order mapping choice can be anything, but the merit I see in COG at xx00 is that does not lose a single byte of very valuable COG memory
The ROM could still load into RAM starting at $00000. Just make the cog start hub exec at $01000. Then the image to be loaded into cog/lut could be in the first 4k, and the code at $01000 in hub could load that first 4k into the cog/lut and jump to it.
This of course is just one example of a simple startup, but in any case, I think keeping stuff simpler here is the wise choice.
Neither does the above mapping. Also, it's not the same, as this does not support non-aligned hub instructions.
%000000000_xxxxxxxxx_00 = HUB instruction addresses (must be long aligned for lower 2KB Hub)
%000000000_xxxxxxxxx_01 = COG instruction addresses & Register space
%000000000_xxxxxxxxx_10 = LUT instruction addresses
%000000000_xxxxxxxxx_11 = reserved for future 4KB COG or LUT instruction addresses
%xxxxxxxxx_xxxxxxxxx_xx = HUB instruction addresses >%000000000_FFFFFFFFF_00
This just means that when the lower 2KB HUB is used in hub-exec mode, instructions MUST be long aligned.
I expect mostly we will desire long aligned hub-exec to minimise clock stalls.
Is there any reason that the ROM could not just be loaded into HUB 2KB (%000000001_000000000_00) and above, and execution start at this address ???
Maybe the FUSES could be read into HUB $0 upwards, and the secure section cleared if security is enabled.
As for ORGC/ORGL/ORGH...
I would prefer to have use ORGx rather than use ORG for one of them (ie always specifically declare)
Actually ORGCOG/ORGLUT/ORGHUB is more intuitive.
Why do we need an ORGL (or ORGLUT) ???
Couldn't ORGC (ORGCOG) just be expanded to 4KB where the lower 2KB is COG/Registers and the upper 2KB is LUT ?
Also, won't LUT be used more for cog-exec or extended cog memory, rather than LUT usage?
If so, then maybe we need to think of a better name than LUT. Extended COG Memory is a bit of a mouthful though.