For a finer geometry, bet we will be asking for a lot more hub ram than 1MB
We may get away with having multiple cog/hub combinations with LUT sharing between them though
I've got the A9 compiled. Now, I'm compiling the DE2-115, while I make sure the example files are all up-to-date. Maybe I'll be able to get this update out tonight.
I've got the A9 compiled. Now, I'm compiling the DE2-115, while I make sure the example files are all up-to-date. Maybe I'll be able to get this update out tonight.
How many Smart Pin Cells does this A9 build have ?
I've got the A9 compiled. Now, I'm compiling the DE2-115, while I make sure the example files are all up-to-date. Maybe I'll be able to get this update out tonight.
How many Smart Pin Cells does this A9 build have ?
Thirty-eight. Here's the mask:
64'hFC00_FFFF_0000_FFFF
It could almost fit 64, but not quite. This many smart pins compiles quickly.
One thing that's been kind of a headache is that the physical hub RAM map wraps around within the 1MB hub map. For example, a 32KB physical hub repeats 32 times in the hub space. If you write just beyond the physical RAM limit, you overwrite early addresses. This creates horrible failures that bring everything to a halt.
To fix this problem, I made the egg-beater qualify the addresses to the hub RAM slices before it enables them. If you remember, there are 64 bytes at the end of the hub ($FFFC0..$FFFFF) that hold 16 debug interrupt vectors. You really don't want to overwrite those accidentally.
So, now, if you read from the physical RAM top, minus 63, to $FFFBF, you get $0000_0000's. Writes to that nether region don't do anything.
Reads and writes, from $00000 to physical RAM top minus 64, access the RAMs, as well as reads and writes from $FFFC0..$FFFFF.
I wanted to get this in before the coming release, as it simplifies the development tools, as well.
Out of curiosity, why couldn't this debug jump table be located at the bottom of hub memory, below $400? Normally, you can't execute hub code out of that area. But, in this case, you know you are in the "debug" state. So, would it be possible to use this knowledge to force a sort of "hub-only" execution mode? This could be advantageous for a couple reasons:
* The debug registers are no longer in the "normal" hub memory area.
* With 256 longs (16 of which would be the jump table), debugger code can also be stored below $400 without risk of it being executed outside of the debug ISR.
* This further enforces the idea debugging being stealthy and isolated from normal programming, since the debugger code would never be able to execute in cog or lut exec modes.
* This gives the lower $400 bytes of hub memory a commonly-acknowledged purpose
* This simplifies your rules above to: reading outside of physical hub memory returns $0000_0000, writing outside of physical hub memory does nothing.
* This same scheme will work for all P2 variants (various hub memory sizes).
Out of curiosity, why couldn't this debug jump table be located at the bottom of hub memory, below $400? Normally, you can't execute hub code out of that area. But, in this case, you know you are in the "debug" state. So, would it be possible to use this knowledge to force a sort of "hub-only" execution mode?
* ... since the debugger code would never be able to execute in cog or lut exec modes.
? but the Debug code does have to be able to run ?
I think you are suggesting what amounts to a page scheme, similar to how some MCUs manage Boot loaders.
In those, there is a gateway to jump into the 2nd page, and code runs normally from there
? but the Debug code does have to be able to run ?
I think you are suggesting what amounts to a page scheme, similar to how some MCUs manage Boot loaders.
In those, there is a gateway to jump into the 2nd page, and code runs normally from there
I guess you could see it that way, though I wouldn't consider this paging. Normally, hub bytes $00000-003FF are not executable, because those addresses are used for cog exec and lut exec modes. However, when you are in the debug ISR, I believe there's an internal state bit that tracks this (e.g. to allow SETBRK to behave differently). Using that debug state bit, you could disable cog exec and lut exec altogether and therefore be able to execute from that low hub memory. Since the debug ISR is always switching to hub exec mode anyhow, using the low hub memory for this purpose seems like a natural fit.
I guess you could see it that way, though I wouldn't consider this paging. Normally, hub bytes $00000-003FF are not executable, because those addresses are used for cog exec and lut exec modes. However, when you are in the debug ISR, I believe there's an internal state bit that tracks this (e.g. to allow SETBRK to behave differently). Using that debug state bit, you could disable cog exec and lut exec altogether and therefore be able to execute from that low hub memory. Since the debug ISR is always switching to hub exec mode anyhow, using the low hub memory for this purpose seems like a natural fit.
Whatever you want to call the selective enable of execution modes by memory area, this seems a good idea, as not only does it move the vectors, but it allows some Debug code space that is somewhat protected.
Now, was there some issue about HUB Data indexing from base 00, for speed reasons in some code ? (or was that only in COG ?, I can't recall )
Whatever you want to call the selective enable of execution modes by memory area, this seems a good idea, as not only does it move the vectors, but it allows some Debug code space that is somewhat protected.
Now, was there some issue about HUB Data indexing from base 00, for speed reasons in some code ? (or was that only in COG ?, I can't recall )
I don't recall anything. I suppose there is the ability to use an immediate address for the RDxxx/WRxxx instructions, which is only useful for the first $200 bytes. On the other hand, using immediate addresses would be a perfect use for setting the debug ISR entrypoints and other debug state information! That would give you 496 bytes (112 longs) for data and 128 longs for debugger code!
Anyhow, Chip has yet to weigh in on the technical feasibility/difficulty of this, so there may be something we are overlooking and/or forgetting about...
Whatever you want to call the selective enable of execution modes by memory area, this seems a good idea, as not only does it move the vectors, but it allows some Debug code space that is somewhat protected.
Now, was there some issue about HUB Data indexing from base 00, for speed reasons in some code ? (or was that only in COG ?, I can't recall )
I don't recall anything. I suppose there is the ability to use an immediate address for the RDxxx/WRxxx instructions, which is only useful for the first $200 bytes. On the other hand, using immediate addresses would be a perfect use for setting the debug ISR entrypoints and other debug state information! That would give you 496 bytes (112 longs) for data and 128 longs for debugger code!
Anyhow, Chip has yet to weigh in on the technical feasibility/difficulty of this, so there may be something we are overlooking and/or forgetting about...
Those lower RAM addresses are where the initial program resides. Putting the debug vectors down there would mean that we wouldn't be able to have our initial program sitting at $00000, waiting to be loaded into cog 0 and run. We'd have to bump it up to maybe $00040, to fit the vectors below. So, your programs would not start at $00000, anymore, which is a mild nuisance, but maybe more of a nuisance than just keeping the vectors at $FFFC0.
I remember going around and around over where to put all these, before. There is no perfect solution. It seemed best to separate some addresses to top and bottom, as we had all this to deal with:
We put the first two at the bottom and the last two at the top of hub. Now, #3 is gone, so we just have those vectors at the top of memory. It was going to be a bear to explain how everything mixed together, if they were all in the same place. Now, we have the simplest story, I think.
RD/WRxxxx imediate addressing is for hub $00000-$000FF IIRC because the top bit is used for another mode???
Right. Bit 8 turns on PTRA/PTRB mode.
I lost almost a day over a crazy bug that was hard to find. USB wasn't working and I finally figured out that the optimization I had made to save one clock on the WRPIN/WXPIN/WYPIN instructions was disastrous when back-to-back WxPIN instructions were executing on different pins, as it would not have an extra cycle to finish the first 4-bit command stream with %0000, before switching to another pin. I put it back to the way it was, before. Now, I'm recompiling for the A9 and I should be able to get the update out tonight.
On the update, rather than have a stripped-down 16-cog Prop2 running on every FPGA board, the eggbeater will be optimized for each FPGA, so that the number of cogs and hub RAM slices are identical. We should be able to get back up to two cogs on the DE0-Nano, without CORDIC, of course.
Those lower RAM addresses are where the initial program resides. Putting the debug vectors down there would mean that we wouldn't be able to have our initial program sitting at $00000, waiting to be loaded into cog 0 and run. We'd have to bump it up to maybe $00040, to fit the vectors below. So, your programs would not start at $00000, anymore, which is a mild nuisance, but maybe more of a nuisance than just keeping the vectors at $FFFC0.
I don't see why you can't still have the boot code load at $00000. You would never use the debug vectors then anyhow. And if you booted the chip with the debug bit set for cog 0, you could just execute the boot code in hub exec mode, then "exit" the debug mode to start the second-stage boot.
As for explanation, I think the people who actually care to write debuggers will have no problem understanding this stuff.
I don't see why you can't still have the boot code load at $00000. You would never use the debug vectors then anyhow. And if you booted the chip with the debug bit set for cog 0, you could just execute the boot code in hub exec mode, then "exit" the debug mode to start the second-stage boot.
The good news is, execute from that space is already there, for boot.
Seems it just needs some co-operation between Boot and Debug, as you suggest ?
Those lower RAM addresses are where the initial program resides. Putting the debug vectors down there would mean that we wouldn't be able to have our initial program sitting at $00000, waiting to be loaded into cog 0 and run. We'd have to bump it up to maybe $00040, to fit the vectors below. So, your programs would not start at $00000, anymore, which is a mild nuisance, but maybe more of a nuisance than just keeping the vectors at $FFFC0.
I don't see why you can't still have the boot code load at $00000. You would never use the debug vectors then anyhow. And if you booted the chip with the debug bit set for cog 0, you could just execute the boot code in hub exec mode, then "exit" the debug mode to start the second-stage boot.
As for explanation, I think the people who actually care to write debuggers will have no problem understanding this stuff.
The debug vectors are ALWAYS used. They are a hook, to grab hold of a cog as it launches, to set up some kind of debugging, if desired. If not desired, they need to be an RETI0 instruction, in order for the cog to continue on its way.
Those lower RAM addresses are where the initial program resides. Putting the debug vectors down there would mean that we wouldn't be able to have our initial program sitting at $00000, waiting to be loaded into cog 0 and run. We'd have to bump it up to maybe $00040, to fit the vectors below. So, your programs would not start at $00000, anymore, which is a mild nuisance, but maybe more of a nuisance than just keeping the vectors at $FFFC0.
I don't see why you can't still have the boot code load at $00000. You would never use the debug vectors then anyhow. And if you booted the chip with the debug bit set for cog 0, you could just execute the boot code in hub exec mode, then "exit" the debug mode to start the second-stage boot.
As for explanation, I think the people who actually care to write debuggers will have no problem understanding this stuff.
The debug vectors are ALWAYS used. They are a hook, to grab hold of a cog as it launches, to set up some kind of debugging, if desired. If not desired, they need to be an RETI0 instruction, in order for the cog to continue on its way.
That's not what I mean. The ROM boot code itself will never require the debug vectors.
Those lower RAM addresses are where the initial program resides. Putting the debug vectors down there would mean that we wouldn't be able to have our initial program sitting at $00000, waiting to be loaded into cog 0 and run. We'd have to bump it up to maybe $00040, to fit the vectors below. So, your programs would not start at $00000, anymore, which is a mild nuisance, but maybe more of a nuisance than just keeping the vectors at $FFFC0.
I don't see why you can't still have the boot code load at $00000. You would never use the debug vectors then anyhow. And if you booted the chip with the debug bit set for cog 0, you could just execute the boot code in hub exec mode, then "exit" the debug mode to start the second-stage boot.
As for explanation, I think the people who actually care to write debuggers will have no problem understanding this stuff.
The debug vectors are ALWAYS used. They are a hook, to grab hold of a cog as it launches, to set up some kind of debugging, if desired. If not desired, they need to be an RETI0 instruction, in order for the cog to continue on its way.
That's not what I mean. The ROM boot code itself will never require the debug vectors.
I read over your posts, but I'm not getting it. It might be because I'm kind of tired and its 100 degrees in my office. Could you please explain it really simply one more time? Sorry to ask this. I'm just not understanding.
It's been raining a blustery cold rain for the last week and a half here. The ski fields are rejoicing a good start to the season.
Chip,
Searith is talking about how the ROM boot code could get plastered right over the vector space. The power up reset won't be triggering any vectors itself so they could be suitably filled soon after by the boot code.
It probably also means a plain warm reset wipes out any programmed vectors.
I'm comfy with the way it is too. It would be difficult to extend beyond 1MB without trading in some useful single instructions for slower prefixed versions.
The long immediate branching instructions would probably integrate with the register direct versions. The register direct versions would gain an immediate address mode and be prefix-able with AUGD to reach beyond #$1FF.
LOC stands out as an oddball that has only one form.
EDIT: Intriguing, I just noticed that the conditional branches, like TJZ, already have immediate addressing while the unconditionals, like CALL, don't. EDIT2: Doh, of course, that's because there is the dedicated long immediate variants of those instructions in separate instruction encodings.
About those interrupt vectors (instruction spaces, really), they are initially loaded up by the cold-boot firmware built into the cog logic. They could be reset after the base program loads from $00000. They do occupy 64 bytes, so I'd hope that base program is at least that size.
Right now, I don't want to change this around. It's not a hard thing to do, but I'd need to think more about it to be confident that it wouldn't generate any subtle problems.
Thanks for thinking about this, in any case, Guys.
About those interrupt vectors (instruction spaces, really), they are initially loaded up by the cold-boot firmware built into the cog logic. They could be reset after the base program loads from $00000. They do occupy 64 bytes, so I'd hope that base program is at least that size.
Right now, I don't want to change this around. It's not a hard thing to do, but I'd need to think more about it to be confident that it wouldn't generate any subtle problems.
Thanks for thinking about this, in any case, Guys.
I just saw the prior comment. Does this mean you figured out what I was saying?
About those interrupt vectors (instruction spaces, really), they are initially loaded up by the cold-boot firmware built into the cog logic. They could be reset after the base program loads from $00000. They do occupy 64 bytes, so I'd hope that base program is at least that size.
Right now, I don't want to change this around. It's not a hard thing to do, but I'd need to think more about it to be confident that it wouldn't generate any subtle problems.
Thanks for thinking about this, in any case, Guys.
I just saw the prior comment. Does this mean you figured out what I was saying?
Comments
That's right. It's paging/banking beyond 20 bits (1MB).
We may get away with having multiple cog/hub combinations with LUT sharing between them though
How many Smart Pin Cells does this A9 build have ?
Thirty-eight. Here's the mask:
64'hFC00_FFFF_0000_FFFF
It could almost fit 64, but not quite. This many smart pins compiles quickly.
Out of curiosity, why couldn't this debug jump table be located at the bottom of hub memory, below $400? Normally, you can't execute hub code out of that area. But, in this case, you know you are in the "debug" state. So, would it be possible to use this knowledge to force a sort of "hub-only" execution mode? This could be advantageous for a couple reasons:
* The debug registers are no longer in the "normal" hub memory area.
* With 256 longs (16 of which would be the jump table), debugger code can also be stored below $400 without risk of it being executed outside of the debug ISR.
* This further enforces the idea debugging being stealthy and isolated from normal programming, since the debugger code would never be able to execute in cog or lut exec modes.
* This gives the lower $400 bytes of hub memory a commonly-acknowledged purpose
* This simplifies your rules above to: reading outside of physical hub memory returns $0000_0000, writing outside of physical hub memory does nothing.
* This same scheme will work for all P2 variants (various hub memory sizes).
That sounds a good idea.
? but the Debug code does have to be able to run ?
I think you are suggesting what amounts to a page scheme, similar to how some MCUs manage Boot loaders.
In those, there is a gateway to jump into the 2nd page, and code runs normally from there
I guess you could see it that way, though I wouldn't consider this paging. Normally, hub bytes $00000-003FF are not executable, because those addresses are used for cog exec and lut exec modes. However, when you are in the debug ISR, I believe there's an internal state bit that tracks this (e.g. to allow SETBRK to behave differently). Using that debug state bit, you could disable cog exec and lut exec altogether and therefore be able to execute from that low hub memory. Since the debug ISR is always switching to hub exec mode anyhow, using the low hub memory for this purpose seems like a natural fit.
Whatever you want to call the selective enable of execution modes by memory area, this seems a good idea, as not only does it move the vectors, but it allows some Debug code space that is somewhat protected.
Now, was there some issue about HUB Data indexing from base 00, for speed reasons in some code ? (or was that only in COG ?, I can't recall )
I don't recall anything. I suppose there is the ability to use an immediate address for the RDxxx/WRxxx instructions, which is only useful for the first $200 bytes. On the other hand, using immediate addresses would be a perfect use for setting the debug ISR entrypoints and other debug state information! That would give you 496 bytes (112 longs) for data and 128 longs for debugger code!
Anyhow, Chip has yet to weigh in on the technical feasibility/difficulty of this, so there may be something we are overlooking and/or forgetting about...
Those lower RAM addresses are where the initial program resides. Putting the debug vectors down there would mean that we wouldn't be able to have our initial program sitting at $00000, waiting to be loaded into cog 0 and run. We'd have to bump it up to maybe $00040, to fit the vectors below. So, your programs would not start at $00000, anymore, which is a mild nuisance, but maybe more of a nuisance than just keeping the vectors at $FFFC0.
I remember going around and around over where to put all these, before. There is no perfect solution. It seemed best to separate some addresses to top and bottom, as we had all this to deal with:
1) Cog execution addresses ($000..$1FF)
2) LUT execution addresses ($200..$3FF)
3) Address-sensitive addresses for 16 longs in hub ($FFF80..$FFFBF)
4) Debug interrupt vectors ($FFFC0..$FFFFF)
We put the first two at the bottom and the last two at the top of hub. Now, #3 is gone, so we just have those vectors at the top of memory. It was going to be a bear to explain how everything mixed together, if they were all in the same place. Now, we have the simplest story, I think.
Right. Bit 8 turns on PTRA/PTRB mode.
I lost almost a day over a crazy bug that was hard to find. USB wasn't working and I finally figured out that the optimization I had made to save one clock on the WRPIN/WXPIN/WYPIN instructions was disastrous when back-to-back WxPIN instructions were executing on different pins, as it would not have an extra cycle to finish the first 4-bit command stream with %0000, before switching to another pin. I put it back to the way it was, before. Now, I'm recompiling for the A9 and I should be able to get the update out tonight.
On the update, rather than have a stripped-down 16-cog Prop2 running on every FPGA board, the eggbeater will be optimized for each FPGA, so that the number of cogs and hub RAM slices are identical. We should be able to get back up to two cogs on the DE0-Nano, without CORDIC, of course.
I don't see why you can't still have the boot code load at $00000. You would never use the debug vectors then anyhow. And if you booted the chip with the debug bit set for cog 0, you could just execute the boot code in hub exec mode, then "exit" the debug mode to start the second-stage boot.
As for explanation, I think the people who actually care to write debuggers will have no problem understanding this stuff.
The good news is, execute from that space is already there, for boot.
Seems it just needs some co-operation between Boot and Debug, as you suggest ?
Good Point.
The debug vectors are ALWAYS used. They are a hook, to grab hold of a cog as it launches, to set up some kind of debugging, if desired. If not desired, they need to be an RETI0 instruction, in order for the cog to continue on its way.
That's not what I mean. The ROM boot code itself will never require the debug vectors.
I read over your posts, but I'm not getting it. It might be because I'm kind of tired and its 100 degrees in my office. Could you please explain it really simply one more time? Sorry to ask this. I'm just not understanding.
Chip,
Searith is talking about how the ROM boot code could get plastered right over the vector space. The power up reset won't be triggering any vectors itself so they could be suitably filled soon after by the boot code.
It probably also means a plain warm reset wipes out any programmed vectors.
The long immediate branching instructions would probably integrate with the register direct versions. The register direct versions would gain an immediate address mode and be prefix-able with AUGD to reach beyond #$1FF.
LOC stands out as an oddball that has only one form.
EDIT: Intriguing, I just noticed that the conditional branches, like TJZ, already have immediate addressing while the unconditionals, like CALL, don't. EDIT2: Doh, of course, that's because there is the dedicated long immediate variants of those instructions in separate instruction encodings.
Right now, I don't want to change this around. It's not a hard thing to do, but I'd need to think more about it to be confident that it wouldn't generate any subtle problems.
Thanks for thinking about this, in any case, Guys.
I just saw the prior comment. Does this mean you figured out what I was saying?
I believe so, but I could have it wrong, still.