Shop OBEX P1 Docs P2 Docs Learn Events
Hub Execution Model Thread (split from blog) - Page 16 — Parallax Forums

Hub Execution Model Thread (split from blog)

1131416181922

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 09:30
    Thanks Chip - great news. Sorry I am just now catching up with last nights postings.
    cgracey wrote: »
    I think it's coming together really well now. I've got all the PC's expanded to 16 bits and made the AUX-based CALL/RET instructions store a 16-bit PC value, along with Z and C, plus one bit that holds hub vs cog mode. You'll be able to call from hub to cog and vice-versa, since the caller's mode will be saved on CALL and restored on RET. I also used the BIG-constant idea to make an instruction which puts the PC value into the BIG buffer so that relative lookups can be done for branch tables, etc. There are some details I still need to work out before I get to implementing the actual cache lines, but I feel like it's a very natural extension of the architecture and not, at all, some last-minute kludge. Someone looking at it for the first time will think it was all designed from inception to work this way.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 09:33
    Thanks Eric, good link.

    I knew it was not exactly like that - it has been about a year since I last peeked at the gcc code generator, which is why I wrote "exact syntax does not matter for discussion"

    Weather gcc calls them patters, or templates, or pseudo instructions it boils down to some way of encoding common functionality (call, jump,add etc) to the native instruction set.
    ersmith wrote: »
    That's not how GCC's code generator works. The machine code generation is based on a pattern matching algorithm. There are a number of predefined patterns that GCC can understand: see http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names. One of these is "decrement_and_branch_until_zero". If an instruction pattern exists in the target with that name, GCC will use it for loops; otherwise it will attempt to synthesize it with other instructions (in this case sub and then conditional branch). Technically you could argue that DJNZ is actually easier for GCC to emit, but since the machine-independent code already knows how to synthesize it from other instructions it's a wash for us.

    Incidentally the list of standard names is a good starting point for figuring out what hardware features GCC can "easily" be made to support.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 09:38
    No worries, I am just a stickler for technically accurate descriptions :)

    Actually, the LMM kernel adds helper functions to do thinks like jumps and call and return, very much the same as system calls... which is why I called it a kernel :)

    On the P1, where if there was no WC, WR and NR was set, it was safe to execute hub addresses, and jumps/calls could have been done with in-line code - but that would have wasted precious hub memory.

    My wife complains all the time at how much of a stickler I am for correct technical definitions... and before her, my sisters. But it is a habit I can't seem to shake.
    jazzed wrote: »
    LOL.

    Call it what you will Bill, the LMM kernel has to interpret jumps, etc.... I'm not trying to offend you.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 09:39
    Chip: Can you clarify how you intend to handle calling functions or branching to hub addresses? Bill has suggested HCALL/HRET/HJMP that take only hub addresses but I'm wondering how that would work with mixing hub mode and COG mode. If I'm executing in hub mode and then decide to call a COG resident function, what instruction do I use and how are the return addresses handled? Is there where your hub vs. COG bit comes in? Can I do a normal CALL from hub mode and make the transition back to COG mode? If so, how does the return address get handled since the CALL instruction in hub mode will have a 16 bit address and the RET instruction can only store a 9 bit value? I tried to handle this by morphing Bill's HCALL instruction into LCALL that could address either COG or hub memory but I'm happy with any solution that solves the problem of mixed hub and COG code. What is your solution to this problem?

    Thanks,
    David
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 09:41
    Thanks David,

    It was fun, and needed for the P1.

    Fortunately we won't need it for P2 (and later)!!!
    David Betz wrote: »
    That's a very good way to put it. LMM is a hack in the very best sense of the word, a clever way to get around a difficult problem!
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 09:46
    I am looking forward to how Chip handles it as well.

    See my post #2... perhaps I should have explained in greater detail:

    From hubexec mode, if you use the cog "CALL D/#n9", you go back to cog mode, execute the cog code, and when you hit the "JMPRET" logic will detect that you are returning to hub mode, and take you back to the next hubexec instruction. Since the instruction cache is not mapped, Chip could use say $1FF as an address to indicate return to hubexec mode.

    From cog mode, if you use HCALL/HCALLA/HCALLB, it will put you into hubexec mode until you hit the corresponding HRET/HRETA/HRETB instruction, and return you to cog mode.

    This works, because the cog already has to have a 'mode' bit in order to know where to fetch the instructions from, and can switch mode as needed based on the "Hxxx" or native cog mode instruction.

    The most important part is to embed the hub address into HJMP/HCALL/HCALLA/HCALLB so that an extra long of memory is not wasted with every jump/call; which is also the reason that small/medium programs can use the UAX stack to avoid the extra memory needed to hold the instructions to push/pop the return address from a hub stack - and also avoid the extra cycles.

    A hub based stack is needed for large programs, also for large stack frames - I just wanted to ensure that smaller code (libraries, drivers etc) that does fit in a cog stack stays as small and as fast as possible (thus the embedded hub address, and need to avoid extra instructions wherever possible, hubexec using "wasted" hub cycles) - therefore stack based stack support is needed just as much as hub based support.

    I have fond visions of efficient, small TCP/IP and USB stacks running through my head...

    Sorry if my earlier postings were not clear.
    David Betz wrote: »
    Chip: Can you clarify how you intend to handle calling functions or branching to hub addresses? Bill has suggested HCALL/HRET/HJMP that take only hub addresses but I'm wondering how that would work with mixing hub mode and COG mode. If I'm executing in hub mode and then decide to call a COG resident function, what instruction do I use and how are the return addresses handled? Is there where your hub vs. COG bit comes in? Can I do a normal CALL from hub mode and make the transition back to COG mode? If so, how does the return address get handled since the CALL instruction in hub mode will have a 16 bit address and the RET instruction can only store a 9 bit value? I tried to handle this by morphing Bill's HCALL instruction into LCALL that could address either COG or hub memory but I'm happy with any solution that solves the problem of mixed hub and COG code. What is your solution to this problem?

    Thanks,
    David
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 09:52
    I am looking forward to how Chip handles it as well.

    See my post #2... perhaps I should have explained in greater detail:

    From hubexec mode, if you use the cog "CALL D/#n9", you go back to cog mode, execute the cog code, and when you hit the "JMPRET" logic will detect that you are returning to hub mode, and take you back to the next hubexec instruction.

    From cog mode, if you use HCALL/HCALLA/HCALLB, it will put you into hubexec mode until you hit the corresponding HRET/HRETA/HRETB instruction, and return you to cog mode.

    This works, because the cog already has to have a 'mode' bit in order to know where to fetch the instructions from, and can switch mode as needed based on the "Hxxx" or native cog mode instruction.

    Sorry if my earlier postings were not clear.
    What I don't understand is how a hub mode PC can be stored in the 9 bit field of a COG mode RET instruction. Can you explain how that works in your proposal? I didn't even try to handle that but instead said that COG mode code had to be called with a CALL instruction that pushes the full 16 bit address on a stack or places it in a register. In other words, CALL could not be used in hub mode. Some instruction that could store the full 16 bit address would have to be used. I proposed LCALL which was a renamed version of your HCALL instruction but that could also address COG memory as well as hub memory. As I said, I don't care if it's done my way but I'd like the transitions to work with no corner cases.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 09:56
    I've used that description as well :) Virtual machine fits quite nicely, as say virtual x86 execute small routines to replace supervisory instructions.
    potatohead wrote: »
    I think LMM is best described as a virtual machine. It's a simple one. Hybrid of native execute and interpreted, depending on what the LMM kernel actually does with program flow and or software defined instructions.

    Awesome hack too. I still remember that first post Bill. Read it, thought for a moment, then "oh yeah! sweet! Big programs, here we come!" :)
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:01
    Ok, thank you, now I understand what you don't see.

    It's very simple. The hub mode PC is never stored in the 9 bit field of a cog mode RET instruction.

    It can be handled several ways. The simplest probably is:

    - save the 16 bit return address on the aux stack
    - when you hit a cog return with the special '$1FF' flag instruction, pop the 16 bit return address from the return stack, and continue in hub mode

    Alternately

    - save the 16 bit return address in a hidden 'cog return link register' :)
    - return to it.

    The first option is better, as it would allow mixed nesting of cog/hubexec code... one could "automagically" hubexec->cog->hubexec->cog and climb back out of the nested call stack.

    The second option would allow cog->hubexec, and hubexec->cog, but have problems with multiple mixed nestings (short of helper code)

    David Betz wrote: »
    What I don't understand is how a hub mode PC can be stored in the 9 bit field of a COG mode RET instruction. Can you explain how that works in your proposal? I didn't even try to handle that but instead said that COG mode code had to be called with a CALL instruction that pushes the full 16 bit address on a stack or places it in a register. In other words, CALL could not be used in hub mode. Some instruction that could store the full 16 bit address would have to be used. I proposed LCALL which was a renamed version of your HCALL instruction but that could also address COG memory as well as hub memory. As I said, I don't care if it's done my way but I'd like the transitions to work with no corner cases.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 10:06
    Ok, thank you, now I understand what you don't see.

    It's very simple. The hub mode PC is never stored in the 9 bit field of a cog mode RET instruction.

    It can be handled several ways. The simplest probably is:

    - save the 16 bit return address on the aux stack
    - when you hit a cog return with the special '$1FF' flag instruction, pop the 16 bit return address from the return stack, and continue in hub mode

    Alternately

    - save the 16 bit return address in a hidden 'cog return link register' :)
    - return to it.

    The first option is better, as it would allow mixed nesting of cog/hubexec code... one could "automagically" hubexec->cog->hubexec->cog and climb back out of the nested call stack.

    The second option would allow cog->hubexec, and hubexec->cog, but have problems with multiple mixed nestings (short of helper code)
    This where I think it would be easier to require hub mode code to use stack or register based instructions to call COG mode code. That way there is nothing magic happening under the hood. I guess this could be done with the existing CALLA/CALLB instructions. Maybe that is what Chip is planning since he said he had modified the COG to store all 16 bits on AUX memory-based instructions. I think he also added a hub/COG bit so that means he doesn't have to use my hack of overlaying the first 512 longs of the 16 bit address space with COG addresses. The bit lets you distinguish which is a COG and which is a hub address on return from the function.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:10
    LOL... too bad you edited, I actually found "Single Line Overlay Loader" to be infinitely more accurate than "interpreter"! The difference there is that there is no explicit loading of overlays (normally variable sized). Mind you, FCACHE is essentially an overlay loader...

    Steve, I believe that you are not trying to offend me. I just have this bug about being (sometimes painfully) technically accurate.

    Having said that :)

    (IMHO) jumps, calls, mvi are also not interpreted, there is no dispatch table to interpret them... they are helper functions that could have been inlined at a great waste of precious hub memory - but they are closer to being interpreted than the fetch execute loop.

    I think the best, closest, term is virtual machine (like potatohead pointed out) - virtual machines such as VMWARE, VirtualBox etc natively execute the vast majority of native instructions, and replace the supervisor instructions with helper routines.

    Emulators decode and emulate instructions.

    Yep, there goes my technical accuracy bug again - even though it makes zero difference to 99% of the population :)
    jazzed wrote: »
    LOL.

    Call it what you will Bill, the LMM kernel has to interpret jumps, etc.... I'm not trying to offend you.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:15
    Note the example I gave above solves that... or do you mean hubexec code should always use

    CALLAR #cogaddress ' existing P2 stack based instruction

    to call assembly language?

    For non-leaf cog functions, that is I think what Chip intends, and a good solution, one I would certainly use

    For leaf cog functions, using the old style link-and-return saves a long of precious hub space.

    Even with 256KB of hub I like saving hub longs :)

    I fully predict as soon as we have 256KB we will start wishing for 512KB... 1MB... more...
    David Betz wrote: »
    This where I think it would be easier to require hub mode code to use stack or register based instructions to call COG mode code. That way there is nothing magic happening under the hood. I guess this could be done with the existing CALLA/CALLB instructions. Maybe that is what Chip is planning since he said he had modified the COG to store all 16 bits on AUX memory-based instructions. I think he also added a hub/COG bit so that means he doesn't have to use my hack of overlaying the first 512 longs of the 16 bit address space with COG addresses. The bit lets you distinguish which is a COG and which is a hub address on return from the function.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 10:20
    Note the example I gave above solves that...
    Yes but at the expense of having CALL behave differently when executed in hub mode.
    or do you mean hubexec code should always use

    CALLAR #cogaddress ' existing P2 stack based instruction

    to call assembly language?
    I think that is what Chip was proposing but he hasn't said what his instruction set changes would be. Actually, it wouldn't use an extra hub long because the return address is pushed on the AUX stack.
    For non-leaf cog functions, that is I think what chip intends, and a good solution, one I would certainly use

    For leaf cog functions, using the old style link-and-return saves a long of precious hub space.

    Even with 256KB of hub I like saving hub longs :)

    I fully predict as soon as we have 256KB we will start wishing for 512KB... 1MB... more...
    How does using CALLA or CALLAR use extra hub memory? Or are you talking about my link register suggestion? I don't see any evidence that Chip is actually considering that and even if he did the LR would always be set to the COG location currently used by PropGCC for the same purpose so no additional space would be consumed.
  • ctwardellctwardell Posts: 1,716
    edited 2013-12-09 10:22
    I fully predict as soon as we have 256KB we will start wishing for 512KB... 1MB... more...

    I think you would be safe writing that prediction in stone.

    C.W.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:26
    Yes, I was referring to non-leaf functions having to explicitly push the LR to the hub stack, and explicitly pop it.

    Even if there were separate "HCALLH" / "HRETH" (that got rid of LR, and pushed/popped the return address to/from a hub stack directly) it would still incur extra hub cycles (somewhat lessened if hubexec vacuums unused slots)

    I am guessing he has not said anything about the final instructions is that he may be trying out multiple variations, seeing what he likes best.
    David Betz wrote: »
    Yes but at the expense of having CALL behave differently when executed in hub mode.

    I think that is what Chip was proposing but he hasn't said what his instruction set changes would be. Actually, it wouldn't use an extra hub long because the return address is pushed on the AUX stack.

    How does using CALLA or CALLAR use extra hub memory? Or are you talking about my link register suggestion? I don't see any evidence that Chip is actually considering that and even if he did the LR would always be set to the COG location currently used by PropGCC for the same purpose so no additional space would be consumed.
  • jazzedjazzed Posts: 11,803
    edited 2013-12-09 10:27
    SLOL !!! :)

    Ya, I decided to drop it because further discussion of what SLOL is called does not really add any value since we won't have to use it on P2.

    On the other hand, I still believe that things should be covered ... I.E. leverage what we can from the model experience.

    Is it possible to use any cog assets for the registers for example? I mean, today PropellerGCC uses R1 to R15, etc... which live in COG in the LMM model. Would R1...R64 make any sense?

    I haven't paid that much attention to the P2 instruction set because I have other things going on. Is it possible to quickly access a COG memory location without using the P1's 3 step self-modifying code for example? Is it faster to use AUX for temporary register storage? At least registers could be guaranteed to fit there.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 10:29
    Yes, I was referring to non-leaf functions having to explicitly push the LR to the hub stack, and explicitly pop it.

    Even if there were separate "HCALLH" / "HRETH" (that got rid of LR, and pushed/popped the return address to/from a hub stack directly) it would still incur extra hub cycles (somewhat lessened if hubexec vacuums unused slots)
    Non-leaf functions are always going to push and pop LR. If all we have are AUX-stack-based call instructions then the return address will have to be popped from the AUX stack and pushed on the hub stack. That's why I asked for a call instruction that directly writes the return address to a COG register.

    I'd really rather that there not be separate hub mode call instructions and COG mode call instructions if possible but I guess that may be difficult to implement in hardware.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:34
    David Betz wrote: »
    Non-leaf functions are always going to push and pop LR. If all we have are AUX-stack-based call instructions then the return address will have to be popped from the AUX stack and pushed on the hub stack. That's why I asked for a call instruction that directly writes the return address to a COG register.

    Absolutely correct if LR is used, and the hub code's stack does not fit in AUX.

    For code whose stack will fit in the AUX stack, much faster to use AUX, and saves quite a bit of hub space.

    Therefore, having both makes the most sense.

    FYI, I do see where you are coming from - large programs such as GCC etc will never fit in AUX stack and I totally agree that for those cases, we need a hub based stack, but many drivers and programs with more modest stack usage will fit in an AUX stack, and be smaller & faster as a result.

    The difference in our points of view is that I need to consider the second case, and I believe you only see a need to address the first case.

    Given that the propeller is a microcontroller, the second case is just as important as the first case.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 10:37
    Absolutely correct if LR is used, and the hub code's stack does not fit in AUX.

    For code whose stack will fit in the AUX stack, much faster to use AUX, and saves quite a bit of hub space.
    Unless we create a compiler for a subset of C, I don't think that GCC will ever use the AUX stack as the C stack no matter how little stack space the user's program requires. The AUX memory access mechanism just doesn't match the C memory model well enough to be useful.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 10:39
    FYI, I do see where you are coming from - large programs such as GCC etc will never fit in AUX stack and I totally agree that for those cases, we need a hub based stack, but many drivers and programs with more modest stack usage will fit in an AUX stack, and be smaller & faster as a result.

    The difference in our points of view is that I need to consider the second case, and I believe you only see a need to address the first case.

    Given that the propeller is a microcontroller, the second case is just as important as the first case.
    I agree that the second case is important for drivers and maybe for a stripped down C that we might create at some point in the future. However, the model that C needs should not be compromised by being forced to use the AUX stack instructions if at all possible.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:41
    With hubexec as I've described it in post#1/#2 (subject to changes Chip may make)

    Theoretically, GCC could use R0...R1F1 - and GCC is good with using many registers (as long as it does not keep trying to push/pop all of them LOL)

    Having said that, a good compromise would be to use say $1D0-$1EF as 32 registers for GCC, and leave $000-$1CF for FCACHE/FLIB style usage - but that is to be explored. The wonderful thing is no need for helper routines or LMM loop!

    GCC could use any cog location directly with the appropriate instructions, but ideally the code generator would have to embed the register addresses directly, without leaving it to the linker.

    AUX registers could also be used like register variables due to the new instructions for fetching/storing them

    RDAUX r12,#123

    WRAUX r12,#234

    Both of the above would take a single cycle.

    There are many opportunities for optimizing GCC to P2 with hubexec and tons of registers, once a basic port is done, and the time/budget is available for optimizations.
    jazzed wrote: »
    SLOL !!! :)

    Ya, I decided to drop it because further discussion of what SLOL is called does not really add any value since we won't have to use it on P2.

    On the other hand, I still believe that things should be covered ... I.E. leverage what we can from the model experience.

    Is it possible to use any cog assets for the registers for example? I mean, today PropellerGCC uses R1 to R15, etc... which live in COG in the LMM model. Would R1...R64 make any sense?

    I haven't paid that much attention to the P2 instruction set because I have other things going on. Is it possible to quickly access a COG memory location without using the P1's 3 step self-modifying code for example? Is it faster to use AUX for temporary register storage? At least registers could be guaranteed to fit there.
  • Dave HeinDave Hein Posts: 6,347
    edited 2013-12-09 10:45
    How big is the AUX stack? As long as you don't put arrays on the stack, and just use it for local variables it should allow for a few levels of calling depth. The threaded chess program requires about 236 bytes per call, but that's because I include a 160-byte struct on the stack. Without that it would only need about 76 bytes per call. A 1024-byte stack would support 13 levels of calling. There may be some library routines that use a lot of stack. Those would need to be avoided.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:46
    Even the original 3 instruction version that only used the AUX stack would not have compromised GCC.

    POPA LR
    WRLONG LR, --SP

    is functionally exactly equivalent to

    WRLONG LR, --SP ... and usually the POPA would be hidden in waiting for the hub cycle

    Leaving out AUX stack support for hubexec would severally compromise drivers built in assembly and hubexec code from other compilers

    From the beginning I knew (even before my first post) that a hub stack would be required for large gcc programs.

    Perhaps without meaning to, your posts constantly came across as wanting to get rid of the AUX stack model for hubexec, which would be an extremely costly mistake.
    David Betz wrote: »
    I agree that the second case is important for drivers and maybe for a stripped down C that we might create at some point in the future. However, the model that C needs should not be compromised by being forced to use the AUX stack instructions if at all possible.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:48
    Exactly.

    Currently the AUX stack is 256 longs deep, and perhaps it will be increased (if there is room)

    In my experience, a LOT of C code, at some depth, can find that useful.

    Like you say, a bit of care, not using the stack to hold arrays, perhaps a sprinkling of 'static' locals (locals can easily be static in non-recursive functions), and not having 11 arguments to a function... and all of a sudden a lot of very useful C code fits in a small stack.
    Dave Hein wrote: »
    How big is the AUX stack? As long as you don't put arrays on the stack, and just use it for local variables it should allow for a few levels of calling depth. The threaded chess program requires about 236 bytes per call, but that's because I include a 160-byte struct on the stack. Without that it would only need about 76 bytes per call. A 1024-byte stack would support 13 levels of calling. There may be some library routines that use a lot of stack. Those would need to be avoided.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 10:52
    Even the original 3 instruction version that only used the AUX stack would not have compromised GCC.

    POPA LR
    WRLONG LR, --SP

    is functionally exactly equivalent to

    WRLONG LR, --SP ... and usually the POPA would be hidden in waiting for the hub cycle
    You're right. the LR instructions aren't absolutely necessary but they match the GCC model better.
    Leaving out AUX stack support for hubexec would severally compromise drivers built in assembly and hubexec code from other compilers

    From the beginning I knew (even before my first post) that a hub stack would be required for large gcc programs.

    Perhaps without meaning to, your posts constantly came across as wanting to get rid of the AUX stack model for hubexec, which would be an extremely costly mistake.
    I don't think I ever suggested leaving the AUX instructions out. I always included them as well as my LR scheme which, as you point out, is not absolutely necessary. However, I think it is desirable if it is possible to implement it easily.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 10:53
    Exactly.

    Currently the AUX stack is 256 longs deep, and perhaps it will be increased (if there is room)

    In my experience, a LOT of C code, at some depth, can find that useful.

    Like you say, a bit of care, not using the stack to hold arrays, perhaps a sprinkling of 'static' locals (locals can easily be static in non-recursive functions), and not having 11 arguments to a function... and all of a sudden a lot of very useful C code fits in a small stack.
    You continually ignore the problem that C allows you to take the address of local variables and that won't be easy to support for variables that are on the AUX stack.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 10:56
    We have a Zen moment of agreement.

    The best possible world is having both sets, for the two separate important usage case.

    In the extremely unlikely event that both don't fit, the AUX stack version will still work fine with GCC (albeit look different than the P1) version, where as an LR-only version would cripple the non-gcc small stack usage case.

    I strongly suspect we can have our cake and eat it too :)
    David Betz wrote: »
    You're right. the LR instructions aren't absolutely necessary but they match the GCC model better.

    I don't think I ever suggested leaving the AUX instructions out. I always included them as well as my LR scheme which, as you point out, is not absolutely necessary. However, I think it is desirable if it is possible to implement it easily.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 11:00
    Sigh. There goes the Zen moment.

    No, I don't ignore it. (I've previously stated that is not needed for a lot of usage cases somewhere, but I don't have time to re-read huge threads to find it)

    First of all, I already agree that a hub stack is needed for large programs, where taking variable addresses are relatively easy, already implemented on propgcc.

    Second, I do not find it relevant for the fast driver / medium size code case as

    - If gcc does not support the "small stack" mode, code that needs to be smaller and faster can be written in assembly, or other languages that will support it.

    - If gcc supports it "small stack", it is documented that in "small stack" mode it is illegal to take the address of non-static local variables; you could still take the address of static ones.
    David Betz wrote: »
    You continually ignore the problem that C allows you to take the address of local variables and that won't be easy to support for variables that are on the AUX stack.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-09 11:06
    Sigh. There goes the Zen moment.

    No, I don't ignore it.

    First of all, I already agree that a hub stack is needed for large programs, where taking variable addresses are relatively easy, already implemented on propgcc.
    True.
    Second, I do not find it relevant for the fast driver / medium size code case as

    - If gcc does not support the "small stack" mode, code that needs to be smaller and faster can be written in assembly, or other languages that will support it.

    - If gcc supports it "small stack", it is documented that in "small stack" mode it is illegal to take the address of non-static local variables; you could still take the address of static ones.
    It isn't really an issue of small vs. large stack. It's an issue with the stack addresses being completely disjoint from the hub addresses making it difficult to have a pointer that could point to either type of memory. This could be done by encoding address space bits into the pointer but then the pointers become very cumbersome to construct and use. Also, as I think Eric point out as well, GCC isn't really designed to support disjoint stack and main memory addresses so you'd pretty much have to construct a new compiler for a subset of C that would use the AUX stack. Actually, maybe you could work on this? I'm serious. You've got lots of ideas about how the AUX stack could be used in compiled code and, since it doesn't fit the GCC model very well, maybe you could help out buy creating a C subset compiler that will take advantage of the AUX hardware stack.
  • jazzedjazzed Posts: 11,803
    edited 2013-12-09 11:15
    I'm pretty sure Chip is already using AUX as a stack for SPIN. It will definitely benefit languages that can handle it.

    BTW, has anyone attempted to write up a HUB-EXEC prototype with the current instruction proposals? That would pay more dividends than arguing over AUX. Just call it a thought experiment Albert.
Sign In or Register to comment.