Shop OBEX P1 Docs P2 Docs Learn Events
Hub Execution Model Thread (split from blog) - Page 17 — Parallax Forums

Hub Execution Model Thread (split from blog)

1141517192022

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 11:18
    And thus, we come back to you looking like you don't want the small stack hubexec mode implemented.

    If it is not easy for gcc to support, and I admit it is non-trivial for gcc to support it, simply don't support it. It's that easy. I believe I mentioned this before - several times.

    You said you are not trying to fight against inclusion of this mode, yet your messages always come across like "gcc does not need it, so don't put it in", so to other readers, it looks like you are fighting it tooth and nail.

    But don't try to fight against it, to prevent those of us who can make use of it, from writing better/smaller/faster code (without using gcc) than GCC will be able to make (at least at first).

    I know you are serious, but I already got a lecture from wifey yesterday about how much time I recently spent on the forum instead of my other projects. Don't know how she found out, unless she lurnks here with her iPad.

    FACTS:

    - small stack mode will save hub memory for programs that can use small stack mode
    - small stack mode is faster than hub stack mode
    - many useful programs and drivers do not need deep/large stacks and can use the speed and memory savings
    - many programs can live without taking the address of local (stack) variables

    I guarantee that I will use the AUX stack mode, regardless of gcc supporting it or not.
    David Betz wrote: »
    It isn't really an issue of small vs. large stack. It's an issue with the stack addresses being completely disjoint from the hub addresses making it difficult to have a pointer that could point to either type of memory. This could be done by encoding address space bits into the pointer but then the pointers become very cumbersome to construct and use. Also, as I think Eric point out as well, GCC isn't really designed to support disjoint stack and main memory addresses so you'd pretty much have to construct a new compiler for a subset of C that would use the AUX stack. Actually, maybe you could work on this? I'm serious. You've got lots of ideas about how the AUX stack could be used in compiled code and, since it doesn't fit the GCC model very well, maybe you could help out buy creating a C subset compiler that will take advantage of the AUX hardware stack.
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 11:20
    jazzed wrote: »
    I'm pretty sure Chip is already using AUX as a stack for SPIN. It will definitely benefit languages that can handle it.

    BTW, has anyone attempted to write up a HUB-EXEC prototype with the current instruction proposals? That would pay more dividends than arguing over AUX. Just call it a thought experiment Albert.
    I'm still not convinced that even what Chip did in Spin will be as easy to use as we'd like. He ended up with local variable pointers that can not be passed to another COG. This breaks some of the techniques people have used in P1 Spin code where the initialization parameters for a COG are built up in a set of local variables and then the address of the first variable is passed as PAR to the new COG. This, of course, won't work in Spin2. The parameters will have to be passed in global variables.
  • Dave HeinDave Hein Posts: 6,347
    edited 2013-12-09 11:27
    Currently the AUX stack is 256 longs deep, and perhaps it will be increased (if there is room)
    Oh boy, I can't wait to run my threaded chess program on it. I'll be able to search 7 levels deep on all 8 cogs at the same time.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 11:30
    I eagerly await that version!

    I am only a casual chess player... but my father was almost a grand master (he would have been, but my mom did not want him to travel, so he turned it down).
    Dave Hein wrote: »
    Oh boy, I can't wait to run my threaded chess program on it. I'll be able to search 7 levels deep on all 8 cogs at the same time.
  • cgraceycgracey Posts: 14,133
    edited 2013-12-09 11:34
    I eagerly await that version!

    I am only a casual chess player... but my father was almost a grand master (he would have been, but my mom did not want him to travel, so he turned it down).

    I can't play chess because it feels like a dead short to my brain.

    I see that in addition to AUX stacks for hub/cog calling, we are going to need a hub stack mechanism, too. Maybe it can be made for both hub (obviously) and cog code, as well.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 11:38
    That would be the best of all possible worlds!

    Simplest way I can think of it is:

    (for discussion purposes: HCALLH <-- first H signifies hubexec instruction, last H signifies uses hub stack)

    HCALLH #hubaddr

    - WRWORD (internal pc), --SP

    HRETH

    - RDWORD (internal pc), SP++


    For local variables within the scaled index range, they can be addressed with the indexed addressing mode.

    I did not ask for this at first to limit the amount of changes (thus the 3 original hubexec mode instructions i suggested)
    cgracey wrote: »
    I can't play chess because it feels like a dead short to my brain.

    I see that in addition to AUX stacks for hub/cog calling, we are going to need a hub stack mechanism, too. Maybe it can be made for both hub (obviously) and cog code, as well.
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 11:39
    And thus, we come back to you looking like you don't want the small stack hubexec mode implemented.
    Where did I say that AUX stack mode should not be implemented? All I've ever said is that it can't be easily used by PropGCC. Please find a single place where I've ever said that it shouldn't be implemented. I have asked for the LR version of CALL but I've never said I want that instead of the AUX stack instructions. In fact, I suspect I will not get the LR instructions and I admit that that will not prevent PropGCC from working on the P2. The LR instructions are a convenience and a slight optimization of both space and speed but not an absolute requirement.

    Also, I welcome your help in making a limited version of C that can use the AUX stack. I think that would be a great option. I just don't think it will be easily built using GCC.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 11:40
    Chip,

    how about exposing the cog program counter at $1F0 ? (or other address you pick)

    All of a sudden:

    ADD PC,#offset - relative forward short jump, up to 511*4 bytes away

    SUB PC,#offset - reverse short jump, up to 511*4 away

    BIG #upperoffset - far relative jump
    ADD PC,#lowoffset

    etc etc etc?
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 11:42
    cgracey wrote: »
    I can't play chess because it feels like a dead short to my brain.

    I see that in addition to AUX stacks for hub/cog calling, we are going to need a hub stack mechanism, too. Maybe it can be made for both hub (obviously) and cog code, as well.
    I think we can already use PTRA and PTRB to implement a hub stack can't we? I think that the CALL instruction that stores its return address in a COG register is all that we need to support a hub stack based calling convention for C (the convention we already support for P1). As Bill has pointed out correct, even the LR version of CALL is not absolutely necessary. It can be implemented by starting each C function with a pop of the return address off the AUX stack into an LR register. It would just be a little more space and time efficient to have the CALL_LR instruction or whatever it would be called.
  • jazzedjazzed Posts: 11,803
    edited 2013-12-09 11:42
    That would be the best of all possible worlds!

    Simplest way I can think of it is:

    (for discussion purposes: HCALLH <-- first H signifies hubexec instruction, last H signifies uses hub stack)

    That's silly. HCALL defaults to HUB stack.

    HCALL (HUB call uses HUB stack)
    HCALLA (HUB call uses AUXA stack)
    HCALLB (HUB call uses AUXB stack)
  • Heater.Heater. Posts: 21,230
    edited 2013-12-09 11:42
    Chip,
    I can't play chess because it feels like a dead short to my brain.
    I suspect you think about it to hard. I bet you instinctively start to try and visualize every possible move and counter move to arrive at the correct answer and the old stack immediately overflows. Anyway, I'm glad it's not just me that gets that "dead short" feeling with chess.

    What were we talking about, oh yeah, stacks...
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 11:42
    I did not say you said it - I said you came across as implying it due to your dedication for getting the LR mode (that I did not oppose) as it is definitely easier for GCC than a stack mode.

    I am already working on a compiler that will make use of the AUX stack mode. It is not gcc, but that is all I will say bout my compiler until it is ready, and debugged.
    David Betz wrote: »
    Where did I say that AUX stack mode should not be implemented? All I've ever said is that it can't be easily used by PropGCC. Please find a single place where I've ever said that it shouldn't be implemented. I have asked for the LR version of CALL but I've never said I want that instead of the AUX stack instructions. In fact, I suspect I will not get the LR instructions and I admit that that will not prevent PropGCC from working on the P2. The LR instructions are a convenience and a slight optimization of both space and speed but not an absolute requirement.

    Also, I welcome your help in making a limited version of C that can use the AUX stack. I think that would be a great option. I just don't think it will be easily built using GCC.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 11:44
    And I agreed a long time ago, that a CALL variant that stored into an LR would make life easier for the current hub stack implementation, and I definitely support it (see post #2)
    David Betz wrote: »
    I think we can already use PTRA and PTRB to implement a hub stack can't we? I think that the CALL instruction that stores its return address in a COG register is all that we need to support a hub stack based calling convention for C (the convention we already support for P1). As Bill has pointed out correct, even the LR version of CALL is not absolutely necessary. It can be implemented by starting each C function with a pop of the return address off the AUX stack into an LR register. It would just be a little more space and time efficient to have the CALL_LR instruction or whatever it would be called.
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 11:45
    I did not say you said it - I said you came across as implying it due to your dedication for getting the LR mode (that I did not oppose) as it is definitely easier for GCC than a stack mode.
    Okay, please don't read anything into my statements. What I said is all I meant to say. I didn't mean to imply the removal of any features.
    I am already working on a compiler that will make use of the AUX stack mode. It is not gcc, but that is all I will say bout my compiler until it is ready, and debugged.
    This is good news! Can you say if it is a C-like language or subset of C?
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 11:52
    David Betz wrote: »
    Okay, please don't read anything into my statements. What I said is all I meant to say. I didn't mean to imply the removal of any features.

    Unfortunately readers would take it that way, which is why I've been trying to get the clear statement from you clarifying that you not for removing the AUX stack capability. That's all.
    David Betz wrote: »
    This is good news! Can you say if it is a C-like language or subset of C?

    Sorry, I've rather painfully learned not to talk about unannounced products, but I can let you know that it is not C, but it is a well-structured modern language, one I have been working on-and-off on for about 15 years (for other processors); for the P2 it just needs an optimizing back end (which is decidedly non-trivial) ... I had one running at decent optimization, but the FPGA changes rendered it moot. I do believe the P2 improvements will make it well worthwhile for me to adapt it though :)

    It has not been released due to trying to find a way of making money from it, it incorporates too many of my "tricks" to just open source it (and I am not open to debate the merits of open sourcing it).
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 11:55
    Unfortunately readers would take it that way, which is why I've been trying to get the clear statement from you clarifying that you not for removing the AUX stack capability. That's all.
    I've said that about 10 times already and I haven't noticed anyone else being confused about what I'm saying.
  • jazzedjazzed Posts: 11,803
    edited 2013-12-09 11:56
    Unfortunately readers would take it that way, which is why I've been trying to get the clear statement from you clarifying that you not for removing the AUX stack capability. That's all. ...
    Take a break Bill :)
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 11:56
    Sorry, I've rather painfully learned not to talk about unannounced products, but I can let you know that it is not C, but it is a well-structured modern language, one I have been working on-and-off on for about 15 years (for other processors); for the P2 it just needs an optimizing back end (which is decidedly non-trivial) ... I had one running at decent optimization, but the FPGA changes rendered it moot. I do believe the P2 improvements will make it well worthwhile for me to adapt it though :)

    It has not been released due to trying to find a way of making money from it, it incorporates too many of my "tricks" to just open source it (and I am not open to debate the merits of open sourcing it).
    Sounds interesting. I look forward to hearing more about it when you're ready.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 12:03
    David,

    If you do not make further posts about how the AUX stack is difficult to implement in GCC (you have convinced me and everyone), and how getting GCC running is helped by an LR register (you have convinced everyone including me) than there will be no need for me to make the counterpoint that assembly language and other languages can still work well with an AUX stack.

    My fear is that people coming to the thread, and reading such a post last, will also think the AUX stack usage is not needed, and pressure for its removal - thus my posts in support for the AUX stack for hubexec.

    You have my support for a hub based stack for gcc, and I am counting on your support for the AUX stack for assembly / other languages.

    And thus peace breaks out.
    David Betz wrote: »
    I've said that about 10 times already and I haven't noticed anyone else being confused about what I'm saying.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 12:04
    I will :)

    I've been sneaking peeks at this thread between working on testing four new designs for which I just received the PCB's.

    One has a P1 on it :):):)

    I still love the P1, it is a fantastic chip by Chip. That's why I so look forward to the P2!
    jazzed wrote: »
    Take a break Bill :)
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-09 12:07
    Thank you. You will be among the first to know, as I do value your opinion.

    We just disagree sometimes, usually due to a lack of sufficient clarity on one (or both) of our parts causing a misunderstanding.
    David Betz wrote: »
    Sounds interesting. I look forward to hearing more about it when you're ready.
  • Heater.Heater. Posts: 21,230
    edited 2013-12-09 12:10
    Bill,

    That's very interesting. I don't want to try and convince you to opensource anything but my gut tells me that any new language for any thing would have to contain some really compelling features to get people to pay anything for it.

    There is one open source pitch though:

    Gordon Williams created a tiny JavaScript interpreter for micro-controllers and a simple little board to run it on, the Esruino. His interpreter was closed source. He put his board up kickstarter along with the promise to opensource the interpreter if he reached a goal 20,000 pounds. He ended up with 100,000 pounds worth of pledges!

    Damien George created Micro Python for micro controllers and a board to run it. He also put his board up on kickstarter again with the promise to opensource it, His goal is 15,000 pounds and he is already up to nearly 70,000 pounds of pledges !!

    (These similar projects were a coincidence, they were not aware of each other when they started).

    Anyway, the point is that if you can get some fame and traction for your language perhaps you can hold it to ransom in some similar way.
  • jazzedjazzed Posts: 11,803
    edited 2013-12-09 12:11
    I have a new PCB with a P1 on it too :)

    Getting ready to make 100 pieces now.

    I will :)

    I've been sneaking peeks at this thread between working on testing four new designs for which I just received the PCB's.

    One has a P1 on it :):):)

    I still love the P1, it is a fantastic chip by Chip. That's why I so look forward to the P2!
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 12:17
    We just disagree sometimes, usually due to a lack of sufficient clarity on one (or both) of our parts causing a misunderstanding.
    Sounds like that may not actually be true. If you accept that I'm not suggesting removing any of the features that you think are important, you've already stated that you agree with the things I've suggested adding. If we can just get both sets of features we'll both be happy! :-)

    Of course, reality may get in the way of that but only Chip can say what we will end up with in the end.
  • potatoheadpotatohead Posts: 10,254
    edited 2013-12-09 12:38
    BTW, has anyone attempted to write up a HUB-EXEC prototype with the current instruction proposals? That would pay more dividends than arguing over AUX. Just call it a thought experiment Albert.

    Yeah, I know Bill has put fragments out there, and David did a great first pass a while back. To be honest, I would enjoy seeing this too. The current dialog is confusing... Would be nice to think on it.
  • cgraceycgracey Posts: 14,133
    edited 2013-12-09 12:40
    Chip,

    how about exposing the cog program counter at $1F0 ? (or other address you pick)

    All of a sudden:

    ADD PC,#offset - relative forward short jump, up to 511*4 bytes away

    SUB PC,#offset - reverse short jump, up to 511*4 away

    BIG #upperoffset - far relative jump
    ADD PC,#lowoffset

    etc etc etc?

    That would be optimal, as it's the simplest. I'll try to make it work like that.
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 12:45
    cgracey wrote: »
    That would be optimal, as it's the simplest. I'll try to make it work like that.
    How would that work with the instruction pipeline? Won't you have to kill instructions in the pipeline if someone modifies the PC directly? I guess it gets treated like a JMP.
  • cgraceycgracey Posts: 14,133
    edited 2013-12-09 12:51
    David Betz wrote: »
    How would that work with the instruction pipeline? Won't you have to kill instructions in the pipeline if someone modifies the PC directly? I guess it gets treated like a JMP.

    You're right. That didn't occur to me. We would have to decide to cancel same-task instructions in the pipeline, or not. I think cancelling would be the safe bet, as some other cases would be indeterminate with multitasking enabled. So, a write to PC at $1F0 would cancel same-task instructions in the pipeline. Very cool!!! This would reduce table branches to 1 instruction! Works with hub execution, too. Awesome! I think I'll implement this right now, before I go any further. Now we've got one orphaned register in the last 16. Any quick idea for what we could use it for?
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 12:56
    cgracey wrote: »
    You're right. That didn't occur to me. We would have to decide to cancel same-task instructions in the pipeline, or not. I think cancelling would be the safe bet, as some other cases would be indeterminate with multitasking enabled. So, a write to PC at $1F0 would cancel same-task instructions in the pipeline. Very cool!!! This would reduce table branches to 1 instruction! Works with hub execution, too. Awesome! I think I'll implement this right now, before I go any further. Now we've got one orphaned register in the last 16. Any quick idea for what we could use it for?
    Great! That sounds like a very useful feature!
  • David BetzDavid Betz Posts: 14,511
    edited 2013-12-09 12:57
    cgracey wrote: »
    Now we've got one orphaned register in the last 16. Any quick idea for what we could use it for?
    How about making that the LR register and adding a CALL instruction that stores its return address in $1f1? This would be an instruction like Bill's HCALL with a 16 bit immediate target address.
Sign In or Register to comment.