Propeller II update - BLOG

potatohead · 2013-12-06 09:58

Why, if the alternative is not having HUBEXEC at all, which seems to be the opinions of a few here.
If we can get HUBEXEC operating at max efficiency (hub loading in the background, giving full speed to the hub execution mode) then why is switching to cog mode necessary at all?

Do you mean a switch to HUBEXEC mode is a one time thing, until the COG is restarted by another COG? If so, then Saphia already mentioned doing that. HUBEXEC is a one way trip for the COG, until it's restarted kind of thing.

Sure, if that is what is required, but I don't think it is.

BTW: I did express the opinion that it shouldn't be done, if it's a mess.

However, I don't think it has to be a mess. What I do think is the discussion about how to do it simply needs to continue right now. I really want HUBEXEC too. Just don't want a mess.

The other statement was simply a priority preference, aimed at Chip really.

ctwardell · 2013-12-06 10:02

I think the HUBEXEC changes are too much at this point and should be put off until the P3 unless Parallax wants to spend the time and risk of further delays to do it right.

By 'do it right' I mean get it worked out, in a non-kludgy way, and get GCC updated to work with it and have at least a couple months of testing on the FPGAs along with GCC.

To do something the 'looks like it will work' and committing to silicon before verifying it covers the needed cases with GCC would be begging for disaster.

C.W.

Bill Henning · 2013-12-06 10:02

David,

That is not what you originally proposed in

http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1225215&viewfull=1#post1225215

""I was thinking that if we have the BIG instruction we could use that to extend the range of the existing CALLA/CALLB destination. Any destination with bits set in 31:9 could be considered a hub address and cause an automatic switch to hub execute mode using a bit-extended version of the PC register as the program counter rather than PTRA. A CALLA/CALLB instruction that didn't use BIG or that had zeros in bits 31:9 would be considered a COG location."

That is what made me get upset, and I did not see where you revised it to say let's just use BIG in front of the large embedded address in the combined HJMP/HCALL/HCALLA/HCALLB (which already uses only a single instruction slot). I must have missed it from seeing red

Had I seen it, you would have gotten the following response (instead of my yelling) - so I will give you that response now, even without having seen the revised proposal.

EXCELLENT idea David!

Using BIG to extend the 18 bit address from HJMP/HCALL/HCALLA/HCALLB is another extremely good idea from you!

Glad that we cleared that up. And I do mean it, it is an excellent idea. Sorry I did not see your revision, it is not like me to blow up.

David Betz wrote: »

You don't seem to be listening to what I'm saying at all. Because of the BIG instruction, a 64 bit CALLA/CALLB will be available without any extra work. It can be used in a future chip if we get more hub memory than can be addressed by LCALL but until then there is no need to use it. I only pointed it out because it falls out of the existence of BIG and if there was no space for the LCALL instruction then it would work. For P2 with 256k of hub memory only LCALLA/LCALLB/LCALLREG would be used. So in that sense it isn't any different from your proposal. The place where it differs is that the same call instructions can be used to call hub mode code or COG mode code and the determination is made based on the upper bits of the destination address. This allows the same RETA/RETB/REGREG/etc instructions to be used so that special HRETx instructions are not needed. It also regularizes the addressing so there is no conflict between hub addresses and COG addresses. They are all in the same address space. This seems like a big win to me and solves the problem of calling COG code from hub code without having to resort to a visible window into hub memory wasting space in the COG register memory map.

Cluso99 · 2013-12-06 10:04

potatohead wrote: »

If I'm not mistaken, Cluso wrote about "always execute from HUB"

I took that to mean *always* as in we simply don't fetch the data for a COG to run with like we do now, which seriously changes things.

Surely, you don't think I really meant for every cog???

Just in case, no, I mean for the cog(s) that want to only run in HUB mode. And if this new mode means there is no hw tasking, I am fine with that. If it impacts large memory transfers, I am fine with that too. It can be a restriction for using this mode. The benefits of having this mode (as an option) mean almost 100% throughput compared to 25% or less for LMM.

BTW My LMM model seems unlikely to achieve 25% because it looks like it will miss the hub window due to RD delays and jmp delays in the loop. Maybe time will flesh this out, or Bill has solved it.

David Betz · 2013-12-06 10:04

Bill Henning wrote: »

David,

That is not what you originally proposed in

http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1225215&viewfull=1#post1225215

""I was thinking that if we have the BIG instruction we could use that to extend the range of the existing CALLA/CALLB destination. Any destination with bits set in 31:9 could be considered a hub address and cause an automatic switch to hub execute mode using a bit-extended version of the PC register as the program counter rather than PTRA. A CALLA/CALLB instruction that didn't use BIG or that had zeros in bits 31:9 would be considered a COG location."

That is what made me get upset, and I did not see where you revised it to say let's just use BIG in front of the large embedded address in the combined HJMP/HCALL/HCALLA/HCALLB (which already uses only a single instruction slot). I must have missed it from seeing red

Had I seen it, you would have gotten the following response (instead of my yelling) - so I will give you that response now, even without having seen the revised proposal.

EXCELLENT idea David!

Using BIG to extend the 18 bit address from HJMP/HCALL/HCALLA/HCALLB is another extremely good idea from you!

Glad that we cleared that up. And I do mean it, it is an excellent idea. Sorry I did not see your revision, it is not like me to blow up.

Sorry I didn't mention LCALL in my original post. I was trying to propose an absolutely minimal change that would allow hub execution and LCALL isn't absolutely necessary although it is highly desireable as you point out.

potatohead · 2013-12-06 10:05

Smile, you two sorted some of it out. Would still love to see a sample.

Would you David and Bill, take a moment and flesh out some sample code? I know this is how Chip generally works.

Let's see the two in more friendly terms, if that makes sense. Maybe show PASM to enter HUBEXEC, a loop bigger than the WIDE doing a memory move with an index pointer, and a call to a HUB subroutine, call to COG routine (if valid in your scheme), finally a return to COG mode, (if valid in your scheme)

I sure would like to see these myself to think on them better. Maybe doing this would help the discussion clarify some. Maybe I'm not the only one unable to see it in concrete enough terms.

Use NOP's for most of it, and just show how it would appear in Pnut.exe today?

potatohead · 2013-12-06 10:08

@Cluso.

Sure. One way trip then. If that's necessary, then it's necessary. Or specify it in the COGNEW? Yeah, if that's necessary, then it's necessary.

I'm not an advocate of either of those, but I'm not opposed. And I'm in agreement about the benefit. We really need this feature, which is precisely why I put my priority preferences here.

Refining it with robust discussion is the best possible thing right now. Maybe that helps? I'm not a naysayer on it at all.

Sorry for the confusion.

David Betz · 2013-12-06 10:08

Bill Henning wrote: »

That is what made me get upset, and I did not see where you revised it to say let's just use BIG in front of the large embedded address in the combined HJMP/HCALL/HCALLA/HCALLB (which already uses only a single instruction slot). I must have missed it from seeing red

Actually, I was suggesting BIG to extend CALLx not HCALLx. I was trying to propose a solution that required a minimum of changes. After you complained about the 64 bit CALL instructions I mentioned in another message that if it were possible to add new instructions with larger immediate fields then LCALL would solve your "code bloat" complaint. I guess BIG could be used to extend LCALL as well but it seems like it would fit into the architecture better to have it always apply to instructions with 9 bit S fields. In any case, there is no difference in code size if BIG is applied to CALL instead of LCALL. Both would result in 64 bit instructions.

Also, my proposal replaces HCALLx with LCALLx and doesn't require HRETx at all.

Bill Henning · 2013-12-06 10:10

ct,

I have been deliberately not responding to your negative posts, but now I will respond:

- next earliest shuttle run is in April, as per Chip's previous postings
- the new process would not fit P2, so video dac bus had to go
- Chip decided to use that space for 256KB hub
- Chip went from QUAD hub bus to OCTAL, and started musing about hub execution
- I played with the idea, and reduced it to just three simple instructions
- David and Eric need a link register to make the GCC changes easier, so I added LR variants to my suggestion
- Beau has much hand-relaying work to do until then, leaving Chip time to play with the synthesized logic
- Chip is VERY fast with Verilog and had the 256KB change implemented in the FPGA one day after deciding to double the hub
- the simple proposed instructions, and Chip's suggested auto reload take little Verilog and are low risk
- an FPGA configuration file would test these instructions to the same level as the far more complex CORDIC engine etc
- if the FPGA implementation of hubexec does not work before it is time to synthesize for the shuttle run, the new instructions can simply be not used if they don't work
- therefore the risk is minimal to non-existent
- having 160-200MIPS execution from hub all of a sudden makes P2 stand above Cortex M0 and M3 performance wise
- it would result in many more design wins
- I would not be surprised if Chip already has hubexec working

It is only Dec.6, even if synthesis has to start March 1, Chip has three months to play with the FPGA implementation to decide what goes to synthesis

Therefore, there is no need to be negative. Worst case, the new hubexec mode does not make this shuttle. Best case is that it works in FPGA and makes the shuttle.

No difference to the shuttle time line, essentially no risk.

ctwardell wrote: »

I think the HUBEXEC changes are too much at this point and should be put off until the P3 unless Parallax wants to spend the time and risk of further delays to do it right.

By 'do it right' I mean get it worked out, in a non-kludgy way, and get GCC updated to work with it and have at least a couple months of testing on the FPGA's along with GCC.

To do something the 'looks like it will work' and committing to silicon before verifying it covers the needed cases with GCC would be begging for disaster.

C.W.

David Betz · 2013-12-06 10:14

potatohead wrote: »

Smile, you two sorted some of it out. Would still love to see a sample.

Would you David and Bill, take a moment and flesh out some sample code? I know this is how Chip generally works.

Let's see the two in more friendly terms, if that makes sense. Maybe show PASM to enter HUBEXEC, a loop bigger than the WIDE doing a memory move with an index pointer, and a call to a HUB subroutine, call to COG routine (if valid in your scheme), finally a return to COG mode, (if valid in your scheme)

I sure would like to see these myself to think on them better. Maybe doing this would help the discussion clarify some. Maybe I'm not the only one unable to see it in concrete enough terms.

Use NOP's for most of it, and just show how it would appear in Pnut.exe today?

This is a very good idea but I won't be able to get to this until the weekend. Sorry!

Bill Henning · 2013-12-06 10:16

As long as the single long {H|L|X|?} hub jump, call (3 variants), returns (3 variants) exist, I of course have no objection to alternate means, especially as they would not take up a full opcode.

But eliminating HRETx requires eliminating HCALLx, which leads to code bloat.

So, to summarize:

If you want to add your BIG mode for the cog CALL instructions, which can use the cog RET, I will be happy to support your suggestion as long as it does not try to eliminate the single long HJMP/HCALL*/HRET* instructions. I honestly do not see an advantage for the P2 for the two long variant, but if you see a use for it, by all means. For the P3... we have plenty of time to brainstorm.

David Betz wrote: »

Actually, I was suggesting BIG to extend CALLx not HCALLx. I was trying to propose a solution that required a minimum of changes. After you complained about the 64 bit CALL instructions I mentioned in another message that if it were possible to add new instructions with larger immediate fields then LCALL would solve your "code bloat" complaint. I guess BIG could be used to extend LCALL as well but it seems like it would fit into the architecture better to have it always apply to instructions with 9 bit S fields. In any case, there is no difference in code size if BIG is applied to CALL instead of LCALL. Both would result in 64 bit instructions.

Also, my proposal replaces HCALLx with LCALLx and doesn't require HRETx at all.

potatohead · 2013-12-06 10:17

Yeah, no worries. I expected that, because it takes some thinking through, and I honestly think it would help this discussion.

I really want you two to get this sorted, because it's best if you do.

David Betz · 2013-12-06 10:21

potatohead wrote: »

Yeah, no worries. I expected that, because it takes some thinking through, and I honestly think it would help this discussion.

I really want you two to get this sorted, because it's best if you do.

I think the main difference between what Bill is suggesting and what I'm suggesting is that I've unified the COG and hub address spaces so I can use common instructions like CALLx, LCALLx, RETx, and JMP (and maybe LJMPx) to address either COG or hub memory. Bill, instead, has two separate address spaces, one for COG addresses and a separate one for hub addresses. It may be that Chip tells us it isn't feasible to merge the COG and hub address spaces. In that case my proposal will have to be abandoned. However, if what I'm proposing is possible, it presents a simpler model to the programmer.

Bill Henning · 2013-12-06 10:29

David,

Please show me how your proposal can do HJMP, HCALL, HCALLA, HCALLB in a single long. Based on my reading of your previous messages, I am missing it where you described that, but I may be missing the tree from the forest.

David Betz wrote: »

I think the main difference between what Bill is suggesting and what I'm suggesting is that I've unified the COG and hub address spaces so I can use common instructions like CALLx, LCALLx, RETx, and JMP (and maybe LJMPx) to address either COG or hub memory. Bill, instead, has two separate address spaces, one for COG addresses and a separate one for hub addresses. It may be that Chip tells us it isn't feasible to merge the COG and hub address spaces. In that case my proposal will have to be abandoned. However, if what I'm proposing is possible, it presents a simpler model to the programmer.

David Betz · 2013-12-06 10:32

Bill Henning wrote: »

David,

Please show me how your proposal can do HJMP, HCALL, HCALLA, HCALLB in a single long. Based on my reading of your previous messages, I am missing it where you described that, but I may be missing the tree from the forest.

That's easy. I just use the exact same bit encoding that you've proposed except I interpret the immediate addresses differently. :-)

David Betz · 2013-12-06 10:39

David Betz wrote: »

That's easy. I just use the exact same bit encoding that you've proposed except I interpret the immediate addresses differently. :-)

LCALL D/#addr

TTTTTTT ZC I CCCC 01AAAAAAAAA AAAAAAA HCALL

LR = ++PC

Saves next hub instruction address value into a link register, then PC = specified address

The difference is that I'm saying that if the upper bits 15:9 of the AAAA field are zero then the address is a COG address and a transition is made to COG mode. If they are non-zero, the address is a hub address and a transition is made to hub mode possibly also loading PTRA if that is required.

Bill Henning · 2013-12-06 10:42

Excellent idea!

- For encodings, see post #2

- cog state will have a flip flop for "cog mode" or "hub mode"

- calling cog code from the hub code, the verilog knows it is a hub->cog code, due to opcode used and the "mode" flipflop. Either a fixed window is used for the 8-long hubexec cache, and the cog return address back into the window triggers the return to hub mode, or the cog return address is stored on the stack, or in LR - whichever is easier for chip - resumes hub mode

- callung hub code from cog code saves the cog return address (stack or LR, does not matter, whichever is easiest for Chip), and the HRET variant returns to the cog address as it was called from cog mode.

There are many ways to accomplish the cross-mode calls & returns, Chip knows which is easiest to implement - but they are not complicated.

Pasm to enter hubexec:

HJMP D/#address (one long)

Pasm to exit hubexec:

JMP D/#cogaddress

Pasm to call a hub subroutine:

HCALL D/#address (one long)

or

HCALLA D/#address (one long)

or

HCALLB D/#address (one long)

Pasm to return from a hub subroutine:

HRET

or

HRETA

or

HRETB

Pasm to call a cog subroutine from hub code:

CALL D/#addr (regular cog instruction)

Pasm to return from a cog subroutine to hub code:

RET (actually the normal jump)

Memory move routine crossing wide

Automagic, as the hub program counter (ptra or pc, whichever Chip uses) would change based on the slot.

See my post #3 for load cog / save cog code; as Chip is using PTRA for PC, best way would be to use the sample code in #3 to load a cog routine that saves PTRA, and uses both PTRA and PTRB for the load, restores PTRA, then returns to hub mode.

Failing that, easiest is load a block into the cog, then write it out again - do it many longs at a time instead of one.

I can write an example this weekend if you like, don't have time right now.

potatohead wrote: »

Smile, you two sorted some of it out. Would still love to see a sample.

Would you David and Bill, take a moment and flesh out some sample code? I know this is how Chip generally works.

Let's see the two in more friendly terms, if that makes sense. Maybe show PASM to enter HUBEXEC, a loop bigger than the WIDE doing a memory move with an index pointer, and a call to a HUB subroutine, call to COG routine (if valid in your scheme), finally a return to COG mode, (if valid in your scheme)

I sure would like to see these myself to think on them better. Maybe doing this would help the discussion clarify some. Maybe I'm not the only one unable to see it in concrete enough terms.

Use NOP's for most of it, and just show how it would appear in Pnut.exe today?

Bill Henning · 2013-12-06 10:44

Thanks, makes sense now. It can store the return address in LR, or stack A or stack B, depending on call variant.

I guess that - for me - your original proposal was not clear enough, and I interpreted it in the wrong way, causing me to blow my stack.

I apologize for that - but the vision of wasted hub space, loss in performance, and lost design wins blew my mind.

Having lost the neat optimization tricks (visible big, big holding low 23 bits, fixed location for the WIDE cache) I could have used in the compiler I've been working on since last November made me a bit touchy, especially since Spin and all other compilers would also have benefitted.

(To all readers: I will not release any information about my compiler until it is fully ready and debugged, no matter how nicely I am asked. Sorry)

David Betz wrote: »
LCALL D/#addr

TTTTTTT ZC I CCCC 01AAAAAAAAA AAAAAAA HCALL

LR = ++PC

Saves next hub instruction address value into a link register, then PC = specified address
The difference is that I'm saying that if the upper bits 15:9 of the AAAA field are zero then the address is a COG address and a transition is made to COG mode. If they are non-zero, the address is a hub address and a transition is made to hub mode possibly also loading PTRA if that is required.

David Betz · 2013-12-06 10:48

Okay, I'll try a quick example:

    org $1000 ' hub mode address

    ' do some stuff
    LCALLA #my_hub_fcn
    ' do some more stuff
    HALT ' I know this isn't an instruction but I just wanted to indicate the end of this instruction sequence

my_hub_fcn
    ' do some stuff
    LCALLA #my_cog_fcn
    ' do some more stuff
    RETA

    org $0 ' COG mode address

my_cog_fcn
    ' do some stuff
    CALL #my_other_cog_fcn
    ' do some other stuff
    RETA

my_other_cog_fcn
    ' do some stuff
my_other_cog_fcn_ret
    RET

So you see that you can call either hub or COG functions with LCALLA and you can use RETA to return whether it's a COG mode or hub mode function. However, you can only use CALL/RET from COG mode code.

Note: The call from the main code to my_hub_fcn remains in hub mode so no mode transition happens. The call from my_hub_fcn to my_cog_fcn transitions from hub to COG mode. The call from my_cog_fcn to my_other_cog_fcn doesn't make any mode transition.

The RET in my_other_cog_fcn makes no mode transition since it is returning to COG mode code but the RETA in my_cog_fcn makes a mode transition from COG mode to hub mode on returning to my_hub_fcn. Lastly, there is no mode transition when returning from my_hub_fcn to the main code.

All of these transitions are made based on looking at bits 31:9 of the target address. If those bits are zero then the target is in COG mode. If they are non-zero, the target is in hub mode. Based on the current mode and the target mode transitions happen if necessary. This all happens automatically.

ctwardell · 2013-12-06 10:49

Bill,

I'm not being negative, I'm being pragmatic.

To your points:

- next earliest shuttle run is in April, as per Chip's previous postings
* Synthesis and hooking up the synthesized core to the outer ring needs to fit in
and there needs to be time to deal with any issues that might arise.

- the new process would not fit P2, so video dac bus had to go
* To be more precise other changes made the core too big.

- Chip decided to use that space for 256KB hub
* Yes

- Chip went from QUAD hub bus to OCTAL, and started musing about hub execution
* OCTAL was an easy side effect of going to 256KB since they reused already define blocks.

- I played with the idea, and reduced it to just three simple instructions
* That are still evidently being argued over, and over, and over.

- David and Eric need a link register to make the GCC changes easier, so I added LR variants to my suggestion
* This is the kind of thing that needs thoroughly worked out to make absolutely sure GCC will port with nothing Kludgy.
I'm not calling any of this a Kludge, but until everything is looked at and GCC is ported we won't know.

- Beau has much hand-relaying work to do until then, leaving Chip time to play with the synthesized logic
* Don't forget about the SERDES/CRC needing worked out, and that may eat some opcode space as well.

- Chip is VERY fast with Verilog and had the 256KB change implemented in the FPGA one day after deciding to double the hub
* Again, reusing existing blocks helped there.

- the simple proposed instructions, and Chip's suggested auto reload take little Verilog and are low risk
* I believe that is a questimate until it is actually done.

- an FPGA configuration file would test these instructions to the same level as the far more complex CORDIC engine etc
* Yes, but time is needed to work with the FPGA versions.

- if the FPGA implementation of hubexec does not work before it is time to synthesize for the shuttle run, the new instructions can simply be not used if they don't work
* That assumes they caused no negative side effects by being implemented.

- therefore the risk is minimal to non-existent
* I obviously do not agree.

- having 160-200MIPS execution from hub all of a sudden makes P2 stand above Cortex M0 and M3 performance wise
* If it works properly.

- it would result in many more design wins
* If it works properly.

- I would not be surprised if Chip already has hubexec working
* Maybe, but then again, having something that is assumed to be what is needed and is verified to be what is needed are not the same thing.

I don't want to see what could be a smooth rollout turn into anything like the rollout of a certain website in recent news...

I think it would be a good idea to do the following:

1) Wrap up the design as is, with the 256KB Hub and optionally the SERDES if that can be wrapped up soon, send it through synthesis to see if there are any issues and have that ready to go for the shuttle run if needed.

2) After the first step is off to synthesis resume working on the HUBEXEC, if that is worked out in time and makes it through synthesis prior to the shuttle run then go with the updated design.

I realize synthesis isn't cheap, but it might be better this way than to risk missing a shuttle run.

C.W.

Ken Gracey · 2013-12-06 11:16

Just spoke with Chip and he says that maybe this weekend he'll have hubexec working. I'm sure he'll figure out the right way to do it and will report back with his results.

Now I'm on the "more features bandwagon" for P2, too. Who'd have thought that would ever happen?* This is especially true since I got a few messages from Bill Henning and David Betz about how much more function and performance large languages (like Propeller C) could work in P2 with hubexec. And beyond performance, they mentioned the GCC design cost...now they're really speaking my language.

Ken Gracey

*Still not interested in DDRAM support that was born over Thanksgiving Holiday while Chip slept twice.

Bill Henning · 2013-12-06 11:21

Ken,

Thanks for the update, and for supporting hubexec.

This will take Prop2 "to infinity and beyond!"

It will also make it faster than Cortex M0 & Cortex M3 :-)

(agreed re/ DDR2 for Prop 2. Suggest starting Prop 2.1 with DDR2 or DDR3 as soon as Prop 2 goes to production <grin>)

Ken Gracey wrote: »

Just spoke with Chip and he says that maybe this weekend he'll have hubexec working. I'm sure he'll figure out the right way to do it and will report back with his results.

Now I'm on the "more features bandwagon" for P2, too. Who'd have thought that would ever happen?* This is especially true since I got a few messages from Bill Henning and David Betz about how much more function and performance large languages (like Propeller C) could work in P2 with hubexec. And beyond performance, they mentioned the design cost...now they're really speaking my language.

Ken Gracey

*Still not interested in DDRAM support that was born over Thanksgiving Holiday while Chip slept twice.

K2 · 2013-12-06 11:24

Cluso99 wrote: »

SETSLOT is I understand quite simple. It uses otherwise unused bandwidth, so really its a no-brainer. If you dont want to tell everyone, thats fine. But dont miss the opportunity.
This is a simple implementation...
(a) Each COG can YIELD (other cog takes priority) or GIFT (this cog has priority) its slot to another COG
SETSLOT #0_0_y_g_ccc
(b) Each COG can accept other COG(s) YIELD/GIFT slot(s), and/or accept any AVAILABLE slots
SETSLOT #p_a_0_0_000

Most of the slot sharing discussion took place while Europe was asleep. When certain influential member woke up and read post #3156 as their first exposure to the idea, they had a cow. Understandable. Chip was just proposing an all-encompassing solution. In reality it need not be that full-featured. Any of the simpler solutions, including this one, would be just fine with me! Any provision for slot sharing is better than no provision for slot sharing.

Unless Chip has found a reason why it shouldn't be done (eg, it's more complicated than originally though or it impacts something else adversely) it really does seem like a no-brainer to include it.

cgracey · 2013-12-06 11:42

K2 wrote: »

Most of the slot sharing discussion took place while Europe was asleep. When certain influential member woke up and read post #3156 as their first exposure to the idea, they had a cow. Understandable. Chip was just proposing an all-encompassing solution. In reality it need not be that full-featured. Any of the simpler solutions, including this one, would be just fine with me! Any provision for slot sharing is better than no provision for slot sharing.

Unless Chip has found a reason why it shouldn't be done (eg, it's more complicated than originally though or it impacts something else adversely) it really does seem like a no-brainer to include it.

The argument against implementing it is philosophical, not technical.

Many of us are worried that by offering a door to increased performance, which cannot be relied up as objects are mixed into an application, people would wind up frustrated after having casually written a bunch of code with that door open, only to find that performance falls as their cog usage grows. By enforcing a single-hub-slot-per-cog rule, there is no performance degradation as cog usage increases; hence, no disappoint, and a reliable store of objects can be developed.

It was a long, drawn-out discussion. My thought at the moment is this: maybe implement it, but don't make it public, at all. Only tell the sort of people who really want to know, maybe requiring participation in some weekend retreat deep in the woods.

David Betz · 2013-12-06 11:43

Ken Gracey wrote: »

Just spoke with Chip and he says that maybe this weekend he'll have hubexec working. I'm sure he'll figure out the right way to do it and will report back with his results.

Now I'm on the "more features bandwagon" for P2, too. Who'd have thought that would ever happen?* This is especially true since I got a few messages from Bill Henning and David Betz about how much more function and performance large languages (like Propeller C) could work in P2 with hubexec. And beyond performance, they mentioned the GCC design cost...now they're really speaking my language.

Ken Gracey

*Still not interested in DDRAM support that was born over Thanksgiving Holiday while Chip slept twice.

That's very encouraging. It will be interesting to see what Chip comes up with! Some of the optimizations Bill has been talking about will probably not make it into PropGCC anytime soon but certainly the hub execution feature can be used to great benefit.

T Chap · 2013-12-06 11:46

I really don't get this logic. Just add the features if it is easy to do, and ban objects with it included from OBEX.

Ken Gracey · 2013-12-06 11:48

cgracey wrote: »

My thought at the moment is this: maybe implement it, but don't make it public, at all. Only tell the sort of people who really want to know, maybe requiring participation in some weekend retreat deep in the woods.

That's sorta weird. How about choosing the Arctic Circle so Heater and Baggers can go, and so that Bill Henning is also at home in a cold northern climate.

Bill Henning · 2013-12-06 11:52

LOL!

I actually like T Chap's suggestion of banning objects that use it from OBEX, and maybe only document it in an app note on Parallax Semiconductor. The normal docs could just say:

SETSL not recommended for normal software. See App Note #242423432 (not even giving the link)

Reason to shorten the name: make function less obvious. Hmm... maybe call it XYZZY ?

Ken Gracey wrote: »

That's sorta weird. How about choosing the Arctic Circle so Heater and Baggers can go, and so that Bill Henning is also at home in a cold northern climate.

ctwardell · 2013-12-06 11:56

Bill Henning wrote: »

Hmm... maybe call it XYZZY ?

How about in honor of Canada, and Rush of course, call it YYZ...

Now that I'm listening to it, the pace of YYZ fits the idea pretty well too!

C.W.

msrobots · 2013-12-06 11:56

I can provide the woods, If you want to see something else than walnut trees! Even a nice Lake View.

But it won't work.

It was a long, drawn-out discussion. My thought at the moment is this: maybe implement it, but don't make it public, at all. Only tell the sort of people who really want to know, maybe requiring participation in some weekend retreat deep in the woods.

once the code is out, the code is out.

Alas I need to say that there MAY be a technical reason FOR implementing it.

Since fast ADC/DAC has now a binding between COG and pins we can not use 2 COGS to drive the same pins fast. A lot of video driver use multiple cogs on P1. Is this still needed on P2 ?

Enjoy!

Mike

Propeller II update - BLOG

Comments