It seems to me that just making them fully shared (via writes) is adequate. There's no complex mode that might need switching and the whole matter becomes, "do we allow writes from the companion cog, or not?". It's very easy to understand. The only argument for more complexity may be that it would be good to not share the entire LUT, so that we can have more LUT space of our own. In most applications, I would assume that only a few, or all, locations would be useful to share.
If you keep two control bits, then the 01 setting above allows COG to use its LUT as private space and LUT+1 as a place to pass data to COG+1. Seems like a useful mode. No?
LUT+1? Do you mean higher addresses than $000..$1FF.
Sorry, no. LUT+1 meant to be LUT2, the second COG's LUT. With 2 bits, the 01 mode is still able to let COG1 write just to LUT2. That seems useful and would go away with just one mode bit.
0 = other cog's LUT writes are ignored (default)
1 = other cog's LUT writes are permitted
This way, all cogs send their LUT writes to their companion cog, but the companion cog must permit them. This keeps both LUTs the same, as writes occur.
Is there much value in being able to control whether you write to your own LUT, the other LUT, or both? It makes for more memory, but then the juggling becomes more complex.
Can this control bit be automatically set when you request a adjacent cog start.
Seems to me that shared LUT writes is implied by the type of startup?
To launch 2 cogs, D[4] is set and D[0] is set. If an unused odd/even pair of cogs is found, it starts them both up with the same parameters.
You mean that they both load the same code ? If so, correct me if I'm wrong but doesn't look very good to me for the intended application.
That single program will be written for both cogs. The first thing it does is a COGID to check the LSB of the cog number. The even cog branches one way and the odd cog goes another. There are not hardware facilities for pointing to two different programs, so the program must figure it out.
Looks similar to the fork function where the returned value tells the program what thread it is running on.
The only complication I see here is that if cogs needs to run very different code so that it is not possible or practical to include both in the same image, the other cog must restart itself pointing to the correct code. Or the common code is just a shared stub that restarts both cogs pointing to the actual code. I guess this won't be a problem.
0 = other cog's LUT writes are ignored (default)
1 = other cog's LUT writes are permitted
This way, all cogs send their LUT writes to their companion cog, but the companion cog must permit them. This keeps both LUTs the same, as writes occur.
Is there much value in being able to control whether you write to your own LUT, the other LUT, or both? It makes for more memory, but then the juggling becomes more complex.
Can this control bit be automatically set when you request a adjacent cog start.
Seems to me that shared LUT writes is implied by the type of startup?
Good idea. I agree. It could be implied by the dual-cog COGNEW. What about plain COGINIT, though? Right now, I made an instruction "SETLUT D/#", where the LSB of D/# enables LUT writes from the other cog. It's pretty simple.
Everyone, this feature is pretty much done, as of tonight. There are a few things to sort out, still, like what kind of event is associated with this, and is simple/full sharing okay? Maybe just one event is useful: a LUT write from the other cog occurred. If we keep it simple with full LUT sharing on/off and have one event on other-cog write, this LUT thing is done.
The only other thing that I think needs any attention is the 'attention' mechanism. It should maybe be expanded to enable seeing which cog(s) called attention. After that, we need to ensure that the USB smart pin mode is sufficient. I think we need to add an N-bit-period delay to the output, if anything.
So, we are almost there (again, but with some new refinements). On Monday, the test chip layout should be off to OnSemi for fabrication. The big thing that we can all do is use the design on FPGA boards and see if there are any bugs that need fixing. I want to start working on tools soon.
If so, correct me if I'm wrong but doesn't look very good to me for the intended application.
Here is what you do.
1) Start two COGS with the same code.
2) Have that code, running in both COGs, check it's COGID for odd/even.
3) One of them can now reload itself with an entire COGS worth of code with COGINIT.
4) The other just gets on and runs the code it was loaded with.
Or of course one could start two COGs with a tiny little bit of code that checks odd/even COGID and then they both reload themselves with the real COG codes you want to run. The addresses of the codes to run would be passed in as PAR parameters.
Heck, this could be generalized and built into Spin as a dual COGSTART feature.
Edit: The above assumes we are talking about PASM. Where you want to fill both COGS with as much of your own PASM as possible. If you are just stating two COGs to run SPIN this reloading is not required, they both run the same Spin engine. But then Spin is as slow as hell so you probably would not be into this LUT sharing and dual COG starting anyway.
Hey, it's a bit like doing a fork() in Unix. A single program calls fork(), the system then starts another process to run exactly the same code. But the code knows if it is a parent or child so it branches and runs appropriate parts of itself.
Has anyone discussed how COGSTOP should be used in these shared arrangements? It might be wise for a dual launched Cog to stop both Cogs and freeing up any shared LUT at the same time, all with the one instruction. In effect the reverse of Oz's idea.
I guess the problem with being done with design is that then you have to work on documentation. Personally, I hate doing the documentation on stuff, very painful compared to doing design work...
Has anyone discussed how COGSTOP should be used in these shared arrangements? It might be wise for a dual launched Cog to stop both Cogs together. Freeing up any shared LUT at the same time.
Cogs have to enable LUT writes from their companion cog. So, if one cog stops, it doesn't hurt anything.
The compile finished in 1.75 hours and has everything, but 16 smart pins. It takes 93% of the A9. It looks good. No routing problems with even/odd LUT sharing.
I'll test this after I sleep.
Hopefully, tomorrow, we can get the design into a finished state, again. Just a few loose ends to address.
Any recommendations on LUT events? We have two free events now. We already have a read-LUT-end event.
Seairth pointed out that the hub r/w events are now antiquated by ATN (attention). They can go away and save a nice chuck of logic, while freeing up two events. I hope to get all the current changes done tonight.
The ATN also needed a 'who called' information field, to make it more usable.
Yes, I believe Seairth has made a nice proposal about how to handle that.
I don't remember what that proposal is, but I'm not sure we need to know which cog set ATN...
There are so many ways for cogs to cross talk now...
Seairth pointed out that the hub r/w events are now antiquated by ATN (attention). They can go away and save a nice chuck of logic, while freeing up two events. I hope to get all the current changes done tonight.
The ATN also needed a 'who called' information field, to make it more usable.
Yes, I believe Seairth has made a nice proposal about how to handle that.
I don't remember what that proposal is, but I'm not sure we need to know which cog set ATN...
There are so many ways for cogs to cross talk now...
I don't really understand why the option to keep 'my' LUT to myself,
use it and write to LUT+1 for connection ...
was dropped.
to save a config bit ??? no?
Can mode %11 be an OR operation with result in other COG LUT? Maybe result target determined by the write mode is best.
That may prove useful when both cogs could be put to work on their part of a long. Each one does its thing, and no maskng is needed to combine results.
We don't have the concept of system wide interrupts to worry about. The events are all cog local. It is reasonable for the programmer to manage those and LUT modes. Once they do, it will just work.
I am in favor of the modes as opposed to the mind meld only.
Chip,
I am pleased with the result. My preference was to have separate LUT addresses for my cog $000-$1FF, and the other shared cog would be $200-$3FF, but your implementation is fine.
I my current cases that I can think of, I don't require any interrupts as the second cog will be in a two instruction loop
RDLUT WZ
IF_Z JMP $-1
Waiting for the WRLUT.
I need this for minimum latency. Perhaps an interrupt on write will be as fast, but it's not necessary.
Good idea. I agree. It could be implied by the dual-cog COGNEW. What about plain COGINIT, though? Right now, I made an instruction "SETLUT D/#", where the LSB of D/# enables LUT writes from the other cog. It's pretty simple.
Maybe just one event is useful: a LUT write from the other cog occurred. If we keep it simple with full LUT sharing on/off and have one event on other-cog write, this LUT thing is done.
<paste?
Any recommendations on LUT events? We have two free events now. We already have a read-LUT-end event.
If this is saying a Write-LUT and a Read-LUT, then yes that is a minimum. That allows either side to handshake, with minimal cycles.
The only other thing that I think needs any attention is the 'attention' mechanism. It should maybe be expanded to enable seeing which cog(s) called attention.
Comments
Sorry, no. LUT+1 meant to be LUT2, the second COG's LUT. With 2 bits, the 01 mode is still able to let COG1 write just to LUT2. That seems useful and would go away with just one mode bit.
It's already complex, just by it's nature...
This time, I think I'll hope compile works...
Seems to me that shared LUT writes is implied by the type of startup?
Looks similar to the fork function where the returned value tells the program what thread it is running on.
The only complication I see here is that if cogs needs to run very different code so that it is not possible or practical to include both in the same image, the other cog must restart itself pointing to the correct code. Or the common code is just a shared stub that restarts both cogs pointing to the actual code. I guess this won't be a problem.
Good idea. I agree. It could be implied by the dual-cog COGNEW. What about plain COGINIT, though? Right now, I made an instruction "SETLUT D/#", where the LSB of D/# enables LUT writes from the other cog. It's pretty simple.
Everyone, this feature is pretty much done, as of tonight. There are a few things to sort out, still, like what kind of event is associated with this, and is simple/full sharing okay? Maybe just one event is useful: a LUT write from the other cog occurred. If we keep it simple with full LUT sharing on/off and have one event on other-cog write, this LUT thing is done.
The only other thing that I think needs any attention is the 'attention' mechanism. It should maybe be expanded to enable seeing which cog(s) called attention. After that, we need to ensure that the USB smart pin mode is sufficient. I think we need to add an N-bit-period delay to the output, if anything.
So, we are almost there (again, but with some new refinements). On Monday, the test chip layout should be off to OnSemi for fabrication. The big thing that we can all do is use the design on FPGA boards and see if there are any bugs that need fixing. I want to start working on tools soon.
I'm going to be bad and repeat my previous reply:
Yep. But it's not a problem see below: Here is what you do.
1) Start two COGS with the same code.
2) Have that code, running in both COGs, check it's COGID for odd/even.
3) One of them can now reload itself with an entire COGS worth of code with COGINIT.
4) The other just gets on and runs the code it was loaded with.
Or of course one could start two COGs with a tiny little bit of code that checks odd/even COGID and then they both reload themselves with the real COG codes you want to run. The addresses of the codes to run would be passed in as PAR parameters.
Heck, this could be generalized and built into Spin as a dual COGSTART feature.
Edit: The above assumes we are talking about PASM. Where you want to fill both COGS with as much of your own PASM as possible. If you are just stating two COGs to run SPIN this reloading is not required, they both run the same Spin engine. But then Spin is as slow as hell so you probably would not be into this LUT sharing and dual COG starting anyway.
Hey, it's a bit like doing a fork() in Unix. A single program calls fork(), the system then starts another process to run exactly the same code. But the code knows if it is a parent or child so it branches and runs appropriate parts of itself.
Cogs have to enable LUT writes from their companion cog. So, if one cog stops, it doesn't hurt anything.
I'll test this after I sleep.
Hopefully, tomorrow, we can get the design into a finished state, again. Just a few loose ends to address.
Any recommendations on LUT events? We have two free events now. We already have a read-LUT-end event.
Before answering this, can someone summarize what was finally implemented?
I don't remember what that proposal is, but I'm not sure we need to know which cog set ATN...
There are so many ways for cogs to cross talk now...
See this post.
use it and write to LUT+1 for connection ...
was dropped.
to save a config bit ??? no?
seems an unnecessary restriction.
That may prove useful when both cogs could be put to work on their part of a long. Each one does its thing, and no maskng is needed to combine results.
We don't have the concept of system wide interrupts to worry about. The events are all cog local. It is reasonable for the programmer to manage those and LUT modes. Once they do, it will just work.
I am in favor of the modes as opposed to the mind meld only.
I am pleased with the result. My preference was to have separate LUT addresses for my cog $000-$1FF, and the other shared cog would be $200-$3FF, but your implementation is fine.
I my current cases that I can think of, I don't require any interrupts as the second cog will be in a two instruction loop
RDLUT WZ
IF_Z JMP $-1
Waiting for the WRLUT.
I need this for minimum latency. Perhaps an interrupt on write will be as fast, but it's not necessary.
We haven't even left the garage.
Yes! We are !
...but it will take 4-6 months to find parking ;-)
<Sorry! Had to to do it. Feel free to use that one on your kids :-) They'll love it !! Honest :->
That sounds useful.
If this is saying a Write-LUT and a Read-LUT, then yes that is a minimum.
That allows either side to handshake, with minimal cycles.
Agreed, that detail is important.
Well, my one and only kid has grown to be a man whilst we were waiting for the P 2.
Perhaps I can bequeath the waiting to him should I shuffle off this mortal coil in the mean time.