No. The cogs are all completely independent of each other (intentionally, and desirably), and the QUAD registers (that make up the cache) are in each cog. If you want cogs to coordinate with each other, then it's up to you to do that coordination (via the LOCK instructions or some other means). Also, there are many cases (if not most) where one would not want the behavior you described.
It's usually best to think of the HUB as some external dumb device that just delvers and accepts memory, because that's pretty much all it is/does.
So if one cog writes to hub memory that matches the cached QUADs of another cog, does a RDxxxxC automatically detect this and reload the cache?
Nope. It's really simple. RDBYTEC/RDWORDC/RDLONGC/RDQUADC only do a RDQUAD if the address has changed since the last RDQUAD or CACHEX invalidated the cache. That's it.
Chip, the referenced post is a suggestion I made for the HMAC/SHA-256 engine. I'm just going to leave the original post there to avoid duplication.
I don't want to leave cog 1 running, since it's most likely that it will not be needed again. It's most likely that AES-128 would be executed by the ensuing loader and that needs different code. Of course, it's all up to the loader programmer. One thing, though - a cog boot (load program from hub memory and begin execution) only takes 1,016 clock cycles. That's only ~50us at the RCFAST 20MHz clock rate.
Nope. It's really simple. RDBYTEC/RDWORDC/RDLONGC/RDQUADC only do a RDQUAD if the address has changed since the last RDQUAD or CACHEX invalidated the cache. That's it.
That was my understanding. What actually prompted this line of questions was that RDBYTE, RDWORD, and RDLONG take 3..10 clock cycles while the RDQUAD only takes 1..8 cycles. Those two extra cycles made me wonder if the QUAD registers were still somehow involved (e.g. all reads from the hub were quad reads with an additional couple cycles to copy to D). I guess not.
I don't want to leave cog 1 running, since it's most likely that it will not be needed again. It's most likely that AES-128 would be executed by the ensuing loader and that needs different code. Of course, it's all up to the loader programmer. One thing, though - a cog boot (load program from hub memory and begin execution) only takes 1,016 clock cycles. That's only ~50us at the RCFAST 20MHz clock rate.
From some prior posts, I was under the impression that the second-stage loaders were likely to authenticate before decryption. Since most non-trivial applications were likely to use multiple cogs, I figured that the HMAC cog would be re-purposed after authentication was complete. Performance might not have been a concern, but freeing up a few instructions (to start the HMAC engine) in the second-stage loader might be.
But that's fine. I wasn't attached to the idea. Just throwing it out there.
As I have worked through the documentation for the P2, it has become fairly obvious that the timing requirements have gotten much more complex. I am wondering how difficult it will be to write truly deterministic code for this chip. After all, it was one of the strongest arguments for the propeller not having interrupts (and was part of my "elevator pitch"). Thoughts? And for those playing with the FPGAs, experiences?
From some prior posts, I was under the impression that the second-stage loaders were likely to authenticate before decryption. Since most non-trivial applications were likely to use multiple cogs, I figured that the HMAC cog would be re-purposed after authentication was complete. Performance might not have been a concern, but freeing up a few instructions (to start the HMAC engine) in the second-stage loader might be.
But that's fine. I wasn't attached to the idea. Just throwing it out there.
AES-128 will provide authentication as a function of decryption. There are several forms of authentication that can be applied to the cypher block chaining to also validate the decryted contents. If you don't use the correct key, the cleartext result is gabarge. Authentication allows error detection. I originally conceived the design as straight CBC, however after seeing Chip's efforts with SHA-256, it may be possible to have XCBC or another variant in the allotted space.
Seairth,
Not sure why you think the P2 is going to be harder to write deterministic code on. I think it'll be pretty easy to write "truly" deterministic code for it. Even with all the new stuff.
Seairth,
Not sure why you think the P2 is going to be harder to write deterministic code on. I think it'll be pretty easy to write "truly" deterministic code for it. Even with all the new stuff.
I'm assuming by deterministic we are talking about timing.
For the P2 It will really be a matter of design choices, more so than with the P1.
For COG's running a single task the P2 looks like it will be as easy or easier than the P2 for writing code with deterministic timing.
For COG's running multitasking *perfect* timing determinism will be much more difficult if not impossible due to instructions like the HUB operations that take multiple cycles and delay the pipeline.
The timing jitter due to the hub operations is relatively small though and in many applications likely not be an issue.
Some design choices when timing is critical could be:
1) Single Task
2) Multi-task where the non-critical tasks are limited to using single cycle ops so they don't cause any delays for the critical task.
The P2 has a lot of new features, but in some cases their usage may be mutually exclusive for the problem to be solved. This is when we get to put on our engineering hat to determine the solution to the problem at hand.
Here are the latest metal density plots for the Propeller II ... If you remember when this happened for the November Test DIE, then you'll know that something good is going to happen very soon.
Just to give you a perspective of size... each of the little 'colored' squares are 100um X 100um ... In terms of a 300 DPI printer ONE dot has a diameter of about 85um
Interesting, Beau. Why does the density have to be locally controlled like that? Is it something to do with an etching style process where you don't want large open areas?
The density has to do with planarization between metal layers and helps to increase yield by making the overall content of the chip relatively even in terms of metal distribution.
It does help to reduce mechanical stress due to heating and cooling expansion, but for the most part I think it has to do with fringing effects. Think of a bead of water on a highly waxed surface... the fringe areas will be 'rounded' compared to the center... especially a large water bead, where the weight of the water causes the tension at the top to flatten. in the case of the metal deposition on the chip, this is what we want, the 'flattened top' ... for similar reasons this is also why 'dummy' structures are placed on the periphery of sensitive areas of the design.
Hi everyone, I really do apologize if I missed this while reading but it is still unclear to me as to when we might see the prop ii ready to ship?? I absolutely can not wait to try one and am holding my breath for the moment!! Thanks
Mark this as a major mile stone for the Propeller II
We have a tapeout submission on the 5th of February, 2013 that we are ready for ... The foundry requires 15 days for tooling and proper design setup, so the actual shuttle run will be on the 20th of February.
Everyone wish us luck and cross your fingers. This initial run will be a small batch of Propeller II's (40 or so) , In approximately three weeks time after the 20th of February (some time around mid-March) we will have chips back from the fab so that we can test various key parameters of the Propeller II. If all goes well, then full-production chips should be underway.
Mark this as a major mile stone for the Propeller II
We have a tapeout submission on the 5th of February, 2013 that we are ready for ... The foundry requires 15 days for tooling and proper design setup, so the actual shuttle run will be on the 20th of February.
Everyone wish us luck and cross your fingers. This initial run will be a small batch of Propeller II's (40 or so) , In approximately three weeks time after the 20th of February (some time around mid-March) we will have chips back from the fab so that we can test various key parameters of the Propeller II. If all goes well, then full-production chips should be underway.
Mark this as a major mile stone for the Propeller II
We have a tapeout submission on the 5th of February, 2013 that we are ready for ... The foundry requires 15 days for tooling and proper design setup, so the actual shuttle run will be on the 20th of February.
Everyone wish us luck and cross your fingers. This initial run will be a small batch of Propeller II's (40 or so) , In approximately three weeks time after the 20th of February (some time around mid-March) we will have chips back from the fab so that we can test various key parameters of the Propeller II. If all goes well, then full-production chips should be underway.
Mark this as a major mile stone for the Propeller II
We have a tapeout submission on the 5th of February, 2013 that we are ready for ... The foundry requires 15 days for tooling and proper design setup, so the actual shuttle run will be on the 20th of February.
Everyone wish us luck and cross your fingers. This initial run will be a small batch of Propeller II's (40 or so) , In approximately three weeks time after the 20th of February (some time around mid-March) we will have chips back from the fab so that we can test various key parameters of the Propeller II. If all goes well, then full-production chips should be underway.
Comments
It's usually best to think of the HUB as some external dumb device that just delvers and accepts memory, because that's pretty much all it is/does.
Nope. It's really simple. RDBYTEC/RDWORDC/RDLONGC/RDQUADC only do a RDQUAD if the address has changed since the last RDQUAD or CACHEX invalidated the cache. That's it.
I don't want to leave cog 1 running, since it's most likely that it will not be needed again. It's most likely that AES-128 would be executed by the ensuing loader and that needs different code. Of course, it's all up to the loader programmer. One thing, though - a cog boot (load program from hub memory and begin execution) only takes 1,016 clock cycles. That's only ~50us at the RCFAST 20MHz clock rate.
I absolutely agree. I'm not advocating that approach, just making sure I understand the rules.
That was my understanding. What actually prompted this line of questions was that RDBYTE, RDWORD, and RDLONG take 3..10 clock cycles while the RDQUAD only takes 1..8 cycles. Those two extra cycles made me wonder if the QUAD registers were still somehow involved (e.g. all reads from the hub were quad reads with an additional couple cycles to copy to D). I guess not.
From some prior posts, I was under the impression that the second-stage loaders were likely to authenticate before decryption. Since most non-trivial applications were likely to use multiple cogs, I figured that the HMAC cog would be re-purposed after authentication was complete. Performance might not have been a concern, but freeing up a few instructions (to start the HMAC engine) in the second-stage loader might be.
But that's fine. I wasn't attached to the idea. Just throwing it out there.
AES-128 will provide authentication as a function of decryption. There are several forms of authentication that can be applied to the cypher block chaining to also validate the decryted contents. If you don't use the correct key, the cleartext result is gabarge. Authentication allows error detection. I originally conceived the design as straight CBC, however after seeing Chip's efforts with SHA-256, it may be possible to have XCBC or another variant in the allotted space.
Not sure why you think the P2 is going to be harder to write deterministic code on. I think it'll be pretty easy to write "truly" deterministic code for it. Even with all the new stuff.
I'm assuming by deterministic we are talking about timing.
For the P2 It will really be a matter of design choices, more so than with the P1.
For COG's running a single task the P2 looks like it will be as easy or easier than the P2 for writing code with deterministic timing.
For COG's running multitasking *perfect* timing determinism will be much more difficult if not impossible due to instructions like the HUB operations that take multiple cycles and delay the pipeline.
The timing jitter due to the hub operations is relatively small though and in many applications likely not be an issue.
Some design choices when timing is critical could be:
1) Single Task
2) Multi-task where the non-critical tasks are limited to using single cycle ops so they don't cause any delays for the critical task.
The P2 has a lot of new features, but in some cases their usage may be mutually exclusive for the problem to be solved. This is when we get to put on our engineering hat to determine the solution to the problem at hand.
C.W.
Unspecified.
http://forums.parallax.com/showthread.php/145201-Shuttle-today?
November Test DIE Density plots for reference:
http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=947002&viewfull=1#post947002
Just to give you a perspective of size... each of the little 'colored' squares are 100um X 100um ... In terms of a 300 DPI printer ONE dot has a diameter of about 85um
Enjoy!!
I see. It's like I don't want all the peperoni on my pizza all stacked up in one spot:)
You know, some of those Prop II images would make nice big glossy posters.
Is this similar to balancing the copper on different layers on a PC board so it doesn't warp ?
It does help to reduce mechanical stress due to heating and cooling expansion, but for the most part I think it has to do with fringing effects. Think of a bead of water on a highly waxed surface... the fringe areas will be 'rounded' compared to the center... especially a large water bead, where the weight of the water causes the tension at the top to flatten. in the case of the metal deposition on the chip, this is what we want, the 'flattened top' ... for similar reasons this is also why 'dummy' structures are placed on the periphery of sensitive areas of the design.
Thanks for the explanation, I have always wondered why the dummy structures are on most die's.
We have a tapeout submission on the 5th of February, 2013 that we are ready for ... The foundry requires 15 days for tooling and proper design setup, so the actual shuttle run will be on the 20th of February.
Everyone wish us luck and cross your fingers. This initial run will be a small batch of Propeller II's (40 or so) , In approximately three weeks time after the 20th of February (some time around mid-March) we will have chips back from the fab so that we can test various key parameters of the Propeller II. If all goes well, then full-production chips should be underway.
Good luck!
I stand corrected!
Good luck!
Thanks for the update. I pictured that being slower. Apparently, when it's all done rendering actual chips happens fairly quickly!
Congratulations!