We are thinking that we should have packaged chips by the beginning of September. Either there will be 75% good chips, or 0% good chips. A single design error can render all chips useless. If there is any design error, let's hope it's one that doesn't leave the chip unable to be programmed.
Just wondering...
For the P1 wafers, do you know statistically where the good ~400 come from (ie their position on the wafers)?
Do you package all the ~524 before you can test for the ~400 good ones?
You're welcome.
We do have all die packaged first, then we test. We would love to do a wafer test, but to do is too expensive (money and time) to consider setting up that test at this time.
The packager doesn't give us any information about die placement per packaged part, so we're only going off of what we know from other's experience and technical knowledge shared in the industry to say that it's likely a high percentage of the good parts were near the center and further from the edge.
Chip, on the (hopefully improbable) chance that there are problems but they are not catastrophic, leaving the prototypes at least partially usable, is there any chance some of us who couldn't justify the investment in a FPGA board could get samples so we'd finally have something to work with?
The packager doesn't give us any information about die placement per packaged part
I realize it is far too late to do this for either Propeller, but I'm curious; given this limitation it would seem that a cheap workaround would be to give each chip on the die a different ID code which could be queried, so that the chips that function at all could announce what their die position was. This would also allow you to test something else I'm curious about, whether things like overclocking and temperature tolerances are affected by die placement. Is there any reason this would be unworkable?
Chip, on the (hopefully improbable) chance that there are problems but they are not catastrophic, leaving the prototypes at least partially usable, is there any chance some of us who couldn't justify the investment in a FPGA board could get samples so we'd finally have something to work with?
You bet. If it's at all usable, we can make little boards with the chips on them.
I realize it is far too late to do this for either Propeller, but I'm curious; given this limitation it would seem that a cheap workaround would be to give each chip on the die a different ID code which could be queried, so that the chips that function at all could announce what their die position was. This would also allow you to test something else I'm curious about, whether things like overclocking and temperature tolerances are affected by die placement. Is there any reason this would be unworkable?
This is not really possible, since the mask set is stepped along the wafer for lithography purposes, repeating the same image over and over. There is no way to introduce any unique patterns per chip.
It's 2:00am here and it looks like the Spin2 compiler and interpreter are working together now. I'm sure there are going to be more bugs to fix and things to add, but it's finally 'alive'.
Hopefully, I'll have a Spin2 version of PNUT.EXE to post in the next week, or so. I'll need to write some documentation to explain Spin2.
It's 2:00am here and it looks like the Spin2 compiler and interpreter are working together now. I'm sure there are going to be more bugs to fix and things to add, but it's finally 'alive'.
Hopefully, I'll have a Spin2 version of PNUT.EXE to post in the next week, or so. I'll need to write some documentation to explain Spin2.
Making it cache them would be really easy - have a long for each snippet, the bottom 9 bits holding the snippet's cogram address or 0 if it's not loaded, and the rest holding when this snippet was last used. Whenever a new snippet is needed and there isn't enough room for it, the oldest one would be unloaded and replaced with the new one.
Obviously, caching them cuts down on the amount of user PASM space you have, but maybe you could specify how much space you needed and it would make the cache size fill only the remainder?
Caching also reduces the efficiency of the code execution mechanism. The cache table has to be searched. Any saved (cached) snippet has to be copied within the cog since there are no relative jumps, etc. Caches are great when there's a high cost to fetch the code snippet, like from an SD card or EEPROM. It's pretty cheap to fetch a small block of code from hub RAM.
Caching also reduces the efficiency of the code execution mechanism. The cache table has to be searched. Any saved (cached) snippet has to be copied within the cog since there are no relative jumps, etc. Caches are great when there's a high cost to fetch the code snippet, like from an SD card or EEPROM. It's pretty cheap to fetch a small block of code from hub RAM.
Mike's right. The hassle of determining whether or not what you wanted was already cached would take longer than just loading what are often 4-long snippets (using RDLONGC, which caches). This is where some kind of content-addressable memory subsystem would help a lot.
Mike's right. The hassle of determining whether or not what you wanted was already cached would take longer than just loading what are often 4-long snippets (using RDLONGC, which caches). This is where some kind of content-addressable memory subsystem would help a lot.
Hi Chip,
Are you planning to document the instruction set used by the Spin interpreter so that it could be used as a target for other languages?
Are you planning to document the instruction set used by the Spin interpreter so that it could be used as a target for other languages?
Thanks,
David
I haven't thought about it. Do you think documentation would be necessary for someone to use the code? I figure they'd probably modify it in some ways after figuring out how it works. The resident portion of the interpreter spans from $147 to $1F5, with some of that being variable space, so it's not that much to learn. Some instruction snippets push stuff and other instructions pop stuff, and some do both. It's RAM-based, so it's nothing that would have to be adhered to strictly.
I was thinking that the bytecodes could be made dynamic at compile time, so that only those used need be included in the interpreter. That would shorten the descriptor table and eliminate unused snippets. Right now the whole interpreter is just over 5KB. Most programs would use way less than that. The advantage of keeping the whole interpreter resident leaves the door open for dynamic overlays.
I would also hope that the Spin2 instruction set is documented so it could be used as a target for other languages. I would also put a "bid" in for similar documentation for the Prop1 Spin instruction set. I know a lot has been "reverse engineered" (Spin1), but it would be nice to have it organized and vetted / filled in for the same purpose.
Sure, most of the Spin2 interpreter is hub RAM based and there's some potential benefit in trimming down the instruction set on an application by application basis, but then there's no run-time commonality from application to application ... not a bad thing if you want to think of the interpreter like a run-time library with the Spin2 compiler managing the packaging of the interpreter with the program and the assumption at run-time that there will always be a native interpreter or other "main" native program that will be given control initially. This is opposite of what Spin has assumed on the Prop1, that there's a built-in Spin machine that gets control initially and can surrender control to one of the native processors.
I think the idea of multiple interpreters (for Spin2 or C or whatever) is fine, but the conceptual base then needs to shift to the native instruction set as the primary or initial program to be given control when booted.
I would also hope that the Spin2 instruction set is documented so it could be used as a target for other languages. I would also put a "bid" in for similar documentation for the Prop1 Spin instruction set. I know a lot has been "reverse engineered" (Spin1), but it would be nice to have it organized and vetted / filled in for the same purpose.
Sure, most of the Spin2 interpreter is hub RAM based and there's some potential benefit in trimming down the instruction set on an application by application basis, but then there's no run-time commonality from application to application ... not a bad thing if you want to think of the interpreter like a run-time library with the Spin2 compiler managing the packaging of the interpreter with the program and the assumption at run-time that there will always be a native interpreter or other "main" native program that will be given control initially. This is opposite of what Spin has assumed on the Prop1, that there's a built-in Spin machine that gets control initially and can surrender control to one of the native processors.
I think the idea of multiple interpreters (for Spin2 or C or whatever) is fine, but the conceptual base then needs to shift to the native instruction set as the primary or initial program to be given control when booted.
I think that as time goes by and the interpreter is improved, there will be many changes to it, making it a moving target for documentation. I think what you said about shifting focus to the native instruction set is the way to go. Spin2 is writ in water.
By packaging the interpreter as part of the run-time library and making the assumption that some kind of native program is what is initially started by the bootstrap, the bootstrap doesn't care what version of the Spin interpreter or C interpreter or whatever is packaged as part of the program. Our programming systems and operating systems don't care which run-time library is packaged with the program, only that the format of the program on the storage device is known and any calls to ROM are at permanently standard locations and follow documented calling conventions.
I would like to see better docs so that it could become part of a library to support other languages too. IOW I would like to see a BASIC frontend, and I would like to see a variant of Spin where indentation is not forced (ie using endif, endrepeat/endloop, endcase, or the horrid "{" and "}" from C etc.). Switching between the forced indentation and others should even be possible on the fly by extending the editor. Don't get me wrong, I love the enforced identation, but so many don't like it I think it would be great to provide an alternative.
The overhead of testing to see whether small "snippets" are already loaded is a large percentage of the load time, so as others have said, is detrimental to the timing and a waste of code space.
By being "soft", improvements can be made. Great times ahead.
This (identifying the chip position on wafer post hoc -- LR) is not really possible, since the mask set is stepped along the wafer for lithography purposes, repeating the same image over and over. There is no way to introduce any unique patterns per chip.
Fascinating... I suppose I always envisioned that wafer being exposed all at once by a single great big mask for each step. I am amazed at the multiple disciplines you had to master to make these chips real. I might have thought to design my own CPU and maybe give it cores but the PLL's and counter-timers are genius totally out of my pay grade and the translation to IC mask HEAD EXPLODES.
As far as ID'ing the location of each die on a wafer, everything comes at a cost. Laser etching is an option but not cost effective for our operation. Expanding that, the laser etching could simply be an enumerated mark on the individual dies or strategically aligned to readable fuses from the chip itself during normal operation of the chip, but again cost prohibitive to our focused efforts.
As far as ID'ing the location of each die on a wafer, everything comes at a cost. Laser etching is an option but not cost effective for our operation. Expanding that, the laser etching could simply be an enumerated mark on the individual dies or strategically aligned to readable fuses from the chip itself during normal operation of the chip, but again cost prohibitive to our focused efforts.
If you did wafer probe then you could burn fuses to uniquely id each die during test, and keep a record of their locations. Some companies do this. (I could see the military wanting tracking like that, but have no experience with miltary electronics.) But it doesn't always make sense to do wafer probe even if you're setup to do it. Someone needs to run the numbers to see if it pays off. (e.g. not packaging bad parts.) For WLBGAs you typically test in wafer form (and hopefully several die in parallel), and then typically no testing is done after dicing the wafer. The chips that I've worked on lately are WLBGAs, and you can see the ATE pass/fail results mapped onto an image of the wafer.
With the bigger address space, SPIN2 can become a general purpose programming language.
That gives me a wonderfully evil idea. Retarget Spin for the XMOS devices so those poor users have something easier to work with:)
How do you imagine this working:
1) Compile Spin to native code for your target, say x86, ARM etc? Like a C compiler does.
2) Compile Spin to the standard byte codes and have a native, x86, ARM.., program interpret them? Like a JVM.
One thing that worries me with Spin as a general purpose language is that I'm used to the tight and seamless integration of the high level Spin syntax with the low level PASM. So much so that in my mind that Spin is Spin + PASM. Which leads to:
3) Adopt the instruction set of the target machine for the PASM parts? Results in non-portable hence non-general purpose use.
4) Keep the to PASM instruction set? Requires a Prop simulator on the target.
One way of tracking yield of the different locations on the wafer could be to have the cut chips, then the packaged chips placed in a carrier with the same layout as the wafer. As long as the chips are always placed correctly, you'd know which position they came from.
Of course, no one will do that...
The other option is to make certain that when the wafer is cut, that the individual chips are always picked in the same order, and that they use a new tray for each wafer.
(This assumes that when the chips are packaged, they're always picked in a predictable order off the tray, and also placed in a known order on the output tray)
The problem is that it'll probably cost too much in man-hours to plan and verify this to make it worthwhile.
(Unless there's one or more chips with the same faults from every wafer)
Because the main struggle in a wafer fab is to get yields up to an acceptable level. Information on the location and type of defects is vital to improving each process step. And almost all processing steps (25-30 times lithography, 10-15 times material deposition) have defects and inaccuracies that have a distribution over the the wafer surface. If a production line exhibits bad yields a team of process engineers will step in, try to pin down the problem ( "process step 6 aluminium oxyde deposition has a too low thickness on the top right side of the wafer") and adjust the equipment for that step.
Comments
Singular: Die
Plural; Dice or Dies
Looks like we'll all know in a couple weeks.
Jeff
You're welcome.
We do have all die packaged first, then we test. We would love to do a wafer test, but to do is too expensive (money and time) to consider setting up that test at this time.
The packager doesn't give us any information about die placement per packaged part, so we're only going off of what we know from other's experience and technical knowledge shared in the industry to say that it's likely a high percentage of the good parts were near the center and further from the edge.
I realize it is far too late to do this for either Propeller, but I'm curious; given this limitation it would seem that a cheap workaround would be to give each chip on the die a different ID code which could be queried, so that the chips that function at all could announce what their die position was. This would also allow you to test something else I'm curious about, whether things like overclocking and temperature tolerances are affected by die placement. Is there any reason this would be unworkable?
You bet. If it's at all usable, we can make little boards with the chips on them.
This is not really possible, since the mask set is stepped along the wafer for lithography purposes, repeating the same image over and over. There is no way to introduce any unique patterns per chip.
Hopefully, I'll have a Spin2 version of PNUT.EXE to post in the next week, or so. I'll need to write some documentation to explain Spin2.
Here's the interpreter:
Spin2interpreter.spin
Nice work.
I only took a quick glance so far, looks like plenty to chew over
'org snippet' looks interesting...
Making it cache them would be really easy - have a long for each snippet, the bottom 9 bits holding the snippet's cogram address or 0 if it's not loaded, and the rest holding when this snippet was last used. Whenever a new snippet is needed and there isn't enough room for it, the oldest one would be unloaded and replaced with the new one.
Obviously, caching them cuts down on the amount of user PASM space you have, but maybe you could specify how much space you needed and it would make the cache size fill only the remainder?
Mike's right. The hassle of determining whether or not what you wanted was already cached would take longer than just loading what are often 4-long snippets (using RDLONGC, which caches). This is where some kind of content-addressable memory subsystem would help a lot.
Are you planning to document the instruction set used by the Spin interpreter so that it could be used as a target for other languages?
Thanks,
David
I haven't thought about it. Do you think documentation would be necessary for someone to use the code? I figure they'd probably modify it in some ways after figuring out how it works. The resident portion of the interpreter spans from $147 to $1F5, with some of that being variable space, so it's not that much to learn. Some instruction snippets push stuff and other instructions pop stuff, and some do both. It's RAM-based, so it's nothing that would have to be adhered to strictly.
I was thinking that the bytecodes could be made dynamic at compile time, so that only those used need be included in the interpreter. That would shorten the descriptor table and eliminate unused snippets. Right now the whole interpreter is just over 5KB. Most programs would use way less than that. The advantage of keeping the whole interpreter resident leaves the door open for dynamic overlays.
Sure, most of the Spin2 interpreter is hub RAM based and there's some potential benefit in trimming down the instruction set on an application by application basis, but then there's no run-time commonality from application to application ... not a bad thing if you want to think of the interpreter like a run-time library with the Spin2 compiler managing the packaging of the interpreter with the program and the assumption at run-time that there will always be a native interpreter or other "main" native program that will be given control initially. This is opposite of what Spin has assumed on the Prop1, that there's a built-in Spin machine that gets control initially and can surrender control to one of the native processors.
I think the idea of multiple interpreters (for Spin2 or C or whatever) is fine, but the conceptual base then needs to shift to the native instruction set as the primary or initial program to be given control when booted.
I think that as time goes by and the interpreter is improved, there will be many changes to it, making it a moving target for documentation. I think what you said about shifting focus to the native instruction set is the way to go. Spin2 is writ in water.
With the bigger address space, SPIN2 can become a general purpose programming language.
Someone will figure out how to make it interact easily with OS resources and GUI window managers like TK.
Chip, you may as well face it, and accept it now so SPIN2 can grow ....
Good question ;-)
I would like to see better docs so that it could become part of a library to support other languages too. IOW I would like to see a BASIC frontend, and I would like to see a variant of Spin where indentation is not forced (ie using endif, endrepeat/endloop, endcase, or the horrid "{" and "}" from C etc.). Switching between the forced indentation and others should even be possible on the fly by extending the editor. Don't get me wrong, I love the enforced identation, but so many don't like it I think it would be great to provide an alternative.
The overhead of testing to see whether small "snippets" are already loaded is a large percentage of the load time, so as others have said, is detrimental to the timing and a waste of code space.
By being "soft", improvements can be made. Great times ahead.
Thanks again Chip, well done!
Fascinating... I suppose I always envisioned that wafer being exposed all at once by a single great big mask for each step. I am amazed at the multiple disciplines you had to master to make these chips real. I might have thought to design my own CPU and maybe give it cores but the PLL's and counter-timers are genius totally out of my pay grade and the translation to IC mask HEAD EXPLODES.
If you did wafer probe then you could burn fuses to uniquely id each die during test, and keep a record of their locations. Some companies do this. (I could see the military wanting tracking like that, but have no experience with miltary electronics.) But it doesn't always make sense to do wafer probe even if you're setup to do it. Someone needs to run the numbers to see if it pays off. (e.g. not packaging bad parts.) For WLBGAs you typically test in wafer form (and hopefully several die in parallel), and then typically no testing is done after dicing the wafer. The chips that I've worked on lately are WLBGAs, and you can see the ATE pass/fail results mapped onto an image of the wafer.
How do you imagine this working:
1) Compile Spin to native code for your target, say x86, ARM etc? Like a C compiler does.
2) Compile Spin to the standard byte codes and have a native, x86, ARM.., program interpret them? Like a JVM.
One thing that worries me with Spin as a general purpose language is that I'm used to the tight and seamless integration of the high level Spin syntax with the low level PASM. So much so that in my mind that Spin is Spin + PASM. Which leads to:
3) Adopt the instruction set of the target machine for the PASM parts? Results in non-portable hence non-general purpose use.
4) Keep the to PASM instruction set? Requires a Prop simulator on the target.
Interesting idea...
Of course, no one will do that...
The other option is to make certain that when the wafer is cut, that the individual chips are always picked in the same order, and that they use a new tray for each wafer.
(This assumes that when the chips are packaged, they're always picked in a predictable order off the tray, and also placed in a known order on the output tray)
The problem is that it'll probably cost too much in man-hours to plan and verify this to make it worthwhile.
(Unless there's one or more chips with the same faults from every wafer)