especially since the LMM is already so well-accommodated in the P2 design.
Now that rdwide/wrwide are available, it certainly would be better, however there is still the little matter of fetch then exec, and all the LMM jump macro kludges. If I understand it correctly, having to do fetch and then exec by itself reduces the performance (and potentially causes a hub miss because a jump back to the LMM next label is needed for a stream of 8 or more instructions). Having instructions to replace the kludges would certainly be worth-while if nothing else.
I promise I'll FedEx the first P2 samples to Leavenworth, Kansas no matter how many Atmels or ARMs you are using.
The real estate market looks reasonable in Leavenworth. I've found a nice 10 acre plot upon which to build a P2 commune. Now to convince the wife to let me liquidate my 401k...
I know all the smoke and clamor in the kitchen have some on the verge of dialing 911, but we are on track for a much better Prop2 than we were last April. These last several months, with all the input from the forum members, have been as productive as the prior two years were, working on my own. This is the way to get a project done well.
Please stick around.
Chip
I know the temptation of wanting to get it right, but there is never a right that is a good as the next one. I really wanted to see P2, Mk 1 even, just some silicon with more memory and more I/O and faster. What is happening is that while the product may still be called P2 it is in fact a P3 we will be getting from my point of view We could have already had a P2 with a better P3 coming out soon as well, With P2 Mk 1 I could have mopped the floor many times over and the migration to P3 would have been a lot easier too. Now P1 to P2 that is a P3 is a much bigger jump code wise although it will run all those "emulators" out there very nicely, I've got other processors to do what P2 would have done. If P2 was just a project for fun or I could wait a few years then I would not be holding back with suggestions.
The P2 is real and has been real for some time I have had one generating a 24-bit screen saver for the passed couple of months. Most of the functionality that corresponds to what we know of the P1 hasn't changed very much, except it is faster and better. If you look at pre-release software from the bigs, it goes through a very similar process. So, what we have in our hands is a pre-release version of the P2 . it is still a P2. Anyone can have one. It is dirt cheap (for what you get), and you can order one tomorrow. If that is a mirage it is pretty convincing to me:) Give it a try. You will smile as much as you did when you discovered the P1. Promise.
I know the temptation of wanting to get it right, but there is never a right that is a good as the next one. I really wanted to see P2, Mk 1 even, just some silicon with more memory and more I/O and faster. What is happening is that while the product may still be called P2 it is in fact a P3 we will be getting from my point of view We could have already had a P2 with a better P3 coming out soon as well, With P2 Mk 1 I could have mopped the floor many times over and the migration to P3 would have been a lot easier too. Now P1 to P2 that is a P3 is a much bigger jump code wise although it will run all those "emulators" out there very nicely, I've got other processors to do what P2 would have done. If P2 was just a project for fun or I could wait a few years then I would not be holding back with suggestions.
Peter, at risk of giving you the proverbial kiss of death spot on. (Hopefully you won't now find yourself stalked by an e-thug).
Whatever the real, tangible P2 is going to be, it needs to enter ASAP a tight cycle of: TEST & VERIFY -> FIX (AS NEEDED) -> DOCUMENT -> RINSE & REPEAT.
Those things out of the way, a question that you're among the best here to ask. How do you feel about the new stack instructions, which, based on what's being said anyway, would allow the Prop to operate as a stack machine (or rather, multiple stack machines)? This would seem a boon to Forth development on the Propeller, would that be correct?
That AUX memory area in general is a nice addition. Video ends up being a scan line affair, and on P1 the most expensive operations were color lookups so that color redirection would be possible instead of the often absolute color display drivers we ended up with at higher color depths, and pixel masking. Nobody even attempted blending that I know of.
Now, the AUX memory can serve as an area for these things to happen, and it means waitvid can operate on nice, big chunks of data, freeing the COG to do a lot of things during a scanline.
The P1 seemed kind of pure as it lacked index registers and that stack. Now having both, COG performance is way up with code size down. Couple that with the spiffy REP instruction, which will repeat a block of assembly code X times, and it's just insane how much more concise and speedy things are. A P2 COG, just in terms of capabilities, not comparing speeds / clock or anything like that, is basically 2X a P1 COG. Add in the tasks, and it's potentially 3X+ a P1 COG in terms of what can be packed in there, IMHO.
A look at Chip's monitor listing hints at some of the savings. Ozpropdev's invaders game shows off a bunch more. it's nice, because we can start out writing P1 looking code, then when things start to get tight, or as skill improves, code size will drop over and over, until we are writing very dense code. I think people will like that dynamic.
Can't wait to see what Peter does with a P2 COG and Forth. We will all probably be amazed at what one COG can do, just as many of us are now.
I think there is a dynamic here, which if unappreciated leads to undue concern. If waiting improves the performance by about a log unit and decreases cost by a smaller fraction…wouldn't we all be willing to wait just a little?
There are two choices to be made: WHAT to build and HOW to build it. Only a very small group can possibly know for sure what the trade-offs in time and cost actually are and what the benefits might be worth in the near and far term. These choices are in good hands.
Well, I once read of a university wanting to do some intense physics computations. The supercomputer they had at the time would take X at a performance of Y, resulting in a run time of a few years. The computer next year would perform at Y*A, and waiting to do that computation would actually mean it gets done sooner, or something like that.
IMHO, it's worth it to pack in what we can, latch it and get this iteration done.
Something I picked up on that JMG actually put out here by inference is this is more like taking times at bat than it is chasing a moving goal. Always keep building the chip, and latch the design at a synthesis / fab / shuttle attempt. The first one was unsuccessful for a variety of reasons. So improve again, now we will latch it for the second attempt, ideally a successful one. If successful, fork things and work on P3, potentially continue with both that and a future P2 variant as well.
Could be a great process. A lot depends on the next shuttle.
I know the temptation of wanting to get it right, but there is never a right that is a good as the next one. I really wanted to see P2, Mk 1 even, just some silicon with more memory and more I/O and faster.
Absolutely everybody did! And Parallax spent a chunk o' change to make it so.
But as time moves on (which it did), so does the competition and so does your own ability to improve and enhance.
KC_Rob makes a good point, I think. Perhaps the improved design will pay off big with TF.
KC_Rob makes a good point, I think. Perhaps the improved design will pay off big with TF.
I sincerely hope that Peter will see this, investigate, and answer. He may not be up on all the latest "stack" developments, which he certainly can't be faulted for, considering how much time and effort it takes to keep up on the threads here. I'd like to see his feedback, though. If what I describe, which is based only on what I've been told, is truly the case, P2 could end up being a Forth programmer's dream machine. But Peter will have to be the judge of that.
If the P2 recipe has been agreed upon and I know that they have put it together and it's been put into the oven then I will be planning my dinner accordingly. Whether this is the the P2.0 or the P3 I will work with what I've got, but I can't work with it if I ain't got it. Just like with P1 I tailored the Forth to suit the hardware (and even the tools), so even the basic P2 from last year with TF would have resulted in more than a few gahs. As I stated before I'm just a little annoyed that P2 has been delayed but if "improvements" delay it much longer I may end up having MacDonalds for dinner before I starve. So lock down the specs and get that P2 in the oven so we can plan a feast is what I say.
Yes, P2TF will be a monster in the same way that developers use embedded PCs because they can develop their code on the target system I think this will be the way with TF. Even if you don't connect a keyboard and screen to it you can still use a serial or Bluetooth terminal on a tablet in screen edit mode. Maybe a simple network connection would allow the P2 to access source files from Dropbox itself so you could edit the Dropbox file from a tablet even (thinking DroidEdit here) and tell "a" P2 via Telnet to load that remote file. Either way it just means that the file and network layers will be part of the kernel and you develop your code on top of those layers as well as use those layers during development.
Yes, P2TF will be a monster in the same way that developers use embedded PCs because they can develop their code on the target system I think this will be the way with TF. Even if you don't connect a keyboard and screen to it you can still use a serial or Bluetooth terminal on a tablet in screen edit mode. Maybe a simple network connection would allow the P2 to access source files from Dropbox itself so you could edit the Dropbox file from a tablet even (thinking DroidEdit here) and tell "a" P2 via Telnet to load that remote file. Either way it just means that the file and network layers will be part of the kernel and you develop your code on top of those layers as well as use those layers during development.
I can easily imagine how an interactive programming, almost OS-like, environment like this with very very little performance penalty! in which to do real-time/embedded work would be desirable. Which is why all this talk has my curiosity piqued.
They're used to enable threaded code (not threading in the sense of multitasking, but this kind: http://en.wikipedia.org/wiki/Threaded_code). Since Forth is a zero-address virtual machine, threaded code for Forth consists of just a list of subroutine addresses. Here's a post I wrote about the needed instructions a short time ago:
They're used to enable threaded code (not threading in the sense of multitasking, but this kind: http://en.wikipedia.org/wiki/Threaded_code). Since Forth is a zero-address virtual machine, threaded code for Forth consists of just a list of subroutine addresses. Here's a post I wrote about the needed instructions a short time ago:
As you point out, the Super-8 has special instructions that enhance Forth performance but is not a stack machine per se. (Whether P2 actually has/will have instructions that enhance Forth performance remains to be seen - no little confusion there right now.)
The stack machines that I remember from (limited) first-hand experience years ago were 4-bitters similar to, for example, the Atmel ATAM894 (programmer's guide).
LOL! I hereby disclaim any incidental or consequential damages to the P2's schedule.
-Phil
Sort of like the cement mixer, or some other construction vehicle, zooming down the highway, stuff flying off everywhere, with a sign that says "Not responsible for damages caused by debris." =D
Another perspective/reference, for anyone interested enough, would be the J1 Forth CPU, a stack-based CPU implemented in 200 lines of Verilog.
The last time I looked at the J1 I didn't have two FPGA boards collecting dust. It looks like a great project for a cold winter.
Perhaps Chip could bury the J1 functionality deep in the recesses of each COG2, to be enabled by an unpublished code word communicated only in the aforementioned secret meeting in the deep woods.
Guys, there haven't been any Super-8s available for more than 20 years. The only ones I ever used were NMOS (requiring a heatsink), and they were samples. It was a real shame: Zilog killed an amazing chip. I used one of the samples to drive a 4-axis CNC. The backend was written in S8 assembler; the frontend, in Forth. It was a happy marriage, and both were a pleasure to program in.
Guys, there haven't been any Super-8s available for more than 20 years. The only ones I ever used were NMOS (requiring a heatsink), and they were samples. It was a real shame: Zilog killed an amazing chip. I used one of the samples to drive a 4-axis CNC. The backend was written in S8 assembler; the frontend, in Forth. It was a happy marriage, and both were a pleasure to program in.
The IA88COO is a form, fit and function replacement for the original Zilog Z88C00 microcontroller.
Innovasic Semiconductor produces replacement ICs using its MILESTM, or Managed IC Lifetime
Extension System, cloning technology. This technology produces replacement ICs far more complex than
"emulation" while ensuring they are compatible with the original IC. MILESTM captures the design of a
clone so it can be produced even as silicon technology advances. MILESTM also verifies the clone against
the original IC so that even the "undocumented features" are duplicated.
These chips are just convenience life extenders for those poor suckers who were early adopters of the S8. That probably explains the high price.
Comments
-Phil
I think this could work out well for Parallax.
Now that rdwide/wrwide are available, it certainly would be better, however there is still the little matter of fetch then exec, and all the LMM jump macro kludges. If I understand it correctly, having to do fetch and then exec by itself reduces the performance (and potentially causes a hub miss because a jump back to the LMM next label is needed for a stream of 8 or more instructions). Having instructions to replace the kludges would certainly be worth-while if nothing else.
The real estate market looks reasonable in Leavenworth. I've found a nice 10 acre plot upon which to build a P2 commune. Now to convince the wife to let me liquidate my 401k...
;-)
I know the temptation of wanting to get it right, but there is never a right that is a good as the next one. I really wanted to see P2, Mk 1 even, just some silicon with more memory and more I/O and faster. What is happening is that while the product may still be called P2 it is in fact a P3 we will be getting from my point of view We could have already had a P2 with a better P3 coming out soon as well, With P2 Mk 1 I could have mopped the floor many times over and the migration to P3 would have been a lot easier too. Now P1 to P2 that is a P3 is a much bigger jump code wise although it will run all those "emulators" out there very nicely, I've got other processors to do what P2 would have done. If P2 was just a project for fun or I could wait a few years then I would not be holding back with suggestions.
1. "Better is the enemy of Good," and
2. "Gratification delayed is gratification denied."
You can't imagine how at odds with my upbringing both of those are, and it's a daily struggle to observe them.
-Phil
The P2 is real and has been real for some time I have had one generating a 24-bit screen saver for the passed couple of months. Most of the functionality that corresponds to what we know of the P1 hasn't changed very much, except it is faster and better. If you look at pre-release software from the bigs, it goes through a very similar process. So, what we have in our hands is a pre-release version of the P2 . it is still a P2. Anyone can have one. It is dirt cheap (for what you get), and you can order one tomorrow. If that is a mirage it is pretty convincing to me:) Give it a try. You will smile as much as you did when you discovered the P1. Promise.
Rich
Whatever the real, tangible P2 is going to be, it needs to enter ASAP a tight cycle of: TEST & VERIFY -> FIX (AS NEEDED) -> DOCUMENT -> RINSE & REPEAT.
Those things out of the way, a question that you're among the best here to ask. How do you feel about the new stack instructions, which, based on what's being said anyway, would allow the Prop to operate as a stack machine (or rather, multiple stack machines)? This would seem a boon to Forth development on the Propeller, would that be correct?
That AUX memory area in general is a nice addition. Video ends up being a scan line affair, and on P1 the most expensive operations were color lookups so that color redirection would be possible instead of the often absolute color display drivers we ended up with at higher color depths, and pixel masking. Nobody even attempted blending that I know of.
Now, the AUX memory can serve as an area for these things to happen, and it means waitvid can operate on nice, big chunks of data, freeing the COG to do a lot of things during a scanline.
The P1 seemed kind of pure as it lacked index registers and that stack. Now having both, COG performance is way up with code size down. Couple that with the spiffy REP instruction, which will repeat a block of assembly code X times, and it's just insane how much more concise and speedy things are. A P2 COG, just in terms of capabilities, not comparing speeds / clock or anything like that, is basically 2X a P1 COG. Add in the tasks, and it's potentially 3X+ a P1 COG in terms of what can be packed in there, IMHO.
A look at Chip's monitor listing hints at some of the savings. Ozpropdev's invaders game shows off a bunch more. it's nice, because we can start out writing P1 looking code, then when things start to get tight, or as skill improves, code size will drop over and over, until we are writing very dense code. I think people will like that dynamic.
Can't wait to see what Peter does with a P2 COG and Forth. We will all probably be amazed at what one COG can do, just as many of us are now.
There are two choices to be made: WHAT to build and HOW to build it. Only a very small group can possibly know for sure what the trade-offs in time and cost actually are and what the benefits might be worth in the near and far term. These choices are in good hands.
Rich
IMHO, it's worth it to pack in what we can, latch it and get this iteration done.
Something I picked up on that JMG actually put out here by inference is this is more like taking times at bat than it is chasing a moving goal. Always keep building the chip, and latch the design at a synthesis / fab / shuttle attempt. The first one was unsuccessful for a variety of reasons. So improve again, now we will latch it for the second attempt, ideally a successful one. If successful, fork things and work on P3, potentially continue with both that and a future P2 variant as well.
Could be a great process. A lot depends on the next shuttle.
Absolutely everybody did! And Parallax spent a chunk o' change to make it so.
But as time moves on (which it did), so does the competition and so does your own ability to improve and enhance.
KC_Rob makes a good point, I think. Perhaps the improved design will pay off big with TF.
-Phil
Phil, what do those do? I know you are now wishing you hadn't brought it up, but I'm curious. Others may be, too.
http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1225609&viewfull=1#post1225609
http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1214465&viewfull=1#post1214465
Yes, P2TF will be a monster in the same way that developers use embedded PCs because they can develop their code on the target system I think this will be the way with TF. Even if you don't connect a keyboard and screen to it you can still use a serial or Bluetooth terminal on a tablet in screen edit mode. Maybe a simple network connection would allow the P2 to access source files from Dropbox itself so you could edit the Dropbox file from a tablet even (thinking DroidEdit here) and tell "a" P2 via Telnet to load that remote file. Either way it just means that the file and network layers will be part of the kernel and you develop your code on top of those layers as well as use those layers during development.
So... let's make n bake.
-Phil
As you point out, the Super-8 has special instructions that enhance Forth performance but is not a stack machine per se. (Whether P2 actually has/will have instructions that enhance Forth performance remains to be seen - no little confusion there right now.)
The stack machines that I remember from (limited) first-hand experience years ago were 4-bitters similar to, for example, the Atmel ATAM894 (programmer's guide).
Sort of like the cement mixer, or some other construction vehicle, zooming down the highway, stuff flying off everywhere, with a sign that says "Not responsible for damages caused by debris." =D
http://www.ultratechnology.com/chips.htm
Zero Stock levels, and Digikey tags it as "Obsolete item" - still, the $20 region price is something the P2 could compete with !!
-Tor
The last time I looked at the J1 I didn't have two FPGA boards collecting dust. It looks like a great project for a cold winter.
Perhaps Chip could bury the J1 functionality deep in the recesses of each COG2, to be enabled by an unpublished code word communicated only in the aforementioned secret meeting in the deep woods.
-Phil
Besides Zilog, Samsung also dabbled in Super-8 and I see Innovasic offer an alternative
http://www.innovasic.com/Products/ia88c00
It does not look like any of the Samsung parts that were sold to Zilog, were Super8, just Z8 variants.
- still an expensive part, for ROMLESS.
Innovasic Semiconductor produces replacement ICs using its MILESTM, or Managed IC Lifetime
Extension System, cloning technology. This technology produces replacement ICs far more complex than
"emulation" while ensuring they are compatible with the original IC. MILESTM captures the design of a
clone so it can be produced even as silicon technology advances. MILESTM also verifies the clone against
the original IC so that even the "undocumented features" are duplicated.
These chips are just convenience life extenders for those poor suckers who were early adopters of the S8. That probably explains the high price.
-Phil