Did we bite off more than we can chew?
cgracey
Posts: 14,152
Just wondering aloud here.
I see that there's been a drop-off in activity over the last week here.
Are you guys finding that the Prop2 is not as fun as it could be?
Could it be that the chip is complex enough that you can't just go blazing into it like you could the Prop1? Or, maybe it's complex in unfriendly ways?
Could it be made simpler somehow, but still have lots of features? I know that complexity kills fun.
Is the lack of documentation the reason?
Are we getting bogged down in details? Does the chip's functionality seem ambiguous and not certain? Things have to feel solid to inspire confidence.
Maybe we need to rethink some basics.
I believe that if we had the right formula, people would be all over it, all the time. Maybe some formaldehyde seeped in. Maybe this whole egg-beater thing is giving people vertigo.
Do you have any notions about this?
I see that there's been a drop-off in activity over the last week here.
Are you guys finding that the Prop2 is not as fun as it could be?
Could it be that the chip is complex enough that you can't just go blazing into it like you could the Prop1? Or, maybe it's complex in unfriendly ways?
Could it be made simpler somehow, but still have lots of features? I know that complexity kills fun.
Is the lack of documentation the reason?
Are we getting bogged down in details? Does the chip's functionality seem ambiguous and not certain? Things have to feel solid to inspire confidence.
Maybe we need to rethink some basics.
I believe that if we had the right formula, people would be all over it, all the time. Maybe some formaldehyde seeped in. Maybe this whole egg-beater thing is giving people vertigo.
Do you have any notions about this?
Comments
Eager to write some programs. I had been working on an interrupt driven, simple video driver. Dropped it when I knew you were gonna put the circuit from hot in.
For me, it's just been timing.
I'm also eager to look at all the PASM done so far. Peter and 78, others have been going to town! I'm envious.
Honestly, hot seemed a bit easier at around this stage, but not by too much. I think we all need to settle in some and write programs.
This one is different enough from P1 to require some thought.
I do think we may be too into the details, but we need to just attempt things to know for sure.
Most of the detail discussion has made a lot of sense.
I get bogged down because something doesn't work as I think it should, such as the wfbyte for example, and I try all kinds of stuff because I think I'm not doing it right just to find out that I don't understand how it works. Once I find out, then I understand. It's this coming to understand the P2 that is getting me bogged down although the process is "fun" in retrospect, though frustrating at the time.
Complex? Perhaps but it is its elegance in its complexity that makes it simple too.
The "complexity" is not getting in the way of doing simple things, we don't need a "CMIS" just to blink an LED for instance. Nothing stopping anyone with the hardware from having fun.
Although I would love more documentation that also means that you can't work on P2 while you are doing that. The fact that you are on the forum, read the buried posts, and respond is great compensation as it is.
Looking at it from my point of view in writing Tachyon for P2 for the first time I didn't really have any problems at all. I wasn't forced to use any special features to get it to boot for the first time but I was able to introduce special P2 features into the kernel bit by bit. So no problems here from my POV.
Please keep moving forward. The P2 is a breath of fresh air in an otherwise AVR/ARM microcontroller world. The big guns mostly only have one big barrel but P2 has 16 wonderfully flexible and powerful barrels to do battle with.
Maybe you're tired Chip, and we would be surprised if you weren't. You have been at this night and day for many years and it is hard to be able to step back and see it as we do. Your egg-beater is a good solution although we all know it will never be the perfect solution but there is no easy solution for 16 cogs all wanting first dibs on hub access.
There is something very very unique about the P2 even aside from the architecture. How many chips of this calibre are designed mostly by one man from a relatively small small company assisted with ideas from a multitude of people from many backgrounds around the world who also take it on themselves to commit time, money, and resources to be able to test the design while it is still on paper so to speak and documentation sketchy at best The big guns would love to have this kind of following although they would never want to be involved at this one-on-one level with anyone except maybe the big corporations.
Yes, I've got plenty of notions. Keep up the good work.
P.S. I may edit this post a few times to further clarify my response.
This would be really fun and everyone could get into it:
A very simple 32-bit microcontroller that only has 16 logic, math, and branch instructions, but runs at 10,000 MIPS.
Everyone would grasp that in a minute and be off to the races. I would love something like that and it's dirt simple. It's just that technology doesn't quite allow it, yet.
It seems that when you want to make complicated things possible, you need lots of special hardware, which complicates the whole story.
So a very simple 32-bit microcontroller to me is the P1. The biggest mistake that was made with P2 was losing sight of keeping it simple because of trying to please too many people for too many esoteric situations. This is a tough call as you have your head buried in the project and sometimes we really need "project managers" or other pressures to force us to be more realistic. So many times we are forced by external pressures to bring a project to fruition while we still have years and years of things we want to do. In hindsight we often appreciate that it happened that way as we can now see it in operation and how it is "typically" used and all those esoteric features neglected
The sooner P2 is realized then we can move on to a P3 and then a P4 and so on. The experience gained from letting our babies discover the world shapes the way we deal with the next "baby".
We are here now, and I have both high confidence and plans for this one. It's pretty good.
I have been using P2 every day since the FPGA release on both Prop123-A7 and DE2-115.
So far I have found P2 to be a joy to use and can't wait for the video/smart pins journey to begin.
My "chewing difficulty" seems to be more associated with other stuff taking up precious P2 time.
Keep up the good work Chip, it's much appreciated.
To be honest I think everyone would have been happy with the P1 with more IO, COGs and RAM, we have that now, and what we do have is amazing, and we're all happy with it, it just changes too much to start writing projects, especially changing mnemonics also, maybe if you lock it down? save extra features for P3? as it's gonna need to be better than P2 so best to save some stuff for it :P
A lot of people stopped doing P1 projects because P2 was imminent ( little did we know back then coming soon wasn't soon ) so projects have been held back in preparation for P2's release.
I do want to get started, soon as you release the next FPGA image though now that it has the RGB converter in
As for documentation, you can't document as you go along, as that will slow the progress down, you need to finalise the chip then do documentation, otherwise the changes make the part of the documentation that changes a waste of valuable development time.
PS, this isn't criticism or anything just my thoughts on why you're thinking people aren't having fun.
PPS, I'm sure Ken probably needs to start selling P2s ASAP :P
Non-geniuses like me are big fans. We check progress regularly.
I found the interrupt code a bit confusing at first. The more I worked with it the more it makes sense. The sample code that's been posted has been very helpful in understanding how the P2 instructions work. I think things will become clearer as more people post sample code.
For instructional purposes, it would be good to keep the sample code as P1-like as possible, and only use P2-specific instructions for the feature that is being demonstrated. Sample code that uses every unique feature of the P2 will be hard to follow for beginners that are just trying to understand a specific feature of the P2. Of course, we want to see code that is also highly tweaked for the P2, but for someone starting out on the P2 they'll need to understand the basics first.
I've had concerns about the eggbeater architecture from the first minute it was proposed. The streaming FIFO helps to alleviate my concerns about it, but it might need some tweaking to make it more useful. I like the hubexec mode. The P2 will be a much more powerful chip with hubexec than if it didn't have it. I really like the LUT exec mode. It's nice to have twice as much memory for running cog code.
Edit: Also, I've not had enough time to keep up with all of the P2 changes. To be honest, I don't think I understand the new design very well at all. I'm sure it's great but I need some time to get my head wrapped around it.
Seems more a natural pause for the next release.
Certainly the Assembler side needs a clean up pass, but that is not silicon.
There is also a trade off there, with "rough enough" being tolerable to some, but left too long, you will never be able to clean things, due to the code done already.
Just like code can run in HUB LUT or COG, users will expect the same ASM examples to able to run in GAS and P2PNUT - ie be truly paste portable.
(Anything less is no fun at all.)
- maybe that means GAS needs to be updated to support P2, so the long term important compatibility across both assemblers can be verified.
Remember, it is still in flux, plus currently, is FPGA only, so that will naturally limit the audience to the fringe.
That's why it is important to have the tools and silicon manage the complexities.
Mostly, the Silicon does that nicely, with HUB/COG/LUT being binary compatible.
There is scope for the tools to improve to better simplify ambiguities surrounding addressing.
I think part of the issue is that we have an incomplete product. The smart pins are a critical piece of the P2 design, and will heavily influence how we implement I/O. Because this isn't available, it somewhat necessitates us holding back (at least for some things). Bit banging I/O (like the FDS demo I wrote) was good for hunting down bugs, but I hesitate to do much more than that until the smart pins are available.
I also think part of the issue is premature optimization and fine-tuning that has been going on. Fine-tuning is a worthwhile goal, but I think it needs to wait until *after* the design is complete and initial documentation has been written. Once we know all of what the P2 is capable of, then we can clean up the rough edges!
I do not yet think the P2 has gone too far or in the wrong direction. And I still think it has the potential to be really fun to write code for. Just get the overall design completed. Then we can fine-tune. Then we can really play!
I think it's mostly because things have been changing at a fairly rapid rate, and there isn't full documentation yet.
I think once changes slow down and you have more documentation available, people be more comfortable working with the P2.
There is also the fact that we don't have the smart pins yet, and those offer a lot of the really fun things.
I know what you are up against and there are a huge amount of details that need to be managed properly. As you move forward, life happens around you weather you may be aware of it or not. Today is Sunday ... take a break and spend time with your family this Sunday and every Sunday. Your kids are growing up so fast you may not recognize them and "know them". I say this with the love, desire, respect, and number of hours spent working on a project of this nature, as it has definitely taken a toll on my family, I can only imagine the same is true for you.
In addition I am finalising my RD/WR BYTE/WORD/LONG unit tests program. Nearly there and ready to release it into the wild.
I think the P2 is a "wonderful beastie". Yes, it is complex, but that complexity permits different approaches to the task in hand. With 16 processors that task can be also be large and complex. The cog and lut execution modes provide deterministic timing, when needed, whilst hubexec caters for larger control and general putpose usage, shared libraries, cog images. User interface control would suit hubexec well with its large address space.
As others have stated, documentation would be useful, *but* it is a somwhat pointless exercise with a moving target.
I am also dabbling in a small macro preprocessor for use prior to PNut for assembly. I am sure jmg will be pleased to know I will be using the GAS "dot names", ie, ".macro". There will be a handful of local labels disallowed when it is used.
It's always darkest before the dawn. Just keep plugging away and things will work out okay. Remember though that if you make this one perfect you'll be out of the chip designing business so leave a little something to be included in P3.
Sandy
but beeing able to try P2 without FPGA board sounds great.
I hope Tachyon-P2 will run on it then ;-)
It's true I've been working on this a lot, especially lately. You did tons of time on it, too.
Believe it or not, I actually do take Sundays off, but I work so late into Saturday night that things often have Sunday dates on them. I might peruse the forums on Sunday, but last weekend I willed myself not to. If I can manage to not work on Sunday, the whole week goes a lot better. If I cheat, Monday starts off like a sick horse. It's better to be refreshed and have a good conscience going into the new week.
Thanks for your good advice, Beau.
I think the number of changes has been taxing to those engaged and dissuading the bystanders. I really need to get some doc's written.
Actually, I feel pretty good about where things are, technically. All your inputs have stretched and improved things tremendously over the course of this project. Making WFBYTE and WFWORD not wait for whole longs before writing to the hub is a perfect example. I didn't even realize what a problem it was going to be in some cases, but that's because I haven't written much software for it, yet. Good thing you guys experienced it and we got that straightened out.
I know that if this is done right, it will be contagious. If it's not, it will fizzle. We should be able to make it go pretty decently.
Hubexec, Lutexec and cogexec are great must have features. But the design is still very much in a state of flux.
Meanwhile I have realised we are not going to see a real P2 till later next year at the earliest.
I have had a number of P1 board designs that I started and then stopped each time I thought the P2 was close. Since the P2 is not going to happen soon, I decided to resurrect my P1 designs. Those pcbs and parts are now arriving. This, plus other things are keeping me over-occupied!
While I was looking at P1 designs, I took a look at some other chips out there.
ARM (including multi cores) are a dime a dozen, and with a kitchen sink full of peripherals. Some have large Flash and RAM. But they are a complex mess to program. You need a few months just to read the manuals! There's no fun in this!
MIPS:
PIC32MX... Series can be had with up to 512KB Flash and 64KB RAM for $5-$10.
PIC32MZ... series can be had with up to 2MB Flash and 512KB RAM with USB LS/FS, Ethernet, CAN, etc, 200MHz, for $10-$15.
But again you need to cut down a forest to make the paper just to print the manual! No fun here either.
Of course these chips are supported with HL compilers. Some are free and some are expensive (for hobbists).
What this means to me is that 512KB Hub RAM is no longer considered big. I am concerned about the commercial market for the P2. The forums (both P1 and P2) are no longer a hive of activity, and we seem to have lost many regulars.
Geometries below 180nm seem to have made successive improvements in power consumption.
I am concerned that the egg-beater will use a lot of power if it's fully utilised. If I am not mistaken, it's possible for 16 x 32bit hub accesses on every clock. If users let all cogs run hubexec then we might have another hot chip. I know it's not the only cause, but it was a biggie IIRC.
We know the P1 is easy to program. The P2 will still be easy, depending on what you want to do. But it will mostly be harder although many concepts remain true - 16 cogs/cores means simple programs can be made in standalone objects.
IMHO, I believe the P2 should be suspended temporarily, and a new simpler specified P1+ should be made quickly. Something like...
P1 instruction set plus multiply and divide, and a relative jmpret
No hubexec, but relative jmpret can use additional cog ram like Lutexec
16 cogs with 4KB RAM (16KB in cog 0)
Hub 512KB RAM (or more)
64 I/O
Security fuses
Boot from SPI Flash
160-200MHz
Cog/hub access 1:16 32bit
Simple UART x2/4 in two cogs instead of VGA
No ADC???
Just my 2c
Please don't take my post the wrong way. What you have been doing is absolutely miraculous. I just think we need something out quickly to fill the gap while P2 (call it P3) continues to evolve.
Meanwhile this P1+ (call it P2) would prove OnSemi process, and generate revenue!
And who knows, OnSemi could buy a plant with finer geometry in the meantime.
You are kind of describing what we all wanted years ago. A Prop 1 but more so. With 64 IO pins working. Say 16 cogs and half meg of RAM. Perhaps a few of those other embellishments you mentioned thrown in. Personally I would have ditched the video for serial/SPI/whatever on each COG. Nice, simple, like a Prop!
I'm afraid that your 2c worth, if adopted, would set the PII back another 5 years, cause Chip to have a nervous breakdown (and Ken) and bankrupt Parallax.
No, we have to go with what we have at this stage.
I have not been following the minute details of all the changes going on in the PII architecture for a long while. I'm hoping a perceived complexity can be dealt with in the documentation:
1) Make the ASM mnemonics as much like the PI as possible.
2) Separate documentation for the simple and advanced stuff. That is to say a document containing only all the operations a PI user would be familiar with and allow him to get started with. Another document containing all the weird stuff: interrupt handling, FIFO, LUT, HUB Exec, etc, etc.
Perhaps even a another document dedicated to HUB exec as that seems to be a somewhat different machine.
Oh, and can we change the name of that WFBYTE operation? I can't help but read it with an extra "T" in there
Lol, WWF came to mind for me.