Okay, I have hit a snag.... Anyone know of a common FBGA-54 8-bit (x8) IO pinout (and some other x8 BGA-54 SDRAMs) before I gamble away the half of memory size by ditching the upper 16-bit (x16)? I tried Micron, all FBGA-60... T___T I thought they actually stock x8 FBGA-54 while ISSI do... I have tried others, all TSOP-54...
You guys won't run into trouble finding TSOP-54 SDRAM, while I am with BGA-54 ones...
Besides, I am keeping trying.
Also, I may have to change the BGA solder layout and redo the viases on PCB in KiCAD... I have to ask first, though (did ask ISSI and will be waiting for the answer tomorrow... And Micron doesn't have the direct request service, only registered users can do away with that... T____T)
One last thing; Humanoido and Heater, you guys said that video card's AVU (Advanced Vector Units - streaming processors) aren't worth messing with, right? Well, yes: the kernel are b*tch to work with - one tiny mistake, and you would be greeted with funky "broken LCD" syndrome that I have been experiencing all over again on my AMD Radeon HD 4670 video card on Linux Mandriva 2009 x64 last year (before I updated that stupid firmware with hacked version)... It really take lot of attentions but if done right, you can even do something like 3D galaxy simulation. I got it cheap due to the firmware bugs that affected lot of Gigabyte AMD Radeon HD 4670 video cards and I hacked the firmware (via hex editor) as the firmware updater was stubborn - fixed it, and enjoying it.
Before Propeller II version board is to be even in KiCAD PCB / schematic editor, I may experiment on my Radeon HD 4670 to give me an idea of what the raw horsepower is like - 32 AVUs for 32 Cogs and more. (And I am glad I got 1GB version of this video card as loading SPIN emulator will eat up some memory before actually touching anything else...) And, I would use OpenCL to make my life a bit easier, if that's asked for.
Finally got my workstation out of storage and swapped the CPU from Athlon 64 X2 Brisbane with Phenom II X4 940 CPU, and it was insanely fast and very powerful -good enough for my kind of work (because I like to work quickly).
And, I decided to do two version of BGA-54 PCB - both native x8 and x16 rewired for x8 operation. Better to do both in case I am getting in more trouble finding native x8 BGA-54 SDRAM.
Got a reply from the rep, IS4245583200D / G is the best bet - not too bad, about 32MB. Hopefully I can get one-off, and if not, time to gamble the half of memory...
And I have been moving around the 74LVT573 to suit the circuitry without violating the DRC in real-life term (I disabled DRC in KiCAD as it's quite annoying....) and to keep the real estate minimized. I have to figure it out.... TSOP is quite simple, but you know, TQFP-208 can be a PITA to sketch the wiring up in CAD....
I've been trying to port ext2 to the prop as I do not like/want to use anything FAT. Simple, easy read-only support exists from the GRUB project. The code is nowadays "contaminated" with support for ext3/ext4. Stripping it could be laborious thus I suggest to start with an older version from the times when ext3 was not there. This code has to be translated/rewritten from C to... something else (unless you use XMM/LMM Catalina of course) Still on the drawing board.
I agree. Maybe you can try IBM's i-FUSE file system generator, as long as SDRAM is used (which I would) - I was thinking about trying the ATFS (from my own OS project), but I really doubt the P8X32A would be able to read 512-bit hash table (where the encryption table is - even if not encrypted, it is helpful especially when defragmenting the whole volume, similar to the way Linux do that jobs upon booting up), so ATFS may be shot but I may keep on trying... Vinculum II firmware is just a bit difficult to write, so I may opt for read-through pipelining (to allow P8X32A to read the content on HDD without having the firmware intercepting too much). If either Ext2FS and i-FUSE is shot, looks like we may have to write the custom FS. (I may modify the way ATFS is to be written, or just do "s= hook" thing to make the FS handler small enough.)
Propeller II may be able to do ATFS encryption / decryption, maybe... But for now, we're stuck with P8X32A for few months...
(BTW, ATFS = Advanced Technology File System and my own encryption method for this volume is a bit harsh on P8X32A - 512-bit RHC [Random Hash Clustering], basically random-generated hash - the encryption table is saved into the file directory partition to hold the encryption keys which is like 4,096 bit wide, thus hiding everything securely, only the boot sector and the firmware know where to find. I will strip the encryption support from this FS for P8X32A)
One last thing; Humanoido and Heater, you guys said that video card's AVU (Advanced Vector Units - streaming processors) aren't worth messing with, right? Well, yes: the kernel are b*tch to work with - one tiny mistake, and you would be greeted with funky "broken LCD" syndrome that I have been experiencing all over again on my AMD Radeon HD 4670 video card on Linux Mandriva 2009 x64 last year (before I updated that stupid firmware with hacked version)... It really take lot of attentions but if done right, you can even do something like 3D galaxy simulation. I got it cheap due to the firmware bugs that affected lot of Gigabyte AMD Radeon HD 4670 video cards and I hacked the firmware (via hex editor) as the firmware updater was stubborn - fixed it, and enjoying it.r.
Things can change very fast in this field, especially when working on projects, doing research and learning new things. I now fully believe in integrating GPUs as peripherals into large Propeller machines because I may have found a way to do it (more detail was cited in the the Bg Brain thread and more will be discussed in the future). Previously, you had to learn new and complicated software, exactly match the hardware flavoring, purchase at great expense the required software and hardware and go through a great learning curve on your own. If you got stuck, well that could be the end. If you pull these cards out of the computer or purchase cards individually for sole operating, you'll run into a great number of challenges.
AMD Radeon has gone up at lighting speed and increased their numbers of GPUs dramatically. As you have noted, a GB is now available, and 2Gb for upgrade. You can get over 5,000 processors but more cost effective is their 900 range boards. I'm currently working towards methods and plans to integrate these with Propellers. There are numerous gains, not to mention the TFlops speed, high processor count, and added FP. As someone pointed out, it's not an all prop project. But sometimes you just have to add peripherals. It's still a Prop Machine. Currently I have one of these GPU cards in the order works for the Big Brain.
I am too. Propeller II doesn't have the FPU so I may have to give it the "hooks" for them to use the AVUs on Radeon HD processor (whatever series that would become cheaper by the time Dendou Oni's completed), so they won't have to worry about touching the messy stuff here.
And I like using AMD Radeon HD series cards, they're great stuffs! I have HD 4670 in the workstation I am writing this sentence right now. Of course, you can actually modify the firmware on the Radeon processors (very hazardous, but I did it on this card as it got firmware bug: the reason I got it cheap at the computer shop).
Oh, and one more thing: It can do everything out-of-order - vertex, floating points, polygons and color matrices. That's the thing that already have given this GPU the prowess. And at the same times, it uses lesser power, except for the ones that are either beefy or on steroids but much faster so watt = performance.
I'm having a hard time understanding this thread. I sounds like Dr. Mario or Humanoido know how to interface the Prop with an AMD Radeon HD video card.
Dr. Mario or Humanoido, can you or have you interfaced the Prop with a AMD Radeon HD video card?
I'm having a hard time understanding this thread. I sounds like Dr. Mario or Humanoido know how to interface the Prop with an AMD Radeon HD video card.
Dr. Mario or Humanoido, can you or have you interfaced the Prop with a AMD Radeon HD video card?
Dr. Mario, as you can see, we're on trial! We'd better plead the fifth! LOL
Mike G, I'll post a reply after a while in the Big Brain thread because my project with gpu's is a part of it.
The issues with PCIe. The PCIe has strict trace routing rules - if you're familiar with DDR-II "wiring" rules, then it's obvious to you as well, and the full-size PCIe x16's difficult to use (eg. TOO FAST - 128 Gigabytes per seconds for PCIe 2.0) but you would be thankfully here, though, as Radeon will still be happy with x1 bus (but you will need to apply the power to all x16 power planes to avoid having problem in that department). You can try Cyclone III FPGA PCIe IP as well, or stack as much Propeller as possible and link them up to TTL-LVDS converter chips.
Google PCIe pinout and it should be obvious. I would post the link, but I may have to make sure as there's finer prints in posting rules... But if it's okay, I will in meantime (as in ASAP).
PCIe, in other hands, is very easy to interface as it still use very much run-of-the-mill PCI bus driver (which you can find in any Linux Distro you may have then study the code - it should be simple to construct the PCI driver out of scratch, but you may need to create the firmware kernel too to boot some specific software up such as PCI driver and modified VESA / FOSS driver to bring up GPU).
PCIe x1 video card is just easy to do away with cheap 65nm FPGA (like Cyclone III) which you can use as host interface controller.
And... Humanoido... You're fast. I replied after you (before I added this sentence in post editing tool...)
I readed the posted link, you pretty much covered my supercomputer plans - it's to pair lot of Propellers with VPU card. Looks like we are in the same boat. ^___^;
Mine's a bit different, though - I plan to start out with 8 Propeller II with its own 16 - 32MB SDRAM in a configuration similar to what the InMOS transputers were set up - with specific software, it may have to maybe break the rules of code-ordering, thus saving the cycles and computing times. I am still with P8X32A (I got DIP-40 too so I may have to whip up test branch-prediction based out-of-order schedulers for each and every Cogs to basically skip the WAITCNT while already-impatient threads can get in the cycle gaps - shortening the time needed to do away with their jobs.)
I guess it's fine as I found the messy details on the video cards on Humanoido's links. No wonder I like to work with AMD parts... (AMD's also apparently interested in Open Source Computing too, least what I heard.)
I do not mind. It's alright - as long as we're talking supercomputer, it's awesome. Something would be useful here too!
And, I am as interested in what makes PCIe ticks by both software and hardware definition too. (BTW Dendou Oni will have its own thread which I will start as soon as the prototype board's baked.)
Thanks. I look forward to seeing the Dendou Oni Supercomputer thread.
One thing looked at briefly but not entirely resolved with multiple AMDs is the bus frame. One could get several of these cards linked together on a bus in the right flavor however beyond that number it could be challenging. So for now, one could be entirely content to stick with one or possibly up to four linked AMD cards using available technology bus framework.
It will most likely initially be one unit and possibly mid-range, due to expense. When these boards drop in price and newness, it will be a field day for getting multiple boards at over 5,000 processors per board cheaply. (There's an entire store in Taiwan just devoted to these lower cost GPU boards and competition and outdating continues to drive costs lower and lower) However, by that time, prime boards could be around 50,000 to 100,00 processors.
Look at those cards now, they've gone from 320 processors to over 5,000 in a short period of time. This is all thanks to gamers! Did you ever think gaming would have advanced technology so dramatically? You could take 500 of these two-TeraFLOP boards and end up with a PetaFLOP supercomputer. Maybe you already have that in mind...
Jazzed, yea - I have been thinking about it. It would be nice too.
Humanoido - not only gamers, but the scientists depend on video card too to carry out very "f***ed my brain up" complicated floating-point mathematics, to be done away within shorter time. And, yea - 5,000 AVU is nice, but also consider Moore's law: for 5,000 AVUs, the GPU die would contain up to 15 billions of 11nm transistors. It's very startlingly surprising that the technology becomes more and more advanced rapidly.
And, about bus frame... Apparently, LucidLogix have nailed on this part. Too bad, they won't sell it (the Hydra series chipset) as one-off...
But, still - there are PCIe switches that we can use. For few card, the "dumb" switch is alright, but in our scales, they need to be a bit more intelligent (in this case the switcher chip containing CPU core).
I am not sure about PLX Technologies Inc. as Mouser apparently sells them, but the PLX just want NDA - which is a bit odd (to me)...
And, one post in Big Brain thread piqued my interest, about the Turing programming language, Brainf*** - it was especially interesting ans amusing.
However, in Wikipedia, it was said that this programming language is useless - in my opinion, "not if it's in the wrong hand" - it can be quite useful if it's understood, pretty much like learning to drive a car.
Few peoples also talked of using it in Propeller - I am interested in this idea too. (It's a code space saver too.)
And, one post in Big Brain thread piqued my interest, about the Turing programming language, Brainf*** - it was especially interesting ans amusing.
However, in Wikipedia, it was said that this programming language is useless - in my opinion, "not if it's in the wrong hand" - it can be quite useful if it's understood, pretty much like learning to drive a car.
Few peoples also talked of using it in Propeller - I am interested in this idea too. (It's a code space saver too.)
Yup. It's especially interesting, but like the name of this programming language implies, it really can mess your mind up trying to understand how it really work.
Really interesting, and I am wondering if I could do the operating system kernel for Propeller with this programming language, providing I am willing to risk that P8X32A-D40 chip if I am more likely to make a fatal mistake.
And, the best part of brainf***ing the P8X32A is that fact you don't have to worry about the keyboard layout, like that Russian keyboard, or any of the keyboard layouts - just few comment symbols and that's it. That's especially one of the nicest thing I ever have seen.
Yet, I am getting a bit more and more curiouser....
I also took a look at a video at Youtube website under the topic, "AVR gets brainf***ed" too, and it's an example of very simple programming language here.
Yet, I am getting a bit more and more curiouser....
I also took a look at a video at Youtube website under the topic, "AVR gets brainf***ed" too, and it's an example of very simple programming language here.
Practically speaking, all it takes is trying to figure out how the 99bottles.bf demo works to deter any serious human development. For me BF is a "write once only" language because once you write it, you can't easily maintain it past a few days. A more general compiler could be written to make it more useful, but execution speed would likely be the next problem.
Looks like I will stick with ASM then put in BF executor threads after the kernel threads (by the ASM-assembled OS kernel).
One problem with BF.... Jazzed really get me on this thing: If I would somehow get the Cog to execute the codes out-of-order, how to retain the codes while optimize it on the fly? From the look of it, it's best to run it in-order, unmodified.
Humanoido, which benchmark do you think is the best for Propeller? I may want to get it then extract the SPIN source codes and create an Out-of-Order execution demo.
Comments
You guys won't run into trouble finding TSOP-54 SDRAM, while I am with BGA-54 ones...
Besides, I am keeping trying.
Also, I may have to change the BGA solder layout and redo the viases on PCB in KiCAD... I have to ask first, though (did ask ISSI and will be waiting for the answer tomorrow... And Micron doesn't have the direct request service, only registered users can do away with that... T____T)
Before Propeller II version board is to be even in KiCAD PCB / schematic editor, I may experiment on my Radeon HD 4670 to give me an idea of what the raw horsepower is like - 32 AVUs for 32 Cogs and more. (And I am glad I got 1GB version of this video card as loading SPIN emulator will eat up some memory before actually touching anything else...) And, I would use OpenCL to make my life a bit easier, if that's asked for.
And, I decided to do two version of BGA-54 PCB - both native x8 and x16 rewired for x8 operation. Better to do both in case I am getting in more trouble finding native x8 BGA-54 SDRAM.
And I have been moving around the 74LVT573 to suit the circuitry without violating the DRC in real-life term (I disabled DRC in KiCAD as it's quite annoying....) and to keep the real estate minimized. I have to figure it out.... TSOP is quite simple, but you know, TQFP-208 can be a PITA to sketch the wiring up in CAD....
Propeller II may be able to do ATFS encryption / decryption, maybe... But for now, we're stuck with P8X32A for few months...
(BTW, ATFS = Advanced Technology File System and my own encryption method for this volume is a bit harsh on P8X32A - 512-bit RHC [Random Hash Clustering], basically random-generated hash - the encryption table is saved into the file directory partition to hold the encryption keys which is like 4,096 bit wide, thus hiding everything securely, only the boot sector and the firmware know where to find. I will strip the encryption support from this FS for P8X32A)
Things can change very fast in this field, especially when working on projects, doing research and learning new things. I now fully believe in integrating GPUs as peripherals into large Propeller machines because I may have found a way to do it (more detail was cited in the the Bg Brain thread and more will be discussed in the future). Previously, you had to learn new and complicated software, exactly match the hardware flavoring, purchase at great expense the required software and hardware and go through a great learning curve on your own. If you got stuck, well that could be the end. If you pull these cards out of the computer or purchase cards individually for sole operating, you'll run into a great number of challenges.
AMD Radeon has gone up at lighting speed and increased their numbers of GPUs dramatically. As you have noted, a GB is now available, and 2Gb for upgrade. You can get over 5,000 processors but more cost effective is their 900 range boards. I'm currently working towards methods and plans to integrate these with Propellers. There are numerous gains, not to mention the TFlops speed, high processor count, and added FP. As someone pointed out, it's not an all prop project. But sometimes you just have to add peripherals. It's still a Prop Machine. Currently I have one of these GPU cards in the order works for the Big Brain.
And I like using AMD Radeon HD series cards, they're great stuffs! I have HD 4670 in the workstation I am writing this sentence right now. Of course, you can actually modify the firmware on the Radeon processors (very hazardous, but I did it on this card as it got firmware bug: the reason I got it cheap at the computer shop).
Oh, and one more thing: It can do everything out-of-order - vertex, floating points, polygons and color matrices. That's the thing that already have given this GPU the prowess. And at the same times, it uses lesser power, except for the ones that are either beefy or on steroids but much faster so watt = performance.
Dr. Mario or Humanoido, can you or have you interfaced the Prop with a AMD Radeon HD video card?
Dr. Mario, as you can see, we're on trial! We'd better plead the fifth! LOL
Mike G, I'll post a reply after a while in the Big Brain thread because my project with gpu's is a part of it.
Mike - you can try Humanoido's notes as soon as it get published.
Humanoido, I would look at it too.
A schematic or block diagram might help us follow along the road map here.
http://forums.parallax.com/showthread.php?124495-Fill-the-Big-Brain&p=1008443&viewfull=1#post1008443
The issues with PCIe. The PCIe has strict trace routing rules - if you're familiar with DDR-II "wiring" rules, then it's obvious to you as well, and the full-size PCIe x16's difficult to use (eg. TOO FAST - 128 Gigabytes per seconds for PCIe 2.0) but you would be thankfully here, though, as Radeon will still be happy with x1 bus (but you will need to apply the power to all x16 power planes to avoid having problem in that department). You can try Cyclone III FPGA PCIe IP as well, or stack as much Propeller as possible and link them up to TTL-LVDS converter chips.
Google PCIe pinout and it should be obvious. I would post the link, but I may have to make sure as there's finer prints in posting rules... But if it's okay, I will in meantime (as in ASAP).
PCIe, in other hands, is very easy to interface as it still use very much run-of-the-mill PCI bus driver (which you can find in any Linux Distro you may have then study the code - it should be simple to construct the PCI driver out of scratch, but you may need to create the firmware kernel too to boot some specific software up such as PCI driver and modified VESA / FOSS driver to bring up GPU).
PCIe x1 video card is just easy to do away with cheap 65nm FPGA (like Cyclone III) which you can use as host interface controller.
And... Humanoido... You're fast. I replied after you (before I added this sentence in post editing tool...)
I readed the posted link, you pretty much covered my supercomputer plans - it's to pair lot of Propellers with VPU card. Looks like we are in the same boat. ^___^;
Mine's a bit different, though - I plan to start out with 8 Propeller II with its own 16 - 32MB SDRAM in a configuration similar to what the InMOS transputers were set up - with specific software, it may have to maybe break the rules of code-ordering, thus saving the cycles and computing times. I am still with P8X32A (I got DIP-40 too so I may have to whip up test branch-prediction based out-of-order schedulers for each and every Cogs to basically skip the WAITCNT while already-impatient threads can get in the cycle gaps - shortening the time needed to do away with their jobs.)
And, I am as interested in what makes PCIe ticks by both software and hardware definition too.
(BTW Dendou Oni will have its own thread which I will start as soon as the prototype board's baked.)
One thing looked at briefly but not entirely resolved with multiple AMDs is the bus frame. One could get several of these cards linked together on a bus in the right flavor however beyond that number it could be challenging. So for now, one could be entirely content to stick with one or possibly up to four linked AMD cards using available technology bus framework.
It will most likely initially be one unit and possibly mid-range, due to expense. When these boards drop in price and newness, it will be a field day for getting multiple boards at over 5,000 processors per board cheaply. (There's an entire store in Taiwan just devoted to these lower cost GPU boards and competition and outdating continues to drive costs lower and lower) However, by that time, prime boards could be around 50,000 to 100,00 processors.
Look at those cards now, they've gone from 320 processors to over 5,000 in a short period of time. This is all thanks to gamers! Did you ever think gaming would have advanced technology so dramatically? You could take 500 of these two-TeraFLOP boards and end up with a PetaFLOP supercomputer. Maybe you already have that in mind...
Humanoido - not only gamers, but the scientists depend on video card too to carry out very "f***ed my brain up" complicated floating-point mathematics, to be done away within shorter time. And, yea - 5,000 AVU is nice, but also consider Moore's law: for 5,000 AVUs, the GPU die would contain up to 15 billions of 11nm transistors. It's very startlingly surprising that the technology becomes more and more advanced rapidly.
And, about bus frame... Apparently, LucidLogix have nailed on this part. Too bad, they won't sell it (the Hydra series chipset) as one-off...
But, still - there are PCIe switches that we can use. For few card, the "dumb" switch is alright, but in our scales, they need to be a bit more intelligent (in this case the switcher chip containing CPU core).
I am not sure about PLX Technologies Inc. as Mouser apparently sells them, but the PLX just want NDA - which is a bit odd (to me)...
However, in Wikipedia, it was said that this programming language is useless - in my opinion, "not if it's in the wrong hand" - it can be quite useful if it's understood, pretty much like learning to drive a car.
Few peoples also talked of using it in Propeller - I am interested in this idea too. (It's a code space saver too.)
You will see I completely agree: http://forums.parallax.com/showthread.php?117194-Urban-M%FCller-s-Brainf***-Language-on-Propeller
Really interesting, and I am wondering if I could do the operating system kernel for Propeller with this programming language, providing I am willing to risk that P8X32A-D40 chip if I am more likely to make a fatal mistake.
And, the best part of brainf***ing the P8X32A is that fact you don't have to worry about the keyboard layout, like that Russian keyboard, or any of the keyboard layouts - just few comment symbols and that's it. That's especially one of the nicest thing I ever have seen.
Yet, I am getting a bit more and more curiouser....
I also took a look at a video at Youtube website under the topic, "AVR gets brainf***ed" too, and it's an example of very simple programming language here.
Practically speaking, all it takes is trying to figure out how the 99bottles.bf demo works to deter any serious human development. For me BF is a "write once only" language because once you write it, you can't easily maintain it past a few days. A more general compiler could be written to make it more useful, but execution speed would likely be the next problem.
Looks like I will stick with ASM then put in BF executor threads after the kernel threads (by the ASM-assembled OS kernel).
One problem with BF.... Jazzed really get me on this thing: If I would somehow get the Cog to execute the codes out-of-order, how to retain the codes while optimize it on the fly? From the look of it, it's best to run it in-order, unmodified.