Shop OBEX P1 Docs P2 Docs Learn Events
144 core micro, looks like it runs fourth - Page 3 — Parallax Forums

144 core micro, looks like it runs fourth

1356

Comments

  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-01-29 23:26
    Well, you say that data serialization and routing are not there -- but it seems to me that they did spend quite a bit of thought on gaining high speed and efficiency of both.

    In terms of i/o -- inputs are handled immediately upon arrival due to the lack of clocking.

    I do admit that I don't know all the alternatives available, but it seems that CM felt this was a good one. Read the F18A documentation in detail to comprehend what the i/o really is and does.

    I suspect what is needed is for someone to present a good tutorial scheme for distributing data and code. Since this is Forth, it is all about stacks and not the usual memory maps. That seems to be the bottleneck. And I wonder how to impliment a Forth dictionary.

    I guess I will just have to get a Schmartboard and try it.
  • LeonLeon Posts: 7,620
    edited 2013-01-30 01:05
    It will need some tricky wiring to get it working on the Schmartboard.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-01-30 02:50
    Leon wrote: »
    It will need some tricky wiring to get it working on the Schmartboard.[/QUOTE


    @Leon
    At this point, I am thinking of just having the GA144 operate on a serial USB and dependent on a Forth OS and/or C in a generic computer.

    IT would seem that I might have to attach a ROM to boot the USB link and a power supply. Of course, I'd have to have a USB to serial at the proper voltage for hardware.

    But at that point, it would be something that the general user population could consider an affordable hack. Further exploration could eventually using the two 18 bit parallax ports for its own RAM.

    That would seem to be an appealing first expansion to rapidly accumulate data that could then be downloaded for further analysis.

    What do you think?
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-01 05:07
    Being curious, I went ahead and ordered a GA144 kit from Schmartboards and a 1.8 volt power supply board.

    Total $45 USD plus shipping.

    I suspect I will learn quite a bit my just getting this to boot properly. But one really has to go back and forth with reading and exploring to learn any chip fully.

    Apparently the impedance of the device is higher than an oscilloscope probe, so it is optimal for observation of digital data in an actual circuit. (think test equipment)
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-03 21:13
    So here is a GA144 report.
    Nothing seems to get things into focus so much as buyer's remorse. I went ahead and ordered the Schmartboard GA144 kit and a power supply that I am awaiting delivery. And in the meantime, I have been really reading and trying to find out how to make this work.

    First observation is that this is a Forth machine and always will be a Forth machine. I doubt that programing in C will evolve soon as it is a DIY project. Green Arrays provides two alternatives -

    ArrayForth that works as an IDE and then loads the GA144 to an actual, active configuration. This is the one that is most likely to work iwth the Schmartboard GA144 kit as no SRAM is required, just a serial interface (seems to prefer a USB to RS232 at 300k baud).

    eForth that actually has a binary image available and two nodes dedicated to its use via custom ROM in the F18A processor. This requires a 16bit wide, 55 nanosecond, 1Mb SRAM attached to boot the GA144 and makes the whole device into an interactive Forth machine via a USB to RS232 link. I'd love to do this, but it is now obvious that to do so will require the design and construction of a custom board ... maybe a 4 layer board. And I don't have software to create art work for 4 layer board at this time. Plus the SRAM is 48 ball grid array and I have never soldered one of these.

    Futhermore, the GA144 may have 144 processors on board, but it really has only 15 GPIO i/o pins. For a device that seems to offer a heck of a lot of processing, 46 of the i/o pins are consumed by providing an SRAM and many others are dedicated to various i/o protocols. So there are some real issues of how to fully exploit the i/o that are central to anything you want to do. 22 of the processors are dedicated to i/o and have specialized ROMs and special wiring to reach the outside world. You can either use what they offer or ignore them. I don't think that they will adapt to GPIO.

    Finally, everything is 1.8 volts with over 1.1v being high and under 0.7 being low. If you want to reach the rest of the world with 3.3volts or 5.0volts, an ample supply of level shifting devices should be included.

    Goals for now?

    First,
    Try to get the ArrayForth loaded and working with a GA144 chip that will be operating in a rather barebones configuration -- One serial port to the PC that is running ArrayForth, and just a few i/o GPIO pins attached to a driver chip to output to real world with higher voltage.

    Second,
    Try to attach an SPI interfaced ROM that the ArrayForth may load with a boot program image and have the GA144 independently boot itself.

    Third,
    Try to interface to the SDcard and install and adapted eForth image (may require hacking into the binary without any provided source code) so that eForth would boot and have an area provided to expand its dictionary.

    Fourth,
    Attempt to build a custom board for SRAM and to install a provided eForth binary.

    After all that is said and done, one should pretty well be on their way to using the GA144 for anything one wants to do. Is it worth it? That depends on what your ambitions are, whether you need the high impedences of the GA144 chip and other electrical features, and what you really hope to learn.

    This is not rapid deployment in any sense --- more of an R&D project.

    This is a lot more effort than any of the versions of Forth on the Propeller and there is a nagging feeling that 144 processors may just be a bit of overkill for most users.
  • HumanoidoHumanoido Posts: 5,770
    edited 2013-02-04 08:04
    Being curious, I went ahead and ordered a GA144 kit from Schmartboards and a 1.8 volt power supply board. Total $45 USD plus shipping. I suspect I will learn quite a bit my just getting this to boot properly. But one really has to go back and forth with reading and exploring to learn any chip fully. Apparently the impedance of the device is higher than an oscilloscope probe, so it is optimal for observation of digital data in an actual circuit. (think test equipment)
    It sounds like you'll be very busy with a great learning experience. Just curious, what documentation did your order with this? Are these processors wired just one way without the flexibility of mixing HW interfacing? Multiple props can wire up any which way so they can also run at VLP. It's a good price at 32 cents for each processor if you can program each one to talk each other in Parallel. You'll need to let us know how this pans out. Also add to your list of things to do - interface a Propeller chip to the collective. Do they cite running languages other than Forth? The Prop now has over 246 languages and variations (recently updated), many of which are ideal for retro experiments and capping earlier programs.

    http://humanoidolabs.blogspot.tw/2012/03/ultimate-list-of-big-brain-languages.html
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-04 23:58
    @ Humanoido
    It is a bit difficult to reply to your questions are you seem quite unaware of what GreenArrays provides, what Schmartboard provides, or what is the Forth context of the GA144 chip.

    A. All documentations and support software is available through GreenArrays with the exception of the GA144 eForth binaries that have to be downloaded elsewhere. In other words, I did NOT order any documentation, I have downloaded all and everything I can find for free.

    B. Schmardboard provides a breakout board and one GA144 chip. The rest of the wiring is up to you. I have purposely avoided the rather expensive (about $450USD) demonstration board to see what a DIYer can do on a minimal budget. So far, excluding shipping (which was $41USD to Taiwan) the parts have cost a total of $35 USD.

    But it does seem that I will need at least one FTDI 232R which Sparkfun sell for $18 in a breakout board and will configure to a 1.8V setup... probably 2 for a total $36. Alternatively, I am considering a MOSfet RS232 interface and may use slower bauds to start.

    And, to get to a really good development platform, I need to at 1M X 16b of parallel SRAM which cost $18USD apiece and may requite ordering a custom board to get working.

    So, it is all creeping up towards $100USD in components and with a board it may reach as high as $200.

    Is it worth it? I dunno, but I enjoy trailblazing.

    My overall impression so far is that The Propeller is a much saner Forth machine with either PropForth or Tachyon. And if the Propeller2 is supported in Forth, that will be even better.

    David Hein feels Forth is really not as good as C and he has spent months with his version of Forth, called pfth, on the Propeller. I am inclined to agree with him in most situations.

    The GA144 may have 144 processors, but not as much available i/o as the Propeller and the i/o that is provided cannot be reconfigured... so may pins may sit idle.

    Multiple GA144 schemes are possible via an extreme high speed serial call SERDES, but one is more than enough for me at this point.
  • HumanoidoHumanoido Posts: 5,770
    edited 2013-02-05 18:29
    Of course we're aware of what GA etc. has provided over the years and have reviewed the current assay of technical information and this is what raises the question which is more of a point than otherwise. For current projects here, the Propeller in multiples appears to have more interesting options for flexible between chip/processor interfacing and providing no cost SWHW solutions and therefore multiple Propellers offer more of these-type solutions. However, it could take years before someone programs GA chip processors for AI work to find other solutions. It's currently unknown if 100 of these chips (14,400) with 2K threads (28,800,000) could push a neural machine density of nearly 29 million.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-05 23:59
    @Heater
    I have one and only one neural network that I deal with and it is lodged between my ears. The rest are beyond me.

    On the other hand, Forth is a contrasting computer paradigm and that seems to provide a lot of insight into what does what.

    Historically, the GA144 seems to be Charles Moore's legacy creation and that may be all it ever will be. I tend to believe that Forth stack machines are best in software to allow one to explore new processors.

    So far, I have no idea of the merits of an 18 x 8 array. I am still waitng for my GA144 to arrive. If you and anyone wants to play with one, the ArrayForth can be loaded via DOS or Windows and emulate the GA144. No need to spend any $$$ or to wire up a device.
  • Heater.Heater. Posts: 21,230
    edited 2013-02-06 00:14
    Loopy
    @Heater
    What? I didn't say anything.

    I don't think Forth is any kind of "contrasting computer paradigm", that kind of talk makes it sound very grand but it's just another language and run time system. All be it with some posibly unique properties in simplicity and ease of self "bootstapping".
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-06 00:48
    @Heater
    Oops.. I meant @ Humanoido.

    Regarding the 'contrasting computer paradigm', we have a conventional CPU with RAM based on storage that can be anything and often FIFO buffers, but Forth works with at least two stacks that exploit a LIFO arrangement for local variables along with a lot of resources dedicated to a threaded dictionary.

    Plus, there is a tendency toward a division between a predominantly compiled language versus a predominantly interpretive language.
  • Heater.Heater. Posts: 21,230
    edited 2013-02-06 01:34
    Loopy,
    Forth works with at least two stacks that exploit a LIFO arrangement for local variables along with a lot of resources dedicated to a threaded dictionary.
    Exactly, so Forth is a very run of the mill computing model. Nothing paradigm shifting there.

    As far as I'm concerned, from a language syntax, sematics point of view Forth has no local variables. It only has annonymous spaces on the stack which are used for parameters/local work space. Hardly worthy of the name "variable". To me a variable at least needs a name.

    If you look hard enough Forth also has named variables.

    Compared to regular programming languages, ALGOL, FORTRAN, PL/M, C Pascal, Modula, Spin, C++, Java etc etc etc I would see a "contrasting paradigm" something like Lisp or my current favorite toy JavaScript. In JS we have weird stuff like first class functions and closures that gives you a whole different way to think about programming.
  • prof_brainoprof_braino Posts: 4,313
    edited 2013-02-06 06:13
    User Name wrote: »
    There are all sorts of real-world problems that can exploit massively parallel architecture...

    This popped up at the top of the screen, and got me thinking. Just out of curiosity, can anybody rattle off a couple real world problem that can exploit a massively parallel architecture? In the context of microcontrollers? Of course we can just site applications that are done on CRAY of other giant supercomputers, but if a problem needs a CRAY, it probably won't work so well on a micro controller. What would be appropriate for this part?

    It seems that since form follows function, and there is no function defined, its difficult to see how to apply this part. Defining an application might help clarify how to proceed.
  • Heater.Heater. Posts: 21,230
    edited 2013-02-06 07:30
    Braino,

    Good question.

    Years back I was involved in a project to scramble speech over the radio. Specifically the scrambler had to be very small and low power and fit some free space in existing handheld radios. What it did was to convert the analog speech signal to digital, take a fourier transform of that, scramble all the samples of the transformed speech according to some cypher key, take the reverse fourier transform of that frequency shifted mess, convert it back to analog and then send back to the radio.

    All of which would have worked much better if we had some small, low power, "massively parallel" chip in there to do the signal processing.

    Given that speech recognition is desirable for some perhaps "massively parallel" chip could help with that task on your Prop powered robots.
  • TorTor Posts: 2,010
    edited 2013-02-06 07:58
    This popped up at the top of the screen, and got me thinking. Just out of curiosity, can anybody rattle off a couple real world problem that can exploit a massively parallel architecture? In the context of microcontrollers?
    To me it looks like the problem is that those types of processing problems which could exploit a massively parallel architecture also need to access equally massive amounts of data, in parallel, to be effective. E.g. weather forecasts, and other problems involving cells of data. The GA144 is, as I understand it, very limited in its access to external data. Heater mentioned another application with less data, but (mind, I'm no expert on that), I thought Digital Signal Processors (DSPs) could handle those kind of things years back.
  • Heater.Heater. Posts: 21,230
    edited 2013-02-06 08:19
    Yes, DSP chips could handle that years back. Worked for us but more would have been better:)
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-06 08:48
    Tor wrote: »
    To me it looks like the problem is that those types of processing problems which could exploit a massively parallel architecture also need to access equally massive amounts of data, in parallel, to be effective. E.g. weather forecasts, and other problems involving cells of data. The GA144 is, as I understand it, very limited in its access to external data. Heater mentioned another application with less data, but (mind, I'm no expert on that), I thought Digital Signal Processors (DSPs) could handle those kind of things years back.

    Yes, I find that nearly all my micro-controller projects are lean on data --- so all this processing power is rather useless to me.
  • Dave HeinDave Hein Posts: 6,347
    edited 2013-02-06 09:06
    You could implement a JPEG decoder, where each processor works on an 8x8 chunk of the image. However, I don't think the F18A processor core is very well suited for that task, and it would be a pain to try to program a JPEG decoder in FORTH for that chip.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-02-06 13:01
    David,

    from my cursory read of the F18A docs, I don't believe there would be enough code/data storage available in each core to handle decoding an 8x8 block.
  • Dave HeinDave Hein Posts: 6,347
    edited 2013-02-06 13:11
    David,

    from my cursory read of the F18A docs, I don't believe there would be enough code/data storage available in each core to handle decoding an 8x8 block.
    Agreed. I can't really think of anything useful to do with the the F18A, or even 144 of them. It might be more interesting to play around with the J1 Forth CPU. It's a Forth processor implemented in an FPGA.
  • Heater.Heater. Posts: 21,230
    edited 2013-02-06 13:40
    It's not all about huge amounts of data.
    Some things just take a long while to compute on whatever data set you have , like audio spectral analysys, the DFT.
    Perhaps you have an algorithm to speed that up a lot, like the FFT.
    Perhaps you don't. The divide and conqure might help. Lots of processors.
  • HumanoidoHumanoido Posts: 5,770
    edited 2013-02-06 17:33
    This popped up at the top of the screen, and got me thinking. Just out of curiosity, can anybody rattle off a couple real world problem that can exploit a massively parallel architecture? In the context of microcontrollers? Of course we can just site applications that are done on CRAY of other giant supercomputers, but if a problem needs a CRAY, it probably won't work so well on a micro controller. What would be appropriate for this part? It seems that since form follows function, and there is no function defined, its difficult to see how to apply this part. Defining an application might help clarify how to proceed.
    Let's say you have a high number of massively parallel processors all running at the same time. The app is a mathematical expression ME in which you want to solve quickly. ME can come from a particular app. An algorithm is chosen or developed to break down the expression to be processed by each processor. When all processors have completed their assignments, they report in their data which is assembled into an answer. These "thinking" machines are ideal for real time analysts, vision systems, mapping, forecasting, neural nets and additional ways to develop machine intelligence.
  • HumanoidoHumanoido Posts: 5,770
    edited 2013-02-06 17:42
    Large amounts of online data storage may not be an issue when portions of data can be dumped during an analytical process and handled in a customary fashion with a data logger. In other designed algorithms, the "on the fly" data result can be processed and deleted instead of saved. For example, in a mathematical sequence, the "next numerical expression" is calculated from previous data, and after processing is complete, the "use data" is deleted.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-07 05:05
    Heater. wrote: »
    It's not all about huge amounts of data.
    Some things just take a long while to compute on whatever data set you have , like audio spectral analysys, the DFT.
    Perhaps you have an algorithm to speed that up a lot, like the FFT.
    Perhaps you don't. The divide and conqure might help. Lots of processors.

    We kinda of went through a big hurt about FFT in Forth a few months back. There were issues with the need for longer stacks for floating point. Be happy, the Propeller does FFT nicely in C and you can parallel more and more processors.

    ~~~~~~~~~~~~~~~

    The J1 architecture seems interesting. Soon I will be the proud owner of a Ga144, even if it is a dinosaur in the making I will have a better grasp of what a Forth machine should be.

    ~~~~~~~~~~~~~~~
    The GA144 can support SDcards for data storage, so that is not an issue.

    But I am more interested in massive parallel processing of lean data tasks. Lean data tasks just seem more achievable for a hobbyist. And if I wanted to do the other, I might use a PC and have a Nvidia card's parallel processor hacked to do more, faster than the GA144.

    So far, it seem that the GA144 is just appropriate to build a good interactive Forth from a resident eForth if it has the SRAM installed on its parallel ports. But who needs another Forth demo machine that doesn't connect to any real world control?
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-07 05:16
    David,

    from my cursory read of the F18A docs, I don't believe there would be enough code/data storage available in each core to handle decoding an 8x8 block.

    Just curious... Do you think two processors could handle the 8x8 block in a good manner? That would knock down the available power by 50%, but it still might be useful.

    So far I am suspecting that clusters of 2, 3, or 4 processors might be required for quite a few things. eForth has an engine in the GA that has a dedicated pair of processors and appears to actually occupy a cluster of 5.
  • Heater.Heater. Posts: 21,230
    edited 2013-02-07 05:17
    Loopy,

    I haven't given up the Forth FFT yet. It won't be using floating point for sure.

    I am very happy that heater_fft in C works nicely on the Propeller.
    Even happier that the propgcc fcache mechanism compiles most of it to native COG code that is loaded to COG and runs nearly as fast as my hand made PASM version!

    There is now also a parallel heater_fft which can use 1, 2, or 4 cogs for a bit of a boost. It uses OpenMP in C so propgcc takes care of starting up the cogs and managing the parallelism.
    For some reason it's not quite working correctly yet. Not sure yet if that is a bug in my FFT or OpenMP under propgcc.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-02-07 07:27
    Thanks for the link Dave - the J1 looks very interesting!

    I even have one of the GameDuino boards to try it on...
    Dave Hein wrote: »
    Agreed. I can't really think of anything useful to do with the the F18A, or even 144 of them. It might be more interesting to play around with the J1 Forth CPU. It's a Forth processor implemented in an FPGA.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-02-07 07:29
    I don't think two F18A res would be enough.

    8x8 block with 24 bits per pixel needs 192 bytes just for the output

    I think four to eight cores would be needed.
    Just curious... Do you think two processors could handle the 8x8 block in a good manner? That would knock down the available power by 50%, but it still might be useful.

    So far I am suspecting that clusters of 2, 3, or 4 processors might be required for quite a few things. eForth has an engine in the GA that has a dedicated pair of processors and appears to actually occupy a cluster of 5.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-02-07 08:09
    @Bill Henning
    Thanks, that really helps give me some idea of what clusters of processor might do.

    I downloaded the document that explains the J1 and finding it very worthwhile reading. The GA144 may actually have a problem with 'grid-lock' as once you get 4 deep into the grid, it takes 3 or more processors to reach an i/o node for output.

    The Propeller avoids this issue by having the cog scheme and every processor can reach the outside in the same way.
  • prof_brainoprof_braino Posts: 4,313
    edited 2013-02-07 08:43
    We kinda of went through a big hurt about FFT in Forth a few months back.

    Big hurt? You mean by not simply using nglordi's FHT? Fixed point math out of the box, or floating point using the coprocessor. As I understand it, its simple, concise, and pretty quick for what its doing. Not that I know a lot about it, but I thought FHT was more suitable for the prop, but close enough to get the idea for FFT.

    The only "big hurt" was getting up to speed on using one language instead of another.
Sign In or Register to comment.