FPGA Forth machines

Heater. · 2015-07-24 18:31

Loopy,
Did you miss the part where I wrote "...but that is another story." i.e I really don't want to discuss it.
I was only trying to help your Forth researches by suggesting okular. Sadly that did not help. Ah well.

LoopyByteloose · 2015-07-25 12:09

While awaiting my BeMicro FPGA goodies arrival from Arrow Electronics (sometime next week, I suppose), I have been sifting through the Forth documents listed at Wufden.

I just noticed that the "eForth and Zen" document therein is only the first chapter. So buying the complete document from C. H. Ting still seems justified.

There are other items of interest at Wufden, including a discussion of what a 3rd stack might do to enhance Forth. Tachyon has a third stack. So some might want to review that.

So far... my vision of what to do remains sketchy. I will likely try to get the Propeller IV verilog to load and run, just to see what I don't understand about hardware and the software build and load cycle.

But the rest is up in the air.

Peter Jakacki · 2015-07-25 12:41

Tachyon in fact has four stacks!

1. Data stack - with top 4 in fixed locations in cog ram and overflow in hub. TOS is a fixed address as are 2nd 3rd and 4th items.

2. Return stack in cog ram which is exclusively for return addresses - not so easy to crash it by leaving loop junk on it.

3. Loop stack in cog ram which holds the loop index and limits so that these are directly accesible by other words outside the loop.

4. Branch stack in cog ram with the address to branch back to in a DO LOOP or FOR NEXT. No need to read another branch address and calculate, it's already there! Also DO and FOR do not have runtime versions such as (DO) and (FOR), they are simple opcodes that simply push the loop parameters and the branch address to loop back to. Likewise LOOP and NEXT are simply single byte opcodes compiled directly without special runtime versions.

Now that will give you something to think about Loopy!

LoopyByteloose · 2015-07-25 17:12

The article at Wulfden mentions the third stack is for 'Local Variables'. So it is very unlike your scheme.

To date, I have done my best to stick with 'plain vanilla' two stack Forth. But the addition of other stacks appears to allow one avenue of optimization of Forth.

I just don't understand enough of Forth to fully embrace the nuances of optimizing performance.

eForth as an ideal goal, may very well NOT be the optimal performer. It simply allows the smallest instruction set in machine code, about 32 or less.

I am looking at FPGA Forth examples and so far the J1 FPGA Forth exceeds the 32 at something like 39. And another FPGA Forth example, the FC16 suggested above (not B16 mentioned in the J1 intro) seems to have 64 machine codes.

It is imperative that I hang onto eForth's minimum? Not at all. I suspect that if the FC16 offers more fundamental Forth commands executed in one clock, it will be more worthwhile to adapt the eForth dictionary to a basic group of more assembly language instructions.

There are other issues of optimization. One can't just hang one's hat on the high performance of one feature and get the fastest Forth machine possible. Tachyon has an array of enhancements that squeeze as much speed as possible. It will always be a challenge to get something better.

For now, it is baby steps. If I can just get a FC16 Forth working with a near complete ANSI Forth dictionary in an FPGA, I can then move on to changing the 16 bit cell to 32 bit cells and attempt to integrate 8 Cog-like CPUs with limited Cog RAM to a hub, hub RAM and i/o sharing.

This is 'the vision' for now. I need to download QuartusII V15.0 to my Vista machine and see if it will work. Hopefully that will go smoothly... I haven't used Vista in ages. I have a nagging feeling that I should try to migrate to Windows 7 and also shift the old Chinese Vista 32bit to English Windows 7 bit as the machine is a Quad 64 bit.

Grumble.... I really thought that I'd never buy another MS product when I went over to Linux. But the FPGA devices are pushing me back into Windows.

LoopyByteloose · 2015-07-25 17:31

What a relief... Quartus II V15.00 appears to be available for Linux..... not just Windows.

mindrobots · 2015-07-25 17:33

Yes, so far, it has worled well on Linux (Fedora 22) for me.

Make sure you grab the latest update for A9 support.

LoopyByteloose · 2015-07-25 18:06

Okay, so I located the download of Quartus II for Linux at Altera, and had to create log in rights.

Typical of these companies, they demand a business name to proceed. What some may not know is that operating a business under one's own name is perfectly legal. So just put you name as you would on legal documents (like your driver's license) in the business name slot.

If you create another name, you can get tangled up in a fine for failure to register a 'fictitious business name' in some states, counties, and cities. But nobody can fine you for doing business as you.

It is 2 am here. So I guess i will install Quartus II tomorrow, then download the updates and install those.

Heater. · 2015-07-25 18:14

Loopy,
"What some may not know is that operating a business under one's own name is perfectly legal." Unless your name happens to be McDonald:https://en.wikipedia.org/wiki/McDonald's_legal_cases#MacDonald.27s_.28UK_-_Cayman_Islands.29
Whoever puts real info into those registration boxes ?

LoopyByteloose · 2015-07-25 19:10

Whoever puts real info into those registration boxes ?

At it again... digressing. I put my name into those boxes, plain and simple.

In the states many local governments require registration of any name used in business other than one's own just to open a bank account. IF you want a company name, it may be wise to register one.

Heater. · 2015-07-25 19:56

Not digressing at all. Just responding a comment on a forum. I though that is what forums were for.
No doubt there are all kind of laws about running a business all over the planet. However as far as I know you are not using this download in business and have no business relation with the supplier, apart from the free download which I can't see being regarded as a business transaction.

LoopyByteloose · 2015-07-26 01:03

Heater,
FPGA Forth machines......
Acquiring Quartus II from Altera

Simply put, one doesn't have to lie to get a download. But I am well aware in the scheme of things that some people feel they must or that is the wisest thing to do.

Some vendors do desire only business-to-business relationships, and actually don't want to be bothered with hobbyist or students and their incumbent small sales potential. Handle the problem as you prefer. You are entitled to your interpretation of the situation, and I am entitled to mine.

I just mentioned what I did to get past the required Business Name field. Actually I just appended 'Company' to my last name. I did the same as a General Contractor in the USA to avoid local business licensing harassment and it worked quite well. And in the scheme of things, I can always recall the company name I gave if required to do so later as 'my company' is involved in any lawful enterprise.

LoopyByteloose · 2015-07-26 05:39

I have located Verilog code for a dedicated UART module that will avoid tasking Forth Cogs with the serial i/o services.

The first is a Github repository
opencores.org/project,osdvu

The second explains their code in detail
http://www.fpga4fun.com/SerialInterface5.html

And here is a TX only solution.
http://ece301.com/fpga-projects/52-uart-txd.html

My first objective will to get the FC16 Verilog code to use a Verilog UART in a single CPU. Later I will investigate how many UARTs might be required and which enhancements might be best.

++++++++++++
In theory, the FPGA may allow the Hub to share Forth CPU codes with a UART. To date we have three very different schemes of asynchronous serial i/o in Forth on a Propeller.

Tachyon has provided extreme speed via division of the Tx and Rx tasks into separate cogs.
Pfth has minimally deployed one Cog with Serial i/o, and though other Cogs have the UART code it remains dormant and a bit of challenge to reach.
PropForth has a very integrated scheme that allows for each Cog to be reached from a terminal and/or have the UART in each Cog run independently.

So this begs the question of whether only one UART is appropriate or are their instance where one per Cog might be optimal.

Initially, just having one to allow communication and testing of the Forth interpreter will be adequate.

Peter Jakacki · 2015-07-26 06:04

Serial I/O in terms of silicon is neither here nor there in my opinion, and a UART takes up very little logic. It's funny though that the P1 has video support for each cog when you really only need support for a single video output, and that output has to be fed from the hub anyway. Whereas multiuple serial I/O is used all the time yet there is no hardware support for it on the P1. So a Forth P1 could be modified to have this serial I/O support on each cog and video support through the hub perhaps.

BTW, although Tachyon has a dedicated cog for receive it doesn't dedicate any for transmit, it just runs from the same cog that Tachyon is running from. This is also a reason why I favor higher speeds for serial so that the Tachyon cog doesn't spend too much time bit-bashing and so 2M baud is much to be prefered over 115.2k for instance where the cog spends too much time sitting idle between bits, but sadly people seem to use crippled terminals that don't support faster speeds.

LoopyByteloose · 2015-07-26 06:10

Opps,
Here is the RX to mate with the TX module mentioned above. This may inspire a higher speed full duplex scheme similar to what Tachyon has created.

http://ece301.com/fpga-projects/57-uart-rxd.html

I can appreciate that 2Mbaud gets the job done without a lot of wait states, but there are a lot of legacy devices that just need to go much slower. For instance, my HC-06 Bluetooth module won't work right beyond 115,200baud.

jmg · 2015-07-26 21:02

Opps,
Here is the RX to mate with the TX module mentioned above. This may inspire a higher speed full duplex scheme similar to what Tachyon has created.

http://ece301.com/fpga-projects/57-uart-rxd.html

I can appreciate that 2Mbaud gets the job done without a lot of wait states, but there are a lot of legacy devices that just need to go much slower. For instance, my HC-06 Bluetooth module won't work right beyond 115,200baud.

Yup, that's why Serial solutions, be they SW or HW, should have the widest dynamic range.Also good to have is Baud granularity - which allows higher baud speeds, without constraining the Clock.
Most USB UARTS have Virtual Baud Clocks of 12MHz or 24MHz (Baud = VBC/N)
Avoid the ancient /16 limit of Baud rate.

LoopyByteloose · 2015-07-27 10:22

Ugh... Quartus II V15.0 for Linux downloaded. But it is only a 64 bit version.

I confess... my 64bit machine has been using 32bit for ages as it seemed that Linux drivers were not quite ready for 64 bit way back then.

So it seems like I have a few systems administration tasks to get Quartus II running. Not much to discuss while I do so.

+++++++++++
My own personal bias has been toward tradition in Linux as it offers new users the most available literature on the web. That mainly means GForth and two stack architecture.

But that doesn't mean that I entirely ignore Tachyon. I spent the past few days reviewing much of what it does and Tachyon is a very feature rich Forth. In many ways, it might be a logical next step for a BasicStamp 2 user that really needs something more.

Originally, I was quite excited about C on the Propeller. But it seems obvious that the C++ and GCC deployments have gotten into convoluted code that is difficult to port to another architecture. It is almost as if the industry wanted to break the portability of C with OOP and other exotic features... just to protect their code. Even the HAL (Hardware Abstraction Layer) seems to be abused to me.

So, I really favor Forth for a next step beyond Basic programing. It allows one to easily explore the Propeller architecture in an interactive fashion. Which Forth is up to the user's comfort.

Heater. · 2015-07-27 10:48

The non portability of C/C++ has nothing to do with OOP or other exotic features, whatever that may mean. GCC and Clang and other compilers are very standards compliant whilst running on and generating code for a wide range of processor architectures and platforms.
It's quite amazing really. For example I can compile the same program source for my PC or the Pi or the Propeller very easily. I was amazed when a program using OpenMP that could parallelize threads and distribute them over multiple processors automatically just worked on the Propeller.
The non-portability comes from the fact that micro-controller systems have severe memory limitations, think Propeller, Arduino and many others. That limits the possible standards compliance of the standard library functions and other features. Especially for C++. They also have very different hardware interfaces that have to be catered for.
No need to speculate any industry conspiracy theory here. It's actually good for them to have standards compliant tools.
Now, I always thought C on the Propeller was more of a curio than a real tool. For the following reasons:
1) To compile code to run at full native speed (20MIPS) one is limited to the 512 instruction space of the COG.
2) To compile bigger programs into HUB memory requires that code to be executed my a VM. The LMM loop. That brings performance down massively.
3) The 32 bit size of all Propeller instructions means that not much code can be fit into HUB RAM.
All in all it's better to use the speed and small size of PASM in COG together with the small code size of Spin byte codes. As was the design intention of the Prop.
I love to play with C on the Prop, I just have doubts about it's usefulness a real tool.
I always imagined the exercise of getting GCC generating code for the Propeller was just a prelude, a stepping stone, to providing a C compiler for the P II. With the much larger memory space of the PII and it direct execution of code from HUB, not to mention much higher speed anyway, C should be a lot more practical.

LoopyByteloose · 2015-07-27 17:08

I suppose Heater is correct, the Propeller 1 is to small for what C++ and GCC are today.

I am more uncertain that it was always that way.

As far as OOP, it does work in SPIN and should be a separate issue.

Main main preference for Forth is that it allows one to interactively explore the Propeller 1 architecture. I also happen to have enough Hubram for Forth which seems more comfortable with 32Kb of hub ram.

++++++++++
Have gotten my Quad machine migrated over to Debian 8.1 64 bit, so I can get started with installing Quartus II. Others have only mentioned installation in Fedora. So I will explore.

Heater. · 2015-07-27 17:29

Well, except the Arduino is similarly memory constrained. It is exclusively programmed in C++ though.
You are right, C originated in a world of very small memory space machines. It handles it well. As does C++.
Spin is not an OOP language. Not in the modern sense of languages like Simula, C++, Eiffel, C# Java etc. Spin has no inheritance or polyporphism and so on.
It is perhaps more like OOP as originally conceived but lacks the message passing concepts of the original OOP idea. As does the C++, Eiffel, C# Java we have as examples of OOP today.
Forth does not exist. All implementations of "Forth" are different and incompatible. Do correct me if I am wrong. The same as BASIC does not exist today. No standards to adhere to.
Good to here you are up to speed with Debian. Quartus runs on my Debian machines very well.

LoopyByteloose · 2015-07-28 07:58

Okay, Heater says Forth and Basic do not exist.

And so, my goal in this thread appears to be purposeless.

And Arduino manages to use C++ on memory restraints similar to Propeller, but the Propeller is too small for C while the Arduino is not. (Heater may have overlooked that the Cog has 512 longs or 2Kbyte. I don't think the Arduinos go down to RAM as low as 2Kbyte.

+++++++++++

In any event, I am moving on with trying to create something on an FPGA related to the goals in my first entry in this thread.

jmg · 2015-07-28 08:26

And Arduino manages to use C++ on memory restraints similar to Propeller, but the Propeller is too small for C while the Arduino is not. (Heater may have overlooked that the Cog has 512 longs or 2Kbyte. I don't think the Arduinos go down to RAM as low as 2Kbyte.

You seem to have overlooked that the Code Size of Prop, is 512 opcodes, and uses that COG RAM, whilst the code size of an average AVR is 16384 opcodes, and does not run in RAM, but in FLASH.

LoopyByteloose · 2015-07-28 08:36

In all honesty, I have very little to do with the Arduinio, but your point is well taken.

At this point, I just want to try to create a Forth device on FPGA and then see what Propeller-like attributes I might add to it. I am uncertain what else I might discover.

In doing so, I hope to learn more about Forth, and more about the Propeller architecture. At least for me, this is all about learning via trying to create something -- not debating what is a true language and so on.

I still feel that Forth in the Propeller 1 can be helpful to those that don't understand Spin and PASM. Studying Forth that attempts to comply with the 1983 standard ( that is what Dave Hein did) and how it is created in PASM can be a rewarding entry point to SPIN and PASM if one has difficulty with directly jumping from BASIC to SPIN and PASM.

This project is an extension of that.

My approach at this point is
[a] to try to load and run the Propeller IV FPGA in the BeMicroCV in order to understand and verify that I can clone a Propeller 1 on FPGA,
and to also try to create, load, and run an FPGA Forth model (likely the FC16) on a BeMicroCV or BeMicroCVA9 with a complete eForth dictionary.

After I get the FC16 running, I have going to have to figure out what I need to do to migrate from a 16 bit Forth to a 32 bit Forth with an eye towards how that will integrate with the modules already available in the Propeller 1V FPGA code.

jmg · 2015-07-28 09:10

I still feel that Forth in the Propeller 1 can be helpful to those that don't understand Spin and PASM. Studying Forth that attempts to comply with the 1983 standard ( that is what Dave Hein did) and how it is created in PASM can be a rewarding entry point to SPIN and PASM if one has difficulty with directly jumping from BASIC to SPIN and PASM.

PropBASIC seemed to me to be an ideal way to jump from BASIC to PASM.
Forth is a natural fit on the Prop, as it has such constrained Code space, yet can fetch byte-codes from a larger, but still limited memory.The main issue with Forth, comes from its compactness -> it can be almost impenetrable to a new user.That makes me wonder if there is a better pathway, one that compiles a 'human language' to forth ?
That study would include Forth byte codes, and also the byte codes being used by Prop C.

Heater. · 2015-07-28 09:50

Loopy,
The Arduino compiles down to 8 bit instructions where as every instruction on the Prop is 32 bits. It also executes it's code from FLASH leaving all the RAM free for data. Similar story with PICs and other micro controllers. I have seen people using C to create programs for little PICs that only have 256 bytes of RAM.
When I say "Forth" does not exist I only mean that all Forths are different as as far as I can tell code is not usable between different versions. I am aware there is actually a Forth standard (ANSI is it) but is seems to getting very old and is largely ignored.
I think a Forth engine is probably a great idea if one wants to get into Verilog or VHDL. Should be a nice simple processor to tackle as a learning exercise. And no doubt useful when you have it done.

LoopyByteloose · 2015-07-28 09:54

PropBASIC may seem the ideal entry point, but the hazard is in 'ideal'.

Learners that are migrating toward PASM, might see Forth as an informative and useful way to break their dependency of BASIC.

Forth may not be the only pathway for learning, but it offers one to those that are willing to learn it. Being interactive helped me confirm a lot about the Propeller very quickly.

The end goal is the same -- to master SPIN and PASM in combination.

Heater. · 2015-07-28 10:19

If I were setting out learning VHDL or Verilog with the intention of learning enough about CPU design to implement my own CPU I would probably want to create one with an instruction set for which there was a C compiler available. After all, creating a compiler is a lot of work and it seems pointless expend effort on it for a one off CPU design.
Given that this design would have to be simple, that rather limits the architecture options. Perhaps an 8 bit 8080 for which BSDC is available (CP/M on an FPGA!). Perhaps the 32 bit ZPU for which there is a GCC target. The ZPU has a very small number of simple instructions.
But breaking free of C, moving to CPU designed to run Forth primitives directly would mean that there need not be a pre-existing C compiler. The Forth environment can be built on top of those primitives.
Or at least that seems to be the possibility. As far as I know Forth is magical in that it is the simplest way to to bootstrap such a system.
On the other hand I would be tempted to construct the smallest possible CPU, a single instruction machine. For example the Subleq which only has the instruction "Subtract and Branch if Less or Equal". There is compiler for a C like language for the Subleq available. Given that such a single instruction computer only has one opcode the is no need to put the opcode into the instructions. In that way there is no need for a instruction fetch and decode logic. Neat.

David Betz · 2015-07-28 11:54

I always imagined the exercise of getting GCC generating code for the Propeller was just a prelude, a stepping stone, to providing a C compiler for the P II. With the much larger memory space of the PII and it direct execution of code from HUB, not to mention much higher speed anyway, C should be a lot more practical.

The PropGCC project had exactly that goal: start with the P1 and then move quickly to the P2. I think the P2 was always considered the real target. Unfortunately, the P2 never came...

David Betz · 2015-07-28 11:58

I still feel that Forth in the Propeller 1 can be helpful to those that don't understand Spin and PASM. Studying Forth that attempts to comply with the 1983 standard ( that is what Dave Hein did) and how it is created in PASM can be a rewarding entry point to SPIN and PASM if one has difficulty with directly jumping from BASIC to SPIN and PASM.

PropBASIC seemed to me to be an ideal way to jump from BASIC to PASM.
Forth is a natural fit on the Prop, as it has such constrained Code space, yet can fetch byte-codes from a larger, but still limited memory.The main issue with Forth, comes from its compactness -> it can be almost impenetrable to a new user.That makes me wonder if there is a better pathway, one that compiles a 'human language' to forth ?
That study would include Forth byte codes, and also the byte codes being used by Prop C.

The hope is that PropGCC on P2 will not need to use byte codes because of the much larger hub RAM and the ability to execute instructions directly from the hub. However, we still have to contend with the large size of Propeller instructions, every one is 32 bits. Actually, hub execution makes that worse by introducing instruction prefixes to provide 32 bit immediate values. It's too bad that it wasn't possible to include some sort of compressed instruction set in P2 similar to the ARM Thumb instruction set.

Heater. · 2015-07-28 12:18

David,
I have not been following the details of the PII development for a long time. So I'm wondering how HUB execution works out now a days. I presume the PII will run code from HUB with no LMM loop. Does that mean that you can point a COG at code in HUB and it runs with nothing loaded into COG space at all? Presumably prop-gcc will be keeping its registers in COG though. That implies to me that there is the possibility of having a large register set for prop-gcc and that 400 odd constants could be loaded to COG on starting HUB execution. That would remove the need for so many in-line immediates.
Or am I totally muddled and this is not possible with the code generator anyway?

David Betz · 2015-07-28 12:49

David,
I have not been following the details of the PII development for a long time. So I'm wondering how HUB execution works out now a days. I presume the PII will run code from HUB with no LMM loop. Does that mean that you can point a COG at code in HUB and it runs with nothing loaded into COG space at all? Presumably prop-gcc will be keeping its registers in COG though. That implies to me that there is the possibility of having a large register set for prop-gcc and that 400 odd constants could be loaded to COG on starting HUB execution. That would remove the need for so many in-line immediates.
Or am I totally muddled and this is not possible with the code generator anyway?

Yes, it would be possible to put constants in COG memory. I'm not sure how hard it would be to get the GCC tool chain to do that though.

FPGA Forth machines

Comments