Tachyon Forth for P2 -FAT32+WIZnet- Now Smartpins - wOOt!

Peter Jakacki · 2015-11-18 14:36

FILESYSTEM UPGRADES

Just updated the filesystem to address files as virtual memory up to 4GB each which also means that directories located outside the first 4GB can also be accessed correctly. Previously the first 4GB was virtual memory but it didn't play well with files beyond that although they were always accessible by their sector numbers.

This new scheme works well up to 128GB as it encodes a directory address into 32-bits since directory entries are aligned to 32-byte boundaries so 5 address bits are redundant. A compilation constant can also be changed to increase the number of file handles from the default of 4 open files at once.

Sadly though I also just updated to the latest FPGA image and for some reason it doesn't work anymore so I will have to track that one down and once I do I will update the files in the P2 folder.

From here I will test the W5500 Ethernet and networking layers although everything seems to compile correctly, just haven't had time to play with it. When I do I will try to leave it running for online HTTP/FTP and telnet access.

mindrobots · 2015-11-18 15:37

Um....WOW!

Quite the little Forth engine you got cooking! Let's see, on an A9 board with a few additions, that will be a 16COG, 1024MB HUB, 1xx MHz, WiFi connected self contained Forth Development system? And when that is moved to a real P2, it just loses some HUB...pretty sweet!!

Peter Jakacki · 2015-11-21 17:19

Finally fixed a problem with the W5500 chip which turned out to be a hardware/finger problem, I was using the secondary MISO output from my IoT5500 module which has a series resistor and this introduced about 50ns extra delay which was fixed either by adding a nop in the read loop or just connecting to the direct output, which I did.

So the network appears to be working well in conjunction with the FAT32 SD filesystem, all running out of one cog at present although I do have other cogs for serial receive and background timers etc. I will do some more testing tomorrow and then put my P2 online if anyone wants a go including talking to the P2 in Forth on the Telnet serial port 10001. Telneters can even use the inline infix assembler to test code!. Maybe I will have a go at grafting in Rayman's VGA code as well.

D.P · 2015-11-23 02:16

Peter Jakacki wrote: »

Finally fixed a problem with the W5500 chip which turned out to be a hardware/finger problem,
.
.
.

Could this problem also affect the P1 kernel?
Remember mjb could not get his <?fth> web engine to work with current P1 EASYNET build on the 5500 hardware.

Peter Jakacki · 2015-11-23 03:13

It was only because I was using the secondary miso output which is coupled through a resistor back to the same side as all the other SPI pins. The original 820io module had this signal on the other side of the module (why?) and just in case the IoT5500 was plugged in in place of a WIZ820io the worst that would happen would be that miso would be pulled up as on the 820 this pin was Vdd IIRC. But resistance does introduce a slight RC delay which at P2 speeds was causing a problem. A nop's delay would do but a direct connection fixed it.

Since I have a lot more memory to play with I will be testing various parts of the engine and eventually retrofitting many of these changes back to P1.

D.P · 2015-11-23 23:19

Any DE0 sightings TF2> , just trying to keep up now that the SD boot discussion is .HOT

MJB · 2015-11-23 23:33

D.P wrote: »

Could this problem also affect the P1 kernel?
Remember mjb could not get his <?fth> web engine to work with current P1 EASYNET build on the 5500 hardware.

I had the scriptable dynamic webserver running on Tachyon 2.3 on Spinneret, but then something in the EASYNET ?? or the W5x00 drivers changed and I did not get it to run on later versions.
I have not tried on 5200 or 5500 HW since I don't have the HW.
ESP8266 with NodeMCU/Lua was very easy to get running (very simple application) so I did that.
The dynamic Tachyon scriptable webserver would be much better of course.

Peter Jakacki · 2015-11-23 23:35

I eventually found my DE0 the other week and sorted out loading it up with the latest image but failed to get boo out of it. However I will give it another go but failing that I can just take a guess and mod the kernel to run on the DE0 if you could kindly test it. Give me a couple of hours and I will put a DE0 specific kernel across to the Tachyon/P2 Dropbox.

Peter Jakacki · 2015-11-23 23:40

MJB wrote: »

I have not tried on 5200 or 5500 HW since I don't have the HW.

Sorry, I will send you a few things including the IoT5500 module via regular airmail when I go to the post office today

(sometimes you just have to remind me!)

D.P · 2015-11-24 00:42

Peter Jakacki wrote: »

I eventually found my DE0 the other week and sorted out loading it up with the latest image but failed to get boo out of it. However I will give it another go but failing that I can just take a guess and mod the kernel to run on the DE0 if you could kindly test it. Give me a couple of hours and I will put a DE0 specific kernel across to the Tachyon/P2 Dropbox.

Acknowledged, I'll keep a look out, no rush.

mindrobots · 2015-11-24 01:25

Peter, if it's any help, on the de0, shrinking the Rcv buffer caused no issues, just put a wait in at the terminal like on P1. Where I got stuck was replacing the MUL/div/etc code for the single P2 instructions. I found your old multiply routine but couldn't find the div routines. Extend.fth gets in trouble because it starts multiplying and dividing things.

Then something shiny ran across my path and I got distracted!

D.P · 2015-11-25 06:23

mindrobots wrote: »

Peter, if it's any help, on the de0, shrinking the Rcv buffer caused no issues, just put a wait in at the terminal like on P1. Where I got stuck was replacing the MUL/div/etc code for the single P2 instructions. I found your old multiply routine but couldn't find the div routines. Extend.fth gets in trouble because it starts multiplying and dividing things.

Then something shiny ran across my path and I got distracted!

I've been working with the current kernel on the DE0, what size:
rxsize = $E000
rxbuffers = $1_0000

Did you test with on the DE0?

Peter Jakacki · 2015-11-25 06:44

The rxsize has to be a lot smaller and the rxbuffers positioned somewhere up the top of 32k memory. But it is also the cordic functions that I need to replace with the original versions I tested when I first ported. I've just been busy going over my SD card and EASYFILE software making improvements etc and general cleaning up and testing so that I can also port back to the P1 plus have a clean version to work with for a bootloader , unless Chip beats me too it (I hope). There were a lot of little things I never got around to tidying up and I've had to refresh my memory about the SD card protocol and FAT32 (yuck). I still have not seen any files that are fragmented in all the cards I have played with over the years, and believe me I would know if they were.

Peter Jakacki · 2015-11-25 10:40

BTW, since I have been going over the SD card and FAT routines here are some screenshots of reading data from the SD card. What gets me is how much time the card needs after it receives a read command until it has the data ready. You might think you could speed it up with faster clocks but it is not dependant on clock cycles, just time.

nglordi · 2016-05-03 22:48

Does the current P2 Tachyon version run on the 123-A9?

Nick

jmg · 2016-05-04 00:16

Peter Jakacki wrote: »

What gets me is how much time the card needs after it receives a read command until it has the data ready. You might think you could speed it up with faster clocks but it is not dependant on clock cycles, just time.

How many us is that ? Is it card-independent ?

I guess in normal SD use, that is tolerable, but it means they would be less than ideal if trying XIP design use

Peter Jakacki · 2016-05-04 00:59

I haven't tried it on the A9 yet although I don't have a 123-A9 board but I do have the CVA9 which Chip will do an image for once I (or Cluso99) get around to specifying a pinout. However the P2 version of Tachyon didn't use anything special so it should work fine as far as I know. Once I have my A9 up I will be testing smartpins and putting them to work.

@jmg - These are SanDisk but obviously SD cards are much faster if using multi-block transfers as the sectors can be accessed back to back without delay. Tachyon doesn't bother with any XIP methods as there is no need, even on P1 it manages to run all code from RAM without reliance on any kind of memory expansion or XMM techniques etc.

nglordi · 2016-05-04 21:05

@Peter:

I have attempted to run Tachyon TF2-pioneer on the 123-A9 board, running the latest P2 version, with an attached W25Q80 chip and prop plug connected to TerraTerm. It loaded without incident. I had set the clock to 80 MHz and baud to 232400. The terminal displayed the opening description, but did not respond to keyboard entries. Apparently, the rxcog was not functioning as expected.

Nick

Peter Jakacki · 2016-05-05 05:46

nglordi wrote: »

@Peter:

I have attempted to run Tachyon TF2-pioneer on the 123-A9 board, running the latest P2 version, with an attached W25Q80 chip and prop plug connected to TerraTerm. It loaded without incident. I had set the clock to 80 MHz and baud to 232400. The terminal displayed the opening description, but did not respond to keyboard entries. Apparently, the rxcog was not functioning as expected.

Nick

Sounds like it might be a change in one of the wait instructions but I will have a look at it later today. I will have to see if I can get Chip to do an image for the CVA9 as otherwise I've only got a DE2-115 (which used to be way more than good enough).

Peter Jakacki · 2016-06-17 08:05

The BeMicroCV-A9 is up and running and features 16 cogs at 80Mhz with 1M RAM! I haven't done anything fancy with the kernel yet, I just want it to run but here are some timings in cycles and what it would be at 160MHz straight from the console:

TF2# ( fibonacci - iterative method - but skip test for n = 0 )  ok
TF2# pub fibo ( n -- f )
     1- 0 1              ( Setup initial calculations "0 1" )
     ROT                 ( for n times )
     FOR
       +FIB              ( next iteration -> result prev = OVER + SWAP )
     NEXT
     NIP                 ( discard the prev result, leave the current result )
     ;  ok
TF2#   ok
TF2# ( fibonacci test - just a Q&D one liner )  ok
TF2# 47 6 DO CR ." fibo(" I . ." ) = " I  LAP fibo  LAP .LAP ."  result =" . 10 +LOOP
fibo(6) = 753 cycles or @160MHZ 4.706us result =8
fibo(16) = 1393 cycles or @160MHZ 8.706us result =987
fibo(26) = 2033 cycles or @160MHZ 12.706us result =121393
fibo(36) = 2673 cycles or @160MHZ 16.706us result =14930352
fibo(46) = 3313 cycles or @160MHZ 20.706us result =1836311903 ok
TF2# .S  0000 [ 0000.002C  0000.002C  0000.002C  0000.01A5 ]TOP  ok
TF2# 1,000,000 LAP FOR NEXT LAP .LAP 32000098 cycles or @160MHZ 200.0ms  ok
TF2# LAP LAP .LAP 0 cycles or @160MHZ 0ns  ok
TF2# LAP NOP LAP .LAP 33 cycles or @160MHZ 206ns  ok
TF2# LAP 2* LAP .LAP 33 cycles or @160MHZ 206ns  ok
TF2# 1234 DUP LAP * LAP .LAP 113 cycles or @160MHZ 706ns  ok
TF2# 1234 dup * . 1522756 ok
TF2# 12345678 DUP UM* <D> . 152415765279684 ok
TF2#
TF2# 1,234,567,890 DUP UM* <D> 0 PRINTDEC 1,524,157,875,019,052,100 ok
TF2# pub DPRINT <D> 0 PRINTDEC ;  ok
TF2# 1,234,567,890 DUP UM* DPRINT 1,524,157,875,019,052,100 ok
TF2#

Interestingly for a big 64-bit result the unsigned multiply only takes half a microsecond.

TF2# 1,234,567,890 DUP LAP UM* LAP .LAP 81 cycles or @160MHZ 506ns  ok

EDIT: some idea of how long a DIR (ls) takes although bear in mind that the same cog is big-bashing the serial and the SD card at present.

TF2# LAP lscnt C~ ' (ls) (DIR) LAP .LAP
NO NAME
WARPEACE.TXT   EXTEND  .FTH   EEWORDS .FTH   VGA     .FTH   CLOCK   .FTH
EASYFILE.FTH   BREAKOUT.FTH   SDCARD  .FTH   W5500   .FTH   LIFE    .FTH
EASYNET .FTH   IOT5500 .HTM   WELCOME .TEL   POPCORN .MP3   HOME    .HTM
P8CPU   .JPG   IOTPINS .JPG   LOVE    .WAV   HELP    .TXT   P8      .H
IOT5500H.JPG   DRAGON  .JPG   IOT5500 .JPG   128K    .BIN   256K    .BIN
W5200   .FTH   PREVIOUS.ROM   DEBUG   .ROM   POPCORN .WAV   P8X32A  .PDF
IMAGE3         FRED    .PNG   FSRSCH  .PNG   FSRPCB  .PNG   IMAGE
HTTP404 .HTM   IMAGE2         IMAGE1         LOGON   .HTM   TACHYON .HTM
WELCOME .FTP   SITE0004.LOG   SITE0003.LOG   SITE0002.LOG   SITE0001.LOG
FAVICON .ICO   PARALLAX.PNG   HOME1   .HTM   SYSLOG  .TXT   HCB4208 .JPG
CE1372  .JPG   CE1372  .PDF   CHARLCD .JPG   ECOLCD  .PDF   LOVE    .MP3
FIRMWARE.ROM   8329235 cycles or @160MHZ 52.57ms  ok

Remove the bit-bashing serial code and it already drops 10ms

TF2#  NULLOUT LAP lscnt C~ ' (ls) (DIR) LAP CON .LAP 6769955 cycles or @160MHZ 42.312ms  ok

Then the core of the directory list which scans the directories looking for valid entries and passing it to the list method which is NULL in this instance

TF2# LAP ' NULL (SLIST) LAP .LAP 6237490 cycles or @160MHZ 38.984ms  ok

D.P · 2016-06-17 16:53

Peter Jakacki wrote: »

The BeMicroCV-A9 is up and running and features 16 cogs at 80Mhz with 1M RAM! I haven't done anything fancy with the kernel yet, I just want it to run but here are some timings in cycles and what it would be at 160MHz straight from the console:

.
.
.

Really neat stuff, I'll assume the source/obj posted in the P2 dropbox 2 hours is what we should download to join on TachP2 fun.

Peter Jakacki · 2016-06-17 23:54

D.P wrote: »

Peter Jakacki wrote: »

The BeMicroCV-A9 is up and running and features 16 cogs at 80Mhz with 1M RAM! I haven't done anything fancy with the kernel yet, I just want it to run but here are some timings in cycles and what it would be at 160MHz straight from the console:

.
.
.

Really neat stuff, I'll assume the source/obj posted in the P2 dropbox 2 hours is what we should download to join on TachP2 fun.

Yes, there's the P2 folder in the Tachyon folder with all the files you need. Just check the baud rate in the kernel as I normally have it set for 460800 but FTDI chips run up to 3M baud and I've been using that as well. You don't need any delays but it does help to set it to 2 stop bits and the compilation just flies.

There is no real way to back anything up at present because if you reboot it will need to have the kernel reloaded from PNut anyway. I'm almost tempted to hook up a little PIC chip with an EEPROM just so I can have it load up the Prop automatically on reset. The P2 would be able to back-up to the EEPROM directly over I2C and even though I2C is "slow" it is convenient and only uses 2 I/O.

Once I experiment with some general-purpose Flash or SD boot-code I will post it up as a suggestion to be included in the P2 ROM.

D.P · 2016-06-18 03:36

Yes, success here with the BeMicroCV-A9, current .jlc file and Tachyon P2. I have to say seeing16 cogs, 1meg of memory and 48 smart pins is just assume. But as you say, persistence first.

You must be using an off board SDCard socket et al, I thought I read you had issues with the CV's onboard card?

Thanks for the effort Peter!

Peter Jakacki · 2016-06-18 03:45

D.P wrote: »

Yes, success here with the BeMicroCV-A9, current .jlc file and Tachyon P2. I have to say seeing16 cogs, 1meg of memory and 48 smart pins is just assume. But as you say, persistence first.

You must be using an off board SDCard socket et al, I thought I read you had issues with the CV's onboard card?

Thanks for the effort Peter!

I wasn't using Chip's latest pinout so I didn't have the internal one mapped properly. So after patching that and fixing a porta/portb problem with FLOAT which CARD? detect was using it sprang to life. So go for the onboard one, you can just load EXTEND then if you like I have combined SDCARD and EASYFILE into a SDFAT file for testing but saves having to load two files and I may just end up leaving it combined.

Since I am running this from my VirtualBox WINXP for PNut's sake I am using ConTEXT editor and TeraTerm which is set to 460800, 2 stop bits, zero delays.

D.P · 2016-06-18 05:31

Okay, nice. I'm running Win7 via parallels on OSX for pnut's sake. Using TeraTerm at same baud but 8N1 and 23 ms delay. Will adapt to 8N2 and 0 delay. Gonna grab SDFAT.fth. The LAP .LAP times are really impressive as you know!

Peter Jakacki · 2016-06-19 19:53

In an effort to come to grips with the smartpins I have added the basic smartpin instructions which I will further develop.

They are:
WRPIN ( modes pin -- ) where we can set the mode of the pin.
WXPIN ( val pin -- )
WYPIN ( val pin -- )
RDPIN ( pin -- val )

So to set a pin to PWM mode it is now so easy. Here is a simple word:

pub PWM ( duty frame div pin -- )	DUP LOW $50 OVER WRPIN SWAP ROT 16 << + OVER WXPIN WYPIN ;

To set pin 7 to a 50kHz PWM with a 1/8 duty cycle I can type:
100 800 1 7 PWM
To change it to 25kHz we could increase the divider:
100 800 2 7 PWM
or double up the duty and frame size:
200 1600 1 7 PWM

To make that sweep here's a quick one to try after you set it up:
400 0 DO I 7 WYPIN 5 ms LOOP

That was so easy I will try some other modes a little later in the day seeing it is now breakfast time and I need to sleep

BTW, some of the documentation doesn't match up with what I am finding.

D.P · 2016-06-19 21:08

TachP2 is getting shiny already and makes using P2 fun for me. P2 is a feature beast, thanks for breaking it down. Hooking up my o-scope to follow along and learn about P2 et al.

Peter Jakacki · 2016-06-20 04:05

SMART PIN MODES and change of syntax

Today is smartpin cram day for me, all the stuff I didn't really take in at the time I need to now. Haven't been able to get the DACs to work yet thoug. Just added PWM of course and the frequency output words using HZ KHZ MHZ etc and now playing with asynch.

I'm changing the syntax a little so that rather than passing the pin number to all these words all the time we just preselect the pin once until changed, so 7 PIN will select the current pin to operate on. Each Tachyon cog has it's own task register for this pin so it's not global.

To set pin 7 to output a frequency to 1 MZ: 7 PIN 1 MHZ
to change it now to 1,843,200: 1,843,200 HZ
to stop it: MUTE

To setup PWM for 100 on in 800 clocks with clock div = 1: 7 PIN 100 800 1 PWM
I may make some simpler words for that too.

To setup a serial transmit at 8N1 on a pin at 115200 baud: 7 PIN 115,200 TXD BAUD (added modes for RXD and TXD which must precede BAUD)
to change the data len: 7 DATALEN which if not set will revert to 8 bits
Some other words will be added to change the default settings.
to send a CR character on current pin: $0D TX

Expect a lot more real soon as I dig deeper into these modes and also test and iron out any quirks I deliberately put in there!. I will even include some LA and scope pics along with the commands.

jmg · 2016-06-20 04:21

Peter Jakacki wrote: »

To set pin 7 to output a frequency to 1 MZ: 7 PIN 1 MHZ
to change it now to 1,843,200: 1,843,200 HZ
to stop it: MUTE

To setup PWM for 100 on in 800 clocks with clock div = 1: 7 PIN 100 800 1 PWM
I may make some simpler words for that too.

To setup a serial transmit at 8N1 on a pin at 115200 baud: 7 PIN 115,200 BAUD
Some other words will be added to change the default settings.
to send a CR character on current pin: $0D TX

Looks nifty.

It could be nice to verify Chip's Reciprocal Counter, which IIRC in the latest iteration you feed a suggested min gate time (eg 100ms) , and it returns the next-highest whole cycles dTime and dCycles values for the attached pin.
I'm unclear if this needed 2 pin cells, or one 1 cell, in the end.

Using the sideways attach you should be able to generate a frequency, and then measure it and give
fR = k*(dCycles/dTime)

eg 80M*(1.8432M*100m)/(80M*100m) = 1843200

and next-step is 80M*(1.8432M*100m)/((80M*100m)-1) = 1843200.23040 or 0.125ppm / 100ms

If you have a GPS with 1pps, you can check the 80MHz value to a 12.5ppb LSB

Addit : Digging thru Chip's posts around this mode...
Chip : "Jmg, these special counter modes work great! I realized that there WAS sufficient logic to set a minimal sample timer that repeats automatically, like in other modes. You just tell it how long to look and it will count up whole periods until the timer runs down to 1. Then, it just waits for the last trailing edge to wrap up the measurement. And it can even handle coincident trailing and leading edges, making cycle counting contiguous. Really nice! "
... about clearing Z on capture...
It does, but the increment is still added in, setting it to one. This is needed, though, since when we detect end-of-cycle, the current value of the Z register needs to be sent for the update, not Z from the next cycle, which would have been incremented (and also cleared, leaving $00000001). This all works out and gives perfect results, as output of $00000000 is never possible. For the measurement to end, some measurement had to be taken, making it always >= 1.

I think it needs 2 Smart pins for Reciprocal Freq, and 3 for both Reciprocal Freq and Duty Cycle ?

10101 *	for periods in X+ clocks, count time	OUT	A,B		time-report-loop
10110 *	for periods in X+ clocks, count states	OUT	A,B		time-report-loop
10111 *	for periods in X+ clocks, count periods	OUT	A,B		time-report-loop

X+ means (>= X SysClks) ie (8M or 0x007A1200 for >= 100ms at 80MHz)
Time & Periods for Freq, and I think states is Pin=CE = Duty Cycle, but whole cycle captured.

Setting up a PWM pin, should allow Freq and Duty readings to be verified ?

Tubular · 2016-06-20 05:02

This is neat, Peter

Regarding the DACs, their signals are broken out separately to the pins. On the P123-A9 they go to THS8136 DAC chips. I think for the CVA9 it would make sense to go to via the 80 pin connector, where there are spare pins.

Tachyon Forth for P2 -FAT32+WIZnet- Now Smartpins - wOOt!

Comments