Shop OBEX P1 Docs P2 Docs Learn Events
A retromachine Basic interpreter [beta] - Page 2 — Parallax Forums

A retromachine Basic interpreter [beta]

24567

Comments

  • pik33pik33 Posts: 2,366
    edited 2023-07-31 22:04

    First priority to do list:

    • make cmd1 : cmd2 : cmd3 compile (now it wrks only as immediate)
    • make goto work even if there is no line to go while goto is entered
    • add if, for and gosub

    Meanwhile, for fun and entertainment, I will add several keywords to do fun stuff. Adding a command that gets integer parameters needs adding a token definition in 4 places, then write do_something sub without parameters, that's all. All kind of beeps, and all this graphics I have already in the driver, so I only need to call.

    We can have a mouse pointer, so maybe "mouse 1" (pointer on) and "mouse 0" (pointer off). Then mousex,mousey, mousek to get position in the program.
    We can have a touch screen (at least 5" Waveshare one)

    Then there will be hard stuff to do:

    • make functions work so there can be a!=sin(b!)
    • make dim work so an array can be declared. I think arrays should be PSRAM based as they may be big

    User definable functions... later

    A fun stuff : goto is planned to work with expressions. That version will be slow... but is simple to implement. I have now 3 tokens for goto, and1 of them implemented called fast_goto. It is fast: it reloads the program pointer, that's all.
    The slow_goto will evaluate an expression and find where to go, so things like

    10 goto 100*a

    will execute too.


    Another fun stuff is : I use (flex)Basic to write (retro)Basic :)

  • @pik33 said:
    Another fun stuff is : I use (flex)Basic to write (retro)Basic :)

    Well naturally :)

    I am imagining a common syntax both interpreted and compiled on the same device...magic! :)

    Craig

  • pik33pik33 Posts: 2,366

    Interactive interpreter has to have diffferent syntax, and at least at the start the syntax will be much more limited.

  • pik33pik33 Posts: 2,366

    Multiple commands in one line in the programming mode now seems to work. Waitvbl allows to draw on the screen without tearing effect

  • pik33pik33 Posts: 2,366

    I have now working comparison operators. That are required to implement if. Maybe tomorrow. If + working goto = loops possible even without for and do.

  • Given you're developing a flavor of BASIC, you could use the PBASIC instructions HIGH, LOW, and PAUSE.

    Neat stuff.

  • pik33pik33 Posts: 2,366

    Implemented brun command (now only for 4-bit PSRAM as I don't have a EC-32 here. It may run or not, I have to test it with EC32)

    brun ("binary run") loads a binary file from SD and runs it, so I can now flash the Basic instead of MicroDOS and run anything I have on the SD card (a player, a MegaYume) using brun.

    This needed some hacking of PSRAM driver files to move PSRAM mailbox up, to $7FF00. This allows to load the binary to PSRAM first, and then move it to the HUB without overwriting the PSRAM driver mailbox.

    @JonnyMac said:
    Given you're developing a flavor of BASIC, you could use the PBASIC instructions HIGH, LOW, and PAUSE.

    Neat stuff.

    I used pinwrite and waitms but I can add these too. They are one parameter integer functions so they are easy to add

  • BeanBean Posts: 8,129

    Really cool. Keep the the good work.

    I know you are probably not to the point of working with files on the SD card, but a lot of retro computer do not have proper file handling commands. Like being able to read a text file line-by-line. Or read a small section of a large binary file. This prevents them from being used for any kind of serious purpose.

    Something like "READ file, startbyte, numbytes, bytearrayvar" would be fantastic.

    Bean

  • pik33pik33 Posts: 2,366

    @Bean said:
    Really cool. Keep the the good work.

    I know you are probably not to the point of working with files on the SD card, but a lot of retro computer do not have proper file handling commands. Like being able to read a text file line-by-line. Or read a small section of a large binary file. This prevents them from being used for any kind of serious purpose.

    Something like "READ file, startbyte, numbytes, bytearrayvar" would be fantastic.

    Bean

    That should be possible, as FlexBasic, in which I write this, has file operations included and I always can use FlexC. As it is now, I already use SD for load, save and brun. Get/put will be implemented later.

  • pik33pik33 Posts: 2,366

    I encountered a strange "PSRAM too slow" bug

    If I plot pixels 0,1,2 on a line with color 15 at the start of the line, the first 256 pixels of the line blinks randomly with this color, displaying either a proper picture between blinks, or moving this line randomly, 1 pixel left- 1 pixel right.

    Color 15 at 8 bpp means there is $0F0F0F at the start of the line in the PSRAM memory. The bug hits also when the color is 240 ($F0). Only first 3..4 pixels of the line causes this problem.

    Nothing I tried helped (driver's delay, setting the line PSRAM start address at odd value, reducing the burst size to 128 bytes). When the burst is short, only these first burst read bytes cause the problem. Then setting bytes #129,129,130 or anything else don't trigger the bug.

    What helped was reducing the speed from 336 to 330 MHz. The bug disappeared, "ram was too slow", but why it only hits at the start of the transfer and not when continuing) and only on 0F0F... pattern?

  • Sounds a bit weird. Maybe something is marginal at 336MHz and some writes are being affected and it's data dependent on the bus pattern. Or there is insufficient time for the write to finish and the first video scan line read takes a hit. Needs some thought and a good way to isolate it.

  • pik33pik33 Posts: 2,366

    On P2-EC32 there is no such errors. I have another 4-bit "backpack" for an Eval, so I can check this on another chip. 330 MHz is still way out of specification, so it may fail.
    While experimenting I discovered that not all eval ports are equal. On P48..55 the PSRAM works at delay=11 and still works with several signs of errors at delay=10. On 40..47 delay=11 is too small, errors are visible.


    I flashed the Basic to my EC32 instead of MicroDOS and tested brun command on EC32. Works:

    Also, simple, one lineifnow works. Even this, combined with goto, can simulate all kinds of loops:

    To do now is finish goto and add else. To reach alpha stage as a simple but functional toy/tool for..next and input is needed.Also, there is already pinwrite, so there should be pinread, wrpin, rdpin, wypin and wxpin.

  • For the pin operations, you probably want to prevent it from messing with the pins used by the system. Flexspin has a system for registering used pins through _usepins(plo, phi) and _freepins(plo,phi) (where the argument are a pin mask to reserve/free). usepins will also return false if any pin is already used. The VFS storage drivers use this system to prevent conflicting mounts. The mask of used pins is stored in __pinsused_lo and __pinsused_hi, but I'm not sure if that's visible in the user namespace. Just try it out.

  • pik33pik33 Posts: 2,366
    edited 2023-08-02 19:48

    For fun: added a simple beep freq,time. Generates a square wave of frequency in Hz for time in ms.
    For convenience: there was no way to delete a line from the program. Now, as in all retro Basics, entering an empty line number deletes this line.


    Edit: and added simple "dir" to list the files in the main directory of sd

  • pik33pik33 Posts: 2,366
    edited 2023-08-03 08:14

    Elseadded. This means one line if-then-else works

    The syntax is slightly different than in FlexBasic: you have to write: before else

    As in FlexBasic, no more than oneelse is allowed. If there is more than one else, what is written after a second else will never execute.
    However, there can be more than one if. If any of theseif fail, then else is executed, if exists.

    An example:

    20 if a=1 then if b=2 then c=0 : d=1 : else c=5 : d=6

    If a<>1 or b<>2 then else will execute and c will be 5 and d will be 6

  • @pik33
    May I suggest to make a version, which is contend without PSRAM? After all, those BASICs then have been used, because they did not need much RAM. I even had a ZX80, that came with 1kByte. The RAM was shared between video buffer, variables and program. - Well, you had to buy additional RAM, if you wanted to do anything. But 16k was plenty.
    To have the code in HUB RAM will be much faster, than in PSRAM, at least if more than one cog is running in parallel?

    If your BASIC will work with several cogs in parallel, I think it might be the tool to fill the gap, which is given, because (as far as I know) P2 Micropython cannot be used to program more than one cog. Probably it will be much faster than Micropython too.

    Did you abandon the idea to use the Taqoz wordcode interpreter? The nice thing about this idea seemed, that you have got a complete assembler too in Taqoz....
    Christof

  • pik33pik33 Posts: 2,366

    The framebuffer used is 576 kB. I have a HUB based HDMI driver, but the resolution and color depth have to be heavy limited. to fit all of the stuff.

    Taquoz... maybe in the future. Making the interpreter itself is complex enough.

  • pik33pik33 Posts: 2,366
    edited 2023-08-03 12:50

    Added a short documentation/ instruction. Attached bin is for EC32. The source is as always on Gitlab. Maybe this Basic needs a separate repository... To be created.

  • AribaAriba Posts: 2,690

    @"Christof Eb." said:
    @pik33
    May I suggest to make a version, which is contend without PSRAM? After all, those BASICs then have been used, because they did not need much RAM. I even had a ZX80, that came with 1kByte. The RAM was shared between video buffer, variables and program. - Well, you had to buy additional RAM, if you wanted to do anything. But 16k was plenty.
    To have the code in HUB RAM will be much faster, than in PSRAM, at least if more than one cog is running in parallel?

    If your BASIC will work with several cogs in parallel, I think it might be the tool to fill the gap, which is given, because (as far as I know) P2 Micropython cannot be used to program more than one cog. Probably it will be much faster than Micropython too.

    Did you abandon the idea to use the Taqoz wordcode interpreter? The nice thing about this idea seemed, that you have got a complete assembler too in Taqoz....
    Christof

    There is already a BASIC interpreter for the P2 that works with Hub RAM:
    FemtoBasic from Mike Green

    Andy

  • @Ariba

    Yeah but we're getting excited about @pik33 because he's demonstrating interest in serious performance. Piotr is already familiar with MMBasic which is pretty dang awesome. I see him aiming for better.
    Re: @"Christof Eb." comment. I am one who doesn't care about video (I have the far superior Android HMI) so it would be extra awesome if this new interpreter could serve simply as an interactive command processor for compiled code in other cogs.

    I wanna drop the Pico and be 100% Prop :smiley:

    Craig

  • pik33pik33 Posts: 2,366

    interactive command processor for compiled code in other cogs.

    These circles used as an example are drawn by a video driver that is compiled. > @rogloh said:

    Sounds a bit weird. Maybe something is marginal at 336MHz and some writes are being affected and it's data dependent on the bus pattern. Or there is insufficient time for the write to finish and the first video scan line read takes a hit. Needs some thought and a good way to isolate it.

    Tested on another chip that works ok. Simply the chip was slightly slower than needed.

  • roglohrogloh Posts: 5,786
    edited 2023-08-04 08:08

    Ok, that's good it works for you on at least one setup. If it's overclocked we are lucky to get what we get.

  • pik33pik33 Posts: 2,366

    I have several chips, so I can select and use what works. They are not expensive.

    The second chip is much faster, it works up to at least 347 MHz, at delay=11. Over 347 MHz the circle drawing test program starts to draw circles on incorrect places and in incorrect sizes and this means it makes mistakes while reading the precompiled code from the PSRAM. Over 350..351 MHz it starts to do exactly the same things that the previous chip, making blinking 256-pixel long strips on the screen. Or shorter if I limit the burst length to less than 256 bytes.

    It seems these chips don't like F0F0... patterns.

    This pattern involves toggling all 4 outputs at the full speed while reading. The memory can do it 1,2 times, but if the pattern repeats 3,4 times, it seems to lock up at $F for the rest of the transfer, as the color drawn on the screen is $FF (and not $0F or F0 that starts the problem). The maximum burst length seems to be set at 256 bytes somewhere in the driver, or the memory restarts after these 256 bytes.

    I have now to try and torture P2-EC32 with $FFFF0000 and see if it makes artifacts.

  • pik33pik33 Posts: 2,366
    edited 2023-08-08 21:01

    Slower progress now as I have other things to do.


    Forinstruction now compiles (but it doesn't execute yet: that's not simple. What is not very hard to do is this:

    20 for i=1 to 100 step 2
    (some code here)
    40 next i
    

    However this syntax

    20 (some code before for)  : for i=1 to 10 : (some code after for) 
    30 next i
    

    is much harder to implement. The runtime interpreter has to load the line with forand then ignore what is before and jump over theforitself. Still to do


    I tested the performance of the interpreter with a simple program that puts pixels on the screen. It is way too slow. (about 35x slower than the same loop compiled directly in FlexBasic. 140 microsecond per pixel).

    So I added a time measurement everywhere I can to check what is slow. The PSRAM based code execution costs the time, but not that much. However, the linecolor (i+j) mod 256took 6686 clocks. Way too many. Removing it and condensing the rest into 1 line makes the program 2x faster

    The runtime procedures simply do too much. They do what they should not do. Instead, this should be done at interpreting time. They check and convert variables when it is not necessary and waste time, adding/substracting constants.

    First thought was to do a major rewrite until it is not too complex.

    Then I got an idea to add an optimizer after the first precompiling pass. It will detect and remove all unnecessary converts and arithmetics on constants. Also, it will replace "general purpose" A/L functions full of ifs with "specialized" ones based on what types variable types are already used. This should make it several times faster. A runtime procedure also does 2 reads from PSRAM, thats unnecessary: 1 read should be enough. I can also allocate more space in the hub for a cache and read more code with one PSRAM transfer.

    All of this is on the TODO list after I implement for loops and ther rest of goto types

  • Still looks like good progress B):+1:

    I will be interested to see how long this takes:

    for i = 1 to 1_000_000
        a=+1
    next
    

    Craig

  • pik33pik33 Posts: 2,366
    edited 2023-08-09 10:37

    I will check it when I get a for running.

    This, however

    10 a=0
    20 print "start"
    30 a=a+1
    40 if a<1000000 then goto 30
    45 beep 1000,100
    50 print "stop"
    

    took about 36 seconds

  • Already very respectable :+1:

    On the STM32F407VET6 (Armmite F4) @ 168MHz it takes 23 seconds which tells me that the very popular MM+ on the PIC32MX170 would take > 50secs

    You're on the right track :)

    Craig

  • Oops....Silly oversight. I had used a for-next

    Just replicated your Goto and the Armmite (168MHz) now takes >60 seconds B):+1:

    Craig

  • I know nothing about games but for your ref:

    Here's a few games that they run on the Armmite

    Craig

  • pik33pik33 Posts: 2,366
    edited 2023-08-09 20:39

    For now works.

    Multi line for:

    10 for i=1 to 1000000
    20 a=a+1
    30 next i
    

    runs 34 seconds.

    One line version

    10 for i=1 to 1000000 : a=a+1 : next i

    is faster (22 seconds) while still not optimal (every loop preloads the same line from PSRAM)

    Several bugs hit while testing the loop :):( to find and remove, then bump the version# up to 0.16


    Edit: one linefor test time now reduced to about 16 seconds by avoiding rereading the line from PSRAM at every iteration

    Edit 2: rewritten plot test using 2 nested for loops in one line : time reduced from 70 to 24 seconds. Better. :) Still 10x slower than compiled, so there is a lot of space to optimize :)

    10 for y=0 to 575 : for x=0 to 1023 : color (x+y) mod 256 : plot x,y : next x: next y
    20 cls: goto 10
    

    .. and modulo is unnecessary here : the video driver truncates the color number itself. Another 4 seconds less.

Sign In or Register to comment.