flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

evanh · 2022-04-16 03:38

Cool.

A couple of points I should make:
- Both build paths use pasm2. I'm not sure if this source file is required to work with the Prop1 or not.
- I've discovered that, above about 320 MHz sysclock (40 MHz SPI clock), the bit-bashed path isn't quite as rugged as the smartpins path has become. I had the advantage of easy fine tuning the clock high-low ratio with smartpins to give it more leeway.

I'm thinking about retiring the optimised bit-bashed path anyway. Maybe revert to the old non-optimised path if Prop1 compatibility is desired.

EDIT: On a tangent, there is no verification of valid data within the driver code. Getting verification might be a decent reason to eventually discontinue the use of SPI. I know CRC can be enabled with SPI but how much processing is that? It might be just as fast processing the full SD protocol.

pik33 · 2022-04-16 14:52

I created a topic in Basic section - something is wrong, as changing the order of variable declaration changes the behavior of the code. Then I wanted to copy the output from the Flexprop's internal serial terminal and paste it to the forum topic editor.

Then I discovered that I cannot do it (or I don't know how) - ctrl-c doesn't work on the internal terminal. To make a post I had to switch to the external terminal. Xubuntu 20.04

pik33 · 2022-04-21 17:32

I compiled the new flexprop and spin2cpp

The player no longer works as expected - it cannot open files.

Something changed in the filesystem and the first impresion is the LFN support is now lost. If the file name is short, it opens. If the name is long, I cannot load the file.

iseries · 2022-04-21 19:16

@pik33 ,

You have to turn that on. It's not on by default.

In this folder: \flexprop\include\filesys\fatfs find file ffconf.h.

Look for this line:

 #define FF_USE_LFN     0

Change the value from 0 to 1 and save it. Your good to go.

Mike

pik33 · 2022-04-21 20:05

The previous version uses -DFF_USE_LFN as a compiler line option and that was enough.

But I tried to change the value to 1.

It now cannot compile the player at all/ The result while compiling:

ff.c:805: error: unknown identifier ff_oem2uni used in function call
ff.c:805: error: Unknown symbol ff_oem2uni
ff.c:887: error: unknown identifier ff_uni2oem used in function call
ff.c:887: error: Unknown symbol ff_uni2oem
ff.c:1886: error: unknown identifier ff_wtoupper used in function call
ff.c:1886: error: Unknown symbol ff_wtoupper
ff.c:2909: error: unknown identifier ff_uni2oem used in function call
ff.c:2909: error: Unknown symbol ff_uni2oem

Also I found cp932 is defined there. Japanese.

Reverting now to the older version.

iseries · 2022-04-22 11:24

Right, did run into that error, it has been a while.

This is easy to fix. As I said 99% of the FatFs was used unchanged.

Need to add a from file to this line to ff.h:

WCHAR ff_oem2uni (WCHAR oem, WORD cp) __fromfile("filesys/fatfsx/ffunicode.c"); /* OEM code to Unicode conversion */

Mike

ersmith · 2022-04-22 11:30

@pik33 : Sorry, I forgot that I had to add a #ifndef FF_USE_LFN in ffconf.h when I updated FatFs. The long file name support should be working again now.

evanh · 2022-04-22 12:32

Eric,
I don't have any test hardware with me but this is a small streamline to the second half of rcvr_mmc() function. I think it's bug-free:

// 8-bit rx (WFBYTE)
rx_bytes
        and bc, #3  wz      // up to 3 bytes
    if_z    ret

        wypin   #8, PIN_CLK     // begin SPI clocks
        wxpin   #7 | 32, PIN_DO     // 8 bits, sample after rising clock
rx_loop1
        sub bc, #1  wz      // check if last byte
rx_wait1
        testp   PIN_DO  wc      // wait for received byte
    if_nc   jmp #rx_wait1

    if_nz   wypin   #8, PIN_CLK     // begin SPI clocks for next byte
        rdpin   r, PIN_DO       // byte from rx buffer
        rev r
//      wfbyte  r           // store byte
        wrbyte  r, buff         // store byte
        add buff, #1
    if_nz   jmp #rx_loop1       // loop if more bytes

pik33 · 2022-04-22 12:56

In Basic, 1 shl 32 gives 1. Should be 0, as this 1 goes out of the register.

ersmith · 2022-04-22 15:28

@pik33 said:
In Basic, 1 shl 32 gives 1. Should be 0, as this 1 goes out of the register.

This happens in all languages; only the bottom 5 bits of the shift amount are used, and so 1 shl 32 is identical to 1 shl 0. That's the way the hardware works, and it would be expensive to try to work around this. I think you'll find something similar happens in FreeBASIC on x86, except that on a 64 bit machine it's 1 shl 64 that equals 1.

pik33 · 2022-04-22 17:22

The code I am porting, that didn't work because of this, was written for a PC. They used int variable in Java and shl it from 1 to 32. Maybe this worked because PC CPUs are 64bit based, so they accept 0..63 as the shift amount.
My ported procedure didn't work because of this and I didn't know why for a while.
Now it is worked around by isolating the special case (n=32) using if

evanh · 2022-05-03 01:43

Eric,
In disk_write() function in sdmm.c, I had earlier found a bug with order of execution in the case of single block writes. My fix was to replace this:

        if ((send_cmd(CMD24, sect) == 0)    /* WRITE_BLOCK */
            && xmit_datablock(buff, 0xFE))
            count = 0;

with this:

        if( send_cmd(CMD24, sect) == 0 )    /* WRITE_BLOCK */
            if( xmit_datablock(buff, 0xFE) )
                count = 0;

ersmith · 2022-05-03 13:49

@evanh said:
Eric,
In disk_write() function in sdmm.c, I had earlier found a bug with order of execution in the case of single block writes. My fix was to replace this:
      if ((send_cmd(CMD24, sect) == 0)    /* WRITE_BLOCK */
          && xmit_datablock(buff, 0xFE))
          count = 0;
with this:
      if( send_cmd(CMD24, sect) == 0 )    /* WRITE_BLOCK */
          if( xmit_datablock(buff, 0xFE) )
              count = 0;

Aren't those two pieces of code identical? Internally I think if (A && B) F() is implemented as if (A) { if (B) F(); }.

evanh · 2022-05-03 13:58

The first one executes in reverse order - Which is a problem when the command needs to come before the data block.

EDIT: I only found it accidentally during my optimisation testing. I've forgotten the exact circumstances though. There was some error that wasn't explained by my changes and narrowing down the error lead me to that.

ersmith · 2022-05-03 14:23

@evanh said:
The first one executes in reverse order - Which is a problem when the command needs to come before the data block.

Not as it was written in the comment. Did you mean to put the xmit_datablock() first? Because I'm sure that if (send_cmd && xmit_datablock) will be the same as if (send_cmd) if (xmit_datablock) ....

evanh · 2022-05-03 14:32

Maybe it's an optimiser thing. I've found the posting now - https://forums.parallax.com/discussion/comment/1537502/#Comment_1537502

The original code wasn't working. There was an error return code but I didn't know which one was returning the error. I split the two parts up to collect the return codes and it started working properly. I stopped looking any further at that point. So, I'm only guessing the problem is due to order of execution within the if() statement.

evanh · 2022-05-03 14:45

I'm guessing C doesn't rigidly define the execution order. ie: If the optimiser finds an advantage in reverse order then it's free to do so.

Wuerfel_21 · 2022-05-03 16:39

@evanh said:
I'm guessing C doesn't rigidly define the execution order. ie: If the optimiser finds an advantage in reverse order then it's free to do so.

It most certainly does define evaluation order for && and ||. Otherwise code like if (ptr && ptr->member) wouldn't be valid.

ersmith · 2022-05-03 17:14

I don't see any significant difference in the generated code for the two versions; the new version (with nested if's) does save one instruction, but I find it hard to believe that the timing is that sensitive that it would matter. But perhaps it does? The diff between the original (with &&) and new (with nested if's) is shown below. The difference is the same for both -O1 and -O2:

--- orig-O2.p2asm   2022-05-03 14:06:34.436157276 -0300
+++ new-O2.p2asm    2022-05-03 14:06:12.731017413 -0300
@@ -2772,20 +2772,19 @@
    add ptr__fatfs_cc_dat__, #381
    rdbyte  local06, ptr__fatfs_cc_dat__
    sub ptr__fatfs_cc_dat__, #381
-   test    local06, #8 wz
+   and local06, #8 wz
  if_e  shl local05, #9
    cmp local04, #1 wz
  if_ne jmp #LR__0131
-   mov arg01, #24
    mov arg02, local05
+   mov arg01, #24
    call    #_fatfs_cc_send_cmd_0740
-   mov local06, result1
-   zerox   local06, #7 wz
+   zerox   result1, #7 wz
  if_ne jmp #LR__0134
    mov arg01, local02
    mov arg02, #254
    call    #_fatfs_cc_xmit_datablock_0736
-   cmps    result1, #0 wz
+   cmp result1, #0 wz
  if_ne mov local04, #0
    jmp #LR__0134

It does look like in the optimizer has trouble figuring out that local06 is no longer live after the whole expression, and hence generates slightly different code in the two cases. But both sets of code look correct to me.

evanh · 2022-05-03 22:10

@Wuerfel_21 said:
It most certainly does define evaluation order for && and ||. Otherwise code like if (ptr && ptr->member) wouldn't be valid.

I'd be concerned about such a use. Funnily, without memory protection, even if your code was executed in reverse, that example wouldn't cause any problem. Whereas Eric's certainly does.

Wuerfel_21 · 2022-05-03 22:12

@evanh said:
I'd be concerned about such a use.

Look at any C program (that deals with nullable struct pointers, anyways) and you will see that pattern everywhere.

evanh · 2022-05-03 22:15

@ersmith said:
I don't see any significant difference in the generated code for the two versions; ... The difference is the same for both -O1 and -O2:

Well, I couldn't think of another reason at the time. If Ada is correct then maybe the optimiser isn't always following the standard.

ersmith · 2022-05-03 22:33

@evanh said:

@ersmith said:
I don't see any significant difference in the generated code for the two versions; ... The difference is the same for both -O1 and -O2:

Well, I couldn't think of another reason at the time. If Ada is correct then maybe the optimiser isn't always following the standard.

Comparing the generated code shows though that the optimizer is following the standard, at least in this case. The timings are slightly different; could that be the source of the difference? Or, could there have been some other bug that you fixed around the same time?

evanh · 2022-05-03 22:41

@ersmith said:
... at least in this case.

I'll have a shot at reproducing it tonight ...

No timing issues. Only thing currently timing sensitive in my code is buffer underflow/overflow [of smartpins because the SPI clock is non-stop for whole block]. The unused bit-bashed version had tight timing constraints but those were limited to within the volatile cogexec section.

EDIT: So, it's not interrupt friendly at this stage. I'd probably rewrite the low-level code again using streamer+FIFO for supporting interrupts.

evanh · 2022-05-03 22:52

Hmm, no luck. I've tried three editions of the driver: The latest edition, the pre-bug edition as posted, and the post-bug edition as posted, both bit-bashed and smartpin code paths where available. None exhibit the behaviour I'd observed. That's as far I want to go retracing it.

I'll put it down to a freak combination/bug at the time and guessing incorrectly what was wrong. There was a lot of corrupted SD cards back then. And I often used them long after they should have been reformatted.

Lol, I still feel nervous about the logical evaluation order.

ersmith · 2022-05-04 13:18

@evanh said:

>

Lol, I still feel nervous about the logical evaluation order.

I understand . But the C standard is pretty explicit that in A && B and A || B the A has to be evaluated first, and B is only evaluated if it needs to be.

Thanks for digging into this and trying to reproduce it. I was afraid there was some subtle bug in flexspin (and there probably are still some, but apparently not in this particular case).

Wuerfel_21 · 2022-05-07 13:06

I think the updated FS code has a bug somewhere in it. Commit b46478b975db78c63173e0f01b2d1995c280de58 breaks loading some ROMs in MegaYume (Red screen of death -> SEGA boot code fails to verify game checksum). I think it has something to do with running into EOF during fread, because the only ROM that causes the issue isn't a multiple of the 16k read block size I use:

PRI rom_read_block(dest) | i,j,tmp1,tmp2
'' Read a 16K block of ROM data
if tmp1 := $4000 - c.fread(dest,1,$4000,romfile)
  bytefill(dest+$4000-tmp1,0,tmp1)

Will look into this a bit deeper....

EDIT: Verified the assumption: padding the file to a 16K multiple makes it work with the new FS code.

Wuerfel_21 · 2022-05-07 14:26

Issue resolved: new code runs too fast (?) and the loop that calls rom_read_block was written in a very stupid way that just happened to work if rom_read_block is slow enough to never catch up to the PSRAM write. Odd that that never manifested with the "old FS code + faster SPI" setup.

evanh · 2022-05-07 23:15

I like those ones! New code so much faster it breaks the calling program.

evanh · 2022-05-18 08:24

Eric,
Struck what must be a regression, when testing Ahle's simpleSound - https://forums.parallax.com/discussion/172894/simplesound-a-simple-to-use-sound-engine-for-the-p2/p1

Compiling "main.spin2" produces the following error message without any reference to a source file:

Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
Version 5.9.11-beta-v5.9.10-14-g28045294 Compiled on: May 15 2022
main.spin2
|-simpleSound.spin2
|-|-reSound.spin2
|-|-trackerPlayer.spin2
warning: Internal warning, found two GETQX during constant propagate??
warning: Internal warning, found two GETQX during constant propagate??
main.p2asm
Done.

Behaviour is the binary runs but only a low audio buzzing noise is produced from speakers. Maybe with signs of the sound module still sequencing in the noise.

I'm pretty certain this wasn't a problem last time I compiled this code ... which was back in March. Certainly Pnut.exe has no issues.

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Comments