Can't Wait for PropGCC on the P2? - Page 7 — Parallax Forums

# Can't Wait for PropGCC on the P2?

• Posts: 3,366
...If you want to know a little bit about SD card internals and what makes them tick then this teardown is an interesting read.

Nice find, so the next TACHYON version does not need a propeller anymore but will run directly on a SD card?

Enjoy

Mike
• Posts: 1,452
Dave Hein wrote: »
dgately, please try changing the timeout value from 50000000 to 160000000 in the checktime routine in sdspi.c. That will change it from 0.625 seconds to 2 seconds. Rebuild the stdio library by running buildstdiolib in the lib directory. Then try running fsrwtest.c to see if it works. BTW, I don't think filetest.c will run correctly because a stack at 30K doesn't give it enough memory.

OK, just back from a mission trip to Mexico... Will give this a try today.

dgately (a.k.a. Dennis)

• Posts: 1,452
dgately wrote: »
Dave Hein wrote: »
dgately, please try changing the timeout value from 50000000 to 160000000 in the checktime routine in sdspi.c. That will change it from 0.625 seconds to 2 seconds. Rebuild the stdio library by running buildstdiolib in the lib directory. Then try running fsrwtest.c to see if it works. BTW, I don't think filetest.c will run correctly because a stack at 30K doesn't give it enough memory.

OK, just back from a mission trip to Mexico... Will give this a try today.

Still not able to get either fsrwtest.c or filetest.c to get further... (Since the BeMicro A2 has 128k of RAM, I increased the stack to 104K, to run filetest.c).

I tried the 2GB and an 8GB SD card, both re-formatted as FAT32 on Windows 10 (just in case my macOS Disk Utility was not formatting the SDs in the same way as WIN).

I then increased the timeout value several times & retested to no avail (up to 8 seconds of timeout).

dgately
• Posts: 6,299
Thanks. I was hoping it would work, but since it didn't complete within 0.6 seconds I didn't think more time was going to help. I think my next step is to convert the initialization code in FSRW 2.6 from Spin to C using spin2cpp.
• Posts: 1,452
Dave Hein wrote: »
Thanks. I was hoping it would work, but since it didn't complete within 0.6 seconds I didn't think more time was going to help. I think my next step is to convert the initialization code in FSRW 2.6 from Spin to C using spin2cpp.
The fsrw 2.6 source tree from OBEX: obex.parallax.com/object/15 includes fsrw.c in the "csourse" directory.

dgately
• Posts: 6,299
Dennis, could you please try this version of sdspi.c? I hand converted the SD initialization code in safe_spi.spin to C, and integrated it with the rest of my SD SPI driver code. It works OK for me, but I want to make sure it works on your Mac.

You can test it by copying it into the top level directory of p2gcc, and then typing "p2gcc -r -t fsrwtest.c sdspi.c".
sdspi.c 14.3K
• Posts: 1,452
Dave Hein wrote: »
Dennis, could you please try this version of sdspi.c? I hand converted the SD initialization code in safe_spi.spin to C, and integrated it with the rest of my SD SPI driver code. It works OK for me, but I want to make sure it works on your Mac.

You can test it by copying it into the top level directory of p2gcc, and then typing "p2gcc -r -t fsrwtest.c sdspi.c".

Quickly getting error exit: -1... I actually have to use a p2load that I've modified to use macOS's "/dev/cu.usbserial-xxxxxxxx" style of USB devices...
$p2gcc -v -k -o a.out fsrwtest.c sdspi.c propeller-elf-gcc -mcog -Os -m32bit-doubles -S fsrwtest.c s2pasm -g -p/opt/parallax/lib/prefix.spin2 fsrwtest p2asm -c -o fsrwtest.spin2 propeller-elf-gcc -mcog -Os -m32bit-doubles -S sdspi.c s2pasm -g -p/opt/parallax/lib/prefix.spin2 sdspi p2asm -c -o sdspi.spin2 p2link /opt/parallax/lib/prefix.o -v -o a.out fsrwtest.o sdspi.o /opt/parallax/lib/stdio.a /opt/parallax/lib/stdlib.a /opt/parallax/lib/string.a Found offset of 12 for symbol ___files of type W at location 1f24$ p2load -p /dev/cu.usbserial-AE00BU4H -b 115200 -t -v a.out
[ Entering terminal mode.  Press ESC to exit. ]
errorexit: -1


BTW: I used buildlibs to rebuild the new sdspi.c into new *.a libs and copied them into /opt/parallax/lib/... So, I really didn't need to build with "p2gcc -r -t fsrwtest.c sdspi.c", and could have just done "p2gcc -r -t fsrwtest.c".

dgately
• Posts: 6,299
I really don't know what to try next. The code works fine for me on a DE2-115. I'll probably need to get my hands on a DE0-Nano or BeMicro-A2 to be able to debug this.
• Posts: 17,777
Dave,
I just got caught with the JMP/CALL #label instructions not doing what I required. Just maybe you could check this.

Unconfirmed...
#label does a relative/call jump if possible, else direct cog/lut/hub but it's anyones guess as to the mechanics of what is used where.
#@label does a direct jump/call. It seems to always use a hub address.
#\label does a direct jump/call. It seems to use a hub address is there is an intervening orgh, otherwise a cog/lut address. But it doesn't seem consistent.
• Posts: 2,763
From "instructions_v32.txt"
A symbol declared under ORGH will return its hub address when referenced.

A symbol declared under ORG will return its cog address when referenced,

COGINIT #0,#@newcode

For immediate-branch and LOC address operands, "#" is used before the
address. In cases where there is an option between absolute and relative
crosses between cog and hub domains, or relative addressing when the
branch stays in the same domain. Absolute addressing can be forced by
following "#" with "\".

CALLPA/CALLPB/DJZ..JNXRL/JNATN/JNQMT   - rel_imm9/ind_reg20
JMP/CALL/CALLA/CALLB/CALLD             - abs_imm20/rel_imm20/ind_reg20
LOC                                    - abs_imm20/rel_imm20


• Posts: 17,777
ozpropdev wrote: »
From "instructions_v32.txt"
A symbol declared under ORGH will return its hub address when referenced.

A symbol declared under ORG will return its cog address when referenced,

COGINIT #0,#@newcode

For immediate-branch and LOC address operands, "#" is used before the
address. In cases where there is an option between absolute and relative
crosses between cog and hub domains, or relative addressing when the
branch stays in the same domain. Absolute addressing can be forced by
following "#" with "\".

CALLPA/CALLPB/DJZ..JNXRL/JNATN/JNQMT   - rel_imm9/ind_reg20
JMP/CALL/CALLA/CALLB/CALLD             - abs_imm20/rel_imm20/ind_reg20
LOC                                    - abs_imm20/rel_imm20


It's not working as described
• Posts: 4,920
Cluso99 wrote: »
It's not working as described

In PNut or in p2asm? What problem exactly are you seeing?

Personally I really wish that branch relative/absolute was explicit all the time (by having different mnemonics for relative and absolute branches). I think it would save some headaches.

• Posts: 17,777
I keep getting told how it works, but it doesn't and it's not consistent.
Yes, for final compilers, we will require a way to specifically use Hub addresses. There are some inconsistencies, and some things that don't work now.
But we need to get the ROM compiling correctly and silicon in production first.
• Posts: 17,777
edited 2018-05-10 13:51
I should add, that all my testing for SD and the Monitor/Debugger used JMP/call #label. I had no issues with code in cog calling code in hub and returning. When I asked, the answers I received were flawed.
But when I added this to the BootROM, what was working failed when put into the FPGA v133a. It turned out (by looking at the compiled hex code) that the JMP #label now failed. I should have used JMP/call #@label but it's not always possible if you define the address as a constant. Cannot recall precise problem atm.

We also found that now we were testing pull-ups and they were incorrect. Turned out that the SD hardware on the BeMicroCV-A9 has pull-ups on all SD pins except CSn which has a very high pulldown. I then desoldered them, but there is still something pulling up some pins and is unresolved.
The next problem, discovered today, is that I believe the ROM code is incomplete in the FPGA. Everything above $FC550 is blank. I was trying to patch and jump into the SD and Monitor code (and TAQOZ) but it's not there. Then I found that the OBJ file output by pnut is incorrect too. All these are solvable with workarounds, so it's not a problem. Just wasted time that is frustrating. • Posts: 14,699 ersmith wrote: » Personally I really wish that branch relative/absolute was explicit all the time (by having different mnemonics for relative and absolute branches). I think it would save some headaches. Agreed. That is much clearer and easier to read, and generate. Other MCUs do this at the mnemonic level, not at the crypto-suffix level. eg RJMP, AJMP, LCALL etc • Posts: 6,299 Dennis, it occurred to me that the problem you're encountering with the SD code may be due to the version of the compiler you're using. In earlier posts I commented that some of the instructions are executed in a different order when comparing your binaries with mine. Could you try running the fsrwtest.bin file that's contained in the attached zipfile? • Posts: 1,452 Dave Hein wrote: » Dennis, it occurred to me that the problem you're encountering with the SD code may be due to the version of the compiler you're using. In earlier posts I commented that some of the instructions are executed in a different order when comparing your binaries with mine. Could you try running the fsrwtest.bin file that's contained in the attached zipfile? Ah, yes! Now the test is working... So, I just need to build a gcc that matches the version you are using! Thanks, dgately • Posts: 6,299 What? That's great news. Now I just need to figure out what your version of PropGCC is doing. It must be messing up the order of when the data and clock lines are changed. • Posts: 1,452 I found another gcc version (inside a copy of SimpleIDE) that has a version much closer to the version you use, but it still builds the incorrect binary. VERSION OF ALTERNATIVE GCC:$ propeller-elf-gcc -v
Using built-in specs.
COLLECT_GCC=propeller-elf-gcc
COLLECT_LTO_WRAPPER=/opt/parallax/libexec/gcc/propeller-elf/4.6.1/lto-wrapper
Target: propeller-elf
Configured with: ../../propgcc/gcc/configure --target=propeller-elf --prefix=/opt/parallax --disable-nls --disable-libssp --disable-lto --disable-shared --with-pkgversion=propellergcc_v1_0_0_2431 --with-bugurl=http://code.google.com/p/propgcc/issues
gcc version 4.6.1 (propellergcc_v1_0_0_2431)

Note the binary file size differences, as well...
-rw-r--r--    1 user  staff      20560 May 10 21:20 fsrw2.bin  <== my original gcc compiler build (propellergcc-alpha_v1_9_0_)
-rw-r--r--    1 user  staff      20560 May 10 21:16 fsrw2.bin    <== my recent build (propellergcc_v1_0_0_2431)
-rw-r--r--@   1 user  staff      20276 May 10 21:17 fsrwtest.bin  <== from your zip file


dgately
• Posts: 1,452
Dave, I edited p2gcc to remove the propeller-elf-gcc '-Os' optimization option and got fsrwtest.c & filetest.c to build and run linked with the new sdspi.c that you posted. Of course, this creates large binaries that may only fit the larger RAM cache of the BeMicro A2 (128K) and not the DE0-Nano (32K). But, it works! And, works with the 2GB and 8GB SD cards that I tested.

For fsrwtest.c, a 29,464 byte binary is created. For filetest.c, a 40,628 byte binary is created.

Would removing that optimization from p2gcc in your sources be a problem?
in the p2gcc shell executable file, replace:
ccstr="propeller-elf-gcc -mcog -Os -m32bit-doubles -S"
with
ccstr="propeller-elf-gcc -mcog -m32bit-doubles -S"

$./p2gcc -v -k -o a.bin fsrwtest.c sdspi.c propeller-elf-gcc -mcog -m32bit-doubles -S fsrwtest.c s2pasm -g -p/opt/parallax/lib/prefix.spin2 fsrwtest p2asm -c -o fsrwtest.spin2 propeller-elf-gcc -mcog -m32bit-doubles -S sdspi.c s2pasm -g -p/opt/parallax/lib/prefix.spin2 sdspi p2asm -c -o sdspi.spin2 p2link /opt/parallax/lib/prefix.o -v -o a.bin fsrwtest.o sdspi.o /opt/parallax/lib/stdio.a /opt/parallax/lib/stdlib.a /opt/parallax/lib/string.a Found offset of 12 for symbol ___files of type W at location 34d0$ ls -al | grep a.bin
-rw-r--r--    1 user  staff      29464 May 11 11:15 a.bin


dgately
• Posts: 6,299
It makes sense that turning off optimizations fixes the problem. That ensures that all of the instructions are executed in the same order that it's written. However, turning off optimizations for everything is probably not the best solution. It increases the code size, and also slows down execution. WIth GCC it's possible to use pragmas to change the optimization in local areas. I'll look into that further.

I would like to understand exactly what is failing when using your version of the compiler. Maybe there is a more elegant way of fixing it. When GCC optimizes code it will maintain the order when it's required to get the right result, or when an external function is called, or when a volatile variable is changed. However, if it's just changing values in registers the order doesn't necessarily matter. The I/O registers OUTA, OUTB, DIRA, etc. may be treated like the general purpose registers in this case, which may be causing the problem.

Could you please run "p2gcc -k sdspi.c" on the attached sdspi.c file, and send me the resulting a.out and all the sdspi.* files. I will then compare your results with mine to see where the optimizer decided to change the order of the instructions.
• Posts: 1,452
Dave Hein wrote: »
Could you please run "p2gcc -k sdspi.c" on the attached sdspi.c file, and send me the resulting a.out and all the sdspi.* files. I will then compare your results with mine to see where the optimizer decided to change the order of the instructions.

Attached...

dgately

• Posts: 14,699
Dave Hein wrote: »
...

I would like to understand exactly what is failing when using your version of the compiler. Maybe there is a more elegant way of fixing it. When GCC optimizes code it will maintain the order when it's required to get the right result, or when an external function is called, or when a volatile variable is changed. However, if it's just changing values in registers the order doesn't necessarily matter. The I/O registers OUTA, OUTB, DIRA, etc. may be treated like the general purpose registers in this case, which may be causing the problem.....

Hmm, situations where compiler versions can break code do not sound a great place to be ...
The good news is the issue has been exposed..

• Posts: 6,299
I diffed the assembly files and the later version generates more efficient code when changing the I/O registers. A sample of the diff is as follows:
227,233c208,210
< .L29
< 	mov	r6, OUTB
< 	and	r6, r3
< 	mov	OUTB, r6
< 	mov	r6, OUTB
< 	or	r6, r5
< 	mov	OUTB, r6
---
> .L28
> 	and	OUTB,r3
> 	or	OUTB,r5

The older version uses 3 instructions to modify OUTB, and the newer version uses only 1 instruction. The INB register is read immediately after this, which makes me think there is a problem with pipeline delays. However, in both cases the rising edge of the clock happens immediately before INB is read. So if there is a problem with pipeline delays it seems like it should affect both versions the same. Also, during some of my debugging tests I did try putting delays after each change of the clock bit in OUTB, and that didn't make any difference.

I didn't see any re-ordering issues in sdspi.s, so that doesn't seem to be the problem. I'll keep looking at it.
• Posts: 17,777
IIRC INB is read about 6+ clocks prior to the instruction due to pipeline delays.
• Posts: 6,299
Dennis, I think I found a fix for the problem. I used your sdspi.s, and I was able to reproduce the problem. I added a delay between when the clock is changed and the INB register is read. A "waitx #1" still had the problem, but a "waitx #2" fixed it. So the problem is indeed caused by the pipeline delays on the I/O registers. In the C code I added a call to a delay function. Eventually, I'll probably use assembly drivers for reading and writing.

Could you please try the attached sdspi.c file to see if it works for you?
• Posts: 1,452
Dave Hein wrote: »
Could you please try the attached sdspi.c file to see if it works for you?

I got a good result with this version of sdspi.c.

Thanks for digging into this!

dgately

• Posts: 6,299
Thanks for testing the code I sent you. I'll check the fix into GitHub, and I'll post a new release soon with all the fixes.
• Posts: 11,069
Dave,
Cluso is correct, there is significant lag between the I/O and the Cogs. See this code - https://forums.parallax.com/discussion/comment/1426180/#Comment_1426180

There, a clock pulse is sent out then a data bit is read in. However, the data bit read is from the previous clock pulse. Note Peter says it is the low going clock edge that does the clocking, and Chip says the TESTP samples from before the coded clock pulse.

So, that particular clock pulse is to bring in a possible subsequent data bit if the reading loop repeats. If the loop exits, then that clock pulse is extraneous.

• Posts: 6,299
evanh, thanks for the information and the link. That confirms what I am seeing.