@Rayman said:
Thanks @ersmith
Can you pls remind me how to increase hub cache size?
There are some defines fairly early in riscvtrace_p2.spin2; you should define one of TOTAL_SIZE or CACHE_SIZE. As it stands now the cache cannot go above 64K, and probably needs to be smaller than that (I'm not sure, I haven't tested for a little while).
@Wuerfel_21 said:
Oh, loadp2 is able to load arbitrary stuff directly into flash now?
Yes. Now there are 3 ways loadp2 can write to flash . I need to consolidate them and just use the latest one. It's nice, because it uses another COG and writes one buffer to flash while another buffer is coming in over serial, and lets you write almost anywhere in flash (as long as it starts on a 4K boundary).
@ersmith said:
Yes. Now there are 3 ways loadp2 can write to flash . I need to consolidate them and just use the latest one. It's nice, because it uses another COG and writes one buffer to flash while another buffer is coming in over serial, and lets you write almost anywhere in flash (as long as it starts on a 4K boundary).
Are you sure it works though? When I pass -HIMEM=flash, the loaded program appears unable to access the flash at all. Forgot to clear some pin state?
@ersmith said:
Yes. Now there are 3 ways loadp2 can write to flash . I need to consolidate them and just use the latest one. It's nice, because it uses another COG and writes one buffer to flash while another buffer is coming in over serial, and lets you write almost anywhere in flash (as long as it starts on a 4K boundary).
Are you sure it works though? When I pass -HIMEM=flash, the loaded program appears unable to access the flash at all. Forgot to clear some pin state?
It works for me on my P2 Eval board -- I can copy data to the flash & read it back, and I can run code using the riscvp2_flash variant. Are you using a different board? What address are you flashing to? I've only tested flashing to 64K boundaries, although 4K boundaries should work in principle. And the flash reading code I'm using has pretty much all been based on the simple SpiFlash example, so maybe there's a difference there?
EDIT: I see that the flash code I was using wasn't shutting down the smart pins. That didn't seem to matter to the SpiFlash code, but maybe conflicts with other flash code? I've changed it to disable the smart pins
@Rayman said:
On Eval board guess that means one can’t access the usd if running code from flash right?
Probably not? I guess if we force the usd access code into HUB memory and code it very carefully we might be able to, but it'll be tricky if it's even possible at all.
@ersmith Trying it with micropython (baby steps?) and get this error:
d: build/shared/libc/string0.o: in function `memset':
string0.c:(.text.memset+0x0): multiple definition of `memset'; /opt/riscv/bin/../lib/gcc/riscv-none-elf/13.2.0/../../../../riscv-none-elf/lib/rvp2_flash.o:(.text+0xa0): first defined here
collect2: error: ld returned 1 exit status
make: *** [Makefile:122: build/firmware.elf] Error 1
Ok, for some reason beyond me, they redefine memset in string0.c
Commented that out and now it compiles.
That doesn't feel like the correct solution though...
@Rayman said:
@ersmith Trying it with micropython (baby steps?) and get this error:
```
d: build/shared/libc/string0.o: in function memset': string0.c:(.text.memset+0x0): multiple definition ofmemset'; /opt/riscv/bin/../lib/gcc/riscv-none-elf/13.2.0/../../../../riscv-none-elf/lib/rvp2_flash.o:(.text+0xa0): first defined here
collect2: error: ld returned 1 exit status
I made a native version of memset because I got annoyed tracing through the compiled version of it used in the startup code while I was debugging. We can probably pull it out, although it may be a little faster as direct native code than the compiled RISC-V version. I guess micropython is defining their own memset, which seems like kind of a bad idea but maybe they're trying to avoid standard libraries as much as possible.
@Rayman said:
Ok, so the firmware.bin file is 2 GB, just a hair too much I think
Yes, I think I mentioned that for flash programs you do not want to convert the .elf to .bin, since the flash code has the high address bit set. The .elf to .bin conversion is probably obsolete anyway, dating back to before loadp2 understood .elf.
Ok, seems to work now... In flexprop, added that to serial run command: "%D/bin/loadp2" -HIMEM=flash -k %P -b%r "%B" %9
Loaded the .elf and looks normal.
But, seems to be very slow.
Did "import pye" and seemed slower.
Did "pye.pye("text.txt")" and seems to be locked up...
I think there's a define called USER_MEMORY in ports/riscv-p2/main.c; at least there used to be. That sets the size of the micropython heap (which is just a static array). You'll have to change that in order to get more user memory, Obviously making it too big could be really bad -- I'm not sure if you'll get an error from anything if you do set it too high and overflow HUB ram (theoretically the linker should warn you, but I don't know if that's set up properly).
@Rayman said:
Ok, seems to work now... In flexprop, added that to serial run command: "%D/bin/loadp2" -HIMEM=flash -k %P -b%r "%B" %9
Loaded the .elf and looks normal.
But, seems to be very slow.
Did "import pye" and seemed slower.
Did "pye.pye("text.txt")" and seems to be locked up...
You might be able to find out how much slower if you run a performance script we typically used when Eric and I were developing both versions of MP. It should be in the zip file I posted here in the Micropython thread.
It just loops and print the iterations reached after 10s runs of different routines. We had extensive results of that in that thread so it will be useful as a reference though that was with earlier versions of MP. To keep things fair you should compare using v1.23 of MP with flash and with HUB. Eventually maybe PSRAM. You will need the pyb.millis function working however.
I've updated the JIT compiler a bit to have improved flash read and to slightly increase the JIT instruction cache (to a full 32K). In theory it should go up to 64K, according to the comments, but that didn't work for me -- the comments may have pre-dated support for compressed instructions. I'll have to look into this, it would be nice to get all 64K.
Not sure what code you are using for flash reads @ersmith, but I think there is some high speed dual SPI stuff floating around which might have been from @evanh IIRC. In time it would be good to optimize for reads given the transfer rate of SPI flash is relatively slow vs the other memories we have. Caching helps mitigate this of course.
@ersmith said:
I've updated the JIT compiler a bit to have improved flash read and to slightly increase the JIT instruction cache (to a full 32K). In theory it should go up to 64K, according to the comments, but that didn't work for me -- the comments may have pre-dated support for compressed instructions. I'll have to look into this, it would be nice to get all 64K.
Found the bug, it was checking for cache overflow before rounding addresses. The 64K cache works now, and I've checked it in to default to that when compiled for flash.
@rogloh said:
Not sure what code you are using for flash reads @ersmith, but I think there is some high speed dual SPI stuff floating around which might have been from @evanh IIRC. In time it would be good to optimize for reads given the transfer rate of SPI flash is relatively slow vs the other memories we have. Caching helps mitigate this of course.
Yes, there's definitely room for optimizing the reads. I'm trying to keep things as simple as possible for now, with an eye to possibly adding support for reading from the eval board SD card as well, which will be tricky in itself.
pye.pye still crashes micropython with himem enabled.
Attempted to send .elf to flash to see if any different and got this:
Could not allocate -2146800580 bytes
@Rayman said:
pye.pye still crashes micropython with himem enabled.
Crashes, or hangs? I'm guessing you're doing this on your own board with SD on different pins? Are you able to see if there's any activity on the flash pins? It could be that something is thrashing the SPI flash cache.
Attempted to send .elf to flash to see if any different and got this:
Could not allocate -2146800580 bytes
This is the same problem as trying to make binaries -- the flash creation code doesn't know how to deal with addresses with the high bit set. It's on my TODO list for loadp2, but might be a while.
@Rayman said:
I2C code to test 9DOF accelerometer works though...
So, is all the code going to flash? Does this mean the I2C stuff is working out of flash?
Guess would mean that uSD code is working from flash too...
Yes, at the moment all of the code runs from flash. We can edit the linker script to create a special hubtext section that will run code from hub instead, if we find we need it.
Also, any way to know how big USER_MEMORY can be?
Maybe I just need to keep this DEC value below 512kB when not using flash for code?
text data bss dec hex filename
230884 16572 250156 497612 797cc build/firmware.elf
I think you have to leave some headroom too. Keeping it below 500000 should be safe (the absolute limit is 507904 but that leaves no room for stack). And of course when compiling for flash you don't have to include the text segment in the total (so keep data+bss below 500000).
@ersmith, I would say crashes because the vga output goes all black. Still being output though, so guess means buffer written with all zeros. USB led stops flashing too.
timeout while sending data to device
( Entering terminal mode. Press Ctrl-] or Ctrl-Z to exit. )
Never mind, see I need the flash switch set to on...
Ok, so this is a pain... Seems I have flip the flash switch off after boot to access uSD and then look up how to mount with os.
Is that even going to work? I think not...
@Rayman debugging this pye hang is going to be really really hard, since there are so many layers of software involved (python bytecode compiler, bytecode interpreter, RISC-V JIT). It might be a timing problem, or something I missed in the JIT compiler. One thing you might try doing is turning off the RISC-V compressed code option (use -march=rv32_im_zicsr); there's some fiddly code in the 16 bit instruction decode, and perhaps I missed a case where we have to read from EEPROM.
Maybe we'd be better off letting the himem stuff in riscvp2 mature a bit, and/or trying with some code that isn't itself an interpreter.
Seems this is so close to working. So much works, but not this probably most complex test...
It would be very awesome to have this fully working though. I'm willing to do more tests...
Also, I can send you one of these SimpleP2 boards with flash on different pins than uSD if you want...
Comments
Oh, loadp2 is able to load arbitrary stuff directly into flash now?
Hmm, now if I'd have a PFS image generator...
There are some defines fairly early in riscvtrace_p2.spin2; you should define one of
TOTAL_SIZE
orCACHE_SIZE
. As it stands now the cache cannot go above 64K, and probably needs to be smaller than that (I'm not sure, I haven't tested for a little while).Yes. Now there are 3 ways loadp2 can write to flash . I need to consolidate them and just use the latest one. It's nice, because it uses another COG and writes one buffer to flash while another buffer is coming in over serial, and lets you write almost anywhere in flash (as long as it starts on a 4K boundary).
Are you sure it works though? When I pass -HIMEM=flash, the loaded program appears unable to access the flash at all. Forgot to clear some pin state?
It works for me on my P2 Eval board -- I can copy data to the flash & read it back, and I can run code using the riscvp2_flash variant. Are you using a different board? What address are you flashing to? I've only tested flashing to 64K boundaries, although 4K boundaries should work in principle. And the flash reading code I'm using has pretty much all been based on the simple SpiFlash example, so maybe there's a difference there?
EDIT: I see that the flash code I was using wasn't shutting down the smart pins. That didn't seem to matter to the SpiFlash code, but maybe conflicts with other flash code? I've changed it to disable the smart pins
On Eval board guess that means one can’t access the usd if running code from flash right?
Probably not? I guess if we force the usd access code into HUB memory and code it very carefully we might be able to, but it'll be tricky if it's even possible at all.
@ersmith Trying it with micropython (baby steps?) and get this error:
Any idea what to make of that?
Ok, for some reason beyond me, they redefine memset in string0.c
Commented that out and now it compiles.
That doesn't feel like the correct solution though...
Ok, so the firmware.bin file is 2 GB, just a hair too much I think
Regular .ld file appears to work though
I made a native version of memset because I got annoyed tracing through the compiled version of it used in the startup code while I was debugging. We can probably pull it out, although it may be a little faster as direct native code than the compiled RISC-V version. I guess micropython is defining their own memset, which seems like kind of a bad idea but maybe they're trying to avoid standard libraries as much as possible.
Yes, I think I mentioned that for flash programs you do not want to convert the .elf to .bin, since the flash code has the high address bit set. The .elf to .bin conversion is probably obsolete anyway, dating back to before loadp2 understood .elf.
Ok, seems to work now... In flexprop, added that to serial run command: "%D/bin/loadp2" -HIMEM=flash -k %P -b%r "%B" %9
Loaded the .elf and looks normal.
But, seems to be very slow.
Did "import pye" and seemed slower.
Did "pye.pye("text.txt")" and seems to be locked up...
reloaded and did this:
That seems to be about how it was before...
I think there's a define called USER_MEMORY in ports/riscv-p2/main.c; at least there used to be. That sets the size of the micropython heap (which is just a static array). You'll have to change that in order to get more user memory, Obviously making it too big could be really bad -- I'm not sure if you'll get an error from anything if you do set it too high and overflow HUB ram (theoretically the linker should warn you, but I don't know if that's set up properly).
You might be able to find out how much slower if you run a performance script we typically used when Eric and I were developing both versions of MP. It should be in the zip file I posted here in the Micropython thread.
https://forums.parallax.com/discussion/comment/1473808/#Comment_1473808
It just loops and print the iterations reached after 10s runs of different routines. We had extensive results of that in that thread so it will be useful as a reference though that was with earlier versions of MP. To keep things fair you should compare using v1.23 of MP with flash and with HUB. Eventually maybe PSRAM. You will need the pyb.millis function working however.
I've updated the JIT compiler a bit to have improved flash read and to slightly increase the JIT instruction cache (to a full 32K). In theory it should go up to 64K, according to the comments, but that didn't work for me -- the comments may have pre-dated support for compressed instructions. I'll have to look into this, it would be nice to get all 64K.
Not sure what code you are using for flash reads @ersmith, but I think there is some high speed dual SPI stuff floating around which might have been from @evanh IIRC. In time it would be good to optimize for reads given the transfer rate of SPI flash is relatively slow vs the other memories we have. Caching helps mitigate this of course.
Found the bug, it was checking for cache overflow before rounding addresses. The 64K cache works now, and I've checked it in to default to that when compiled for flash.
Yes, there's definitely room for optimizing the reads. I'm trying to keep things as simple as possible for now, with an eye to possibly adding support for reading from the eval board SD card as well, which will be tricky in itself.
pye.pye still crashes micropython with himem enabled.
Attempted to send .elf to flash to see if any different and got this:
Could not allocate -2146800580 bytes
I2C code to test 9DOF accelerometer works though...
So, is all the code going to flash? Does this mean the I2C stuff is working out of flash?
Guess would mean that uSD code is working from flash too...
Also, any way to know how big USER_MEMORY can be?
I'm guessing I should have reduced it after adding things to the micropython binary, right?
Maybe I just need to keep this DEC value below 512kB when not using flash for code?
Crashes, or hangs? I'm guessing you're doing this on your own board with SD on different pins? Are you able to see if there's any activity on the flash pins? It could be that something is thrashing the SPI flash cache.
This is the same problem as trying to make binaries -- the flash creation code doesn't know how to deal with addresses with the high bit set. It's on my TODO list for loadp2, but might be a while.
Yes, at the moment all of the code runs from flash. We can edit the linker script to create a special
hubtext
section that will run code from hub instead, if we find we need it.I think you have to leave some headroom too. Keeping it below 500000 should be safe (the absolute limit is 507904 but that leaves no room for stack). And of course when compiling for flash you don't have to include the
text
segment in the total (so keepdata
+bss
below 500000).@ersmith, I would say crashes because the vga output goes all black. Still being output though, so guess means buffer written with all zeros. USB led stops flashing too.
I'll try on P2 eval...
Huh, on eval getting this:
Never mind, see I need the flash switch set to on...
Ok, so this is a pain... Seems I have flip the flash switch off after boot to access uSD and then look up how to mount with os.
Is that even going to work? I think not...
This is the line in the main pye() routine where it chokes:
key = slot[index].edit_loop() ## edit buffer
It gets through a lot of stuff before getting here though...
BTW: It'd be convenient if the FlexProp GUI could include ".elf" in the default files when doing "Run Binary"...
Digging further, it's this line in edit_loop(self) where it goes dark:
self.redraw(self.message == "")
Looks like this is the line where it loses it:
Editor.scrbuf = [(False,"\x00")] * Editor.height ## force delete
Looks like multiplying the array by anything goes off the rails...
Actually, could be just declaring a large array breaks it.
If make screen tiny and also don't use * to define the array, it doesn't crash the VGA/USB, but doesn't work either, doesn't respond to input..
@Rayman debugging this pye hang is going to be really really hard, since there are so many layers of software involved (python bytecode compiler, bytecode interpreter, RISC-V JIT). It might be a timing problem, or something I missed in the JIT compiler. One thing you might try doing is turning off the RISC-V compressed code option (use
-march=rv32_im_zicsr
); there's some fiddly code in the 16 bit instruction decode, and perhaps I missed a case where we have to read from EEPROM.Maybe we'd be better off letting the himem stuff in riscvp2 mature a bit, and/or trying with some code that isn't itself an interpreter.
@ersmith tried your -march and got same result.
Seems this is so close to working. So much works, but not this probably most complex test...
It would be very awesome to have this fully working though. I'm willing to do more tests...
Also, I can send you one of these SimpleP2 boards with flash on different pins than uSD if you want...
I'm guessing it has to be something to do with allocating a lot of memory, but who knows.
Very strange that some much works but not, pye...