Basic Q's about the Spin2 Interpreter Size, Content and Loading

JRetSapDoog · 2021-07-05 05:09

I'm using the Propeller Tool ver. 2.5.3. When I press F8 to compile/view info, the popup window reports 4232 bytes being used for the Spin2 interpreter, and those bytes are displayed in the hex viewer on the right and colored pink. About this, I have a few questions (please pardon, but point out, any misconceptions in my phrasing of such questions):

[1] Will that always be the same total no matter what my program is? I looked at a handful of programs and it appears to be fixed. Perhaps this total could change for a future version of Spin2 (such as with built-in floating point), but my meaning is just for any particular version used for various programs. I would guess that the interpreter "engine" for processing byte codes would have a fixed size (and it seems to have, as mentioned), but I just wanted to check.

[2] How fixed is the content of those 4232 bytes? I assume that most of it is fixed. But I imagine certain values have to change, such as for the clock speed perhaps. I wonder if there is a certain range of that space that is totally fixed (seems likely). And I wonder if there is any sort of guide with insights into the interpreter, kind of like Cliff Biffle's humorous early Propeller Binary Format PDF guide for the P1 (Yes, I think that the interpreter code is available, but I mean a high-level description or overview for mere mortals).

[3] When the Prop2 boots (or otherwise loads in a program), will this exact same data load into the hub byte-for-byte starting at hub location $0 of the hub (without any gaps) and "march" upwards, not only for the interpreter, but also for the code/data section next and then the var section? Or do things get rearranged (or some things omitted/added)? That is, is the image that I see in the hex viewer on the right after pressing F8 in the PT the exact same image that will load into the hub? And if not, what are the differences, if I may ask?

[4] And what portion of this displayed interpreter code gets loaded into cog 0's registers (starting from where in the hub to where in the hub)? I've forgotten (or never fully appreciated) the details, but I think that the Spin2 interpreter utilizes substantial portions of a cog's RAM and LUT. But that's at most 2x512x4 = 4096 bytes (ignoring the fixed registers at the end of RAM), but the above 4232 is bigger than that by 136 bytes. So that means that some of the data just stays in the hub. Also, I assume that such loading is handled by the fixed firmware of the P2 (that is, there's no code in the interpreter (that gets loaded in) that is handling the cog loading). Please correct me if I'm wrong or have over simplified that (though starting byte code address(es) might be an exception, as below).

[5] And similar to question 2, for other cogs that get loaded (not just cog 0 on booting) with Spin2 interpreters, where are the addresses stored that tell those cogs where to start pulling their byte codes from? And are such addresses fixed upon compiling, or does a portion of the address need to be calculated at run time? Hmm, on re-reading this before posting, it occurs to me that there is probably just one copy of the interpreter code in a complete program's image (no matter how many interpreters it launches). Sorry, I didn't test that, but that seems likely to me to reduce code size. And given that one could tell a cog to fire up different methods, then I guess that the starting addresses for the byte codes must be dynamically calculated (or at least set) at runtime. I guess there's probably just a location in each Spin2 cog's interpreter instance that has this starting address. Of course, for the current chip, at max, only 8 interpreters can run at one time, so there could be one table somewhere for all the starting addresses, but that seems unlikely to me.

[6] Lastly, and this question diverges a bit from the above theme, are there any special values in the hub image that are acted upon by anything other than a cog? That is, is there anything in either the P2's firmware or even its hardware that directly accesses value in the hub? Or does everything go through a cog? For example, when the clock speed is set (other than the default value) in the P2's hardware, that's done through the "supervision" of a running cog, is it not? But, for example, is there anything in the P2 hardware (that's hardwired in, whether globally or on a per cog basis) that "reaches" into the hub for values (or that changes values in the hub) on its own without explicit direction from running code in a cog? I'm sure that the cogs do almost everything, but do they do absolutely everything? And feel free to mention any exceptional behavior by the firmware during bootup, if any.

I know that these are basic questions for most of you long-term followers, but I've never really given such concepts much thought. And on thinking about them today, I realized that there is a lot that I don't know. Ignorance may be bliss, but it ain't that bliss. Plus, in commenting (should anyone be gracious enough to wade through this), feel free to add anything that's related or that I haven't even thought to ask about. Thanks. --Jim

Cluso99 · 2021-07-05 05:33

@JRetSapDoog said:
I'm using the Propeller Tool ver. 2.5.3. When I press F8 to compile/view info, the popup window reports 4232 bytes being used for the Spin2 interpreter, and those bytes are displayed in the hex viewer on the right and colored pink. About this, I have a few questions (please pardon, but point out, any misconceptions in my phrasing of such questions):

[1] Will that always be the same total no matter what my program is? I looked at a handful of programs and it appears to be fixed. Perhaps this total could change for a future version of Spin2 (such as with built-in floating point), but my meaning is just for any particular version used for various programs. I would guess that the interpreter "engine" for processing byte codes would have a fixed size (and it seems to have, as mentioned), but I just wanted to check.

Chip keeps adding features, so it will change. You cannot rely on it being fixed or in any fixed location.

[2] How fixed is the content of those 4232 bytes? I assume that most of it is fixed. But I imagine certain values have to change, such as for the clock speed perhaps. I wonder if there is a certain range of that space that is totally fixed (seems likely). And I wonder if there is any sort of guide with insights into the interpreter, kind of like Cliff Biffle's humorous early Propeller Binary Format PDF guide for the P1 (Yes, I think that the interpreter code is available, but I mean a high-level description or overview for mere mortals).

As for [1]. The P1 interpreter is available but with little comments. If you want to understand it better take a look at my faster version as I commented it and there is documentation about the bytecodes used.

[3] When the Prop2 boots (or otherwise loads in a program), will this exact same data load into the hub byte-for-byte starting at hub location $0 of the hub (without any gaps) and "march" upwards, not only for the interpreter, but also for the code/data section next and then the var section? Or do things get rearranged (or some things omitted/added)? That is, is the image that I see in the hex viewer on the right after pressing F8 in the PT the exact same image that will load into the hub? And if not, what are the differences, if I may ask?

When P2 boots, it loads the ROM into various sections of the RAM. The listing is published, but there may be differences depending on what paths were taken in the boot. eg Flash, SD, etc. There is no Interpreter loaded unless code is booted from Flash, SD or downloaded.

[4] And what portion of this displayed interpreter code gets loaded into cog 0's registers (starting from where in the hub to where in the hub)? I've forgotten (or never fully appreciated) the details, but I think that the Spin2 interpreter utilizes substantial portions of a cog's RAM and LUT. But that's at most 2x512x4 = 4096 bytes (ignoring the fixed registers at the end of RAM), but the above 4232 is bigger than that by 136 bytes. So that means that some of the data just stays in the hub. Also, I assume that such loading is handled by the fixed firmware of the P2 (that is, there's no code in the interpreter (that gets loaded in) that is handling the cog loading). Please correct me if I'm wrong or have over simplified that (though starting byte code address(es) might be an exception, as below).

No Interpreter code gets loaded into Cog on a raw boot, only the ROM boot code. A listing is available.

[5] And similar to question 2, for other cogs that get loaded (not just cog 0 on booting) with Spin2 interpreters, where are the addresses stored that tell those cogs where to start pulling their byte codes from? And are such addresses fixed upon compiling, or does a portion of the address need to be calculated at run time? Hmm, on re-reading this before posting, it occurs to me that there is probably just one copy of the interpreter code in a complete program's image (no matter how many interpreters it launches). Sorry, I didn't test that, but that seems likely to me to reduce code size. And given that one could tell a cog to fire up different methods, then I guess that the starting addresses for the byte codes must be dynamically calculated (or at least set) at runtime. I guess there's probably just a location in each Spin2 cog's interpreter instance that has this starting address. Of course, for the current chip, at max, only 8 interpreters can run at one time, so there could be one table somewhere for all the starting addresses, but that seems unlikely to me.

The ROM only loads COg 0. As I said, no interpreter on raw boot.

[6] Lastly, and this question diverges a bit from the above theme, are there any special values in the hub image that are acted upon by anything other than a cog? That is, is there anything in either the P2's firmware or even its hardware that directly accesses value in the hub? Or does everything go through a cog? For example, when the clock speed is set (other than the default value) in the P2's hardware, that's done through the "supervision" of a running cog, is it not? But, for example, is there anything in the P2 hardware (that's hardwired in, whether globally or on a per cog basis) that "reaches" into the hub for values (or that changes values in the hub) on its own without explicit direction from running code in a cog? I'm sure that the cogs do almost everything, but do they do absolutely everything? And feel free to mention any exceptional behavior by the firmware during bootup, if any.

Basic answer, no. Hardware doesn't modify the hub ram, except for copying the top 16KB from the serial ROM at raw boot.

I know that these are basic questions for most of you long-term followers, but I've never really given such concepts much thought. And on thinking about them today, I realized that there is a lot that I don't know. Ignorance may be bliss, but it ain't that bliss. Plus, in commenting (should anyone be gracious enough to wade through this), feel free to add anything that's related or that I haven't even thought to ask about. Thanks. --Jim

JRetSapDoog · 2021-07-05 05:57

Thanks for those responses, Cluso99. Much appreciated! That gives me something to think about.

Also, in partial answer to Q4, I just found this from page 4 of the interpreter thread (though the exact details may have changed since then): "Cog registers $000..$162 are free for user programs."

evanh · 2021-07-05 06:00

Answer[6]: Aside from a hard reset, hub-ops are only initiated by cogs. HUBSET instruction manipulates some special hardware registers that are truly in the hub. In the case of system clock, there is 25 flip-flops forming the mode bits of the clock mode register. There is only one of each hub register and they will share a common bus between all cogs. Which is why each cog's hub-ops are enforced into interleaving time slots.

There is other parts of the prop2 that do have their own processing:

The FIFOs, timing wise, are relatively independent of the cogs and do modify hubRAM. They depend on their respective cog for direction but access hubRAM at their own pace and are bound to slot timing like the hub-ops are.
Streamers, which heavily use the FIFOs, are part of the cogs but also run in parallel and pace themselves once set going.
The 64 smartpins each do their own thing independent of all else, although they do nothing by default and never interact with hubRAM.
The cordic is a hub resource. It can accept a new command on every clock tick but this is time sliced between all cogs. So all the instructions for cordic-ops, except maybe GETQX/Y, have the slot timing effect. The cordic runs in parallel to the cogs while it has any command in its pipeline.

JRetSapDoog · 2021-07-05 06:11

Thanks, evanh. More to think about. Now as for those three "other parts" bullets, at least we know that such automatic or semi-automatic behavior is instigated by the cogs (so known to a high-level programmer).

evanh · 2021-07-05 06:15

Added a fourth.

JRetSapDoog · 2021-07-05 06:17

Back to Cluso99 saying:

"There is no Interpreter loaded unless code is booted from Flash, SD or downloaded. ... No Interpreter code gets loaded into Cog on a raw boot, only the ROM boot code. ... The ROM only loads COg 0. As I said, no interpreter on raw boot. "

Thanks for setting me straight on that. And I need to be careful how I use the word "boot." Also, to other developers, sorry if I'm being too Spin2--centric; I realize that PASM code for other environments can load in rather than a Spin2 interpreter (including those that can really compile Spin2).

evanh · 2021-07-05 06:19

Yeah, different from Prop1 where the Spin runtime environment was always there first, as part of the ROM boot code.

JRetSapDoog · 2021-07-05 06:57

Thanks, folks.

By the way, the reason that I got to thinking about this is kind of weird and tricky to explain (and the "Why?" is mostly just as a learning step, not as something that is needed, per se). I wanted to try a little experiment. I wanted to see if I could have a cog with a Spin2 interpreter (let's assume that it's a cog other than cog 0) load an image of a program (say something short and simple that just toggles an LED) back into the hub and later cog 0, the image being stored as data in the hub but as a part of the object for the program being run by that cog.

That is, I wondered if I could save the binary for such a (short) program and then include that data in an object to be launched into a cog. Yes, obviously, that's all possible. But then I'd have that cog (not cog 0) move up that image from wherever it was in the hub to the start of the hub (I assume that I can use bytemove to do that as long as I don't overwrite anything before it has been moved). So that would get that image into position just like the original program didn't exist (though remnants would be left higher up in the hub). And having done that, I then wondered if I could load the interpreter portion of that image into cog 0 and kick it off and then have the cog stop itself.

So that's why I wanted to know if the data (hex) image we see in the Prop Tool loads into the hub verbatim starting from $0. I'm not sure that that was answered above, but I think that that is the case. But when it comes to loading a Spin2 interpreter into cog 0, I now see that it is not as straightforward as just copying some portion of that hub image into a cog (since, for example, the first portion of a Spin2 interpreter cog is available to the user--for inline assembly or whatever, I guess). So what portion of the hub image needs to be moved into a cog to have a working interpreter and where to move it (and whether in cog RAM or LUT), I know not. Maybe it's simply moving one chunk here or perhaps one chunk here and another there, but I don't know. Plus, it's likely that some of the RAM and LUT that are not loaded need to be cleared (assuming that everything is not overwritten when moving in the interpreter code).

Cluso99 · 2021-07-05 07:48

The problem with the P2 Interpreter is that once it is compiled the software does not really know where anything else and has to be determined at runtime. So a pasm program does not know where it will be loaded and so there are problems of accessing fixed locations/buffers/variables without getting a runtime pointer to it. This may or may not impact what you are thinking about.

JRetSapDoog · 2021-07-05 11:31

Thanks, Cluso99. I guess that in this scenario, everything will be okay because it is the Prop Tool compiler that created the binary that I want to "load" or copy to $0, so it should take care of everything to let things resolve at runtime, if there is anything to resolve (as it's just the one cog). I'm not sure about more complex programs that utilize multiple cogs concurrently (haven't thought about that yet).

Update: Well, that's assuming that I kill off all the other cogs before doing a coginit on Cog 0, and then rapidly kill off the so-called loader. Also, I'm not sure about any clock speed differences (or other settings) between the loader and the binary being loaded to run. Plus, I'll bet there are a lot of other things that I'm not considering.

JRetSapDoog · 2021-07-05 12:06

Okay, what I did since posting the poorly-worded questions above was write the simple attached program. It starts a cog (cog 0) that starts another cog (likely cog 1) that copies a binary "image" of the Hello Blinky program from a Dat section into the Hub at $00000, that binary image being compiled by the Prop Tool.

With Cog 0, it slowly blinks the right LED (P57) three times and then launches a second cog (I'll call it Cog 1) and then it dies a natural death. After Cog 1 begins executing, it more quickly blinks the LED on P57 three times, then copies the binary from the Dat section to the Hub at $0. It then pauses for a second, before blinking quickly three more times.

It then does a coginit(0, $00000, 0) to try to load Cog 0 with the binary that was copied to Hub $0,, before stopping itself. The compiled Hello Blinky binary (image) does appear to work, as P56 blinks forever, which was what it was coded to do. As such, perhaps copying a compiled binary image to the Hub at $0 and calling coginit is enough to load a program??? I don't know that for sure. Would this only work reliably if the clock frequency between the "loader" program and the binary are the same? If so, could I get the clock info from $40 of the hub (I'm assuming a Prop Tool compiled Spin2 program here) after copying the binary from the Dat section? Or maybe the compiler will take care of setting the clock to whatever just fine, especially if I quickly kill the "loader" cog.

A couple of notes:
[1] The state of the LED's reverts to their default state (lit) upon copying the binary to the Hub at $0. That's before coginit has been called.

[2] I tried having Cog 1 blink the P57 LED perpetually, and that worked. But I believe that only worked because the Hello Blinky binary (A) didn't kill off Cog 1, and (B) the binary image footprint was small and didn't overwrite the bytecodes for the "loader" program in Cog 1 that blink the LED on P57. Does that sound right to you folks?

So this was just a test/experiment to see if loading a binary into the Hub from a Cog would work. I now see from the manual that something similar can be done externally with hex text over serial. So this won't be news to you guys (for that reason and others), but it was news to me (and I never did anything like this on the P1).

But the reason that I really did this test and asked the question above is that I'm wondering if a similar method could be used to start a new program (binary image) by reading it off of an SD card (rather than hardcoding its binary image data into a Dat section). If this can work, then I don't think that I would have to worry about whether the file on the SD card was fragmented because I think/hope that the SD card driver (FAT32 and SPI) would take care of all of that for me. Does that seem right? I'm not sure about whether I'd have to worry about the clock, and I could have the "loader" (such as it is) kill off all other cogs before loading Cog 0 and then kill off itself (assuming that the copying process doesn't kill it off first (i.e., it shoots itself in the foot)). If it matters, in the foregoing scenario, the binaries (programs) that I would be interested in loading and running from the SD card would be over half video buffer areas (as opposed to actual code). I guess those would be variables (Var arrays), and hence they would be allocated at (or towards) the end of the binary image. Anyway, I need to think about all of that more, and about potential caveats to this method.

I guess I should have said what I intended to do right from the start rather than asking the six question in the first post. Sorry about that.

JRetSapDoog · 2021-07-05 12:36

And even if this method will work, I wonder what other things need to be considered (or what the potential caveats are). Also, is there some kind of checksum byte/word/long in the binary image that can be used to check the loaded data against? Though if there is, it doesn't appear that it's mandatory to check it (based on the above experiment), at least not by the "loader" doing the loading.

JRetSapDoog · 2021-07-05 16:47

I didn't want to hijack Wuerfel_21's "Actually Functional Spin2 SD Driver thread" any more than I already had, so I started this thread. First, what I'm trying to do, Wuerfel_21 calls "file booting." That is, it's not power-on or hard reset booting. And Kye's method to do that on the P1 is called "bootPartition()," which is described as " Loads the propeller chip's RAM from the specified file." It loads it and runs it. That's the goal. BTW, although the method is still present in Wuerfel_21's P2 version and does compile, I assume that it doesn't work based on Wuerfel_21's comment that says "File booting is not yet implemented (but should be easy to add)." But maybe I should try it, anyway, and see what happens, ha-ha. No, I'll assume that it doesn't work. And in that it's "easy to add," I guess it was left to the reader to do as an exercise. So here we are. Anyway, here are three other relevant comments from that thread:

I. From Wuerfel_21: Booting mostly has to do with the block driver. It basically has to stop all the other cogs, then load the file into RAM, init cog 0 and then stop itself.
On P2 there's the added issue that the file being booted may be fragmented (this is mostly irrelevant to P1 since the cluster size on SD cards is usually 32k)

II. From deets: So within this context one could write a firmware flat into the SD-Card from a known start Block (probably 0) and just read it sequentially. Or use FAT32 and then load the firmware from a file that can be spread all over the place.

III. From msrobots: The way Kye's driver boots on the P1 is to read the FAT to get the sector number chain into the block driver cog. This COG then kills all other ones, loads the file into HUB, sets the clock to RCFAST and starts itself with the SPIN interpreter. ... On the P2 things are a bit more complex because you need the existing clock settings to switch the PLL off, ...

Regarding I, although I'm only simulating reading from the SD card using a Dat section that has a compiled binary in it, it sounds like I'm basically doing that. Hopefully, I will get around to trying this from an SD card in the next day or so. But I don't really understand the "fragmentation" part of the comment because I thought that the FAT32 object took care of that for us. Currently, in another program, I read in ~700 byte chunks of a 1.5MB text file seemingly without a hitch. However, the file isn't likely fragmented. Still, even if it was, wouldn't the FAT32 object take care of traversing the file (such that I wouldn't see the fragmentation at a higher level)?

Regarding II, I'm not sure what a "firmware flat" refers to or whether it's a typo, but hopefully I don't need to deal with SD card blocks. So I'lll ignore that part. But then that comment says "Or use FAT32 and then load the firmware from a file that can be spread all over the place." Yep, that's the goal. And again, I'm currently assuming/hoping that the FAT32 object takes care of following the chain of sectors (or whatever they're called) to move through the file.

Regarding III and the part about getting "the sector number chain into the block driver cog," I'm hoping that Wuerfel_21's version of Kye's driver still does that for regular file access tasks, such that I don't need to do that. At least, that's going to be my working assumption for now, since I know that I can move through a big text file without dealing with that in my code. As for the part that says "This COG then kills all other ones [and] loads the file into HUB," that's kind of what I've simulated using a compiled binary in a Dat section (though I didn't actually kill off all cogs because only two were ever launched).

However, I did not set the clock to RCFAST in my little test (and I'm sure that coginit() wouldn't do that when just loading a single cog). Also, I didn't do any kind of chip reset (hubset or whatever it's called (sorry, haven't directly used it yet)), at least not yet, anyway. So in that I didn't set the clock to RCFAST, did I just get lucky? Or is it that it's unnecessary if one's old (original) program and new (loaded compiled binary) program have the same clock/PLL settings? Mine did. Maybe that's why I got away with it. Did I basically just inherit the existing clock (from the original Cog 0 interpreter) since there was no real boot or chip reset, even though the compiled binary does specify the clock frequency, and it just happened to match, so no problem? I suppose that's likely. So it sounds like I might have to deal with the clock (unless all the programs that I might file boot use the same clock frequency, which is almost for sure not something I'd want to depend on.

And then msrobots also says that I'll need to shut the PLL off (after getting the clock setting). I'll need to read the manual to understand more about the clock, because I'm not sure if, or how much, this part about shutting the PLL off overlaps with switching to RCFAST. And if the existing clock settings are needed to do this, as msrobots said, then I guess I can get those from Hub $40 and/or $44 (in the case of Chip's Spin2 interpreter as it's currently compiled). So, that's something for me to look into.

However, I think what I'll do first is just ignore the clock, like I did in my little simulation above using a compiled binary from a Dat section, and see if I can get that to work from an SD card. Hopefully, that will be straightforward (and similar to, or the same as, reading a text file to the end). That is, my old (original) and new (file to "boot" from) programs will use the same clock/PLL settings, such that, hopefully, RCFAST mode and/or shutting off the PLL aren't necessary. Then, if that works, and hopefully it will, I'll try to add code to deal with the clock settings (including adding a 10ms or so pause between switchover that I saw somewhere to allow things to stabilize). But I like to take things one step at a time, hence the little experiment in the attached file above, then file booting from SD at the same clock frequency/settings first. So that's the plan. And as for adding checksum code, as asked about above, I'll just ignore that for now (since reading a text file from an SD card has been so reliable), even though, long term, that presents a risk of some kind of P2 damage (though I think that the risk is fairly low in my case and this is not for any mission critical usage).

Okay, sorry if I've provided too much detail and asked too many questions. But maybe this will be useful to someone else in understanding things better. In researching things to get this far, I came across a thread created by Jon (sorry, I don't have a link at the moment) asking, I believe, if Spin2 could add a function to do just this sort of thing (file booting, though not perhaps by that name). And the three pages of comments that ensued about low-level matters, while interesting, didn't ever seem to directly address his original suggestion (and if I recall correctly, Jon didn't have any more posts in that thread (perhaps he threw up his hands)). My point is that this file booting thing, though apparently trivial to add for those who are in-the-know about the chip (that wouldn't be me, as I'm just a light, high-level user), will be useful to many of us. And yes, I realize that I could have probably already accomplished the SD card file booting phase in the time that it has taken me to compose this update. Oh well, that's me (step-by-step).

Wuerfel_21 · 2021-07-05 18:32

I. From Wuerfel_21: Booting mostly has to do with the block driver. It basically has to stop all the other cogs, then load the file into RAM, init cog 0 and then stop itself.
On P2 there's the added issue that the file being booted may be fragmented (this is mostly irrelevant to P1 since the cluster size on SD cards is usually 32k)

II. From deets: So within this context one could write a firmware flat into the SD-Card from a known start Block (probably 0) and just read it sequentially. Or use FAT32 and then load the firmware from a file that can be spread all over the place.

III. From msrobots: The way Kye's driver boots on the P1 is to read the FAT to get the sector number chain into the block driver cog. This COG then kills all other ones, loads the file into HUB, sets the clock to RCFAST and starts itself with the SPIN interpreter. ... On the P2 things are a bit more complex because you need the existing clock settings to switch the PLL off, ...

Regarding I, although I'm only simulating reading from the SD card using a Dat section that has a compiled binary in it, it sounds like I'm basically doing that. Hopefully, I will get around to trying this from an SD card in the next day or so. But I don't really understand the "fragmentation" part of the comment because I thought that the FAT32 object took care of that for us. Currently, in another program, I read in ~700 byte chunks of a 1.5MB text file seemingly without a hitch. However, the file isn't likely fragmented. Still, even if it was, wouldn't the FAT32 object take care of traversing the file (such that I wouldn't see the fragmentation at a higher level)?

Yes, the FAT driver handles fragmentation internally by walking the cluster chain as the file is being read. But that doesn't work for booting because the FAT code may get overwritten before the entire program has been loaded. So the cluster chain has to be checked before the load commences, either to give to the block driver to use during load or to check it isn't fragmented and just pass the sector address and size. (AFAICT simply copying a file onto a card from a PC will never result in fragmentation unless the card is nearly full and/or badly fragmented already).

If you want to see how to follow the cluster chain manually, this is what Spin Hexagon does to check that it's audio files are defragmented and get the first SD sector (the file size can afterwards be queried in the usual way):

PRI contigFile(name) : result | err,erc,clust
  send(10,"Checking ")
  send(strsend(name))
  send("... ")
  err := \fat.openFile(name,"r")
  if fat.partitionError
    send("openFile error: ",strsend(err))
    repeat
  clust := fat.getCurrentFileCluster
  result := fat.firstSectorOfCluster(clust) + fat.getHiddenSectors

  'go looking for fragmentation
  repeat
    erc := fat.followClusterChain(clust)
    if fat.isClusterEndOfClusterChain(erc)
      quit
    if erc <> clust+1
      send(" oh no, fragmented!",10)
      repeat
    clust := erc

  send("OK!")

  waitms(100)

BTW, if you only need to load code from SD on initial bootup, just call the file _P2_BOOT.BIX, set the right pullups and it will happen automagically. Relatedly, .BIX is the correct(tm) file extension for P2 binaries.

JRetSapDoog · 2021-07-05 19:38

@Wuerfel_21 said:
Yes, the FAT driver handles fragmentation internally by walking the cluster chain as the file is being read. But that doesn't work for booting because the FAT code may get overwritten before the entire program has been loaded.

Thanks, wuerfel_21! Yikes! That's right: The FAT32 code, which is in bytecode form in the hub, could (and often would) get clobbered during the loading process. That's a shame. I had briefly worried about my Cog 1 "loader" getting clobbered, but I hadn't thought far enough ahead about the big FAT32 program (which I didn't include for my little test) getting clobbered. That does throw a monkey wrench into things, though with the code that you have provided (thanks!!!), there is a basis for a way around that. So I take it that the SPI PASM portion of the code isn't affected by this clobbering since it resides in a cog instead of as bytecodes in the hub (however, I haven't checked to what extent, if any, it might use external buffers in the hub, though hopefully that's minimal).

Hmm. actually, in my case, since a video buffer will always (I think, anyway) occupy over half the hub and would start out with zeros, perhaps I could temporarily load the data for the interpreter, code and other data up there, ignoring the buffer area (that's assuming that such a buffer always ends up towards the high end of the hub memory map). And then once that's loaded into that buffer area, then make a copy of it down lower in memory starting at $0. And then clear everything above that (where the video buffer needs to be). Yeah, I know, it's not a good general solution, but just musing (I might try it, though, since it's simple, just to see if it works).

As for booting on powerup/reset from an SD card, no, I'm not looking to do that. However, when you say that .BIX is the correct name for P2 binaries, I wonder if you only mean those that will boot from powerup/reset, as opposed to binaries like in my case where one program needs to replace itself with another off of an SD card. For the latter case, I figured that I'd just call them .BIN, as I did with the P1 (but if there's a new convention.... ). For really booting from SD with Cluso99's code, I can see how it might matter, but in my case, I guess a rose by any other name would smell as sweet. Anyway, THANKS SO MUCH for the guidance and code. --Jim

Wuerfel_21 · 2021-07-05 19:53

@JRetSapDoog said:

@Wuerfel_21 said:
Yes, the FAT driver handles fragmentation internally by walking the cluster chain as the file is being read. But that doesn't work for booting because the FAT code may get overwritten before the entire program has been loaded.

Thanks, wuerfel_21! Yikes! That's right: The FAT32 code, which is in bytecode form in the hub, could (and often would) get clobbered during the loading process. That's a shame. I had briefly worried about my Cog 1 "loader" getting clobbered, but I hadn't thought far enough ahead about the big FAT32 program (which I didn't include for my little test) getting clobbered. That does throw a monkey wrench into things, though with the code that you have provided (thanks!!!), there is a basis for a way around that. So I take it that the SPI PASM portion of the code isn't affected by this clobbering since it resides in a cog instead of as bytecodes in the hub (however, I haven't checked to what extent, if any, it might use external buffers in the hub, though hopefully that's minimal).

Only hub memory used are the audio buffers and the command mailbox. The former just has to not be enables and the latter is only cleared when a command completes (which a boot command never technically would).

Hmm. actually, in my case, since a video buffer will always (I think, anyway) occupy over half the hub and would start out with zeros, perhaps I could temporarily load the data for the interpreter, code and other data up there, ignoring the buffer area (that's assuming that such a buffer always ends up towards the high end of the hub memory map). And then once that's loaded into that buffer area, then make a copy of it down lower in memory starting at $0. And then clear everything above that (where the video buffer needs to be). Yeah, I know, it's not a good general solution, but just musing (I might try it, though, since it's simple, just to see if it works).

That'd work, yeah. I don't think Spin2 has the _FREE thing yet though, so you can't reliably place a buffer at the end of RAM.

As for booting on powerup/reset from an SD card, no, I'm not looking to do that. However, when you say that .BIX is the correct name for P2 binaries, I wonder if you only mean those that will boot from powerup/reset, as opposed to binaries like in my case where one program needs to replace itself with another off of an SD card. For the latter case, I figured that I'd just call them .BIN, as I did with the P1 (but if there's a new convention.... ). For really booting from SD with Cluso99's code, I can see how it might matter, but in my case, I guess a rose by any other name would smell as sweet. Anyway, THANKS SO MUCH for the guidance and code. --Jim

It's because .BIN is a really dumb extension to use for executables that happened to become the de-facto standard.

msrobots · 2021-07-05 21:40

Jim,

Kye's driver for the P1 solved the problem nicely and I do not see any reason why it would not work the same on the P2.

I am busy right now building a 'man-cave' so I could not take the time to play with the P2 but I am eager to try @Wuerfel_21's version of Kye's driver.

On the P1 you can change the clock frequency without knowing at what frequency it is running by setting it to RCFAST (so the PLL gets switched off) wait a moment and then switch to the desired new frequency (turning the PLL on again) wait a moment (to let the PLL settle) and you are fine.

The same was planned for the P2 but someone (@evanh?) found out that there is a glitch sometimes and this fails. So somehow one needs the old clock setting clear the last two bits, set this as frequency wait a moment, then set RCFAST wait a moment and then set the new frequency.

For that reason all compiler-programmer agreed on a fixed HUB location for that needed value. But there is this small Gallic Town (remember Asterix and Obelix?) called Red Bluff where Chip is living. And he tends to do things different. So as you already found out he is storing the value one a different location.

On the next zoom meeting I will ask him about this.

And then there are programs written in PASM without Spin (just DAT section, no Spin methods) and they do not need to store the value there at all, but could and should.

Even if a lot of people claim that SD cards do not fragment files, FAT32 uses sectors and is chaining them together supporting fragmentation.

So Kye's driver (in SPIN) makes a list of sectors need to be loaded and gives that list to the SPI-Block-driver running in some COG.

This driver now can shut down all other COGs so nobody is using any HUB memory. Now the block driver loads all needed sectors from SD starting at HUB address 0, unmounts the SD card, switches to RCFAST, sets all PINs to input, starts a COG at HUB 0 and terminates itself.

The same should work for the P2.

My guess is that @Wuerfel_21 did not implement it because of the address mismatches for the clock settings.

My second guess is if we all say 'pretty please' often enough she might consider adding this.

Enjoy!

Mike

Cluso99 · 2021-07-05 23:52

My raw SD Driver can locate a file and the load/run hooks are there but i dont check for fragmented files. However ive not seen fragmented files on a SD. My driver is totally cog resident and IIRC doesnt use lut.

Please use .BIX as .BIN is used in P1. I use the same SD card for both P1andP2 and thats why i chose .BIX.

The ROM SD Driver can also load/run files but again frag is not handled.

JRetSapDoog · 2021-07-06 21:05

@wuerfel_21: Thanks for the breakdown of how hub memory is used for sdspi_with_audio. And thanks for the confirmation about my little interim strategy to boot a file. I'll need to think more about the reason(s) why a buffer can't be reliably placed at the end of hub, though (other than in the case of using the debugger, which write protects the last 16KB, I believe). In my case, I'm just using such a buffer area prior to replacing a running program. About using .BIX instead of .BIN, I'm all in based on what you, msrobots and Cluso99 have said. I'm thinking of .BIX as standing for Bytecode Interpreted eXecutable. By the way, in a few days, I hope to look at the code you provided and see if I can grok any of it (but that'll be a challenge for me, I'm sure).

JRetSapDoog · 2021-07-06 21:08

@Cluso99: Thanks for your comments. I will use the file extension .BIX from now on. As for fragmented files, I still have some confusion on that matter and can't really comment meaningfully at this time. But if they are rarely, if ever, copied to an SD card, I'm thinking that maybe they could be ignored (just assume that the chain is sequential), at least in my non-mission critical usage scenario. I really appreciate your comments. You folks are a lot of help!

JRetSapDoog · 2021-07-06 21:19

@msrobots: Thank you so much for the explanation about the clock settings and the history behind it. I'm still a ways out from trying to deal with it (as I'm ignoring the clock settings for now), but your comments should come in handy once I confront the matter. And thanks for the additional comments about the process for file booting.

As for nicely asking wuerfel_21 to add such functionality with a "pretty please," I likely wouldn't dare to do so unless I was offering to pay her. Besides, she's already been soooo helpful already in providing her version of Kye's driver, and also responding to comments about it (including those from inexperienced users like me). I want to stay on her good side, or not get on her bad side, or get off of her bad side if I'm already there, whatever the case may be. But keep a good thought, there, with that indirect plea.

As for Chip using different locations for the clock settings, I read through that thread and you folks nearly persuaded him. In fact, you had him on the hook but you just didn't reel him in. I know that it's a bit frustrating for the toolmakers out there, but if Chip's way helps him to keep his mind free, it may help him come up with something cool again, even if that cool thing isn't directly related to that low address range topic. Also, like he said, if he decides that he wants to change it, it's a rather simple change in the compiler.But I do hear what you're saying, and feel free to keep banging that drum if you feel that you
should.

I was 100% behind you on your comments on Zoom about how great it would be to have 16 cogs. Chip was reluctant to agree (despite that being a design goal once), as he knows how to cram multiple, disparate functions into one running cog (such as on the P2 Arc8de project). But he's the chip designer, and we can't expect most users to be so good at doing that. Anyway, what amazed me was how you were able to make such a good argument for having 16 cogs right off the top of your head and on the fly like that. I could so use a couple of extra cogs right now. And even in those cases where multiple functionality can be crammed into a single cog, that doesn't mean that it's ideal (as you touched upon). But I'm mostly just saying that I appreciated your comments; I know that we have the design in silicon now (with possible variants some day). And what a great design it is!

And about that man-cave, that sounds really cool. But wait: I kind of thought that your whole house was a man-cave. Did you go "traditional" with the rest of it now as a concession to meet society's expectations? That's rhetorical. By the way, you've got a solar installation there, too, I do believe. Where do you find the time! Speaking of man-caves, I was watching a video on the Huygens Optics YouTube channel today, and he has a man-cave, but it's a separate dwelling/structure, so that he can get away from the loved ones and concentrate on his optical bench. I wonder if yours is in a converted bedroom or in the garage/car-port. And what will you put in it? Will it be filled with audio and video stuff and pinball machines, or will it be dedicated to the P2 and other homebrew tech? Hmm, perhaps a lot of both. Thanks again for your assistance and comments.

JRetSapDoog · 2021-07-06 21:30

I went ahead and wrote some code to see if I could file boot off of an SD card using the method I proposed earlier (first copying things to the upper half of hub), and it appears to work for a small file. Update: I needed to use pasm to do the final move from the upper hub to the lower hub, otherwise a larger program will clobber the bytecodes of the Spin2 cog doing the move. So I updated the method listing below.

This is partly just for learning. I'm sure that wuerfel_21's proposed way is the best and most general (as my way is limited to files that are only half the size of the hub, not counting buffers). But it's a start. And if I don't have any other way, it should still allow me to do what I want to do. Anyhow, for what it's worth (maybe nothing), here's the method that I used:

pub bootFileOnSDCard() | i, fs, noc, cs, fbp, cid
'fs=file size, noc=num of chunks of file, cs=chunk size, fbp=file buff ptr, cid=cog id, cs=chunk size
'replace currently running P2 program with another one from a .BIX file on SD card
'Two steps: [1] copy .BIX file to upper half of hub, and [2] copy upper half to lower half (using pasm)
'This method avoids possible corruption of the running FAT32 driver that uses bytecodes in the hub
'Warning 1: Assumes that bytecodes and any non-zero data are within 2**18 = 262144 bytes of .BIX file
'Warning 2: Ignores clock settings; the new program "inherits" the old clock; may cause instability

  sdErrMsg := \sdc.openFile(string("KEYPAD1A.BIX"), "R") 'open P2 eXecutable (.BIX) file
  'debug(zstr(sdErrMsg), dly(2000))

  cs := 512     'set the chunk size (cs) to read in from the file each time
  fbp := 262144 'starting point for file buffer pointer (upper half of hub) = 2**18 = 512*512 = 512**2
  fs := sdc.fileSize() <# 262144  'get file size = fs, but limit maximum to 262144 (half the hub size)

  noc := fs / cs     'calc no. of chunks to read; ignore file after byte no. 262144
  if fs // cs > 0    'check for partially full last chunk from file
    noc++   'add one more chunk for a partially full last chunk (which is  quite likely)
  'debug(udec(noc), dly(2000))

  repeat noc 'copy the file chunk-by-chunk into the upper half of hub ram
    sdErrMsg := \sdc.readData(@Passage, cs)      'get chunk from file (Passage buffer is defined globally)
    bytemove(fbp, @Passage, cs)                  'copy chunk of chunk size cs to second half of hub ram
    fbp += cs                                    'advance file buffer pointer for next chunk

  sdc.closeFile()                                'close the .BIX file
  sdc.unmountPartition()                         'unmount the SD card (partition)
  sdc.FATEngineStop()                            'done with FATT32 (optional, can comment out)

  cid := cogID()                                 'get the cog ID of this loader cog
  repeat i from 0 to 7                           'stop all cogs, except this loader cog
    if i <> cid
      cogstop(i)

  pinclear(0 addpins 31)                         'float pins and clear smartpins
  pinclear(32 addpins 31)

  'D= %0010_xxxx_xxxx_xxLW_DDDD_DDDD_DDDD_DDDD       'Last 16KB of RAM and 166 Debug Interrupts for Cogs
  hubset(%0010_0000_0000_0000_0000_0000_0000_0000)   'L=0=don't lock W&D; D=0=disable debug interupt

  'clkset(clkmode, clkfreq)                      'safely utilize new clkmode and clkfreq (glitch free)
                '
  'use pasm in cog (not hub bytecodes) to copy upper half of hub to lower half, then clear upper half
  coginit(7, @moveProgram, 0)                    'launch the .BIX file in lower hub into cog 0 

  'let this cog (cog 0) fall off the edge and die; it will be replaced by the coginit call anyway

And here's the pasm

dat
moveProgram
        org
        'copy upper half of hub to lower half of hub
                                  'note: hub is addressed in bytes, not longs
        mov     ltm,  ##65536     'load number of longs to transfer
        mov     ptra, ##262_144   'load beginning address of second half of hub into ptra
        mov     ptrb, #0          'load beginning address of first half of hub into ptrb
copy
        rdlong  tmp,  ptra++      'read four bytes from upper half of hub into tmp
        wrlong  tmp,  ptrb++      'copy four bytes from tmp into lower half of hub
        djnz    ltm,  #copy       'when ltm goes to 0, transfer is complete

        'clear the upper half of hub (now that it has been copied to lower half)
        mov     ltm,  ##65536     'load number of longs to transfer
        mov     ptra, ##262_144   'load beginning address of second half of hub into ptra
clear
        wrlong  #0,   ptra++      'clear four bytes of the upper hub
        djnz    ltm,  #clear      'when ltm goes to 0, clear operation is complete

        'TBDone: possibly stop other cogs here

        'TBDone: read the clock settings (at $40 & $44) and safely set the clock

        'load program from beginning of hub into cog 0 and start it
        coginit #0, #0            'start new program in cog 0

        'kill the current cog
        cogid   tmp               'get the current cog's ID
        cogstop tmp               'stop the current cog


ltm     long 0         'number of longs-to-move = 65,536
tmp     long 0         'temporary to hold four bytes to transfer

fit     496

evanh · 2021-07-06 21:51

@msrobots said:
The same was planned for the P2 but someone (@evanh?) found out that there is a glitch sometimes and this fails. So somehow one needs the old clock setting clear the last two bits, set this as frequency wait a moment, then set RCFAST wait a moment and then set the new frequency.

For that reason all compiler-programmer agreed on a fixed HUB location for that needed value. But there is this small Gallic Town (remember Asterix and Obelix?) called Red Bluff where Chip is living. And he tends to do things different. So as you already found out he is storing the value one a different location.

Luckily this isn't generally a concern for compiled program startup because the compiler authors have wisely gone with using RCFAST handover solution by default. Meaning the sysclock frequency is put back to RCFAST before the loader, if any, does COGINIT to the loaded program. Then the loaded program will set itself to CON _CLKFREQ. Coded on the assumption it is always starting from RCFAST. Combined that avoids any glitch potential.

In Chip's case, the loader never leaves RCFAST. It handles the whole program loading while staying in the power up default of RCFAST.

And if you are dynamically changing your clock frequency as part of the program sequences then the compiler builds in the correct steps for avoiding the potential glitch there.

That is unless you bypass the libraries and use inline assembly to set the frequency, or build a pure assembly program. Only then is it your problem to do it right.

Eric also provides a second solution that allows the loadp2 loader to preset the sysclock frequency for the target program but I'm not sure if it gets used at all now.

evanh · 2021-07-06 22:03

End result is: The mailboxes are not used to make setting of sysclock reliable.

msrobots · 2021-07-06 22:20

except you want to load binaries created by different tools from your own code on the P2 like Fat32 driver needs to do. Could run compiled with Pnut or compiled with fastspin and needs to do the clock settings because it is not a boot after reset but a load after @rogloh's driver changed the clock frequency or something along that line.

So it wouild be nice if all tool adhere to the same location, not all but Pnut and Proptool.

just my thinking,

Mike

evanh · 2021-07-07 00:22

RCFAST handover solves the reliability issue right now without needing mailboxes.

I don't see an OS ever taking hold to be honest. The propeller architecture is intended to be used low level. Taking that away is a negative.

PS; The idea of self-hosting also clashes with this IMHO.

AJL · 2021-07-07 03:49

@JRetSapDoog said:
About using .BIX instead of .BIN, I'm all in based on what you, msrobots and Cluso99 have said. I'm thinking of .BIX as standing for Bytecode Interpreted eXecutable.

The problem with thinking of .BIX as standing for Bytecode Interpreted eXecutable is that the file may not include a bytecode engine, an interpreter of any kind, or any bytecode.
While _P2_BOOT.BIX can contain a boot image, so too can _P2_BOOT.BIY.

JRetSapDoog · 2021-07-07 04:41

Yeah, the code that executes wouldn't have to be bytecodes (such as Spin2 bytecodes). It could be anything that runs on the P2 (P2 PASM, such as compiled C/Basic/Spin2). Then just BInary eXecutable, or Binary Image eXecutable. Perhaps .P2X would have been nice (though that can also be something related to Excel), but I think that the .BIX file extension is in ROM, so we'll live with that, and it's good enough.

evanh · 2021-07-07 07:08

I didn't mean to be harsh about self-hosting. It has a warm place in my heart too. But community methods of code development really has changed. The easy of access to very large displays/desktops, PDFs and websites like this forum means a night and day difference from pre-web days.

Back then everything was printed and written down on paper. Producing magazines and books were thriving business markets.

The RPi could be a decent host now. It'll have the requisite browser, PDF viewer and large desktop.

Cluso99 · 2021-07-07 07:49

@JRetSapDoog said:
@wuerfel_21: Thanks for the breakdown of how hub memory is used for sdspi_with_audio. And thanks for the confirmation about my little interim strategy to boot a file. I'll need to think more about the reason(s) why a buffer can't be reliably placed at the end of hub, though (other than in the case of using the debugger, which write protects the last 16KB, I believe). In my case, I'm just using such a buffer area prior to replacing a running program. About using .BIX instead of .BIN, I'm all in based on what you, msrobots and Cluso99 have said. I'm thinking of .BIX as standing for Bytecode Interpreted eXecutable. By the way, in a few days, I hope to look at the code you provided and see if I can grok any of it (but that'll be a challenge for me, I'm sure).

BIX is a P2 binary file which may or may not contain spin bytecode, of for that matter anything else. It's a P2 executable.

Basic Q's about the Spin2 Interpreter Size, Content and Loading

Comments