Basic Q's about the Spin2 Interpreter Size, Content and Loading

AJL · 2021-07-10 00:29

@JRetSapDoog said:

```
'Note: copying upper hub to lower hub code and killing cog code omitted for brevity
              'Stop all other cogs (whether extant and active or not)
              cogid   kid               'get cog (kog) id
              mov     tmp,  #15         'highest possible cog number (on a future 16-cog P2 variant)
.kill cmp tmp, kid wz 'don't stop the current cog
if_nz cogstop tmp 'stop all but the current cog
djnf tmp, #.kill
    if_z      coginit #0, #0            'start new program in cog 0  <--remove if_z after testing

              'kill the current cog
              cogid   tmp               'get the current cog's ID
              cogstop tmp               'stop the current cog

By eliminating the intervening code I hope that you can see what I’m seeing. You go through and systematically stop all cogs except 0, then change the clock and initialize cog 0. At that point there’s nothing left running this code, so no need for a final cogstop.

JRetSapDoog · 2021-07-10 06:46

evanh, you didn't give your prediction for the result of your suggested test. I thought it might work since it clears the last two bits (the SS clock source bits). But when it failed three times in a row at the start, I thought that it was going to totally fail. I was wrong about that, too. In a first run of 13 trials, it succeeded 6 times and failed 7. I ended up doing another run of 13 trials and it succeeded 7 times and failed 6. So the combined total was 13:13. It seems it's a coin's toss. [FWIW (likely nothing), the exact sequence of the runs is given in the comments after the code.] Guess I'll pass on doing it this way. Update: See comment about the debugger in the next post.

                  'hubset ##%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS 'set clock mode
                  hubset  #$F0              'set 20 MHz+ (RCFAST) mode; PPPP=1111; CCSS=0000
                  'hubset  ##clkmode_ & !3   'zero SS clock source bits (inhibits PLL/foreces RCFAST mode)
                  hubset  #0                  
                  waitx   ##20_000_000/100   'wait ~10ms (50ns x 200k = 0.01 sec)
                  'Run A: F,F,F,S,F,S,F,S,S,S,F,F,S  =>  6S:7F = Success:Failure to file boot
                  'Run B: S,S,S,F,S,F,S,F,S,S,F,F,F  =>  7S:6F 
                  'Total: 13 successes : 13 Failures

evanh · 2021-07-10 06:55

Huh, it was supposed to be 100% success. I might need to see the startup code of program after the COGINIT.

JRetSapDoog · 2021-07-10 06:58

Hmm, this may be interesting. The above test was conducted with the debugger in the Propeller Tool enabled. But when I turned that off and ran the test, I got 26 failures to file boot in a row. And when I turned the debugger back on, the chance of success was back to being a coin's toss. I might have to redo my prior tests, too. But remember that I'm using the upper half of the hub to temporarily store the image to boot, and then I clear the upper half of hub after copying it to the lower half. And the debugger uses the last 16KB of the hub, doesn't it? So it would seem that I'm stomping on the debugger (but I'm new to using the debugger, so I'd better not speculate).

JRetSapDoog · 2021-07-10 07:13

Thanks for your comments, Ariba.

@Ariba said:
I also have made a similar bootfile methode for FSRW. The program that wants to launch another bin (bix) file has to pass a hub address to the methode, where the new file can be loaded (free hub behind the current program or a screenbuffer or something else). ... I load the new code there, then copy it to hubaddr 0 in a inline PASM2 loop and start a cog with it.

Well, yeah, if you want to do things the smart, more-likely-to-work way.

I stop other running cogs before, but I don't change pin states or the clock mode. When you launch the new program the clock is anyway set new. And sometimes it's better to not disable pins, they can have pullups set, or are in a smartpin mode, that is necessary. I think the application that starts the new program should decide which pins can be set to input, before it starts the new code.

Yeah, that might be best. In my case, the hardware is going to be fixed. I don't think that anyone will be adding anything to it. There might be a few pins that could be repurposed, though. Anyway, in the general case of launching game, it's likely that the hardware will be reused from game to game in the same way. However, it's possible that one might only use, for example, two out of the four screens in a game for two or if the programmer was out of resources (cogs or ram). Anyhow, that's something to think about. Thanks.

Inline PASM between ORG ... END gets copied to cog ram and then executed in the cog, so you don't need to start a cog just for the PASM part. This works also for FlexSpin.

That's one thing that I wanted to know about (or needed to check the documentation on) and maybe mentioned earlier (or intended to). So yes, I could do this from the same cog running the application, then. Thanks! Also, I might end up trying to move this code into the FAT32 object (for organization). But I'm a little unsure what symbols will be available there (away from the top-level object). I'm already confused on symbols as it is. Anyway, that's down the road (if at all). Thanks again for your input.

evanh · 2021-07-10 07:20

Debugger can protect itself: From docs:

Once initialized with debug ISR code, this upper hub RAM can be write-protected, in which case it is mapped only to $FC000..$FFFFF and it is only writable from within debug ISR's.

JRetSapDoog · 2021-07-10 07:25

@AJL said:
By eliminating the intervening code I hope that you can see what I’m seeing. You go through and systematically stop all cogs except 0, then change the clock and initialize cog 0. At that point there’s nothing left running this code, so no need for a final cogstop.

Thanks for your comment. I thought that I could just let the pasm cog fall off the edge and die. But I wasn't sure if doing that would "kill" the cog as completely as using cogstop. I like to beat a dead horse. BTW, actually, in this case, I launched this pasm code into cog 7 (the last cog) just for testing purposes. So cog 0 (the top-object Spin2 cog) ends up getting killed off before this cog kills itself (Perhaps this cog killing itself is like someone who's going to die naturally in a minute hastening his/her death by shooting him/herself just to be in control to the very end). But this cog arrangement could change later, and perhaps I'll use in-line pasm.

JRetSapDoog · 2021-07-10 07:39

@evanh said:
Huh, it was supposed to be 100% success. I might need to see the startup code of program after the COGINIT.

Trust me, you don't want to see my code. Not yet, anyway. But did you have anything specific in mind?

Debugger can protect itself:

Ah, that's right. I do recall reading that (now that you've reminded me). But then I wonder why the file booting sometimes works (50:50) when the debugger is running. Need to think about that and do more testing. Unfortunately, I have to go out pretty soon for several hours.

But anyway, I haven't finished digesting your comments from the post where you suggested that quick test. I'm still confused about how I should access the clkmode (and whether you think that the code that I tried was even using the real clkmode). But the test that you suggested was something that I (figured that I) could test quickly, so I haven't digested or responded to your other comments, not that you are at all obligated to help me resolve my confusion. I'll try to take a look at that now.

Oh, by "the startup code after the COGINIT," do you just mean in pasm? There's nothing but the two likely extraneous cogstop lines for the current cog, followed by 5 res lines and a fit 496 line. Or by "startup,' do you mean in the program being booted after all?

JRetSapDoog · 2021-07-10 07:52

Hmm, this is weird, I just commented out

                  'kill the current cog
                  cogid   tmp               'get the current cog's ID
                  cogstop tmp               'stop the current cog

while keeping the debugger enabled and I got 26 file boot failures in a row. But once I uncommented those two lines, things were back to 50:50 (as reported above).

Hmm, maybe I should try adding a repeat infinite loop after the Spin2 call to launch the pasm into cog 7. UPDATE: Tried that, no change.

JRetSapDoog · 2021-07-10 08:04

@evanh said:
The CLKMODE variable doesn't exist in pure Pasm builds. So no symbol is generated for it either. On the up side you can just make up any location for your temporary copy of the clock mode. It doesn't have to even be in hubRAM because when it comes to handover you've set the Prop2 back to RCFAST after all.

As for using "pure Pasm," I haven't done any testing yet. But I assume that if I did put my code in a program with no Spin2 object, that the complier would complain about my clkmode references (in their various incarnations).

When the silicon (or Spin2) manual talks about "running under Spin," it just means with a top-level (or at least higher level) Spin object, right? That is, it's not limited to in-line pasm, right (or pasm ran from Spin2 without using another cog (if there's another way to call pasm from Spin2, which I seem to recall that there is, similar to TSR's))? That is, it means pasm code in the same file as spin code, wherein a Spin2 interpreter has been invoked at runtime.

JRetSapDoog · 2021-07-10 08:08

@evanh said:
On the up side you can just make up any location for your temporary copy of the clock mode. It doesn't have to even be in hubRAM because when it comes to handover you've set the Prop2 back to RCFAST after all.

Need to think about this more. But I need to go out pretty soon and will be gone for at least 8 or 9 hours.

If anyone has anything that they want me to test, I'll be glad to do so once I get back (or tomorrow). I appreciate all the comments/help. --Jim

JRetSapDoog · 2021-07-10 08:20

But before needing to get ready to head out, I tested the original code again:

                  hubset  #$F0              'set 20 MHz+ (RCFAST) mode; PPPP=1111; CCSS=0000
                  hubset  ##clkmode_ & !3   'zero SS clock source bits (inhibits PLL/foreces RCFAST mode)
                  'hubset  #0
                  waitx   ##20_000_000/100   'wait ~10ms (50ns x 200k = 0.01 sec)

[1] It worked 13 times straight with the debugger enabled AND with the two cog stop lines after it still in place
[2] It failed 13 times straight with the debugger disabled AND the two cogstop lines after it still in place
[3] It failed 13 times straight with the debugger disabled AND the two cogstop lines commented out

In other words, currently, I need the two cogstop lines and the debugger running. Too hurried right now to think more about that, but just wanted to test that before leaving.

JRetSapDoog · 2021-07-10 08:33

The two cogstop lines being right after the coginit line.

                  'load program from beginning of hub into cog 0 and start it
                  coginit #0, #0            'start new program in cog 0  <--remove if_z after testing

                  'kill the current cog
                  cogid   tmp               'get the current cog's ID
                  cogstop tmp               'stop the current cog

Yes, if I comment them out, it doesn't work, or if the debugger is disabled, it doesn't work. or if they are commented out and the debugger is disabled, it doesn't work. That is, I presently need both the debugger enabled and those two cogstop lines (though I could use cogstop #7) in order for file booting to work. Maybe I've got something wrong elsewhere, but I'll have to look later. Thanks again for all the help. By the way, I certainly don't want to make running with the debugger enabled a requirement, lol.

JRetSapDoog · 2021-07-10 08:50

One last comment before leaving, when I said "failed to file boot," my counts were based on not seeing anything on the VGA screens. But when I come back, I'll try just file booting a simple program that blinks an LED on the edge and see if that works. Hmm...maybe I can squeeze in a quick test before leaving........

JRetSapDoog · 2021-07-10 08:59

Hmm, sorry folks. I must have something wrong elsewhere, because now even my Hello Blinky program to blink an LED on the Edge doesn't work, and that's WITH the debugger enabled AND the cogstop lines (which was working with a much, much larger program that used vga). So that's inconsistent behavior and I am probably missing something. I'm not good at thinking under time pressure, and I do need to leave now. Hopefully, I can resolve this later. Sorry if I wasted anyone's time.

msrobots · 2021-07-10 19:13

The way the debugger works is that chip adds a 'loader' before the normally compiled binary, containing the debugger code.

This code copies most of itself into the top HUB ram and then copies the rest of the binary back to HUB address 0 and then restarts COG 0 with debugging enabled and top HUB write protected.

So if the binary you saved to SD was created with debugging enabled, it will enable debugging after loading, if saved without debugging enabled that loader stub does not get added to the file you save onto the SD.

A binary created with debugging is slightly larger and contains in itself another routine doing the same as you do and moving the HUB content down, then starting a new COG (itself, COG 0) at address 0.

I think the best way for you to handle this is to load your binary (in Spin) from FAT into HUB, then switch (in SPIN) to RCFAST, then start your PASM COG to kill all other ones, copy the HUB content down and let coginit that COG itself with HUB 0.

That way SPIN is changing the clock not your PASM and you do not need to care about the details.

It also takes care of the different HUB location since setting the clock in SPIN works with PropTool and FlexProp, no worries there.

And - hmm - @evanh is right (again) that you even do not need a COG but should be able to use inline PASM.

Mike

JRetSapDoog · 2021-07-10 20:14

Yes, that's exactly what I've been testing with 4 different programs over the last hour: a small blinky program compiled with and with/out the debugger, and ran with and without it; and a large program that uses vga screens compiled with and without the debugger and ran with and without the debugger. I got predictable results for the larger program, but not with the smaller one, so I held off on posting. Still checking some things.

JRetSapDoog · 2021-07-10 20:34

I'm still not 100% sure for the small program, but for the large one, I got these success : failure results:

[1] Compiled w/o the debugger AND ran w/o the debugger: 13 : 0 <--- Works
[2] Compiled w/o the debugger AND ran with the debugger: 0 : 13, with garbage in the debug window
[3] Compiled with the debugger AND ran with the debugger: 13 : 0 <--- Works
[4] Compiled with the debugger AND ran w/o the debugger: 0 : 13

By the way, this was with the hubset #$F0; hubset ##clkmode_ & !3; waitx ##200_000 pasm sequence. Perhaps I need to go back and do the testing that evanh suggested. But one thing at a time.

JRetSapDoog · 2021-07-10 20:45

Mike, thanks so much for your explanation of how the loading and all works. That helps me to understand my test results better.

About using Spin2 to get into RCFAST, I may try that. That will slow down the assembly code when it runs though, won't it, but I'll probably hardly notice.

And I'll probably try (or switch to) using in-line pasm from whatever Spin2 object that starts the ball rolling (I think it might have been Ariba that answered a question about that).

Thanks again, Mike. Your post came just at the right time. I got back home a couple hours ago and lay down to think about this and two ideas occurred to me to test to try to shed light on the problem [1] make the Spin2 program MUCH bigger by declaring a big wasted array, and [2] saving two versions of the binary and running them with and without the debugger. Fortunately, I tested the latter idea first. I probably don't need to do the other test, but at the time I considered it, I had some doubts about whether this technique of loading the binary to the upper hub and then copying down to the lower hub was viable. It's now looking much better, but something unforeseen could still crop up. Anyway, your comment is GREATLY appreciated.

JRetSapDoog · 2021-07-10 21:21

For the pasm sequence that evanh asked me to test (hubset #$F0; hubset #0 ; waitx ##200_000), I got the following results:

[1] Compiled w/o debugger AND ran w/o debugger: 13 successes in a row
[2] Compiled w/o debugger AND ran with debugger: Not tested
[3] Compiled with debugger AND ran with debugger: 8 successes and 5 failures
[4] Compiled with the debugger AND ran w/o the debugger: Not tested

I didn't test the other two combinations as done above since those are not expected to work. However, the test run compiled for the debugger and ran with the debugger [3] seems a little concerning. I wonder what would happen if I stored the program compiled for the debugger in flash and ran it w/o the PC connected. I may try that next

Update: Just tried that and got 6 successes and 7 failures. That's with the first program in flash and calling the second program compiled for the debugger. The Prop Plug was attached to my board with a USB cable, but the cable was detached from the PC. I'm not really sure what the sense in that test was, but thought I'd give it a go, and the result appeared to be the same as when I was downloading the first program into the P2's ram and connected to the PC (with the debugger enabled). Anyhow, my preliminary conclusion (not that one really needs to be made) is that using the debugger probably changes the timing of things a bit to make things more likely to fail with my current code and waitx delay. I'll move on to other stuff now, perhaps in-line pasm (I've never used it).

Oh, by the way, for that last test, the first program was flashed while the debugger was enabled in the Prop Tool. I flashed the first program again with the debugger disabled and I got 13 failures in a row when trying to file boot to the second program. It's confusing, because there are two different programs involved: the first program and the file booted second program, and both can be created with and without the debugger. Anyway, time to move on I think.

AJL · 2021-07-11 00:39

@JRetSapDoog said:

@AJL said:
By eliminating the intervening code I hope that you can see what I’m seeing. You go through and systematically stop all cogs except 0, then change the clock and initialize cog 0. At that point there’s nothing left running this code, so no need for a final cogstop.

Thanks for your comment. I thought that I could just let the pasm cog fall off the edge and die. But I wasn't sure if doing that would "kill" the cog as completely as using cogstop. I like to beat a dead horse. BTW, actually, in this case, I launched this pasm code into cog 7 (the last cog) just for testing purposes. So cog 0 (the top-object Spin2 cog) ends up getting killed off before this cog kills itself (Perhaps this cog killing itself is like someone who's going to die naturally in a minute hastening his/her death by shooting him/herself just to be in control to the very end). But this cog arrangement could change later, and perhaps I'll use in-line pasm.

I misread the conditional test that prevented self termination of the current cog. So my comment only applies if you ran this code in cog 0.

JRetSapDoog · 2021-07-11 14:43

@AJL, thanks for that clarification. That was nice of you to follow-up. I didn't completely understand what you were saying was wrong in your original comment, so I kind of danced around it in my response and tried to cover all the bases. But now everything makes sense. By the way, in my latest version of things, I've switched to using in-line pasm, and I've just assumed that the spin and in-line pasm is running in cog 0, at least for now, for simplicity. That will have to change, of course, if I move this functionality to an object running in another cog (which does seem likely, I'll admit). But that's easily changed later. Thanks again for your clarification. And in the new version, I don't even use cogstop on the spin/pasm (cog 0) cog since the coginit call (from in-line pasm) will launch code from the hub (from the new program to be run) on top of itself (in cog 0), just like you were saying. So we've come full circle.

JRetSapDoog · 2021-07-11 14:56

For reference, below is the in-line pasm that I'm using now, with one spin statement before it to hopefully get the clock into RCFAST mode using the safe (glitch free) clkset() method. Regarding doing that, I didn't see a constant for RCFAST that works in spin (w/o declaring one myself), so I just used $F0 for new_clkmode. And as for new_clkfreq, it's a required parameter for clkset(), so I just gave it 20_000_000. Anyway, here's the code (which as mentioned in the post before this one, just assumed that this code is running in cog 0 for now (can change it back later).

  clkset($F0, 20_000_000)  'safely switch to RCFAST mode at 20MHz before launching program from SD card

  'At this point, the .BIX file to be switched to is already in the upper half of the hub
  'Use in-line pasm to copy upper half of hub to lower half (avoids clobbering the bytecodes)
  'Then clear the upper half (just in case) and launch the .BIX program (on top of this cog = cog 0)          '
  org
                 'copy upper half of hub (262,144 bytes = 65536 longs) to lower half of hub
                  mov     ptra, ##262_144   'load beginning address of second half of hub into ptra
                  mov     ptrb, #0          'load beginning address of first half of hub into ptrb
                  rep     #2,   ##65536     'execture the read-and-write instruction pair 65536 times
                  rdlong  tmp,  ptra++         'read four bytes from upper half of hub into tmp
                  wrlong  tmp,  ptrb++         'copy four bytes from tmp into lower half of hub

                  'clear the upper half of hub (now that it has been copied to lower half)
                  setq    ##65_536-1        'use setq to do a "fast block" clear for 65536 longs
                  wrlong  #0,   ##262_144   'zero out the upper half of the hub a long at a time

                 'load program from beginning of hub into cog 0 and start it
                  coginit #0, #0            'start new program in cog 0 (the cog for the current method)

    tmp    res    1    'temporary to hold four bytes to transfer at a time
  end

Update: I reverted back to the 2nd version after the error with the 3rd version was pointed out.

I don't know if the clkset() line is right or sufficient, but it seems to work. I've file booted a couple of other programs fine several times, one at 250MHz and one at 160MHz, with the program doing the launching having a 250MHz clock in both cases. By the way, for those tests, I didn't use the debugger in either the launcher or the launchie. So I think I've ended up doing what Mike suggested, and learned some stuff along the way, thanks to you folks and all of your responses. --Jim

PS.: If you think that the new program will need the old program's clkmode, let me know, and I'll change the $F0 value to include that, keeping the SS=00 part. But so far, I don't know why the new program would need that (as I would think that the new program can do whatever it needs to do at 20MHz just fine, just like in a normal boot situation, though maybe I'm wrong).

PPS: I guess I could use the default org of $000 and just let my value to be transferred clobber the instruction to load ptra, but it wouldn't save any memory and isn't reader friendly. Anyway, I originally had a tmp register reserved for this at the end, but it looked kind of ugly in terms of the formatting, so I got rid of it (though I wouldn't worry about that if there were other "variables.")

evanh · 2021-07-11 18:47

Oh, don't put the data location at $0 like that! Whatever is at address zero gets executed first. A null is a NOP but you can't be sure you'll always get a null. Much better to have the data after the COGINIT instead.

HUBSET #$F0 should not have any effect on stability in any position. If it does, you've got other problems still. All cases of it should be removed from production code.

evanh · 2021-07-11 18:49

@JRetSapDoog said:
PPS: I guess I could use the default org of $000 and just let my value to be transferred clobber the instruction to load ptra, but it wouldn't save any memory and isn't reader friendly. Anyway, I originally had a tmp register reserved for this at the end, but it looked kind of ugly in terms of the formatting, so I got rid of it (though I wouldn't worry about that if there were other "variables.")

Yes, go back to the tmp register on the end.

One advice problem I have is the lack of knowledge of what the sequence of testing is. And Even how a fail is determined. I may not have read everything you've posted either.

msrobots · 2021-07-11 21:13

Since you are setting RCFAST in Spin you can delete all the clocksetting in pasm

NOT needed

                  'Use RCFAST before doing "handover" to new (file booted) program via a coginit
                  'hubset ##%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS 'set clock mode
                  hubset  #$F0              'set 20 MHz+ (RCFAST) mode; PPPP=1111; CCSS=0000
                  hubset  ##clkmode_ & !3   'zero SS clock source bits (inhibits PLL/foreces RCFAST mode)
                  waitx   ##20_000_000/100  'wait ~10ms (50ns x 200k = 0.01 sec)

Mike

msrobots · 2021-07-11 21:23

And you really need to

-before copying

stall interrupts
kill all other COGs except yourself.

-after copying

check if you are COG 0
if yes just launch yourself with coginit
else coginit COG0 and kill yourself

Mike

Wuerfel_21 · 2021-07-11 21:34

@msrobots said:
check if you are COG 0
if yes just launch yourself with coginit
else coginit COG0 and kill yourself

Mike

Check is unneccessary, since a cog coginiting itself will inherently prevent it from cogstopping itself afterwards.

JRetSapDoog · 2021-07-12 00:39

Thanks, evanh. I'll go back to using a variable at the end (which is more readable anyway). When I used $F0, my thinking was that the instruction in that register would only be ran once, so I could reuse it after ptra was initialized. As for org 1, in addition to loading the pasm starting at that location, I wrongly assumed, apparently, that it would cause the in-line pasm to start execution at 1 (skipping over reg 0).

Thanks, mike. I totally thought that I'd removed those three lines already when I deleted some others. Thanks for pointing that out. I definitely don't want those in there with the clkset() cal. Sorry about that. As for stalling interrupts, I'll look into that; that hadn't occurred to me.

Anyway, this seems to be working fine. I'll do a couple more tests just to be sure before moving on.

msrobots · 2021-07-12 01:30

@Wuerfel_21 said:

@msrobots said:
check if you are COG 0
if yes just launch yourself with coginit
else coginit COG0 and kill yourself

Mike

Check is unneccessary, since a cog coginiting itself will inherently prevent it from cogstopping itself afterwards.

Yes,

but a coginint takes a moment, not sure if a coginint #0 followed by cogstop #0 will make it in COG 0 - pipelining?

And the code might not even be running in COG 0.

Anyways it is a shortcut, basically the Block-Drive of Fat32 should do this like the P1 one does, following a (spin) given sector-array loaded into the COG before execution of loading the HUB with those given Sector numbers.

But if loading and starting from upper HUB works, it will fit into the SPI block driver cog too.

Enjoy!

Mike

Basic Q's about the Spin2 Interpreter Size, Content and Loading

Comments