where is (PASM) coginit executed?

ags · 2013-07-16 13:29

Back to grasping at straws debugging again. I have an unreliable bootloader (which was once reliable). It functions by stopping all (other) cogs, loading RAM from an EEPROM, then calling coginit (launching the SPIN interpreter in cog0) and finally terminating the cog that does all the bootloading. So the sequence is:

'stop all cogs other than "me"
'load program from EEPROM into hub RAM
'do stuff to setup clock correctly, letting the PLL and Oscillator settle (20ms delay @20MHz RCFAST clock mode)

coginit    interpreter        'interpreter is $0001 << 18 | $3C01 << 4
cogid      tmp
cogstop  tmp

Documentation shows that cogid, cogstop, and coginit all take 8..23 clocks - typical for a hub instruction. I can see how cogid and cogstop will complete in 8..23 clocks, but not coginit. 512 longs need to be loaded into a cog, at least. So that leads me to the question:

Where is all the work being done to complete the coginit instruction? In the code snippet above, you see that as soon as coginit returns, I stop the cog in which the instruction was executed. By this time I've already stopped every other cog. If the "target" cog (the one that is being launched by the coginit instruction, not the one in which the coginit instruction was executed) is responsible to do all the work to load itself then I'm OK. If it's the cog in which coginit was executed then I'm in trouble, because I stop that cog immediately after execution of the coginit instruction. That doesn't seem like it would ever work so I doubt it.

I'm wondering if ROM contains the SPIN interpreter and *two* bootloaders: one to load RAM from EEPROM or a host, and another to load all 512 cog longs and then begin executing at $0. It could load the SPIN interpreter from ROM and execute it just as well as loading PASM code from RAM and executing it.

But the fundamental (if not now esoteric and not really demanding full understanding to get the job done - but interesting nonetheless) question remains: how is the loading of cog memory from hub RAM/ROM accomplished if no cog is running yet?

BTW, thanks to Mike Green for sharing how to load the SPIN interpreter - years ago.

Dave Hein · 2013-07-16 14:02

When you execute coginit it will stop the cog if it is running, and then load the cog memory from hub memory using a hardware loader. If your program was running in the same cog it will stop when the coginit is executed, and the cogid and cogstop instructions will not be executed. Once the cog memory is loaded it will begin execution at location zero.

ags · 2013-07-16 14:21

Dave - yes, you are correct and I went through that thought exercise and even have a comment for myself to remind me of that. That' a bit different than what I'm asking. Let's assume the "me" cog is not the cog that will be initialized with the coginit instruction. I've stopped all cogs other than "me". If the target cog isn't running yet, what actual hardware is loading the cog RAM from the specified destination? If it's the target cog itself, how were the instructions it is executing to accomplish that loaded? Is there a special block somewhere that exists to do just that, before the cog is able to run? (Is this what you call the "hardware loader"? - and if so, is there just one, or one per cog? Is it the same or different from the mechanism at startup that searches for a host or EEPROM to determine what to load?)

The complexity came when I wondered if the "me" cog had any part in the process of loading the target cog. If so, killing it before the process is complete would be a problem. But the more I think of that, it just doesn't make sense. One cog can't do more than one thing at a time. When the coginit instruction is complete, it's done. Yet I'm still curious about the "hardware loader".

Mike Green · 2013-07-16 14:26

I assume that there's a state machine in the cog that's set up by COGINIT using an internal cog register and the program counter. Each hub access window, it does the equivalent of a RDLONG, then increments the internal register and the program counter. When the program counter wraps around to zero, the COGINIT logic clears the cog's registers and starts executing normally from cog RAM at location 0. I'm sure there are some details I've gotten wrong, but there's no detailed description of this process. The cog registers might be cleared at a different time and the memory accesses probably don't involve an actual byte address, but a long address with the least significant 2 bits dropped.

ags · 2013-07-16 14:45

Now *that* makes sense. I hope it's true...
That also implies there is separate machinery that at initial boot time (after coming out of reset) that is responsible for polling to find a host or EEPROM. After some checks/handshakes, that will either do nothing or loading RAM then initiating the cog process you outline above to load a SPIN interpreter instance.

Ariba · 2013-07-16 15:30

ags wrote: »

---
That also implies there is separate machinery that at initial boot time (after coming out of reset) that is responsible for polling to find a host or EEPROM. After some checks/handshakes, that will either do nothing or loading RAM then initiating the cog process you outline above to load a SPIN interpreter instance.

This is handled in the bootloader software. After a Reset the bootloader is loaded to cog 0 from ROM. That is all that the hardware does.
All further tests for serial and EEPROM are done per software. The booter code is encrypted in ROM, but was later released by Chip in Spin source form.

Andy

Mike Green · 2013-07-16 15:33

When the chip comes out of reset, there's a program in ROM that starts up (I think with an implicit COGINIT), checks for a PC polling for a Propeller, then checks for an I2C EEPROM. If either is present, there's an appropriate loader included in the ROM. If there's a PC, it can send commands to write the RAM to EEPROM after downloading and/or run the Spin program in RAM when downloading is done. This program has been posted (I think it's called PNUT) for your perusal.

ags · 2013-07-16 17:34

OK, now this makes sense. Summarizing, (probably not 100% correct but hitting the major points) the only thing the hardware can do without any code is the same functionality used by COGINIT. That is, given an address in hub RAM, load the 512 longs starting at that location into cog RAM, then begin execution at cog RAM location $00. It may be an implicit COGINIT after reset that loads the bootloader from ROM which used to poll for a host or EEPROM. This same mechanism is also used to load the SPIN interpreter from ROM.

In any case, this isn't where my defect lies... but it was good to learn how it works. Thanks.

tonyp12 · 2013-07-16 18:42

How does the very first coginit get started?
Is it that after a reset, a long value will be at put in address location 0 in cog0
This long value represents: cognew (@bootcode,0)
Or does HUB have a simple state machine, that moves 512longs and no cog have to run for this to happen?

Alexander (Sandy) Hapgood · 2013-07-16 21:06

The manual states "... and then loads and runs the built-in Boot Loader program in the first processor (Cog 0)". That would suggest some sort of block move functionality in the hub. I know it's possible to load PASM code into all eight cogs using the coginit procedure. You use coginit to load cogs 1 through 7 first and then load cog 0 last. That gives you eight cogs running PASM code.

Which came first? The chicken or the egg? The hub or the cog? I think there's a little more to the hub then just memory..

Sandy

Only 496 longs get transfered, not 512. The last 16 locations are cleared to zero, at least that's what the manual states. If that's the case, how does the PAR register at $1F0 get set?

ericball · 2013-07-17 09:12

Two item of notes: For the cog invoking the coginit the instruction only takes one HUB cycle (i.e. 8 - 23 cycles) before the next instruction is executed. Loading the target cog takes ~8192 cycles, but during this time the invoking cog can be doing anything it wants. Therefore care must be taken if the invoking cog changes the HUB RAM containing the code loaded into the target before 8192 cycles have elapsed.

ags · 2013-07-17 09:12

The PAR value which is loaded into cog RAM address $1F0 is an argument to coginit, that's where the value is specified. From what I learned from Mike Green a while ago, when the SPIN interpreter is loaded, the PAR value is specified as $01 (not $00 - not sure what is there, perhaps a magic number the SPIN interpreter checks) and the hub memory address to begin loading at cog RAM location $00 is $3C01.

ags · 2013-07-17 09:22

ericball wrote: »

Therefore care must be taken if the invoking cog changes the HUB RAM containing the code loaded into the target before 8192 cycles have elapsed.

Agreed. And that is precisely how I ended up on this subject. My previous implementation of a bootloader was starting a cog that ran PASM code; that cog would load hub RAM with contents from an EEPROM; when that was finished, it would launch cog0 as is done during a normal startup sequence. That is, load the SPIN interpreter and run the code in hub RAM. Back in the "initiating cog" - the one running the SPIN code that launched the cog to run the PASM bootloader code - once the bootloader cog was started, it would stop all cogs other than itself and the bootloader cog. Then it would stop it's own cog. I realized that although unlikely, if the SPIN code that was executed to stop all cogs modified any hub RAM that had already been loaded (in parallel) by the bootloader cog, I would end up with problems. So I moved the code to stop all other cogs from the initiating SPIN code into the bootloader PASM code. Now the first thing I do in the bootloader is stop all other cogs, then load hub RAM, then launch cog to run the SPIN interpreter and execute the program in hub RAM.

Unfortunately, that wasn't the cause of my problem - but it is a more robust implementation.

Dave Hein · 2013-07-17 09:29

The $01 and $3C01 are 14-bit versions of the PAR and code addresses. The values used with coginit are long addresses with the 2 LSBs missing, since these are always zero. So the actual addresses are $04 and $F004. $F004 is the location of the beginning of the Spin interpreter that's stored in ROM.

The first thing the interpreter does is to copy the values of PBASE, VBASE, DBASE, PCURR and DCURR from the header in hub RAM to registers in cog RAM. The PBASE value is stored at location $06 in hub RAM, but the copy loop in the interpreter adds a value of 2 to the hub address before it reads it. This is why PAR is set to $04 instead of $06.

Dave Hein · 2013-07-17 09:46

ags, you are loading a binary Spin file from an SD card and starting it in cog 0, correct? I also do this in spinix, but I use a Spin program instead of PASM. The Spin loader program runs in the last 512 bytes at the high end of hub RAM. It shuts down all of the cogs except for its own cog and the cog running the SPI driver that talks to the SD card. It also does a LOCKCLR and LOCKRET on all the locks. Are you doing that?

The spinix loader then reads the binary file from the SD card into hub RAM starting at location 0. It assumes that all of the sectors are contiguous in the file, which is valid for cluster size of 32K, but could be a problem if the SD card uses 16K clusters. So spinix requires a cluster size of 32K to ensure correct operation. Is it possible that your loader has the same restriction?

After the spinix loader reads the binary file into memory it issues a stop command to the SD card, and then stops the SPI driver cog. BTW, you need to make sure that your SPI mailbox is at the high end of memory so it doesn't get overwritten by the binary file. Maybe that's your problem.

ags · 2013-07-17 11:26

Dave Hein wrote: »

The $01 and $3C01 are 14-bit versions of the PAR and code addresses. The values used with coginit are long addresses with the 2 LSBs missing, since these are always zero. So the actual addresses are $04 and $F004. $F004 is the location of the beginning of the Spin interpreter that's stored in ROM.

The first thing the interpreter does is to copy the values of PBASE, VBASE, DBASE, PCURR and DCURR from the header in hub RAM to registers in cog RAM. The PBASE value is stored at location $06 in hub RAM, but the copy loop in the interpreter adds a value of 2 to the hub address before it reads it. This is why PAR is set to $04 instead of $06.

Yes, I recall that (14-bit address for PAR and code address) and thinking this though before. I should have made that clear (and made an outright mistake in stating that the PAR address was $01). With PAR being hub RAM location $04, I think of that as the "base mailbox address". Adding an offset to that to get to other parameters is not unusual. The question I had was why not use hub RAM location $00 as the base? I presume something else (important) is stored at that location.

Dave Hein · 2013-07-17 11:29

CLKFREQ is stored at location 0.

ags · 2013-07-17 11:30

Dave Hein wrote: »

ags, you are loading a binary Spin file from an SD card and starting it in cog 0, correct? I also do this in spinix, but I use a Spin program instead of PASM. The Spin loader program runs in the last 512 bytes at the high end of hub RAM. It shuts down all of the cogs except for its own cog and the cog running the SPI driver that talks to the SD card. It also does a LOCKCLR and LOCKRET on all the locks. Are you doing that?

The spinix loader then reads the binary file from the SD card into hub RAM starting at location 0. It assumes that all of the sectors are contiguous in the file, which is valid for cluster size of 32K, but could be a problem if the SD card uses 16K clusters. So spinix requires a cluster size of 32K to ensure correct operation. Is it possible that your loader has the same restriction?

After the spinix loader reads the binary file into memory it issues a stop command to the SD card, and then stops the SPI driver cog. BTW, you need to make sure that your SPI mailbox is at the high end of memory so it doesn't get overwritten by the binary file. Maybe that's your problem.

Dave Hein is a genius. You have pointed me at what I am (almost) certain is the problem. I don't release any locks after stopping all the cogs. Now that you mention it, that's an obvious thing that must be done. (I checked the manual to see if there was any indication that locks are cleared, but see they can't be as they are a shared resource (like hub RAM) so can't be modified by any one cog being stopped. The only thing that clears them is a reset). This explains why I am able to run my bootloader four times, then it fails. I'm running out of locks and in a loop waiting for one to become available (which will never happen). Thanks Dave!

Now I have to figure out why I'm consuming 2x the number of locks I expect. I should be hanging on the 8th bootloader call, not the 4th.

As to the spinix bootloader, what's the advantage of having it in SPIN instead of PASM?

Dave Hein · 2013-07-17 11:50

The only advantage to Spin over PASM is that it's easier for me to program. One drawback is that the binary file has to be less than 31.5K in size, but that's not a problem in practice.

One thing that I forgot to mention is that you need to clear the VAR memory. Some programs may depend on their VAR variables being initialized to zero.

ags · 2013-07-17 12:42

Yup - I clear all VAR memory and add the stack marker as well. I'm pretty sure the problem is not returning/clearing locks when I stop all the cogs.

ags · 2013-07-17 22:01

As suspected, the problem with my bootloader was I failed to check-in and clear all locks before launching the SPIN interpreter. I could have ignored this as a "nuisance issue" - but experience has shown me that it would have been even more difficult to diagnose and repair months later when it became a critical problem. Thanks to Dave Hein for the clue that led me to the problem.

Lesson: stopping all cogs does not result in the same state as resetting the Prop.

where is (PASM) coginit executed?

Comments