What's so special about 180Mhz?

RossH · 2022-04-27 04:29

I have some Catalina software that appears to run reliably on the P2 at any clock speed other than 180Mhz.

I have tried 90, 100, 160, 175, 200, 220, 240, 270 & 300 Mhz. All work fine.

I have tried 179 and 181 Mhz (as closely as these values can be achieved with the clock generator). Both work fine.

But the code fails quite reliably when the P2 is run at 180Mhz. There are different clock configurations that can be used to achieve 180Mhz, and I believe I have tried them all. They all fail.

The software does not (knowingly) use or rely on any specific timing. It does not refer to or use the value 180000000 (or its binary equivalent) anywhere. It does not use any external devices other than the SD card to load the program and data, and the serial interface to output results. I have been able to prove that the program loads and begins execution correctly - the failure occurs some time later, once all the cogs are loaded and executing. And the program reports the failure using the (still working) serial interface.

The process of demonstrating the failure is complex, so I cannot yet post a simple example. Everything I do to simplify it also makes the problem go away. But it does not solve it - it always comes back again sometime later.

There is no reason I can think of that the clock speed would make any difference to the program execution. Catalina uses 180Mhz as the default for the P2 because that is the recommended maximum clock speed - but I have begun wondering if there is something "special" about this clock speed? Something built into either the P2 itself or into the P2 boot code?

Anyone have any ideas? As a last resort, I will simply make the default clock speed something else in future Catalina releases, and warn users about using 180Mhz. I am thinking of making the default 200Mhz, but this is beyond the recommended maximum, so I am not sure this would be a good choice. I am open to suggestions.

Ross.

jmg · 2022-04-27 05:01

@RossH said: ... It does not use any external devices other than the SD card to load the program and data, and the serial interface to output results. I have been able to prove that the program loads and begins execution correctly - the failure occurs some time later, once all the cogs are loaded and executing. And the program reports the failure using the (still working) serial interface.

So you are talking about partial failure, and then only after some time ? (enough time to warm up ?)
If you actively cool it, does the problem go away ?

If the COM still works, it's unlikely the PLL has changed, but you could config smart pin toggle of pin from each cog, and maybe actively/slowly change DUTY cycle from SW in each COG as a simple watchdog, and see how far the failure propagates ?

The P2 does not 'know' what exact MHz it runs at, but there are some pin pathways that need multiple sysclks, maybe you have fluked on a pathway that bumps at 180MHz,
you can probably get to within ~500kHz or so of 180MHz, for more tests, as I think the PLL will lock below PFD of 1MHz but with increasing jitter.

If it is a pathway-delay issue, then I'd expect 179.5MHz or 180.5MHz to have similar fail modes, just at slightly different temperatures.

evanh · 2022-04-27 05:12

I'd be looking at SD reliability. Or more specifically, the prop2 driver code to access SD cards. SD cards in SPI mode are only really good for [up to] 25 MHz SPI clock. And, separately, the Prop2's internal I/O stages can make it tricky to manage fast SPI in a robust manner.

RossH · 2022-04-27 06:16

@jmg said:

@RossH said: ... It does not use any external devices other than the SD card to load the program and data, and the serial interface to output results. I have been able to prove that the program loads and begins execution correctly - the failure occurs some time later, once all the cogs are loaded and executing. And the program reports the failure using the (still working) serial interface.

So you are talking about partial failure, and then only after some time ? (enough time to warm up ?)
If you actively cool it, does the problem go away ?

If the COM still works, it's unlikely the PLL has changed, but you could config smart pin toggle of pin from each cog, and maybe actively/slowly change DUTY cycle from SW in each COG as a simple watchdog, and see how far the failure propagates ?

The P2 does not 'know' what exact MHz it runs at, but there are some pin pathways that need multiple sysclks, maybe you have fluked on a pathway that bumps at 180MHz,
you can probably get to within ~500kHz or so of 180MHz, for more tests, as I think the PLL will lock below PFD of 1MHz but with increasing jitter.

If it is a pathway-delay issue, then I'd expect 179.5MHz or 180.5MHz to have similar fail modes, just at slightly different temperatures.

It is not temperature related. But you raised a good point, and it turns out that it does not need to be exactly 180Mhz to fail. Values close to 180Mhz fail as well.

For example:

XDIV 19, MULT 170, DIVP 1 - i.e. 178,947,270 Hz - works

XDIV 19, MULT 172, DIVP 1 - i.e.181,052,532 Hz - works

But XDIV 19, MULT 171, DIVP 1 - i.e. 179,999,901 Hz - fails!

I would need to do more experiments to find precisely the range of frequencies around 180Mhz that fails, but now that we know that 180Mhz is not some "magic number" (e.g. a number buried in the boot software somewhere) I'm not sure that would tell us anything more.

pik33 · 2022-04-27 06:16

There are a lot of ways to achieve 180 MHz in a P2, as it is 9*20 MHz, so 1/9, 2/18, 3/27, 4/36... not including a third value that can be set to something which is not 1

Maybe try to manually do hubset with other than default settings and check if it still fails. Also, check 360 MHz.

RossH · 2022-04-27 06:19

@evanh said:
I'd be looking at SD reliability. Or more specifically, the prop2 driver code to access SD cards. SD cards in SPI mode are only really good for [up to] 25 MHz SPI clock. And, separately, the Prop2's internal I/O stages can make it tricky to manage fast SPI in a robust manner.

I thought this too, given how many problems the P2 has with SD cards. But I have verified that the code and data are being loaded from the SD card ok. Something goes wrong after that

RossH · 2022-04-27 07:22

@pik33 said:
There are a lot of ways to achieve 180 MHz in a P2, as it is 9*20 MHz, so 1/9, 2/18, 3/27, 4/36... not including a third value that can be set to something which is not 1

Maybe try to manually do hubset with other than default settings and check if it still fails. Also, check 360 MHz.

I have tried all the ways I can think of to make 180Mhz. They all fail.

I tried 360Mhz and it also failed, but with a different failure - clearly this clock speed is just beyond my particular Propeller.

jmg · 2022-04-27 08:22

@RossH said:

I thought this too, given how many problems the P2 has with SD cards. But I have verified that the code and data are being loaded from the SD card ok. Something goes wrong after that

How long is 'after', and can you capture and report that time to fail, per COG running ?
Does heating or cooling P2 change the time to fail ?

evanh · 2022-04-27 08:56

If the code is multitasking then maybe you're looking at a race condition or ilk. A pointer being corrupted at a particular beat maybe. Does adding [inline] debug prints impact it?

RossH · 2022-04-27 09:14

@jmg said:

@RossH said:

I thought this too, given how many problems the P2 has with SD cards. But I have verified that the code and data are being loaded from the SD card ok. Something goes wrong after that

How long is 'after', and can you capture and report that time to fail, per COG running ?
Does heating or cooling P2 change the time to fail ?

Time to run, and/or temperature, have no effect. But the failure, once it occurs in a program, is very repeatable.

The only thing that seems to make a difference is the address in Hub RAM of individual instructions being executed, which is why it is so hard to simplify the failing program, or add debug code. Moving even a single instruction by as much as a single long is generally enough to make the program work (I do this by inserting a NOP in the code, then moving it before or after individual blocks of code - even in some cases single instructions. Needless to say, this process can take many hours, and in some cases just inserting a NOP anywhere is enough to make the code work!).

I have verified that the Hub RAM is not being corrupted. I am tempted to think that it is an issue with cog access to Hub RAM at certain clock frequencies, but I cannot prove that, and it is (of course) much more likely to be a software bug of mine - I just can't track it down and keep running out of ideas of what to try next

Ross.

RossH · 2022-04-27 09:17

@evanh said:
If the code is multitasking then maybe you're looking at a race condition or ilk. A pointer being corrupted at a particular beat maybe. Does adding [inline] debug prints impact it?

This code is not multitasking. It is single-threaded, apart from device drivers (e.g. SD card and serial plugins) running in various cogs.

ManAtWork · 2022-04-27 09:24

@RossH said:
The software does not (knowingly) use or rely on any specific timing. It does not refer to or use the value 180000000 (or its binary equivalent) anywhere. It does not use any external devices other than the SD card to load the program and data, and the serial interface to output results. I have been able to prove that the program loads and begins execution correctly - the failure occurs some time later, once all the cogs are loaded and executing. And the program reports the failure using the (still working) serial interface.

If the program does not use any external devices or rely on any external input signals it should run completely synchronous even if more than one cog is running. If the frequency value is not used as number anywhere in the code (which I don't believe) then the only variable input that can change the behaviour is the timing of the SD card. Starting out of a hard reset all counter values should read the same independent of the frequency.

But you have to use the clock frequency number to calculate the serial baud rate.

So how can you tell it fails? Does it compute false results? Does one or more cogs stop executing? Does it crash and corrupt memory?

evanh · 2022-04-27 09:29

@RossH said:
... or add debug code. Moving even a single instruction by as much as a single long is generally enough to make the program work (I do this by inserting a NOP in the code, then moving it before or after individual blocks of code - even in some cases single instructions.

I've had similar but don't think it was frequency dependant. Rather it was hub phase dependant, thus changing the jump addresses (code shifted in recompile) affected it. One case was to do with use of smartpin paced I/O - Particularly notable when using a pulse smartpin for generating SPI like clock pulses in sync with streamer data, ie: Hyperbus testing.

RossH · 2022-04-27 10:01

@ManAtWork said:

@RossH said:
The software does not (knowingly) use or rely on any specific timing. It does not refer to or use the value 180000000 (or its binary equivalent) anywhere. It does not use any external devices other than the SD card to load the program and data, and the serial interface to output results. I have been able to prove that the program loads and begins execution correctly - the failure occurs some time later, once all the cogs are loaded and executing. And the program reports the failure using the (still working) serial interface.

If the program does not use any external devices or rely on any external input signals it should run completely synchronous even if more than one cog is running. If the frequency value is not used as number anywhere in the code (which I don't believe) then the only variable input that can change the behaviour is the timing of the SD card. Starting out of a hard reset all counter values should read the same independent of the frequency.

But you have to use the clock frequency number to calculate the serial baud rate.

So how can you tell it fails? Does it compute false results? Does one or more cogs stop executing? Does it crash and corrupt memory?

The program itself tells me it fails. It fails its own data integrity checks. But only at this specific clock frequency, and the failure is so obscure that any attempt to dig deeper into the failure usually result in the program actually working.

It is very frustrating

evanh · 2022-04-27 11:21

@evanh said:
I've had similar but don't think it was frequency dependant. Rather it was hub phase dependant, thus changing the jump addresses (code shifted in recompile) affected it. One case was to do with use of smartpin paced I/O - Particularly notable when using a pulse smartpin for generating SPI like clock pulses in sync with streamer data, ie: Hyperbus testing.

BTW: The fix was to always reset, using a DIRL/DIRH combo, the smartpin before starting the next series of pulses. The streamer has XINIT for the same effect.

ManAtWork · 2022-04-27 11:30

What happens if you fool the P2 by connecting a different crystal? I'd test two cases:
1. Let the software think it runs at 180MHz but connect a, say, 16MHz instead of a 20MHz crystal so it actually runs at 144MHz.
2. Do the opposite and set the PLL to 45/4 so that the P2 runs at 180MHz but the software thinks it's 225MHz.

With that test you can find out if the timing causes the failure or the numbers. I suspect it is some sort of invalid pointer problem and you are reading memory areas of your code or lower address space where the sysclock frequency is stored when you think you're actually accessing data.

ke4pjw · 2022-04-27 14:02

Something to be aware of while hammering HubRAM is, an adjacent cog can simultaneously access an adjacent hub address, or any multiple of eight offset from the adjacent address.

If you are using a HubRAM scheme based on the assumption that only one COG can access the HUB at a time, like the P1, that assumption is wrong. P2 uses the egg beater. Every cog gets a byte at the HUB on every clock.

ManAtWork · 2022-04-27 16:46

Yes that's true. But it doesn't cause any conflicts. A simultanous write cannot corrupt data that is read from the same address because 8 accesses can happen simultanously but not to the same address at the same time.

Again, any program that does not wait for or depend on external signals, no matter how large and no matter how many cogs it runs on, should execute in a perfectly deterministic way. It should produce the exact same results each time it is run independent of the frequency, of course assuming that the clock frequency is below the max reliable speed.

Waiting for the SD card can disturb the perfect determinism. But waiting for a specific CT counter value after booting and before executing the rest of the program should restore it.

rogloh · 2022-04-27 18:40

@RossH said:
The program itself tells me it fails. It fails its own data integrity checks. But only at this specific clock frequency, and the failure is so obscure that any attempt to dig deeper into the failure usually result in the program actually working.

It is very frustrating

Sounds nasty. If you have another P2 it might be worth running your test on that too. If that one does work at 180MHz it could indicate a HW vs SW bug, or at least some sort of timing variation amongst different P2s or boards.

jmg · 2022-04-27 21:45

@RossH said:

Time to run, and/or temperature, have no effect. But the failure, once it occurs in a program, is very repeatable.
The only thing that seems to make a difference is the address in Hub RAM of individual instructions being executed, which is why it is so hard to simplify the failing program, or add debug code.

Maybe I read too much into your 'Something goes wrong after that' ? - I presumed that meant it ran ok for a while.
Are you saying the failure is immediate from reset-exit, for that particular build ?
Do failures favour any COG in particular ?

Temperature and MHz are very closely linked on any MCU, but maybe it is the temperature during SD load that matters, not run time temperature ?
Are you able to load the same image from Flash, and improve the load integrity check to be bulletproof ? (more than a single checksum)

SD certainly has question marks, but MHz-code-align? issues are new, so maybe entirely remove SD from the equation for a while ?

RossH · 2022-04-27 23:51

Some good suggestions here. Some of them I cannot easily do because I don't have the hardware, and others would require too many changes to the code, which usually results in the problem disappearing, but I think I can try these ones just by patching a few instructions during the startup, after the program has been loaded but before it actually begins executing ...

@ManAtWork said:
Waiting for the SD card can disturb the perfect determinism. But waiting for a specific CT counter value after booting and before executing the rest of the program should restore it.

@jmg said:
Do failures favour any COG in particular ?

If the issue is related to a specific timing of cog/hub interactions, either of these might either make the problem vanish, or else occur in other instances that currently appear to work fine.

jmg · 2022-04-28 00:34

@jmg said:
Are you able to load the same image from Flash, and improve the load integrity check to be bulletproof ? (more than a single checksum)

Another test idea would be to use Serial load, in 2 modes - one that does a full code load and run, and another minimal version that is smallest legal Prop_Txt string, that becomes a RST-run-from-already-loaded-RAM command.
Shifting the temperature across those, could indicate if memory was being corrupted at the critical MHz ?

RossH · 2022-04-28 01:03

@jmg said:

@jmg said:
Are you able to load the same image from Flash, and improve the load integrity check to be bulletproof ? (more than a single checksum)

Another test idea would be to use Serial load, in 2 modes - one that does a full code load and run, and another minimal version that is smallest legal Prop_Txt string, that becomes a RST-run-from-already-loaded-RAM command.
Shifting the temperature across those, could indicate if memory was being corrupted at the critical MHz ?

The problem with suggestions like these - and I know because I have tried similarly invasive things previously - is that you do a massive amount of work, and then find that your new program does not exhibit the same problem as the old one.

The only way I have found to tackle it so far is to find a program that fails, then "tweak" it as minimally as possible until it no longer fails. For example, in the program I am currently working with, I have reduced this down to moving the location of a single instruction by a single long, along the lines of:

FAILS:

  nop
  mov a,b

WORKS:

   mov a,b
   nop

And yet when you examine the hub memory either before or after the failure, the memory at any of the locations concerned is not corrupted. And it only fails at all at specific clock speeds. At all other speeds (which changes only one long in the startup code) it works fine.

In all the cases I have found so far, I can always make the program work by some such "tweak". But I cannot explain why this works, or prevent it from failing again later.

evanh · 2022-04-28 01:30

So there's more than one situation of this issue?

jmg · 2022-04-28 01:50

@RossH said:

The problem with suggestions like these - and I know because I have tried similarly invasive things previously - is that you do a massive amount of work, and then find that your new program does not exhibit the same problem as the old one.

Understood, which is why my suggestion does not change the code at all, it just provides some different pathways from P2 reset to run.
It removes SD from the test situation, and will allow easier temperature sweep testing, because I'm sure MHz and Temp will interact under the right test conditions.

@RossH said:
The only way I have found to tackle it so far is to find a program that fails, then "tweak" it as minimally as possible until it no longer fails. For example, in the program I am currently working with, I have reduced this down to moving the location of a single instruction by a single long, along the lines of:

FAILS:
  nop
  mov a,b
WORKS:
   mov a,b
   nop

I've seen code alignment change code operation on other MCUs, but on those it affects opcode timing and never causes a crash.
The assembler I use makes use of a ALIGN directive, to help make things 100% deterministic.

I guess it is possible interactions of that timing with specific HW flags might cause different code operation ?
The MCUs this shows up on, are the EFM8 series from Silabs, which fetch 32b from flash, and then consume 1/2/3 bytes of that as an opcode.

The P2 does not have that cache detail, but the P2 will vary the memory slot with code align. Doesn't P2 HUBEXEC involve some sort of cache read ?
Of course, that should not be 'MHz paranoid', which your tests indicate.
Does your code depend on COG to COG handshake based on memory to memory flag handling ?

evanh · 2022-04-28 02:08

@jmg said:
The P2 does not have that cache detail, but the P2 will vary the memory slot with code align. Doesn't P2 HUBEXEC involve some sort of cache read ?

Buffered incremental prefetching, might be best description of the FIFO. But lacking any branch prediction from the cog. Upon a branch, hubexec will stall until the FIFO has reloaded at new address. And it always reloads the FIFO, with a hidden RDFAST, for every hubexec branch.

jmg · 2022-04-28 02:20

@evanh said:

@jmg said:
The P2 does not have that cache detail, but the P2 will vary the memory slot with code align. Doesn't P2 HUBEXEC involve some sort of cache read ?

Buffered incremental prefetching, might be best description of the FIFO. But lacking any branch prediction from the cog. Upon a branch, hubexec will stall until the FIFO has reloaded at new address. And it always reloads the FIFO, with a hidden RDFAST, for every hubexec branch.

Thanks, so that could wobble about the finer details of timing based on align, but it should never be MHz paranoid.
If RossH can fit code into COG and see the same effect, that excludes any FIFO - HUB effects ?

RossH · 2022-04-28 02:48

@evanh said:
So there's more than one situation of this issue?

Yes. I have seen it before, made some code changes and it went away. For instance, you find an uninitialized pointer or some other bug, fix it and the problem disappears. Which makes you think you have found it. But it has returned often enough that I finally realized something deeper is going on. Also, this is the first time I have thought to do extensive testing with different clock speeds, which I think is is a major new clue.

Unfortunately, I did not keep any of the previous examples, and this current example is in code that does not belong to me, so I cannot share it. If it happens again in code that I can share, I will do so.

I no longer think it has to do with 180Mhz specifically - that was always something of a long shot! What is causing the problem to appear or disappear is probably just anything that changes the program timing slightly. Although having said that, I would have expected it to also fail at some of the other clock frequencies I tried. I would feel more confident I was on the right track if I could understand that

RossH · 2022-04-28 03:15

@jmg said:

Understood, which is why my suggestion does not change the code at all, it just provides some different pathways from P2 reset to run.
It removes SD from the test situation, and will allow easier temperature sweep testing, because I'm sure MHz and Temp will interact under the right test conditions.

The program is written to be loaded and also read its data from a file system, so removing the SD card altogether is not trivial. To produce an image that was in effect "pre-loaded" and did not actually require the SD card to be present would be possible, but difficult. Since I have evidence that the program and data are being loaded correctly, I am not inclined to try this until I have eliminated all other possibilities. An easier option might be to use the flash as a file system. I seem to recall that someone posted some code to do that. I will investigate that option. That would be a useful piece of code in any case.

The assembler I use makes use of a ALIGN directive, to help make things 100% deterministic.

Code alignment is always one of the first things I check. I have been bitten by that one before!

Does your code depend on COG to COG handshake based on memory to memory flag handling ?

As well as the SD card and serial drivers, the code uses a floating point co-processor that runs in a separate cog. So it uses 4 cogs in total. The interaction with all these is via Hub RAM. This may be affected by which specific cog are used, but it should not (AFAIK) be affected by clock speed.

pik33 · 2022-04-28 06:31

These SD and serial drivers compute their constants according to clkfreq, so this may make difference. I don't know what you use to compile the program but the test could be to make the program think it works at for example 270 MHz and then hubset it to 180. The SD should survive this, the serial baud rate has to be adjusted at PC side.

evanh · 2022-04-28 06:58

@RossH said:
The program itself tells me it fails. It fails its own data integrity checks. ...

Is the data check directly on SD loaded data? Or is there intermediate processing of it first? I'm still thinking the SD read/write routines could be a candidate.

What's so special about 180Mhz?

Comments