Prop 1 - system clock cycles elapsed before user's code starts running

Wossname · 2017-06-27 18:02

At the exact moment where Cog 0 begins to run the interpreted Spin code, is it possible to make any particular statements about the current state of the System and Hub clocks?

I think we can, but I'm not 100% certain, could someone sanity check this for me...

The last thing the "booter" does is to execute:

coginit interpreter             'reboot cog with interpreter

Since coginit is a Hub instruction (and Cog 0 is the only one running), Cog 0 waits for the Hub to swing around to allow access. Then the Interpreter is loaded into Cog 0 (which is a clock-deterministic operation as it is a fixed length of memory to unscramble and copy). A question occurs - how many system clocks does it take to unscramble and copy 512 longs from Hub to Cog, it must be less than 8 right (there's some dark magic going on there I think)?

At the last System clock after the Interpreter is loaded AND before the next clock cycle when the Interpreter (in Cog 0) begins to run the user's code, is the state of the Hub clock guaranteed to be knowable by deduction? I think the answer is yes and I think the System clock will be advanced by a multiple of 8 clock cycles since the CogInit was invoked. I think this unscramble and copy must take a while because of the complexity of the task, but is transparent because all this is happening invisibly inside the chip, before our code runs.

The booter's earlier activities, serial port host detection and eeprom handling are both non-deterministic because they involve timing measurements and timeouts of indeterminate length. I think these are resolved by the ultimately final CogInit.

To put it another way -- the Propeller becomes cycle-predictable only at the moment the Cog 0 Interpreter begins execution at it's default entry point.

Is my thinking correct?

jmg · 2017-06-27 20:40

Wossname wrote: »

At the last System clock after the Interpreter is loaded AND before the next clock cycle when the Interpreter (in Cog 0) begins to run the user's code, is the state of the Hub clock guaranteed to be knowable by deduction? I think the answer is yes and I think the System clock will be advanced by a multiple of 8 clock cycles since the CogInit was invoked. I think this unscramble and copy must take a while because of the complexity of the task, but is transparent because all this is happening invisibly inside the chip, before our code runs.

What do you mean by "the state of the Hub clock" ?
See below - from reset release, there will be some fixed and known number of X * RCSLOW + Y * RCFAST SysCLKs, plus some startup times
( Addit: X ~ 1000 from data, portion of Y that is {LoaderROM -> COG} ~ 640u/(1/12M) = 7680 SysCLKs )

Of course, those RCSLOW/RCFAST clocks have variances, but the actual cycle counts should be defined.

Wossname wrote: »

The booter's earlier activities, serial port host detection and eeprom handling are both non-deterministic because they involve timing measurements and timeouts of indeterminate length. I think these are resolved by the ultimately final CogInit.

To put it another way -- the Propeller becomes cycle-predictable only at the moment the Cog 0 Interpreter begins execution at it's default entry point.

Is my thinking correct?

Once BOD is released and Oscillators start, all decisions will be made based on RCSLOW/RCFAST.

In the case of EEPROM boot, the timeout window for Serial RXD is also defined

In the case of serial boot, there will be a time from reset release to possible first valid Char, and then a time from last char stop bit, to user code.

tUr is ~RCSLOW(50ms)+RCFAST(640us+40us)
tUp is ~RCFAST(240ms)
tUd is ~tUr + tUp

Vcc      ___/=============================
POR      ___////==========================
RXD      ==============F================== F=0, Fast EE boot example
UsrCode  ______________________________/=====
            |tP|<-tUr->|<----tEE ----->|   UART timeout skipped

Vcc      ====================================
RSTn     ==\_/=============================== fast RST edge
RXD      =============F====================== F=1, 240ms RXD wait
UsrCode  __________________________________/=======
             |<- tUr->|<--tUp-->|<--tEE ->|   UART timeout added
i2c_CLK(peak) = RCFAST/k


Vcc      ==============================================
RSTn     ==\_/=========================================
RXD      ==============\=\=\=\=====\===================
UsrCode  ______________________________________/=======
             |<-tUr-->|< PC link  >|<--tRe --->|

tP  = POR reset release time
tUp = Uart detection timeout
tEE = Time to boot from EEPROM, no Serial
tUr = Rxd Ready, time from reset release, to first pin sample
tRe = RXD exit, time from last byte stop bit sample, to UsrCode

tP may have BOD delays
tUr,tUp,tEE,tRe should be a fixed/deterministic number of RCSLOW & RCFAST Clocks

& yes, it would be good if those values were given in the data sheet.
Also of interest would be fastest possible UART Serial boot, which needs some means to ping the Loader (to find tRr), and some upper BAUD speed.
There have been posts around upper Baud speed.

Average effective SCL from the 1.3s EE load window, looks to be ~ 227 kHz
tUr is ~RCSLOW(50ms)+RCFAST(640us+40us)
tUp is ~RCFAST(240ms)
tUd is ~tUr + tUp

With small MCUs price drops, there are more choices around local and PC boot - an 18kB MCU is recently now similar in price to a 24C256, and that MCU can be i2c or UART loader.

Booter Source Code is here

Mike Green · 2017-06-27 20:58

Once the COGINIT synchronizes with the hub, longs are transferred once per hub cycle until 496 longs are transferred. I believe (without checking the P1 FPGA sources) that the remaining 16 locations are also copied while their corresponding system registers are cleared. There's a little additional time for the cog to actually start execution. You'll have to trace through the interpreter and the initialization code if you want to see how long from the COGINIT to the first interpreted Spin bytecode, then further to when the first bytecode in the user's code is executed. The execution times of Spin bytecodes are not absolutely predictable without tracing through the interpreter although you can re-synchronize the cog with the hub using WAITCNT, then reproducibly perform an operation some Spin operations later (if there are no variable length code paths ... IF ELSE)

PASM is different. By using conditional execution, you can have deterministic conditional code paths. Have a look at some of the video drivers for examples. The I2C driver for FemtoBasic also uses this.

jmg · 2017-06-27 22:24

There is also this thread :

http://forums.parallax.com/discussion/132417/faster-eeprom-booting

Mentions that tUd above, can be skipped by RXD=L

and visible in the current profiles of P8X32A-Propeller-Datasheet-v1.4.0_0.pdf
8.6. Current Profile at Various Startup Conditions

are the various time delays.

Cluso99 · 2017-06-27 23:11

For reference, the hub cycles at 16 clocks, not 8 clocks.
It's been a while since I looked at the booter code. As Mike said, after the 496 cog registers are copied, the last 16 special registers are cleared. Then the Interpreter is run from cog $000.

IIRC, the oscillator is not changed until the Interpreter runs. So up to this point, everything is executing at the internal RC Oscillator speed. This oscillator has a wide range, and this will make the timing to start quite difficult to calculate a time to start range.

Dave Hein · 2017-06-28 00:00

I tried the following test program.

con
  _clkmode = pll16x+xtal1
  _clkfreq = 80_000_000

obj
  fds : "FullDuplexSerial"

pub main | temp1, temp2
  temp1 := CNT
  temp2 := CNT
  waitcnt(CNT+_clkfreq*3)
  fds.start(31, 30, 0, 115200)
  fds.hex(temp1,8)
  fds.tx(13)
  fds.hex(temp2 - temp1,4)
  fds.tx(13)

I loaded the program in EEPROM and did a few runs by hitting the reset button. The value of temp1 increased after each reset until it wrapped around, and then increased again. In other words, a reset did not clear the system counter. It just continued to increment as long as the clock was running. The value of temp2 - temp1 was always $170.

I then tried power-on resets. I used a QuickStart card, so power on/off was done by unplugging the USB connector, and then reconnecting it. The value of temp1 ranged from $8E460286 to $92521B9E. This implies that the system counter doesn't start at zero on power up. This value would be consistent with the counter starting at $80000000, and then taking a little over 3 seconds to boot up at 12 MHz.

So I think the conclusion is that the value of the system counter on start up is not predictable. If fact, it might make a good starting point for a random number generator.

jmg · 2017-06-28 00:32

Dave Hein wrote: »

...
I then tried power-on resets. I used a QuickStart card, so power on/off was done by unplugging the USB connector, and then reconnecting it. The value of temp1 ranged from $8E460286 to $92521B9E. This implies that the system counter doesn't start at zero on power up. This value would be consistent with the counter starting at $80000000, and then taking a little over 3 seconds to boot up at 12 MHz.

I get these indicated times from 0x80000000, so it seems to not start always the same ?
(0x8E460286-0x80000000)*(1/12M) = 19.95576583s
(0x92521B9E-0x80000000)*(1/12M) = 25.6142425s

Mike Green · 2017-06-28 01:10

Indeed, the system counter is not cleared on a reset nor is it set to any particular value on power up. Depending on the physical and electrical conditions in and around the counter on the particular chip, the counter may have an initial value that is the same or different each time the chip is powered up. It is unlikely that the value is random or anything like it.

The time between any two readings of the system counter is as accurate as the system clock itself. When running off a crystal or external clock source, that can be very accurate. When running off one of the RC clocks, it's dependent on temperature and operating voltage. When switching clock sources, there's a brief period where the clock may be unstable.

If you're interested, you can use a cog counter's PLL to generate true random numbers (dependent on a random physical process). Checkout this in the Propeller Object Exchange.

Dave Hein · 2017-06-28 02:44

Mike Green wrote: »

... It is unlikely that the value is random or anything like it.

The time between any two readings of the system counter is as accurate as the system clock itself. When running off a crystal or external clock source, that can be very accurate. When running off one of the RC clocks, it's dependent on temperature and operating voltage. When switching clock sources, there's a brief period where the clock may be unstable.

Yet I get random looking values when I reset and/or power up. If the temperature and voltage remained exactly the same each time you would get repeatable values, but as you said there is a short period of time where the clock may be unstable when switching from the internal 12 MHz clock to the Xtal clock. In practice it would be very difficult to get a repeatable start-up value for the system counter.

@jmg, yes my calculation was in error. I was just guessing that the counter might start up at $80000000, but it appears to be starting up at different values.

Mike Green · 2017-06-28 03:47

The system counter is just an incrementing counter driven by the system clock. The counter has flip-flops which are unstable as the supply voltage is increasing through the switching threshold of the devices used (and the threshold will vary slightly from device to device). Each may settle into a low or high state depending on what's connected to it including the process variations locally on the chip. It's not really random although it may seem that way at first. I think you'll find that the values cluster. Specific bits may favor one state or another. Groups of bits may favor one state or another depending on local conditions on the chip. Temperature and operating voltage will skew everything.

Dave Hein · 2017-06-28 12:36

About the time I went to bed I realized that the number of cycled during bootup should be fairly constant. I was thinking that since it ran off the internal RC clock the number of cycles would vary since the frequency would vary. However, everything happens in lockstep with the system clock, and this should always take a fixed number of cycles as long as the serial port isn't used. So the only randomness occurs in the initial value of the counter and the period where the frequency changes. I agree that this isn't sufficient to seed a random number generator. I wasn't really serious about using it for that purpose, but just casually commented that it could be used.

Wossname · 2017-06-28 16:01

Dave Hein wrote: »

...should always take a fixed number of cycles as long as the serial port isn't used

The EEPROM does have a wait cycle in it too (see "ee_wait" in the booter source).

Thank you everyone for your replies, I believe my questions have been answered.

Wossname · 2017-06-28 16:22

To summarise...

In a nutshell, anything that happens before user-code begins to be interpreted by Cog 0 will take an effectively unpredictable number of system clock cycles (note: the actual oscillator frequencies involved are immaterial). This variation is due to 2 external factors: the responsiveness of the serial port on the host PC (if there is one) and/or the responsiveness of the EEPROM chip on the I2C bus.

When user-code begins to execute, the System and Hub clock states are known. I think this is true because initially the Interpreter in Cog 0 is the only cog running and has just finished loading the interpreter which is a Hub operation (or at least a long sequence of them). If the last thing that the happens is the last of the Hub--> Cog memory transfers, then it stands to reason that at the very next Hub clock edge will give Hub access to Cog 1. Cog 1 won't yet be running though of course since Cog 0 hasn't had a chance to execute the first Spin instruction yet.

(I think. I will attempt to verify this on a real propeller momentarily, feel free to try it for yourself

)

Wossname · 2017-06-28 18:31

Empirical measurement may prove tricky.

A logic analyser attached to pins P0 and P7 (Cog 0 and Cog 7 respectively) sampling at 16MHz...

Consider this test code...

CON
  _clkmode = RCSLOW

PUB EngineTemplateMain
  coginit(7, @ENTRY_POINT_COG_7, 0)
  coginit(0, @ENTRY_POINT_COG_0, 0)

DAT
                        org     0
ENTRY_POINT_COG_0       mov     DIRA, #$ff
                        or      OUTA, #1 
                        clkset  DIRA 'hack to reset the chip                        
                        
c0_cogid                res     1
                        fit

                        org     0
ENTRY_POINT_COG_7       mov     DIRA, #128
                        mov     PHSA, #0
                        mov     FRQA, HALF_SYSTEM_SPEED
                        mov     CTRA, CTRA_MODE
:derp                   jmp     #:derp
                        
HALF_SYSTEM_SPEED       long    |< 31
CTRA_MODE               long    %0_00100_111_00000000_000000_000_000111        
                        fit

The SPIN code tells Cog 7 to run some PASM that just outputs half the system clock (may be out of phase by 180 degrees but that doesn't matter, either rising or falling edge is synchronous with a FULL CYCLE of the absolute system clock).
Then it tells Cog 0 to run, set P0 as output, set it HIGH and then reset the Propeller ASAP.

RCSLOW is the chosen clock speed because my crappy logic analyser can't handle RCFAST without Mr Nyquist getting all smug at me.

I'm at the limits of my poor test gear (my 8 channel Saleae tops out at 24MHz), but even sampling RCSLOW at 16MHz it does seem that this sequence of user-code is deterministic in this simple test. However this is NOT remotely scientific.

Every sequence I look at in my logic analyser software shows extremely consistent behaviour. Not good enough though, correlation != causation as they say.

What can I draw from this? Well the sample rate is MUCH higher than the RCSLOW rate, so that's a good thing. If I can see that Cog7's CTRA outputs the same number of clock cycles each time (without any phase drift) before the Prop resets, then I think this means that Cog 7 and Cog 0 are lock-stepped.

Does anyone have some grown-up test gear? A multi-channel logic analyser with 100+MHz sample rate would be a good tool to have, because we can go up to RCFAST and this means we can avoid switching clock sources and thus don't have to worry about oscillator settling (in the interests of removing all sources of variation).

jmg · 2017-06-28 21:12

Wossname wrote: »

In a nutshell, anything that happens before user-code begins to be interpreted by Cog 0 will take an effectively unpredictable number of system clock cycles (note: the actual oscillator frequencies involved are immaterial).
This variation is due to 2 external factors: the responsiveness of the serial port on the host PC (if there is one) and/or the responsiveness of the EEPROM chip on the I2C bus.
... The EEPROM does have a wait cycle in it too (see "ee_wait" in the booter source).

That's not the wording I would use.
Booter Source is here
Points:
a) yes, ee_wait is in booter source, but that applies only to WRITE cycles. EE read is entirely a predictable number of system RCFAST clock cycles
b) The RST exit delay is also a very predictable number of system RCSLOW clock cycles
c) Oscillator frequencies are very material, if you are interested in time. Those frequencies are actually the main source of variance.

Oscillator frequencies can be measured. (see below)

Wossname wrote: »

When user-code begins to execute, the System and Hub clock states are known.

I'm not sure what you are trying to say here ?
The COG executes using RCFAST, and all COGS use the same RCFAST.
coginit launches the COG(s), using RCFAST, and that will be deterministic every time.

I'm less sure about what you call 'Hub Clock state', but the hub slots do vary by COG, and any HUB access will re-align the opcode fetch to that hub slot.

That means two COGINITs can start in sync, but you do need care with HUB ops to resync again, if that is critical.

Wossname wrote: »

(I think. I will attempt to verify this on a real propeller momentarily, feel free to try it for yourself )

From the Booter source, you should also be able to measure the RCSLOW & RCFAST times.
Reset the Prop with RXD=0, and you skip UART timeout, and go directly to EE_Read
SCL clock will start ~RCSLOW(50ms)+RCFAST(640us+40us)
Width of this read-burst of SCL (294948 SCL clks?) shows the time to read 32k, for t ~ RCFAST(32kRead)
Loading and running booter code using a crystal, or simulator, should allow an exact OpcCLK count for this to be measured.

Cluso99 · 2017-06-29 06:21

As jmg said, it is the frequency of the internl RC oscillator(s) that are the unknowns. The rest can be calculated. But it does not help to know the number of clocks if you do not know the clock frequency.

jmg · 2017-06-29 10:53

Cluso99 wrote: »

As jmg said, it is the frequency of the internl RC oscillator(s) that are the unknowns. The rest can be calculated. But it does not help to know the number of clocks if you do not know the clock frequency.

Yup, but knowing the clock cycles is still useful, as it gives a precise leverage point.
You can then apply that to the Freq limits, to get proper design limits.
Or, you can measure the Frequencies, and improve those limits, and so speed up download speeds.

Wossname · 2017-06-29 15:56

jmg wrote: »

a) yes, ee_wait is in booter source, but that applies only to WRITE cycles. EE read is entirely a predictable number of system RCFAST clock cycles

It's used in reads too (ee_read calls ee_write which calls ee_wait). The wait loop is there to detect the presence or absence of the I2C ACK signal which is always part of I2C transactions.

jmg wrote: »

c) Oscillator frequencies are very material, if you are interested in time.

At the moment I'm only concerned with clock cycles. My project doesn't have any frequencies in it. I'm playing with simulation of a Propeller on a PC, not actual physical Propeller hardware. In general though you're right, actual frequencies are of utmost importance in a design, but specifically (for this thread), they don't have any affect on the question.

jmg · 2017-06-29 20:15

Wossname wrote: »

At the moment I'm only concerned with clock cycles. My project doesn't have any frequencies in it. I'm playing with simulation of a Propeller on a PC, not actual physical Propeller hardware. In general though you're right, actual frequencies are of utmost importance in a design, but specifically (for this thread), they don't have any affect on the question.

Did you see the mention of Verilog derived simulation in another thread ?

Approach is to have Verilog -> exe, which then simulates the core, using the exact verilog used to create it ?

Prop 1 - system clock cycles elapsed before user's code starts running

Comments