Prop 1 - system clock cycles elapsed before user's code starts running
Wossname
Posts: 174
At the exact moment where Cog 0 begins to run the interpreted Spin code, is it possible to make any particular statements about the current state of the System and Hub clocks?
I think we can, but I'm not 100% certain, could someone sanity check this for me...
The last thing the "booter" does is to execute:
Since coginit is a Hub instruction (and Cog 0 is the only one running), Cog 0 waits for the Hub to swing around to allow access. Then the Interpreter is loaded into Cog 0 (which is a clock-deterministic operation as it is a fixed length of memory to unscramble and copy). A question occurs - how many system clocks does it take to unscramble and copy 512 longs from Hub to Cog, it must be less than 8 right (there's some dark magic going on there I think)?
At the last System clock after the Interpreter is loaded AND before the next clock cycle when the Interpreter (in Cog 0) begins to run the user's code, is the state of the Hub clock guaranteed to be knowable by deduction? I think the answer is yes and I think the System clock will be advanced by a multiple of 8 clock cycles since the CogInit was invoked. I think this unscramble and copy must take a while because of the complexity of the task, but is transparent because all this is happening invisibly inside the chip, before our code runs.
The booter's earlier activities, serial port host detection and eeprom handling are both non-deterministic because they involve timing measurements and timeouts of indeterminate length. I think these are resolved by the ultimately final CogInit.
To put it another way -- the Propeller becomes cycle-predictable only at the moment the Cog 0 Interpreter begins execution at it's default entry point.
Is my thinking correct?
I think we can, but I'm not 100% certain, could someone sanity check this for me...
The last thing the "booter" does is to execute:
coginit interpreter 'reboot cog with interpreter
Since coginit is a Hub instruction (and Cog 0 is the only one running), Cog 0 waits for the Hub to swing around to allow access. Then the Interpreter is loaded into Cog 0 (which is a clock-deterministic operation as it is a fixed length of memory to unscramble and copy). A question occurs - how many system clocks does it take to unscramble and copy 512 longs from Hub to Cog, it must be less than 8 right (there's some dark magic going on there I think)?
At the last System clock after the Interpreter is loaded AND before the next clock cycle when the Interpreter (in Cog 0) begins to run the user's code, is the state of the Hub clock guaranteed to be knowable by deduction? I think the answer is yes and I think the System clock will be advanced by a multiple of 8 clock cycles since the CogInit was invoked. I think this unscramble and copy must take a while because of the complexity of the task, but is transparent because all this is happening invisibly inside the chip, before our code runs.
The booter's earlier activities, serial port host detection and eeprom handling are both non-deterministic because they involve timing measurements and timeouts of indeterminate length. I think these are resolved by the ultimately final CogInit.
To put it another way -- the Propeller becomes cycle-predictable only at the moment the Cog 0 Interpreter begins execution at it's default entry point.
Is my thinking correct?
Comments
See below - from reset release, there will be some fixed and known number of X * RCSLOW + Y * RCFAST SysCLKs, plus some startup times
( Addit: X ~ 1000 from data, portion of Y that is {LoaderROM -> COG} ~ 640u/(1/12M) = 7680 SysCLKs )
Of course, those RCSLOW/RCFAST clocks have variances, but the actual cycle counts should be defined.
Once BOD is released and Oscillators start, all decisions will be made based on RCSLOW/RCFAST.
In the case of EEPROM boot, the timeout window for Serial RXD is also defined
In the case of serial boot, there will be a time from reset release to possible first valid Char, and then a time from last char stop bit, to user code.
& yes, it would be good if those values were given in the data sheet.
Also of interest would be fastest possible UART Serial boot, which needs some means to ping the Loader (to find tRr), and some upper BAUD speed.
There have been posts around upper Baud speed.
Average effective SCL from the 1.3s EE load window, looks to be ~ 227 kHz
tUr is ~RCSLOW(50ms)+RCFAST(640us+40us)
tUp is ~RCFAST(240ms)
tUd is ~tUr + tUp
With small MCUs price drops, there are more choices around local and PC boot - an 18kB MCU is recently now similar in price to a 24C256, and that MCU can be i2c or UART loader.
Booter Source Code is here
PASM is different. By using conditional execution, you can have deterministic conditional code paths. Have a look at some of the video drivers for examples. The I2C driver for FemtoBasic also uses this.
http://forums.parallax.com/discussion/132417/faster-eeprom-booting
Mentions that tUd above, can be skipped by RXD=L
and visible in the current profiles of P8X32A-Propeller-Datasheet-v1.4.0_0.pdf
8.6. Current Profile at Various Startup Conditions
are the various time delays.
It's been a while since I looked at the booter code. As Mike said, after the 496 cog registers are copied, the last 16 special registers are cleared. Then the Interpreter is run from cog $000.
IIRC, the oscillator is not changed until the Interpreter runs. So up to this point, everything is executing at the internal RC Oscillator speed. This oscillator has a wide range, and this will make the timing to start quite difficult to calculate a time to start range.
I then tried power-on resets. I used a QuickStart card, so power on/off was done by unplugging the USB connector, and then reconnecting it. The value of temp1 ranged from $8E460286 to $92521B9E. This implies that the system counter doesn't start at zero on power up. This value would be consistent with the counter starting at $80000000, and then taking a little over 3 seconds to boot up at 12 MHz.
So I think the conclusion is that the value of the system counter on start up is not predictable. If fact, it might make a good starting point for a random number generator.
I get these indicated times from 0x80000000, so it seems to not start always the same ?
(0x8E460286-0x80000000)*(1/12M) = 19.95576583s
(0x92521B9E-0x80000000)*(1/12M) = 25.6142425s
The time between any two readings of the system counter is as accurate as the system clock itself. When running off a crystal or external clock source, that can be very accurate. When running off one of the RC clocks, it's dependent on temperature and operating voltage. When switching clock sources, there's a brief period where the clock may be unstable.
If you're interested, you can use a cog counter's PLL to generate true random numbers (dependent on a random physical process). Checkout this in the Propeller Object Exchange.
@jmg, yes my calculation was in error. I was just guessing that the counter might start up at $80000000, but it appears to be starting up at different values.
The EEPROM does have a wait cycle in it too (see "ee_wait" in the booter source).
Thank you everyone for your replies, I believe my questions have been answered.
In a nutshell, anything that happens before user-code begins to be interpreted by Cog 0 will take an effectively unpredictable number of system clock cycles (note: the actual oscillator frequencies involved are immaterial). This variation is due to 2 external factors: the responsiveness of the serial port on the host PC (if there is one) and/or the responsiveness of the EEPROM chip on the I2C bus.
When user-code begins to execute, the System and Hub clock states are known. I think this is true because initially the Interpreter in Cog 0 is the only cog running and has just finished loading the interpreter which is a Hub operation (or at least a long sequence of them). If the last thing that the happens is the last of the Hub--> Cog memory transfers, then it stands to reason that at the very next Hub clock edge will give Hub access to Cog 1. Cog 1 won't yet be running though of course since Cog 0 hasn't had a chance to execute the first Spin instruction yet.
(I think. I will attempt to verify this on a real propeller momentarily, feel free to try it for yourself )
A logic analyser attached to pins P0 and P7 (Cog 0 and Cog 7 respectively) sampling at 16MHz...
Consider this test code...
The SPIN code tells Cog 7 to run some PASM that just outputs half the system clock (may be out of phase by 180 degrees but that doesn't matter, either rising or falling edge is synchronous with a FULL CYCLE of the absolute system clock).
Then it tells Cog 0 to run, set P0 as output, set it HIGH and then reset the Propeller ASAP.
RCSLOW is the chosen clock speed because my crappy logic analyser can't handle RCFAST without Mr Nyquist getting all smug at me.
I'm at the limits of my poor test gear (my 8 channel Saleae tops out at 24MHz), but even sampling RCSLOW at 16MHz it does seem that this sequence of user-code is deterministic in this simple test. However this is NOT remotely scientific.
Every sequence I look at in my logic analyser software shows extremely consistent behaviour. Not good enough though, correlation != causation as they say.
What can I draw from this? Well the sample rate is MUCH higher than the RCSLOW rate, so that's a good thing. If I can see that Cog7's CTRA outputs the same number of clock cycles each time (without any phase drift) before the Prop resets, then I think this means that Cog 7 and Cog 0 are lock-stepped.
Does anyone have some grown-up test gear? A multi-channel logic analyser with 100+MHz sample rate would be a good tool to have, because we can go up to RCFAST and this means we can avoid switching clock sources and thus don't have to worry about oscillator settling (in the interests of removing all sources of variation).
That's not the wording I would use.
Booter Source is here
Points:
a) yes, ee_wait is in booter source, but that applies only to WRITE cycles. EE read is entirely a predictable number of system RCFAST clock cycles
b) The RST exit delay is also a very predictable number of system RCSLOW clock cycles
c) Oscillator frequencies are very material, if you are interested in time. Those frequencies are actually the main source of variance.
Oscillator frequencies can be measured. (see below)
I'm not sure what you are trying to say here ?
The COG executes using RCFAST, and all COGS use the same RCFAST.
coginit launches the COG(s), using RCFAST, and that will be deterministic every time.
I'm less sure about what you call 'Hub Clock state', but the hub slots do vary by COG, and any HUB access will re-align the opcode fetch to that hub slot.
That means two COGINITs can start in sync, but you do need care with HUB ops to resync again, if that is critical.
From the Booter source, you should also be able to measure the RCSLOW & RCFAST times.
Reset the Prop with RXD=0, and you skip UART timeout, and go directly to EE_Read
SCL clock will start ~RCSLOW(50ms)+RCFAST(640us+40us)
Width of this read-burst of SCL (294948 SCL clks?) shows the time to read 32k, for t ~ RCFAST(32kRead)
Loading and running booter code using a crystal, or simulator, should allow an exact OpcCLK count for this to be measured.
Yup, but knowing the clock cycles is still useful, as it gives a precise leverage point.
You can then apply that to the Freq limits, to get proper design limits.
Or, you can measure the Frequencies, and improve those limits, and so speed up download speeds.
It's used in reads too (ee_read calls ee_write which calls ee_wait). The wait loop is there to detect the presence or absence of the I2C ACK signal which is always part of I2C transactions.
At the moment I'm only concerned with clock cycles. My project doesn't have any frequencies in it. I'm playing with simulation of a Propeller on a PC, not actual physical Propeller hardware. In general though you're right, actual frequencies are of utmost importance in a design, but specifically (for this thread), they don't have any affect on the question.
Did you see the mention of Verilog derived simulation in another thread ?
Approach is to have Verilog -> exe, which then simulates the core, using the exact verilog used to create it ?