Improving pages 7 and 9 of the data sheet, especially about Hub access

dpeschel · 2008-11-16 05:50

Overall, the data sheet and manual are quite well-written. I've been having fun "watching" the instruction overlap happening, by putting several copies of figure 4 on page 9 into a spreadsheet.

A minor suggestion: The right-hand column of page 9 uses the phrase "next instruction to be executed", then talks about instruction N modifying instruction N+1, then talks about prediction. The key phrase here is "to be executed", as opposed to "in address order". After closely reading the paragraph about prediction (or for anyone who understands modern CPU design) it becomes clear. But what about expanding on "to be executed" where it's used? Or having "At this point in time (Stage 4)", then a paragraph about prediction, then "Finally at clock cycle M+5", then a paragraph about self-modifying code? The point about self-modifying code is important, but its current placement may reinforce the simple and wrong meaning of "next".

Using N-1 and N+1 in figures 4 and 5 is an oversimplification too—because of possible jumps to and from N—but I don't have an answer. Jump and non-jump versions of both, maybe. Or a few versions of figure 5 showing a shaded "next" location and one version of figure 4.

I'm much more confused about Hub access timing. I understand the cog clock, hub clock, and access window. The 7- and 9-cycle lengths are the confusing part. Figures 2 and 3, figure 4, and the text don't have enough details in common. A variation of figure 4 showing a Hub instruction and the 7-cycle length, or four copies of figure 4 showing Hub in sync, non-Hub, non-Hub, Hub would give all the details nicely. Relating figure 4 to rising and falling clock edges would be a bonus.

However, I do have a guess. I put the Hub in sync, non-Hub, non-Hub, Hub sequence in my spreadsheet. Then I tried stretching both Hub instructions' stage 4 to whatever length would make the second Hub instruction execute 16 cycles (= 8 Hub cycles = 1 access "rotation") after the first. I assumed the fetch-opcode stage stretches too. A length of five cycles works perfectly. That gives (in a monospaced font):

        0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  
instr1  ex  res
hub2    op      src dst ex  ex  ex  ex  ex  res                                                                                                                                                                                                                     
instr3                  op  op  op  op  op      src dst ex  res                                                                                                                                                                                                                 
instr4                                                  op      src dst ex  res                                                                                                                                                                         
hub5                                                                    op      src dst ex  ex  ex  ex  ex  res                 
instr6                                                                                  op  op  op  op  op      src dst ex  res

hub5's "ex" cycles start at cycle 20, 16 cog cycles after hub2's at 4. The 7 cycles of execution time are accounted for if I include the fetch-source and fetch-destination cycles. So in that sense hub2 executes on cycles 2-8 and hub5 on cycles 18-24. And the delay (cycles 9-17) between those two execution periods is 9 cycles.

dpeschel · 2008-12-01 19:27

Aargh! I was so happy about my explanation of the 7-cycle execution time that I stopped thinking. Eventually I realized I needed to explain
the 4-cycle time of ordinary instructions too. You can't stretch a one-cycle phase into a five-cycle phase (adding four cycles) and also stretch
a 4-cycle instruction into a 7-cycle instruction (adding three cycles). But the summary chart in the data sheet has "4" and "7...22" as the
execution times, and presumably all times are measured in a consistent way.

Back to experimenting with my spreadsheet... But if someone knowledgeable replied, we could settle this issue.

Jeff Martin · 2008-12-08 22:42

Hi dpeschel,

I've been trying to figure out how to respond to your post. Just so you know, I'll take your suggestions seriously and try to clear up the explanations in the datasheet.

I can truly understand why you're confused, but the solution for clearing it up isn't totally "clear" to me. As it turns out, we hesitated many times to put those

details in the manual/datasheet for the very reason that we knew it would cause undue confusion in some cases. It's kind of like telling a "driving school" student (sitting at a stoplight that just turned green) that he needs to:

pump so many milliliters of fuel into the engine's cylindars, which will get mixed with a specific amount of outside air on the way in (as determined by the

car's engine computer) so it can be compressed in the chamber to a critical density in order to ignite, via a carefully timed electric spark, and cause a small

contained explosion resulting in expansion pressure that will actuate a mechanical linkage to the drive train and push the car forward a predictable distance.

when, instead, he should be told simply to:

press on the throttle gently to make the car go.·

In truth, all the seemingly complex details about the inner workings of the Propeller's instruction execution can be boiled down into very simple terms that

should be the focus of most every user:

Every instruction takes 4 clock cycles, except the "wait" instructions (5+) and hub instructions (7..22).
- Wait instructions take 5 "plus" the number of extra cycles it takes for the condition to satisfy their execution.
- Hub instructions take a minimum of 7 cycles, and a maximum of 22, depending on when the instruction is processed in relation to the cog's hub-access window.

The rest are details that, in practice, are totally irrelevant to the Propeller user in all but the most extreme cases (like what happens with the very 1st instruction and the very last instruction are executed).

My point is, we were trying to provide the detail for those who requested it, but please understand that we did not mean to confuse you in the process, and we apologize for that. Keeping the three bulleted rules in mind, above, will make your success of the Propeller quicker to realize.

You're right, the N-1, N and N+1 is a bit of an over-simplification, but it was done with the idea that readers will understand we mean the previous, current, and next instruction to be processed. To that end, we'll revise the text to specify that better next time around.

Yes, perhaps expansion of figure 4 to include rising/falling clock edges, and showing a version in-sync and out-of-sync with the hub would be of benefit. We'll consider this more while trying to simplify the whole explanation better.

I feel that Figures 2 and 3 show it best (regarding hub instructions and the access window), in figure 2, a hub instruction (labeled "HI") is begun in Cog 0 on the falling clock edge 0, which just happens to be start of Cog 0's hub access window. That instruction takes 7 clock cycles (because that's the fastest a hub instruction can be fully processed) and afterwards, there are 9 clock cycles left over before the next hub-access window (ie: the cog can execute two other non-hub instructions before the next hub-access window for the purposes of not waiting around but still remaining synched to the access window should it wish to execute another hub instruction.

Figure 3 shows that same hub instruction just barely missed the cog's hub-access window and now has to wait 15 clock cycles for the next hub-access window, then it can execute (in its 7 required cycles) for a total instruction execution time of 22 cycles.

Does that clear up the confusion? I really want to make sure you understand this part, and whatever I glean from the confusion you had to help me explain it more clearly to others.

Thank you very much for the suggestions.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Jeff Martin

· Sr. Software Engineer
· Parallax, Inc.

Andrew E Mileski · 2008-12-09 00:27

If it didn't matter, people wouldn't keep asking questions about timing.

There is some info on the forum, but it is scattered and incomplete.

My favorite post is probably one from Parallax that presented a theory on how something probably works, but even they were unsure :-o Though admittedly that same post had a wonderful treatment of the timing issue, namely stage by stage, that one would hope it would be applied to other questions.

All these pearls need to be collected, and should appear in the datasheet.

Basically, there seems to be a need for more about:
- hub timing (like mentioning the pipeline stalls, and the execution stage is extended).
- wait* timing (like mentioning the pipeline stalls, when are pins sampled, and details about jitter possibly extending the cycles).
- counter to pin timing (writing PHSx, or does an accumulate happen on the same cycle before or after a write to a running counter)
- explaining something as mundane as "mov dest, cnt" and just what cycle the count gets captured (not as mysterious as the above, but should appear in the same section).
- etc., etc.

n45w73 · 2008-12-10 07:37

Hi : -)

Has a noob, I also have some questions about hub acces

If I understand correctly the hub is a synchrone bus share by all cogs
I can speculate that hub access is done like in the following picture...

Lines are hubs instructions, and each cog has his window were his instruction do
something to the main memory.

My first question is,

what is the order of access is it like I draw in the picture
C0, c1, c2 c3 c4 c5 c6 c7 ... c0 c1 etc ...

Why my question ? well I'll try to synchronise cogs to do some calculations in sync.
Cod n passing result to cogs n+1

if this is the correct cogs acces pattern I'll will not have sync delay between cog ..

What do you thing ? Someone have some answer ?

Thanks

Andrew E Mileski · 2008-12-10 16:24

n45w73 said...
My first question is,

what is the order of access is it like I draw in the picture
C0, c1, c2 c3 c4 c5 c6 c7 ... c0 c1 etc ...

The datasheet states in section 4.4:

"The Hub controls access to mutually-exclusive
resources by giving each cog a turn in a “round robin”
fashion from Cog 0 through Cog 7 and back to Cog 0
again."

Post Edited (Andrew E Mileski) : 12/10/2008 4:32:33 PM GMT

n45w73 · 2008-12-10 23:58

ooppps I just miss that ...
I thought the access order wasn't specified...

I go read the manual now !

godzich · 2008-12-11 10:47

Hi,

I fully agree with Andrew. Since the propeller IS a device were you more or less program your own "hardware" (in other
words, build PASM variants of such harware that normally exists in other processors, like UARTs, CAN interfaces etc),
you must be provided (in the data sheet) with precise timing information. Now many essentials are totally missing!!!

As an example I can mention the discussion in some threads about the precise time when a pin is sampled, when
executing waitpx instructions. It cannot be right (and it is for sure not very professional) to hunt for such basic
information in the forums. Such info is well documented in all other vendor's microprocessors data sheet I have read. And there
are many of them. Adding such important data to the data sheet would ease the support peoples lives·- a lot.

As an analogy; I buy a new TV with a remote control that has lots of nice functions and control possibilities.
The seller gives me a manual that only tells me the basics, how to turn on and off the set, select channels and control
the volume. The user manual is blank othervice, and the other remote buttons are lacking symbols, but something happens
if I press them. I have bought a piece of sophisitcated equimpent that I cannot use to its full potential. I have exactly
this feeling with the Propeller.

The Propeller data sheet has many such shortcomings, in addion to the ones mentioned here. Also the video section
is more like an complete black box, at least what comes to the precise timing and the finer details. Kiss goodbye to using
it for other purposes. My time is too precious to have to find out thing by just experimenting...

It's like the Propeller guys would under-estimate us, that we get wicked or insane, if we get to much technical data. Or
is it so that Parallax wants to keep the lid on these things, to make it as difficult as possible for people to copy
the design, or use some of the intellectual property?

PLEASE, PLEASE add the missing timing diagrams and some better block diagrams to the next revision of the data sheet,
to bring it to the level it deserves. Going from expermental to professional is only a small step away...

Sorry if I sound a little harsh. It is just to wake you up from your "sleeping beauty" dreams... [noparse]:)[/noparse]· Since, the Propeller is little
like a sleeping beauty... This critisism is VERY constructive, and least from my perspecive [noparse];)[/noparse]

Sincerely

Christian

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
The future does not exist - we must invent it!

Post Edited (godzich) : 12/11/2008 10:56:14 AM GMT

dpeschel · 2008-12-28 00:24

Jeff, thanks for spending the time to read my comments.

Andrew and godzich, thanks for making the same basic point I'm trying to make. I'd put it this way:
A data sheet is supposed to deal with extreme cases. It's supposed to completely describe everything
programmers might need to know that the manufacturer does not consider proprietary. And personally,
I'd rather see a description of underlying principles than a list of individual cases.

If people are asking questions about Prop timing, then they obviously do need to know about it. The Prop
data sheet has some answers and some principles, but not enough. It lists the case of a Hub, then a non-Hub,
then a Hub instruction. It describes the principle of stages of execution. But the principle isn't detailed enough
(to me) to explain the case, or different parts of the data sheet are using different terminology, or something
is confusing me. That's what I meant by "don't have enough details in common". I was hoping to see the Hub/
non-Hub/Hub case correctly described in cycles in this thread, but that hasn't happened either.

I'm focusing on this one case because it's already mentioned in the data sheet and because I'm curious. Andrew
and godzich mention some other areas they feel need better coverage.

My other suggestions are about little improvements rather than big issues. I'm glad you looked at them but I'm
afraid they drowned out the point about the big issues.

Cluso99 · 2008-12-28 00:49

Here is some timing code I did 6 months or so ago. It uses the spin font which has timing diagrams, so see the attached file for full representation.

'Logged by DataLogger 2008-05-23. Instruction timing.
      IdSDE...ER                      waitcnt   :tdelay,0        'synchronised with the DataLogger Cogs
                IdSDER                or        outa, :pinout    'make pin 7 =1  (to show we are synchronised!)
                    IdSDER            waitpeq   :pinout,:pinout  'wait for pin 7 =1 (we just set it this way!)                                          
                          IdSDER      xor       outa, :pinout    'toggle pin 7 =0
                              IdSDE......................ER                             waitpeq   :pin10,:pin10  'wait for pin 10 =1 (wait here for pin)                                                
                                                           IdSDER                       xor       outa, :pinout  'toggle pin 7 =1
                                                               IdSDER                   nop                     
                                                                   IdSDER         :loop xor       outa, :pinout  'toggle pin 3 =0
                                                                       IdSDER           jmp       #:loop        
                                                                           IdSDER :loop xor       outa, :pinout  'toggle pin 3 =1

 
 
      IdSDE...ER                      waitcnt   :tdelay,0        'synchronised with the DataLogger Cogs
                IdSDER          :loop mov       outa, :counter   'make pin8 =1
                    IdSDER            add       :counter, #$100  'inc counter (counts from pin8...)
                        IdSDER        jmp       #:loop
                            IdSDER          :loop mov       outa, :counter   'make pin8 =1
                                IdSDER            add       :counter, #$100  'inc counter (counts from pin8...)
                                    IdSDER        jmp       #:loop
                                        IdSDER          :loop mov       outa, :counter   'make pin8 =1
                                            IdSDER            add       :counter, #$100  'inc counter (counts from pin8...)
                                                IdSDER        jmp       #:loop
                                                    IdSDER          :loop mov       outa, :counter   'make pin8 =1
                                                        IdSDER            add       :counter, #$100  'inc counter (counts from pin8...)
                                                            IdSDER        jmp       #:loop

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

cruising][noparse][[/noparse]url=http://www.bluemagic.biz]cruising[noparse][[/noparse]/url][/url]

This is a [noparse][[/noparse]b]bold[noparse][[/noparse]/b] test.

dpeschel · 2008-12-29 22:08

Cluso99, what are you using to time the code and generate the diagrams? Can you time any piece of code?

Cluso99 · 2008-12-30 03:22

I wrote a datalogger program which records all 32 pins using 4 interleaved cogs.

Using the code I published (link below) you can time any code. You can also use my Pasm/Spin Debugger which counts the instructions executed.

Data Logger Object (samples pins @ 50nS or 12.5nS) - like Dscope
http://forums.parallax.com/showthread.php?p=726950

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

cruising][noparse][[/noparse]url=http://www.bluemagic.biz]cruising[noparse][[/noparse]/url][/url]

This is a [noparse][[/noparse]b]bold[noparse][[/noparse]/b] test.

Andrew E Mileski · 2008-12-30 09:59

I started collecting links to a number of forum messages dealing with timing.

Improving pages 7 and 9 of the data sheet, especially about Hub access

Comments