Shop OBEX P1 Docs P2 Docs Learn Events
Cog Launching — Parallax Forums

Cog Launching

HumanoidoHumanoido Posts: 5,770
edited 2010-08-28 10:58 in Propeller 1
How to load this one program into all 8 cogs?

Comments

  • kuronekokuroneko Posts: 3,623
    edited 2010-08-26 23:45
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 00:09
    I didn't forget. It is the relationship of the function syntax that put me on hold. With the posted program, the entire program, how to make it into a function? I'm interested in running the entire program in each cog, not just some part of it.

    cognew(fn, @stack0)
    cognew(fn, @stack1)
    cognew(fn, @stack2)
    cognew(fn, @stack3)
    cognew(fn, @stack4)
    cognew(fn, @stack5)
    cognew(fn, @stack6)
    coginit(cogid, fn, @stack7)
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 00:15
    What exactly do you need? The example program will exit after having flashed the LEDs 4 times each. Not exactly worth throwing cogs at :) Or does that not matter?
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 00:34
    Try this (remove the stack space in the "example" object file):
    CON
      length = 20
      
    OBJ
      client[8]: "example"
    
    VAR
      long  stack[8 * length]
      
    PUB null : ID | start
    
      start := cogid - 7
      repeat 8
        ID := start++ & 7
        coginit(ID, launch(ID), @stack[ID * length])
    
    PUB secondary : ID         ' NOTE: only equivalent to method [COLOR="Blue"]null[/COLOR] when
                               '       [COLOR="Blue"]cogid[/COLOR] is the only running cog
      repeat 7
        cognew(launch(ID), @stack[ID * length])
        ID++
    
      coginit(cogid, launch(ID), @stack[ID * length])
      ' shortcut: call method in current context
      ' [COLOR="Blue"]launch(ID)[/COLOR]
    
    PRI launch(ID)
    
      client[ID].start
    
    ' example.spin
    
    VAR
      [s]long  stack[20][/s]
    
    PUB start
    
      dira[cogid] := outa[cogid] := 1
      waitpne(0, 0, 0)
    
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 01:05
    Thanks, I'll give it a test run.
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 02:43
    I'm still not sure about syntax and the code is missing something in the top object. What I want it to do, the top object launches the example object in each cog, which blinks 1 led on a pin equal to the cog number. All cogs are shut off except one through the rems. That way, I can rem out the remaining cogs step by step (to test each one) and verify it's working.
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 02:50
    Humanoido wrote: »
    I'm still not sure about syntax and the code is missing something in the top object.

    Your stack space is just one long and you use the same (invalid) stack for all cogs. How is that supposed to work?
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 03:01
    kuroneko wrote: »
    Try this (remove the stack space in the object file):

    Just to clarify, object file refers to the included example.spin, not the top level object.
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 03:08
    Changed to this
    VAR
      long  stack[20]
    

    and led 0 is flashing but the other pins 1 - 7 remain on.
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 03:12
    Humanoido wrote: »
    Changed to this
    VAR
      long  stack[20]
    

    and led 0 is flashing but the other pins 1 - 7 remain on.

    You need separate stack space for each cog. That's why I used long stack[8 * 20] and use address offsets @stack[ID * 20]. You are allowed to copy my code. It's tested :)

    e.g. cog 0 gets stack at @stack[0], cog 6 at @stack[120].
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 03:30
    Ok, I'll try the original code.

    Would something like this be more simple? - attached
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 03:36
    Now that makes sense, plus it's working now. Let me know your thoughts on the new attachment. Thanks.
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 03:56
    Humanoido wrote: »
    Would something like this be more simple? - attached

    In example2.spin you're still using the same stack for all coginit calls. You CAN'T do that (besides stacks grow upwards in SPIN so you'd have to use @stack[0] anyway). Otherwise it's fine (you didn't use cog 0 though).
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 05:43
    kuroneko wrote: »
    In example2.spin you're still using the same stack for all coginit calls. You CAN'T do that (besides stacks grow upwards in SPIN so you'd have to use @stack[0] anyway). Otherwise it's fine (you didn't use cog 0 though).

    Kuroneko, but the code automatically loads into cog 0 and if cog 0 is initialized then the program won't work, right?

    I read the book on page 78 for StackPointer and do not know how to set it, because there are several examples, one at 6, one at 10, one at 20, so I don't know how to set the stack.

    So for cogs 0 through 7, what are the respective stack numbers and how do you determine it? If 0 is used, then no stack space is set aside, right?

    Thanks.
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 05:53
    Maybe this example is closer to working?
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 05:59
    Humanoido wrote: »
    Kuroneko, but the code automatically loads into cog 0 and if cog 0 is initialized then the program won't work, right?
    That's why cog 0 (or rather cogid) is initialised last.
    start := cogid - 7
      repeat 8
        ID := start++ & 7
        ...
    

    The first line makes sure we deal with the other 7 cogs first. Assuming we run in cog 0 then start is -7. Which gives us the sequence 1 (== -7 & 7), 2, 3, 4, 5, 6, 7 and finally 0. You wanted the program to be run in all 8 cogs so you have to recycle the current cog somehow.
    Humanoido wrote: »
    So for cogs 0 through 7, what are the respective stack numbers and how do you determine it? If 0 is used, then no stack space is set aside, right?

    cognew/coginit take an address which points to a number of longs (e.g. 20). You can declare 8 stack arrays by name (e.g. stack0[20], stack1[20] etc) or use one array and point at addresses in that array (e.g. @stack[0], @stack[20] etc).
    coginit(ID, launch(ID), @stack[ID * length])
    
    Cog N is assigned stack starting at long N*20 + 0, ending at N*20 + 19.
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 06:03
    Humanoido wrote: »
    Maybe this example is closer to working?

    You should re-init cog 0 last (cogid in fact). Do you actually test your code before posting? :)
  • K2K2 Posts: 693
    edited 2010-08-27 06:55
    Kuroneko, your post #5 is exactly what I needed. In my quest for eight cogs running the same app, I had not considered all that you have, nor was my implementation as organized.

    BTW, for my particular app, the Propeller is a bargain. Compared with an ARM Cortex M3 running highly optimized C code, a fully-deployed Propeller running the same algorithm in PASM is 5.7 times faster.

    Meanwhile, the respective chips are within $1 of each other, the ARM being slightly more. So, if you can tolerate PASM and no JTAG, the Propeller conquers!
  • Heater.Heater. Posts: 21,230
    edited 2010-08-27 07:00
    K2,

    That makes me very curious as to what your mystery app is, or even just a clue.

    Is it purely PASM within the COGs?

    Is it Spin?

    If the latter I'm amazed how well it compares with ARM.

    How big is it?
  • K2K2 Posts: 693
    edited 2010-08-27 07:23
    Heater,

    The core algorithm is entirely PASM and makes good use of Prop's instruction set - it's a good fit. And it's small enough to fit into a COG with room to spare for a couple of background tasks that keep everything clicking away.

    I really like the Cortex M3 core. I was stunned from the outset by how fast it could execute compiled C code, and how well the compiler optimizations worked. I'm told that the Cortex M3 was specifically designed to run C. It shows.

    So for a Prop to be, in aggregate, 5.7 times faster than a 100 MHz 125Mips Cortex M3 is a real head spinner. It just shows that for the right app, the Prop is amazing.

    As a final note, I still have as an exercise an attempt to unfold the C code further in order to avoid the relatively stiff penalty incurred anytime branching exceeds a certain distance. I commend Chip for optimizing conditional jumps on the Prop in favor of taking the jump. That's a real boon. I also like the NR effect, reminicent of the PIC. That saves time, too.

    Edit: Your questions need better answers. Two obvious ways in which the Cortex M3 trounces the Prop are in memory size and single-cycle integer multiplication. The particular algorithm I used in this comparison does not place much demand on memory and does not employ multiplication. That's why I say, "...for the right app..." I'm fortunate to have a useful task to occupy the combined throughput of several Propellers. And fortunate to have ARM chips for certain other tasks.
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 14:51
    Kuroneko, thank you for the explanation - the code is working after making the changes outlined. Very nice! Yes, I always test code before posting - it was blinking the first LED and the others remained on. Just curious, how do you handle debugging your code?

    Humanoido
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 16:53
    Humanoido wrote: »
    Just curious, how do you handle debugging your code?

    Pencil and paper for timing issues, some LEDs (demoboard, "I got this far" style) or running the stuff in my head :)
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-27 16:54
    K2 wrote: »
    The core algorithm is entirely PASM and makes good use of Prop's instruction set - it's a good fit. And it's small enough to fit into a COG with room to spare for a couple of background tasks that keep everything clicking away.

    Could you show it to us (if permitted) so we can take it apart?
  • K2K2 Posts: 693
    edited 2010-08-27 18:45
    @kuroneko: Well, I've been sort of secretive about what I'm doing simply because its unknown nature is part of its value. But I can say that most of the things I work on were originally inspired by Donald Knuth's "The Art of Computer Programming."

    Seems that the more I delve into numbers, the more interesting they become. Computers make all kinds of things possible. With the current MIPS/$ ratio in the stratosphere, it's a great time to be tinkering.
  • HumanoidoHumanoido Posts: 5,770
    edited 2010-08-27 19:32
    kuroneko wrote: »
    Pencil and paper for timing issues, some LEDs (demoboard, "I got this far" style) or running the stuff in my head :)
    Kuroneko, see, see, see, I know you are a computer! This proves it! :)

    BTW, what is your advice to a person just starting with the prop chip and spin programming, who wants to become a master like you (and Heater)?

    Humanoido


    EDIT: (and Cluso..)
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-08-27 20:46
    @K2: I am not sure if you fully understand the conditional jumps (and calls). If the jmp is conditional there is no penalty for the jump not taken because the conditional is tested early in the pipeline and if the conditional is not met then the instruction is 'converted' to a nop and the next instruction will be fetched in the pipeline correctly - i.e. there is no miss and therefore no penalty which means it executes in 4 cycles and not 8. see the latest prop manual (with proptool) for a better explanation.

    Of course all the conditionals can be used on any instruction.

    So, the only jumps that can stall the pipeline (take 8 clocks) are djnz, djz, tjnz that take 8 clocks when the jump is not taken.

    As an aside, I have just seen that the jmp wz,wc will clear zero and set carry.
  • kuronekokuroneko Posts: 3,623
    edited 2010-08-28 00:43
    Cluso99 wrote: »
    As an aside, I have just seen that the jmp wz,wc will clear zero and set carry.

    Well, that irked me enough to dig into it. The zero flag will always be clearedA. Carry is unsigned borrow from the comparison between the value pointed to by the destination slot (normally register 0) and the target address (9bit immediate or 32bit register value).

    So if I set a register target to $207 and register 0 contains $42 then jmp target wc will jump to location $7 == $207 & $1FF and set carry ($42 < $207). Doing the same with $7 will produce the same jump but doesn't set carry ($42 > $7).

    I wonder who came up (for what reason) with the explanation in the manual?

    FWIW, the same carry rule applies to mov[dis].

    A So far I haven't seen any evidence that it can be set.
  • K2K2 Posts: 693
    edited 2010-08-28 10:58
    Cluso,

    I'm not 100% sure what I said, but what I meant to say is that I like the fact that djnz is optimized for looping. Normally (i.e, on most processors) looping incurs an additional penalty. That's one reason loops are so often unrolled for speed. A bigger reason nowadays has to do with the prefetch queue. I like the freedom the Prop provides from such concerns.

    There are lots of details that need to be cleared up in my head. Your and kuroneko's elucidations are very helpful! I'm going over them carefully.

    BTW, eight cogs operating at full-throttle create a bit of heat, both for the Propeller and the linear regulator. The time has come to employ a switching device, and perhaps a fan, especially since the quantity of Props will shortly be multiplying (no pun intended).
Sign In or Register to comment.