Shop OBEX P1 Docs P2 Docs Learn Events
4 Cog overclocked P1V — Parallax Forums

4 Cog overclocked P1V

rjo__rjo__ Posts: 2,114
edited 2014-09-01 13:34 in Propeller 1
Before my first attempt, help me Jesus, a 4Cog P1V.

The goal is improve bandwidth by increasing the main clock and by reducing the hub cycle.

The question here isn't if I am missing something, but what I am missing:)

I have found just two places that need to be modified. In dig.v, in "generate" I need to change
"for (i=0; i<8; i++)" to "for (i=0; i<4; i++)

Below that at line 129, I need to change
"wire [7:0] cog_ena;" to "wire [3:0] cog_ena;"

I know how to change the clock my question regards how best to limit the hub cycle to 4 cogs.



Obviously, it can't be this simple:)

Thanks,

Rich
«1

Comments

  • cgraceycgracey Posts: 14,155
    edited 2014-08-22 16:23
    That might work, but to tighten up hub timing so that only 4 cogs are in the loop, you'd need to change all the stuff that deals with 8 cogs and make it handle 4, only.
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-22 16:40
    I'm looking at it...:)
  • cgraceycgracey Posts: 14,155
    edited 2014-08-22 16:50
    All that stuff is in the dig.v and hub.v files.
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-22 17:02
    You might have ESP... or a profound capacity for dealing with ignorant people:)
  • nutsonnutson Posts: 242
    edited 2014-08-22 18:19
    reg [7:0] bus_sel;
    always @(posedge clk_cog or negedge nres)
    if (!nres)
    	bus_sel <= 8'b0;
    else if (ena_bus)
    	bus_sel <= {bus_sel[6:0], ~|bus_sel[6:0]};
    

    These lines in dig.v implement a one-hot shiftregister that selects one of eight cog's to connect to the hub.

    In a 4 cog design you can reduce the length of the shiftsequence, example 6 positions so the hub rotates faster.
    reg [7:0] bus_sel;
    always @(posedge clk_cog or negedge nres)
    if (!nres)
    	bus_sel <= 8'b0;
    else if (ena_bus)
    	bus_sel <= {2'b00,bus_sel[4:0], ~|bus_sel[4:0]};
    

    I found that further reducing the runlength gives strange results. Reducing it to to 4 positions results in only 2 cog running but is ok for a 2 cog design.
  • jazzedjazzed Posts: 11,803
    edited 2014-08-22 18:28
    Just some ideas ...

    This may not be a practical idea longer term, but it may be worth trying to make a 1 COG P1V first just to understand the code.

    Then try making a 4 COG P1V.

    Then try making a 4 COG P1V that could share hub in some alternative way with the 1 COG P1V. ...

    For example, the one COG P1V might use the even 4 of 8 HUB slots, and the 4 COG P1V could only use the odd 4 HUB slots.

    Wishing I had more time for this stuff .... Maybe soon.
  • cgraceycgracey Posts: 14,155
    edited 2014-08-22 22:20
    nutson wrote: »
    reg [7:0] bus_sel;
    always @(posedge clk_cog or negedge nres)
    if (!nres)
    	bus_sel <= 8'b0;
    else if (ena_bus)
    	bus_sel <= {bus_sel[6:0], ~|bus_sel[6:0]};
    

    These lines in dig.v implement a one-hot shiftregister that selects one of eight cog's to connect to the hub.

    In a 4 cog design you can reduce the length of the shiftsequence, example 6 positions so the hub rotates faster.
    reg [7:0] bus_sel;
    always @(posedge clk_cog or negedge nres)
    if (!nres)
    	bus_sel <= 8'b0;
    else if (ena_bus)
    	bus_sel <= {2'b00,bus_sel[4:0], ~|bus_sel[4:0]};
    

    I found that further reducing the runlength gives strange results. Reducing it to to 4 positions results in only 2 cog running but is ok for a 2 cog design.


    That's right. The other thing to address is the COGID instruction (in hub.v), as it will return a wrong 3-bit cog# most of the time.
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-24 07:23
    OK...
    changes mentioned in my first post:
    In dig.v, in "generate" change
    "for (i=0; i<8; i++)" to "for (i=0; i<4; i++)

    and following the leads above
    in dig.v
    reg [7:0] bus_sel;
    
    always @(posedge clk_cog or negedge nres)
    if (!nres)
    	bus_sel <= 8'b0;
    else if (ena_bus)
    	bus_sel <= {4'b0,bus_sel[3:0], ~|bus_sel[3:0]}; 
    

    and in hub.v
    // output
    
    reg [2:0] sys_q;
    reg sys_c;
    
    always @(posedge clk_cog)
    if (ena_bus && sys)
    	//sys_q <= ac[2:0] == 3'b001	? {	bus_sel[7] || bus_sel[6] || bus_sel[5] || bus_sel[0],		// cogid
    									//bus_sel[7] || bus_sel[4] || bus_sel[3] || bus_sel[0],
    									//bus_sel[6] || bus_sel[4] || bus_sel[2] || bus_sel[0] }
    								//: num;															// others
       
        sys_q <=ac[2:0]==3'b001    ? {1'b0,bus_sel[3] || bus_sel[0],
                              bus_sel[2] || bus_sel[0]}
                              :num;
    


    test code run in PropellerIDE
    CON 
    _clkmode = xtal1+pll16x
    _clkfreq = 80_000_000
    OBJ ser : "FullDuplexSerial"
    VAR
       long i,x[1000],time1,time2,elapsed
    pub main
      ser.Start(31,30,0,115200)
      i:=0
      time1 :=50
      waitcnt(clkfreq/4 +cnt)
      ser.str(string("Watch LEDs"))
      ser.tx(13)
      waitcnt(clkfreq*4 + cnt)
      repeat 1000
         x[i]:=i
         i++
      coginit(3,@timeit,@x)
      repeat until x[0] > 0
      ser.str(string("Results:",13))
      i:=0
      ser.dec(x[0])
      ser.str(string("  should be = 100",13))
      ser.dec(x[999])
      ser.str(string("  should be = 1099",13))
      elapsed:=time2-time1
      ser.dec(elapsed)
      ser.str(string("_clocks  "))
      waitcnt(clkfreq*2+cnt)
      cogstop(3)
      waitcnt(clkfreq*2+cnt)
      cogstop(1)
      waitcnt(clkfreq*2+cnt)
    Dat
                                 org 0
    timeit                       mov loops,reps
                                 mov a1,par
                                 mov t1,cnt
    myloop                       rdlong xval,a1
                                 add xval,#100
                                 wrlong xval,a1
                                 add a1,#4
                                 nop
                                 djnz loops,#myloop
                                 mov t2, cnt
                                 wrlong t1,a1
                                 add a1,#4
                                 wrlong t2,a1
    nothing
                                 jmp #nothing
    
    reps long 1000
    t1    res  1
    t2    res  1
    t3    res  1
    loops  res  1
    a1 res 1
    xval res 1
    

    Nothing breaks but I get exactly the same timing on both the nano_P1V and a P1.

    "hubslots" ... where are those hubslots?

    Thanks

    Rich
  • nutsonnutson Posts: 242
    edited 2014-08-24 09:53
    bus_sel <= {4'b0,bus_sel[3:0], ~|bus_sel[3:0]};
    

    This code generates a 9 bit value where a single "1" bit cycles over 5 positions = hub timeslots, 10 CPU clocks.....bus_sel[2:0] would result in 4 hub time slots.

    I have done some experiments with less hub timeslots, look in this threadhttp://forums.parallax.com/showthread.php/156955-Small-V-Prop-2-Cog-s-4-KB-ROM-4KB-Hub-RAM In the last post I posted two oscilloscope screens that show the speedup with 4 slots / 2 Cog's for a series of sequential RDLONG's compared to 8 slots / 8 cog's..

    Warning: probably more Verilog changes are necessary to change the number of timeslots without breaking some logic. With 6 timeslots I can have only 4 cog's running, with 4 timeslots only 2 cogs.

    So you were lucky with your 5 timeslots, I guess that only 3 Cogs can be running with that (did not try)
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-24 11:09
    Thank you... that is nine bits:)
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-24 11:41
    I changed that line, recompiled, ... had a cup of coffee, went to get lunch, and when I got back, it ran perfectly.
    So, if anyone wants half a P1v... it seems to be available here:)

    BUT I am seeing absolutely no differences in the timing.

    Cluso99 is working on documentation. That should help a lot.
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-24 12:23
    IDK... IF you look at the above spin program, you see that I use cog 3 to run my pasm code.

    I went back to PropellerIDE and used cog 4 and it worked... then I switched to cog 5 (which shouldn't) exist. The code ran fine. The timing was unaffected and is still the same as for a normal P1. The correct led for the different cogs assigned lit up on my Nano.

    I know that I recompiled and reprogrammed correctly... the time stamps prove it.

    I am thinking that when I am asking for a cog that doesn't exist, it uses the 2 lsb and chooses a cog that does exist... and the LED is simply an artifact.
    But I'm not sure about this.
  • nutsonnutson Posts: 242
    edited 2014-08-24 13:33
    The LED's don't mean a thing. I am running a 2 COG prop at the moment, but when I load a program that fires up all Cog's (with the same SPIN code, toggle a pin) all 8 LED's light up, but on my logic analyzer I see only 2 pins toggling.

    The Verilog code for COGINIT/COGNEW and other hub functions is way beyond me, it may be that this code knows about "active" cog's.
  • jazzedjazzed Posts: 11,803
    edited 2014-08-24 14:17
    Rich,

    I've seen the same LED and performance behaviour with code I've tried. I'm building with your changes now.

    This is the Spin code I used to test performance.
    
    CON
    
    
      _clkmode = XTAL1 + PLL16X
      _clkfreq = 80_000_000
      
    OBJ
      ser : "MySimpleSerial"
    
    
    PUB start | addr, t0, t1
    
    
      ser.init(31,30,19200)
      waitcnt(CLKFREQ/2+CNT)  '' Wait for start up
      'ser.str(string($d,"Hello.",$d))
    
    
      t0 := CNT
      addr := $7f00  
      t1 := CNT
      ser.Str(string("Diff Time "))
      ser.Dec(t1-t0)
      ser.tx($d)
    

    MySimpleSerial.spin
    ''*******************************************************************
    ''*  Simple Asynchronous Serial Driver v1.3                         *
    ''*  Authors: Chip Gracey, Phil Pilgrim, Jon Williams, Jeff Martin  *
    ''*  Copyright (c) 2006 Parallax, Inc.                              *
    ''*  See end of file for terms of use.                              *
    ''*******************************************************************
    ''
    '' Performs asynchronous serial input/output at low baud rates (~19.2K or lower) using high-level code
    '' in a blocking fashion (ie: single-cog (serial-process) rather than multi-cog (parallel-process)).
    ''
    '' To perform asynchronous serial communication as a parallel process, use the FullDuplexSerial object instead.
    '' 
    ''
    '' v1.3 - May 7, 2009    - Updated by Jeff Martin to fix rx method bug, noted by Mike Green and others, where uninitialized
    ''                         variable would mangle received byte.
    '' v1.2 - March 26, 2008 - Updated by Jeff Martin to conform to Propeller object initialization standards and compress by 11 longs.
    '' v1.1 - April 29, 2006 - Updated by Jon Williams for consistency.
    ''
    ''
    '' The init method MUST be called before the first use of this object.
    '' Optionally call finalize after final use to release transmit pin.
    ''
    '' Tested to 19.2 kbaud with clkfreq of 80 MHz (5 MHz crystal, 16x PLL)
    
    
    
    
    VAR
    
    
      long  sin, sout, inverted, bitTime, rxOkay, txOkay   
    
    
    
    
    PUB init(rxPin, txPin, baud): Okay
    {{Call this method before first use of object to initialize pins and baud rate.
    
    
      • For true mode (start bit = 0), use positive baud value.     Ex: serial.init(0, 1, 9600)
        For inverted mode (start bit = 1), use negative baud value. Ex: serial.init(0, 1, -9600) 
      • Specify -1 for "unused" rxPin or txPin if only one-way communication desired.
      • Specify same value for rxPin and txPin for bi-directional communication on that pin and connect a pull-up/pull-down resistor
        to that pin (depending on true/inverted mode) since pin will set it to hi-z (input) at the end of transmission to avoid
        electrical conflicts.  See "Same-Pin (Bi-Directional)" examples, below.
    
    
      EXAMPLES:
      
        Standard Two-Pin Bi-Directional True/Inverted Modes                Standard One-Pin Uni-Directional True/Inverted Mode
                    Ex: serial.init(0, 1, ±9600)                      Ex: serial.init(0, -1, ±9600)  -or-  serial.init(-1, 0, ±9600)            
             &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;               &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;                          &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;               &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;     
             &#9474;Propeller P0&#9500;&#61626;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#61626;&#9508;I/O Device&#9474;                          &#9474;Propeller P0&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;I/O Device&#9474;     
             &#9474;          P1&#9500;&#61627;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#61627;&#9508;          &#9474;                          &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;               &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   
             &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;               &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;                           
    
    
             
    
    
                Same-Pin (Bi-Directional) True Mode                              Same-Pin (Bi-Directional) Inverted Mode   
                    Ex: serial.init(0, 0, 9600)                                       Ex: serial.init(0, 0, -9600)       
                                  &#61463;                                             &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;               &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;                                         
                                  &#9474;                                             &#9474;Propeller P0&#9500;&#61626;&#61627;&#9472;&#9472;&#9472;&#9472;&#9472;&#9523;&#9472;&#9472;&#9472;&#9472;&#9472;&#61626;&#61627;&#9508;I/O Device&#9474;                                         
                                  &#61628; 4.7 k&#937;                                      &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;       &#9474;       &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;                                         
             &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;       &#9474;       &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;                                               &#61628; 4.7 k&#937;            
             &#9474;Propeller P0&#9500;&#61626;&#61627;&#9472;&#9472;&#9472;&#9472;&#9472;&#9531;&#9472;&#9472;&#9472;&#9472;&#9472;&#61626;&#61627;&#9508;I/O Device&#9474;                                               &#9474;                   
             &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;               &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;                                               &#61464;                   
    }}                                                                  
    
    
      finalize                                              ' clean-up if restart
      
      rxOkay := rxPin > -1                                  ' receiving?
      txOkay := txPin > -1                                  ' transmitting?
    
    
      sin := rxPin & $1F                                    ' set rx pin
      sout := txPin & $1F                                   ' set tx pin
    
    
      inverted := baud < 0                                  ' set inverted flag
      bitTime := clkfreq / ||baud                           ' calculate serial bit time  
      
      return rxOkay | TxOkay
      
    
    
    PUB finalize
    {{Call this method after final use of object to release transmit pin.}}
     
      if txOkay                                             ' if tx enabled
        dira[sout]~                                         '   float tx pin
      rxOkay := txOkay := false
    
    
    
    
    PUB rx: rxByte | t
    {{ Receive a byte; blocks caller until byte received. }}
    
    
      if rxOkay
        dira[sin]~                                          ' make rx pin an input
        waitpeq(inverted & |< sin, |< sin, 0)               ' wait for start bit
        t := cnt + bitTime >> 1                             ' sync + 1/2 bit
        repeat 8
          waitcnt(t += bitTime)                             ' wait for middle of bit
          rxByte := ina[sin] << 7 | rxByte >> 1             ' sample bit 
        waitcnt(t + bitTime)                                ' allow for stop bit 
    
    
        rxByte := (rxByte ^ inverted) & $FF                 ' adjust for mode and strip off high bits
    
    
    
    
    PUB tx(txByte) | t
    {{ Transmit a byte; blocks caller until byte transmitted. }}
    
    
      if txOkay
        outa[sout] := !inverted                             ' set idle state
        dira[sout]~~                                        ' make tx pin an output        
        txByte := ((txByte | $100) << 2) ^ inverted         ' add stop bit, set mode 
        t := cnt                                            ' sync
        repeat 10                                           ' start + eight data bits + stop
          waitcnt(t += bitTime)                             ' wait bit time
          outa[sout] := (txByte >>= 1) & 1                  ' output bit (true mode)  
        
        if sout == sin
          dira[sout]~                                       ' release to pull-up/pull-down
    
    
        
    PUB str(strAddr)
    {{ Transmit z-string at strAddr; blocks caller until string transmitted. }}
    
    
      if txOkay
        repeat strsize(strAddr)                             ' for each character in string
          tx(byte[strAddr++])                               '   write the character
    
    
      
    PUB dec(value) | i, z
    
    
    '' Print a signed decimal number
    
    
      if value < 0
        -value
        tx("-")
    
    
      i := 1_000_000_000
      z~
    
    
      repeat 10
        if value => i
          tx(value / i + "0")
          value //= i
          z~~
        elseif z or i == 1
          tx("0")
        i /= 10
    
    
    
    
    PUB hex(value, digits)
    
    
    '' Print a hexadecimal number
    
    
      value <<= (8 - digits) << 2
      repeat digits
        tx(lookupz((value <-= 4) & $F : "0".."9", "A".."F"))
    
    
    
    
    PUB bin(value, digits)
    
    
    '' Print a binary number
    
    
      value <<= 32 - digits
      repeat digits
        tx((value <-= 1) & 1 + "0")
        
    {{
    
    
    
    
    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
    &#9474;                                                   TERMS OF USE: MIT License                                                  &#9474;                                                            
    &#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
    &#9474;Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation    &#9474; 
    &#9474;files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy,    &#9474;
    &#9474;modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software&#9474;
    &#9474;is furnished to do so, subject to the following conditions:                                                                   &#9474;
    &#9474;                                                                                                                              &#9474;
    &#9474;The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.&#9474;
    &#9474;                                                                                                                              &#9474;
    &#9474;THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE          &#9474;
    &#9474;WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR         &#9474;
    &#9474;COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,   &#9474;
    &#9474;ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.                         &#9474;
    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
    }}       
    
  • jazzedjazzed Posts: 11,803
    edited 2014-08-24 15:13
    Hi Rich,

    I'm getting the same "Diff Time 704" as before. So it seems either the Spin is being optimized away (very unlikely) or more verilog digging is on order.
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-24 15:26
    There are a couple of things I want to try, but before I break anything, I am going to play with the clock.

    For anyone looking in, who hasn't been here for a while, I need to add that I am a pure hobbyist, who ordinarily leaves the room when serious conversations start.

    I am such a feckless programmer that I normally test my code after entering each line. That's a little tedious with FPGA's.

    So far I haven't found anything I want to do with a propeller that I can't eventually do... and the verilog code makes about as much
    sense to me as PASM did when I first looked at it... I'm guessing the end results will be similar.

    It could take a while:)
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-24 20:02
    I used pik33's thread on over-clocking http://forums.parallax.com/showthread.php/156851-Some-overclocking-)
    Thank you pik33.

    On my Nano P1V*1/2, I got largely the same results as pik33. At 150Mhz, the loading was unreliable. At 141.666Mhz and 133.333Mhz, the loading seemed reliable. I added a blinking LED on P0 and the program seemed to run just fine. The LED seemed to blink at the right rate, the cog led's which are hooked up in the verilog behaved correctly.
    However, I could not get reliable serial communications, despite using fullserialduplex and mysimpleserial (above in Jazzed's post)at a variety of baud rates. I even hard coded the clock into the serial drivers, just in case something in the declaration wasn't quite kosher. No luck. I am out of time for the next couple of days. I used both bst and PropellerIDE with identical results.

    If you are following along. The next step is to put out a frequency on one of the pins and measure it with a real prop... if you have the time, give it a whirl. And if that works, then we will have to figure out what is going on with the serial stuff.

    Tah Tah

    Rich
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-28 20:10
    in my previous posts I made sequential errors in the dig.v file

    Originally, I had the following:
    reg [7:0] bus_sel;
    
    always @(posedge clk_cog or negedge nres)
    if (!nres)
    	bus_sel <= 8'b0;
    else if (ena_bus)
    	bus_sel <= {4'b0,bus_sel[3:0], ~|bus_sel[3:0]};
    

    which nutson correctly informed me had too many bits. I then made another error in the zip files above... right number of bits, wrong number of cogs:)

    Which brings me to this:
    reg [7:0] bus_sel;
    
    always @(posedge clk_cog or negedge nres)
    if (!nres)
    	bus_sel <= 8'b0;
    else if (ena_bus)
    	bus_sel <= {3'b0,bus_sel[3:0], ~|bus_sel[3:0]};
    
    which I am fairly certain is correct.

    The issue is that even though I think I am properly restricting bus selection to 4 cogs... the timing for my test file remains unchanged.

    at line 246 of hub.v
    I also made this change:
    // output
    
    reg [2:0] sys_q;
    reg sys_c;
    
    always @(posedge clk_cog)
    if (ena_bus && sys)
       
        sys_q <=ac[2:0]==3'b001    ? {1'b0,bus_sel[3] || bus_sel[0],
                              bus_sel[2] || bus_sel[0]}
                              :num;
    
    The concatenation works as in the original code(exept for 4 cogs), but I think I need to change to something like sys_q <={1b'0,ac[1:0]... but .... but...
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-28 20:17
    Of course the proper way to do it would be to limit sys_q to 2 bits... but if I do that then I am going to have to deal with endless errors... which would nicely lead me through important segments of code.
    And right now that looks like it could take forever:)
  • RamonRamon Posts: 484
    edited 2014-08-30 00:46
    rjo__ wrote: »
    Which brings me to this:
    	bus_sel <= {3'b0,bus_sel[3:0], ~|bus_sel[3:0]};
    
    which I am fairly certain is correct.

    I think it can be wrong.
    D:\Verilog\samples>vvp p1bus_tb
    At time                    0, bus_sel = xxxxxxxx, ena_bus = xx0
    At time                   16, bus_sel = 00000000, ena_bus = 000
    At time                   18, bus_sel = 00000000, ena_bus = 001
    At time                   25, bus_sel = 00000001, ena_bus = 011
    At time                   35, bus_sel = 00000010, ena_bus = 021
    At time                   45, bus_sel = 00000100, ena_bus = 041
    At time                   55, bus_sel = 00001000, ena_bus = 081
    At time                   65, bus_sel = 00010000, ena_bus = 101
    At time                   75, bus_sel = 00000001, ena_bus = 011
    At time                   85, bus_sel = 00000010, ena_bus = 021
    At time                   95, bus_sel = 00000100, ena_bus = 041
    At time                  105, bus_sel = 00001000, ena_bus = 081
    At time                  115, bus_sel = 00010000, ena_bus = 101
    At time                  118, bus_sel = 00000000, ena_bus = 001
    At time                  125, bus_sel = 00000001, ena_bus = 011
    

    Try this one instead:
    bus_sel <= {4'b0,bus_sel[2:0], ~|bus_sel[3:0]};
    
    D:\Verilog\samples>vvp p1bus_tb
    At time                    0, bus_sel = xxxxxxxx, ena_bus = xx0
    At time                   16, bus_sel = 00000000, ena_bus = 000
    At time                   18, bus_sel = 00000000, ena_bus = 001
    At time                   25, bus_sel = 00000001, ena_bus = 011
    At time                   35, bus_sel = 00000010, ena_bus = 021
    At time                   45, bus_sel = 00000100, ena_bus = 041
    At time                   55, bus_sel = 00001000, ena_bus = 081
    At time                   65, bus_sel = 00000000, ena_bus = 001
    At time                   75, bus_sel = 00000001, ena_bus = 011
    At time                   85, bus_sel = 00000010, ena_bus = 021
    At time                   95, bus_sel = 00000100, ena_bus = 041
    At time                  105, bus_sel = 00001000, ena_bus = 081
    At time                  115, bus_sel = 00000000, ena_bus = 001
    At time                  125, bus_sel = 00000001, ena_bus = 011
    


    HOWTO testbench:

    p1bus.v
    module p1bus (bus_sel, nres, ena_bus, clk_cog);
    
    output  [7:0] bus_sel;
    input         nres, clk_cog, ena_bus;
    reg     [7:0] bus_sel;
    
    always @(posedge clk_cog or negedge nres)
    if (!nres)
    	bus_sel <= 8'b0;
    else if (ena_bus)
    	bus_sel <= {4'b0,bus_sel[2:0], ~|bus_sel[3:0]};
    
    endmodule
    

    p1bus_tb.v
    module test;
      reg nres = 1;  
      reg ena_bus = 0;
      initial begin
         # 16  nres = 0;
    	   #  1  nres = 1;
    	   #  1  ena_bus = 1;
    	   # 100 nres = 0;
         #  1  nres = 1;
         # 100 $finish;
      end
      reg clk_cog = 0;
      always #5 clk_cog = !clk_cog;
      wire [7:0] bus_sel;
      p1bus p1 (bus_sel, nres, ena_bus, clk_cog);
      initial 
         $monitor("At time %t, bus_sel = %b, ena_bus = %h", $time, bus_sel, bus_sel, ena_bus);
    endmodule // test
    

    Execute with icarus verilog:
    iverilog -o p1bus p1bus_tb.v p1bus.v
    vvp p1bus
    

    (code stealed from http://iverilog.wikia.com/wiki/Getting_Started
    ... do not know why ena_bus has 3 bits)
  • Todd MarshallTodd Marshall Posts: 89
    edited 2014-08-30 05:19
    ... do not know why ena_bus has 3 bits)

    Because there are 8 cogs?
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-30 06:28
    Ramon,

    Big thank you. Away from my massive Xp machine. I need a itty bitty 64 bit laptop:)
  • RamonRamon Posts: 484
    edited 2014-08-30 08:11
    Because there are 8 cogs?

    No. Actually It had 9 bits. I have found the typo. A duplicated variable in the monitor line:

    BAD> $monitor("At time %t, bus_sel = %b, ena_bus = %h", $time, bus_sel, bus_sel, ena_bus);
    OK > $monitor("At time %t, bus_sel = %b, ena_bus = %b", $time, bus_sel, ena_bus);

    It didn't warned that I used three parameters (%t, %b, %h) and four variables, the compiler just joined the last two variables 8 bits + 1 bit (2nd bus_sel & ena_bus).
  • RamonRamon Posts: 484
    edited 2014-08-30 09:20
    rjo__ wrote: »
    Of course the proper way to do it would be to limit sys_q to 2 bits... but if I do that then I am going to have to deal with endless errors... which would nicely lead me through important segments of code.

    Substitute this:
    sys_q <= ac[2:0] == 3'b001 ? { [B]bus_sel[7] || bus_sel[6] || bus_sel[5] || bus_sel[0],	         // cogid[/B]
                                                      bus_sel[7] || bus_sel[4] || bus_sel[3] || bus_sel[0],
                                                      bus_sel[6] || bus_sel[4] || bus_sel[2] || bus_sel[0] }
                                                      : num;                                                                             // others
    
    // 76543210  {OR(7,6,5,0),OR(7&4&3&0),OR(6&4&2&0)} 
    // ========
    // 00000001  {         1 ,         1,          1}  =  111b (7)
    // 00000010  {         0 ,         0,          0}  =  000b (0) 
    // 00000100  {         0 ,         0,          1}  =  001b (1) 
    // 00001000  {         0 ,         1,          0}  =  010b (2)
    // 00010000  {         0 ,         1,          1}  =  011b (3) 
    // 00100000  {         1 ,         0,          0}  =  100b (4) 
    // 01000000  {         1 ,         0,          1}  =  101b (5) 
    // 10000000  {         1 ,         1,          0}  =  110b (6) 
    
    

    with this:
    sys_q <= ac[2:0] == 3'b001 ? { [B]1'b0,                                                                               // cogid[/B]
                                                      bus_sel[7] || bus_sel[4] || bus_sel[3] || bus_sel[0],
                                                      bus_sel[6] || bus_sel[4] || bus_sel[2] || bus_sel[0] }
                                                      : num;								         	// others
    
    // 76543210  {       1'b0,OR(7&4&3&0),OR(6&4&2&0)} 
    // ========
    // 00000001  {         1 ,         1,          1}  =  111b (3)
    // 00000010  {         0 ,         0,          0}  =  000b (0) 
    // 00000100  {         0 ,         0,          1}  =  001b (1) 
    // 00001000  {         0 ,         1,          0}  =  010b (2)
    // 00010000  {         0 ,         1,          1}  =  011b (3) 
    // 00100000  {         0 ,         0,          0}  =  000b (0) 
    // 01000000  {         0 ,         0,          1}  =  001b (1) 
    // 10000000  {         0 ,         1,          0}  =  010b (2)
    

    Beware ! Not tested.
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-30 18:11
    Ramon,

    Thanks again.

    We are moving our house... and it isn't going well:)

    I had just enough time tonight to go to my "lab" and test your changes... I first tested post#21... it worked but did not change the number of clocks (same as measured on my "p1v*1/2" and on a regular p1. I started and stopped all 4 cogs... they worked as expected)

    I then added the change from the above( and if you hadn't shown me the truth table, I wouldn't have believed it.) Again, everything works, but the timing in my test code remains as it was tested on a regular p1 with the spin file that I posted.

    One issue I have about post#25... (and I suspect that it is just me) is this: you show results for all bus_sel options... but to my mind, bus_sel[7..4] should always be 0... the idea is to never have these selected.
    It doesn't seem to make a difference to my final result.... so, I don't see a reason to change it back.

    I'm at something of a loss. There must be some other source of bus arbitration that I am missing... I would have expected the measured clocks to drop... maybe not in half, but substantially. They are exactly same. Kind of amazing.

    We are taking a hack saw to a Propeller... and it doesn't seem to care:)

    On the bright side... we do have a smaller footprint and much quicker compile times in Quartus... but that is not exactly what I want:)(:~~~~^^^^^
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-31 06:46
    OK... now that I am loading the correct .jic file... with both fixes applied, coginit / cognew appear to fail.

    BUT using spin to measure elapsed times... I get a decrease of 16 clocks(496->480) when the code is run on a Project Board vs. p1v*1/2
     time1 :=cnt
     time2:=cnt
     elapsed:=time2-time1
    


    Note... in PropellerIDE, use a baud rate of 115200.
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-31 07:13
    For what is worth when I change
    bus_sel <= {4'b0,bus_sel[2:0], ~|bus_sel[3:0]};
    

    to
    bus_sel <= {4'b0,bus_sel[2:0], ~|bus_sel[2:0]};
    

    In Spin, the measured clocks drops to 448.
    As before, the LEDs light up appropriately, so the Prop1v*1/2 seems to think the cog is being used.

    Cogstop does work but the pasm routine never writes to hub ram.
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-31 07:26
    That is progress... we now have 1/2 of p1v, with faster Spin... but no PASM:)

    Now that Ramon, Cluso99 and Ozpropdev have me heading in the right (Thank you guys:)

    I'm going back to over clocking and see what I screwed up there:)
  • rjo__rjo__ Posts: 2,114
    edited 2014-08-31 15:16
    After experimentation, we have only 2 cogs... but what is even more interesting... if the pasm code segment of read->modify-> write, is optimized for the P1 , the P1 pasm code outperforms the 2 cog P1v running the same code.

    With unoptimized pasm code, there is about a 15 percent improvement in PASM timing of the P1v*1/4 over a standard P1 and a similar increase (though smaller) for Spin.

    The ultimate goal here is to make the 4Cog P1v... perform all hub related tasks about as fast as optimized PASM, with no regard to code optimization.

    It bothers me (but I don't know what to do about it) that cog_ena doesn't reflect anything that we have done so far.
    I have tried to follow the uses and assignments through multi file searching, but it is seems very much like 32-bit sudoku:)
  • RamonRamon Posts: 484
    edited 2014-09-01 07:18
    Good progress !

    Yes, my code was wrong. It introduced an "all_zero" in bus_sel. God to know that you solved it. I have found that this one may also be ok: "bus_sel <= {bus_sel[2:0], ~|bus_sel[2:0]};"

    Look at the following code, I think that there is an assign that maybe need to be changed:
    P1V     -> assign bus_ack = ed ? {      bus_sel[1:0], bus_sel[7:2]} : 8'b0;
    P1V_1/2 -> assign bus_ack = ed ? {4'b0, bus_sel[1:0], bus_sel[3:2]} : 8'b0;
    
    bus_sel    P1v_cog[n-2]  P1v_1/4[n-2]
    =========  ============  ===========           
    0000 0001     0100 0000    0000 0100  
    0000 0010     1000 0000    0000 1000
    0000 0100     0000 0001    0000 0001
    0000 1000     0000 0010    0000 0010
    0001 0000     0000 0100
    0010 0000     0000 1000
    0100 0000     0001 0000
    1000 0000     0010 0000
    
    
Sign In or Register to comment.