4 Cog overclocked P1V

rjo__ · 2014-08-22 16:16

Before my first attempt, help me Jesus, a 4Cog P1V.

The goal is improve bandwidth by increasing the main clock and by reducing the hub cycle.

The question here isn't if I am missing something, but what I am missing:)

I have found just two places that need to be modified. In dig.v, in "generate" I need to change
"for (i=0; i<8; i++)" to "for (i=0; i<4; i++)

Below that at line 129, I need to change
"wire [7:0] cog_ena;" to "wire [3:0] cog_ena;"

I know how to change the clock my question regards how best to limit the hub cycle to 4 cogs.

Obviously, it can't be this simple:)

Thanks,

Rich

cgracey · 2014-08-22 16:23

That might work, but to tighten up hub timing so that only 4 cogs are in the loop, you'd need to change all the stuff that deals with 8 cogs and make it handle 4, only.

rjo__ · 2014-08-22 16:40

I'm looking at it...:)

cgracey · 2014-08-22 16:50

All that stuff is in the dig.v and hub.v files.

rjo__ · 2014-08-22 17:02

You might have ESP... or a profound capacity for dealing with ignorant people:)

nutson · 2014-08-22 18:19

reg [7:0] bus_sel;
always @(posedge clk_cog or negedge nres)
if (!nres)
	bus_sel <= 8'b0;
else if (ena_bus)
	bus_sel <= {bus_sel[6:0], ~|bus_sel[6:0]};

These lines in dig.v implement a one-hot shiftregister that selects one of eight cog's to connect to the hub.

In a 4 cog design you can reduce the length of the shiftsequence, example 6 positions so the hub rotates faster.

reg [7:0] bus_sel;
always @(posedge clk_cog or negedge nres)
if (!nres)
	bus_sel <= 8'b0;
else if (ena_bus)
	bus_sel <= {2'b00,bus_sel[4:0], ~|bus_sel[4:0]};

I found that further reducing the runlength gives strange results. Reducing it to to 4 positions results in only 2 cog running but is ok for a 2 cog design.

jazzed · 2014-08-22 18:28

Just some ideas ...

This may not be a practical idea longer term, but it may be worth trying to make a 1 COG P1V first just to understand the code.

Then try making a 4 COG P1V.

Then try making a 4 COG P1V that could share hub in some alternative way with the 1 COG P1V. ...

For example, the one COG P1V might use the even 4 of 8 HUB slots, and the 4 COG P1V could only use the odd 4 HUB slots.

Wishing I had more time for this stuff .... Maybe soon.

cgracey · 2014-08-22 22:20

nutson wrote: »
reg [7:0] bus_sel;
always @(posedge clk_cog or negedge nres)
if (!nres)
	bus_sel <= 8'b0;
else if (ena_bus)
	bus_sel <= {bus_sel[6:0], ~|bus_sel[6:0]};
These lines in dig.v implement a one-hot shiftregister that selects one of eight cog's to connect to the hub.

In a 4 cog design you can reduce the length of the shiftsequence, example 6 positions so the hub rotates faster.
reg [7:0] bus_sel;
always @(posedge clk_cog or negedge nres)
if (!nres)
	bus_sel <= 8'b0;
else if (ena_bus)
	bus_sel <= {2'b00,bus_sel[4:0], ~|bus_sel[4:0]};
I found that further reducing the runlength gives strange results. Reducing it to to 4 positions results in only 2 cog running but is ok for a 2 cog design.

That's right. The other thing to address is the COGID instruction (in hub.v), as it will return a wrong 3-bit cog# most of the time.

rjo__ · 2014-08-24 07:23

OK...
changes mentioned in my first post:
In dig.v, in "generate" change
"for (i=0; i<8; i++)" to "for (i=0; i<4; i++)

and following the leads above
in dig.v

reg [7:0] bus_sel;

always @(posedge clk_cog or negedge nres)
if (!nres)
	bus_sel <= 8'b0;
else if (ena_bus)
	bus_sel <= {4'b0,bus_sel[3:0], ~|bus_sel[3:0]};

and in hub.v

// output

reg [2:0] sys_q;
reg sys_c;

always @(posedge clk_cog)
if (ena_bus && sys)
	//sys_q <= ac[2:0] == 3'b001	? {	bus_sel[7] || bus_sel[6] || bus_sel[5] || bus_sel[0],		// cogid
									//bus_sel[7] || bus_sel[4] || bus_sel[3] || bus_sel[0],
									//bus_sel[6] || bus_sel[4] || bus_sel[2] || bus_sel[0] }
								//: num;															// others
   
    sys_q <=ac[2:0]==3'b001    ? {1'b0,bus_sel[3] || bus_sel[0],
                          bus_sel[2] || bus_sel[0]}
                          :num;

test code run in PropellerIDE

CON 
_clkmode = xtal1+pll16x
_clkfreq = 80_000_000
OBJ ser : "FullDuplexSerial"
VAR
   long i,x[1000],time1,time2,elapsed
pub main
  ser.Start(31,30,0,115200)
  i:=0
  time1 :=50
  waitcnt(clkfreq/4 +cnt)
  ser.str(string("Watch LEDs"))
  ser.tx(13)
  waitcnt(clkfreq*4 + cnt)
  repeat 1000
     x[i]:=i
     i++
  coginit(3,@timeit,@x)
  repeat until x[0] > 0
  ser.str(string("Results:",13))
  i:=0
  ser.dec(x[0])
  ser.str(string("  should be = 100",13))
  ser.dec(x[999])
  ser.str(string("  should be = 1099",13))
  elapsed:=time2-time1
  ser.dec(elapsed)
  ser.str(string("_clocks  "))
  waitcnt(clkfreq*2+cnt)
  cogstop(3)
  waitcnt(clkfreq*2+cnt)
  cogstop(1)
  waitcnt(clkfreq*2+cnt)
Dat
                             org 0
timeit                       mov loops,reps
                             mov a1,par
                             mov t1,cnt
myloop                       rdlong xval,a1
                             add xval,#100
                             wrlong xval,a1
                             add a1,#4
                             nop
                             djnz loops,#myloop
                             mov t2, cnt
                             wrlong t1,a1
                             add a1,#4
                             wrlong t2,a1
nothing
                             jmp #nothing

reps long 1000
t1    res  1
t2    res  1
t3    res  1
loops  res  1
a1 res 1
xval res 1

Nothing breaks but I get exactly the same timing on both the nano_P1V and a P1.

"hubslots" ... where are those hubslots?

Thanks

Rich

nutson · 2014-08-24 09:53

bus_sel <= {4'b0,bus_sel[3:0], ~|bus_sel[3:0]};

This code generates a 9 bit value where a single "1" bit cycles over 5 positions = hub timeslots, 10 CPU clocks.....bus_sel[2:0] would result in 4 hub time slots.

I have done some experiments with less hub timeslots, look in this threadhttp://forums.parallax.com/showthread.php/156955-Small-V-Prop-2-Cog-s-4-KB-ROM-4KB-Hub-RAM In the last post I posted two oscilloscope screens that show the speedup with 4 slots / 2 Cog's for a series of sequential RDLONG's compared to 8 slots / 8 cog's..

Warning: probably more Verilog changes are necessary to change the number of timeslots without breaking some logic. With 6 timeslots I can have only 4 cog's running, with 4 timeslots only 2 cogs.

So you were lucky with your 5 timeslots, I guess that only 3 Cogs can be running with that (did not try)

rjo__ · 2014-08-24 11:09

Thank you... that is nine bits:)

rjo__ · 2014-08-24 11:41

I changed that line, recompiled, ... had a cup of coffee, went to get lunch, and when I got back, it ran perfectly.
So, if anyone wants half a P1v... it seems to be available here:)

BUT I am seeing absolutely no differences in the timing.

Cluso99 is working on documentation. That should help a lot.

rjo__ · 2014-08-24 12:23

IDK... IF you look at the above spin program, you see that I use cog 3 to run my pasm code.

I went back to PropellerIDE and used cog 4 and it worked... then I switched to cog 5 (which shouldn't) exist. The code ran fine. The timing was unaffected and is still the same as for a normal P1. The correct led for the different cogs assigned lit up on my Nano.

I know that I recompiled and reprogrammed correctly... the time stamps prove it.

I am thinking that when I am asking for a cog that doesn't exist, it uses the 2 lsb and chooses a cog that does exist... and the LED is simply an artifact.
But I'm not sure about this.

nutson · 2014-08-24 13:33

The LED's don't mean a thing. I am running a 2 COG prop at the moment, but when I load a program that fires up all Cog's (with the same SPIN code, toggle a pin) all 8 LED's light up, but on my logic analyzer I see only 2 pins toggling.

The Verilog code for COGINIT/COGNEW and other hub functions is way beyond me, it may be that this code knows about "active" cog's.

jazzed · 2014-08-24 14:17

Rich,

I've seen the same LED and performance behaviour with code I've tried. I'm building with your changes now.

This is the Spin code I used to test performance.


CON


  _clkmode = XTAL1 + PLL16X
  _clkfreq = 80_000_000
  
OBJ
  ser : "MySimpleSerial"


PUB start | addr, t0, t1


  ser.init(31,30,19200)
  waitcnt(CLKFREQ/2+CNT)  '' Wait for start up
  'ser.str(string($d,"Hello.",$d))


  t0 := CNT
  addr := $7f00  
  t1 := CNT
  ser.Str(string("Diff Time "))
  ser.Dec(t1-t0)
  ser.tx($d)

MySimpleSerial.spin

''*******************************************************************
''*  Simple Asynchronous Serial Driver v1.3                         *
''*  Authors: Chip Gracey, Phil Pilgrim, Jon Williams, Jeff Martin  *
''*  Copyright (c) 2006 Parallax, Inc.                              *
''*  See end of file for terms of use.                              *
''*******************************************************************
''
'' Performs asynchronous serial input/output at low baud rates (~19.2K or lower) using high-level code
'' in a blocking fashion (ie: single-cog (serial-process) rather than multi-cog (parallel-process)).
''
'' To perform asynchronous serial communication as a parallel process, use the FullDuplexSerial object instead.
'' 
''
'' v1.3 - May 7, 2009    - Updated by Jeff Martin to fix rx method bug, noted by Mike Green and others, where uninitialized
''                         variable would mangle received byte.
'' v1.2 - March 26, 2008 - Updated by Jeff Martin to conform to Propeller object initialization standards and compress by 11 longs.
'' v1.1 - April 29, 2006 - Updated by Jon Williams for consistency.
''
''
'' The init method MUST be called before the first use of this object.
'' Optionally call finalize after final use to release transmit pin.
''
'' Tested to 19.2 kbaud with clkfreq of 80 MHz (5 MHz crystal, 16x PLL)




VAR


  long  sin, sout, inverted, bitTime, rxOkay, txOkay   




PUB init(rxPin, txPin, baud): Okay
{{Call this method before first use of object to initialize pins and baud rate.


   For true mode (start bit = 0), use positive baud value.     Ex: serial.init(0, 1, 9600)
    For inverted mode (start bit = 1), use negative baud value. Ex: serial.init(0, 1, -9600) 
   Specify -1 for "unused" rxPin or txPin if only one-way communication desired.
   Specify same value for rxPin and txPin for bi-directional communication on that pin and connect a pull-up/pull-down resistor
    to that pin (depending on true/inverted mode) since pin will set it to hi-z (input) at the end of transmission to avoid
    electrical conflicts.  See "Same-Pin (Bi-Directional)" examples, below.


  EXAMPLES:
  
    Standard Two-Pin Bi-Directional True/Inverted Modes                Standard One-Pin Uni-Directional True/Inverted Mode
                Ex: serial.init(0, 1, ±9600)                      Ex: serial.init(0, -1, ±9600)  -or-  serial.init(-1, 0, ±9600)            
         &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;               &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;                          &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;               &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;     
         &#9474;Propeller P0&#9500;&#61626;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#61626;&#9508;I/O Device&#9474;                          &#9474;Propeller P0&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;I/O Device&#9474;     
         &#9474;          P1&#9500;&#61627;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#61627;&#9508;          &#9474;                          &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;               &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   
         &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;               &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;                           


         


            Same-Pin (Bi-Directional) True Mode                              Same-Pin (Bi-Directional) Inverted Mode   
                Ex: serial.init(0, 0, 9600)                                       Ex: serial.init(0, 0, -9600)       
                              &#61463;                                             &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;               &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;                                         
                              &#9474;                                             &#9474;Propeller P0&#9500;&#61626;&#61627;&#9472;&#9472;&#9472;&#9472;&#9472;&#9523;&#9472;&#9472;&#9472;&#9472;&#9472;&#61626;&#61627;&#9508;I/O Device&#9474;                                         
                              &#61628; 4.7 k&#937;                                      &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;       &#9474;       &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;                                         
         &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;       &#9474;       &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;                                               &#61628; 4.7 k&#937;            
         &#9474;Propeller P0&#9500;&#61626;&#61627;&#9472;&#9472;&#9472;&#9472;&#9472;&#9531;&#9472;&#9472;&#9472;&#9472;&#9472;&#61626;&#61627;&#9508;I/O Device&#9474;                                               &#9474;                   
         &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;               &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;                                               &#61464;                   
}}                                                                  


  finalize                                              ' clean-up if restart
  
  rxOkay := rxPin > -1                                  ' receiving?
  txOkay := txPin > -1                                  ' transmitting?


  sin := rxPin & $1F                                    ' set rx pin
  sout := txPin & $1F                                   ' set tx pin


  inverted := baud < 0                                  ' set inverted flag
  bitTime := clkfreq / ||baud                           ' calculate serial bit time  
  
  return rxOkay | TxOkay
  


PUB finalize
{{Call this method after final use of object to release transmit pin.}}
 
  if txOkay                                             ' if tx enabled
    dira[sout]~                                         '   float tx pin
  rxOkay := txOkay := false




PUB rx: rxByte | t
{{ Receive a byte; blocks caller until byte received. }}


  if rxOkay
    dira[sin]~                                          ' make rx pin an input
    waitpeq(inverted & |< sin, |< sin, 0)               ' wait for start bit
    t := cnt + bitTime >> 1                             ' sync + 1/2 bit
    repeat 8
      waitcnt(t += bitTime)                             ' wait for middle of bit
      rxByte := ina[sin] << 7 | rxByte >> 1             ' sample bit 
    waitcnt(t + bitTime)                                ' allow for stop bit 


    rxByte := (rxByte ^ inverted) & $FF                 ' adjust for mode and strip off high bits




PUB tx(txByte) | t
{{ Transmit a byte; blocks caller until byte transmitted. }}


  if txOkay
    outa[sout] := !inverted                             ' set idle state
    dira[sout]~~                                        ' make tx pin an output        
    txByte := ((txByte | $100) << 2) ^ inverted         ' add stop bit, set mode 
    t := cnt                                            ' sync
    repeat 10                                           ' start + eight data bits + stop
      waitcnt(t += bitTime)                             ' wait bit time
      outa[sout] := (txByte >>= 1) & 1                  ' output bit (true mode)  
    
    if sout == sin
      dira[sout]~                                       ' release to pull-up/pull-down


    
PUB str(strAddr)
{{ Transmit z-string at strAddr; blocks caller until string transmitted. }}


  if txOkay
    repeat strsize(strAddr)                             ' for each character in string
      tx(byte[strAddr++])                               '   write the character


  
PUB dec(value) | i, z


'' Print a signed decimal number


  if value < 0
    -value
    tx("-")


  i := 1_000_000_000
  z~


  repeat 10
    if value => i
      tx(value / i + "0")
      value //= i
      z~~
    elseif z or i == 1
      tx("0")
    i /= 10




PUB hex(value, digits)


'' Print a hexadecimal number


  value <<= (8 - digits) << 2
  repeat digits
    tx(lookupz((value <-= 4) & $F : "0".."9", "A".."F"))




PUB bin(value, digits)


'' Print a binary number


  value <<= 32 - digits
  repeat digits
    tx((value <-= 1) & 1 + "0")
    
{{




&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;                                                   TERMS OF USE: MIT License                                                  &#9474;                                                            
&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
&#9474;Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation    &#9474; 
&#9474;files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy,    &#9474;
&#9474;modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software&#9474;
&#9474;is furnished to do so, subject to the following conditions:                                                                   &#9474;
&#9474;                                                                                                                              &#9474;
&#9474;The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.&#9474;
&#9474;                                                                                                                              &#9474;
&#9474;THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE          &#9474;
&#9474;WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR         &#9474;
&#9474;COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,   &#9474;
&#9474;ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.                         &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
}}

jazzed · 2014-08-24 15:13

Hi Rich,

I'm getting the same "Diff Time 704" as before. So it seems either the Spin is being optimized away (very unlikely) or more verilog digging is on order.

rjo__ · 2014-08-24 15:26

There are a couple of things I want to try, but before I break anything, I am going to play with the clock.

For anyone looking in, who hasn't been here for a while, I need to add that I am a pure hobbyist, who ordinarily leaves the room when serious conversations start.

I am such a feckless programmer that I normally test my code after entering each line. That's a little tedious with FPGA's.

So far I haven't found anything I want to do with a propeller that I can't eventually do... and the verilog code makes about as much
sense to me as PASM did when I first looked at it... I'm guessing the end results will be similar.

It could take a while:)

rjo__ · 2014-08-24 20:02

I used pik33's thread on over-clocking http://forums.parallax.com/showthread.php/156851-Some-overclocking-)
Thank you pik33.

On my Nano P1V*1/2, I got largely the same results as pik33. At 150Mhz, the loading was unreliable. At 141.666Mhz and 133.333Mhz, the loading seemed reliable. I added a blinking LED on P0 and the program seemed to run just fine. The LED seemed to blink at the right rate, the cog led's which are hooked up in the verilog behaved correctly.
However, I could not get reliable serial communications, despite using fullserialduplex and mysimpleserial (above in Jazzed's post)at a variety of baud rates. I even hard coded the clock into the serial drivers, just in case something in the declaration wasn't quite kosher. No luck. I am out of time for the next couple of days. I used both bst and PropellerIDE with identical results.

If you are following along. The next step is to put out a frequency on one of the pins and measure it with a real prop... if you have the time, give it a whirl. And if that works, then we will have to figure out what is going on with the serial stuff.

Tah Tah

Rich

rjo__ · 2014-08-28 20:10

in my previous posts I made sequential errors in the dig.v file

Originally, I had the following:

reg [7:0] bus_sel;

always @(posedge clk_cog or negedge nres)
if (!nres)
	bus_sel <= 8'b0;
else if (ena_bus)
	bus_sel <= {4'b0,bus_sel[3:0], ~|bus_sel[3:0]};

which nutson correctly informed me had too many bits. I then made another error in the zip files above... right number of bits, wrong number of cogs:)

Which brings me to this:

reg [7:0] bus_sel;

always @(posedge clk_cog or negedge nres)
if (!nres)
	bus_sel <= 8'b0;
else if (ena_bus)
	bus_sel <= {3'b0,bus_sel[3:0], ~|bus_sel[3:0]};

which I am fairly certain is correct.

The issue is that even though I think I am properly restricting bus selection to 4 cogs... the timing for my test file remains unchanged.

at line 246 of hub.v
I also made this change:

// output

reg [2:0] sys_q;
reg sys_c;

always @(posedge clk_cog)
if (ena_bus && sys)
   
    sys_q <=ac[2:0]==3'b001    ? {1'b0,bus_sel[3] || bus_sel[0],
                          bus_sel[2] || bus_sel[0]}
                          :num;

The concatenation works as in the original code(exept for 4 cogs), but I think I need to change to something like sys_q <={1b'0,ac[1:0]... but .... but...

rjo__ · 2014-08-28 20:17

Of course the proper way to do it would be to limit sys_q to 2 bits... but if I do that then I am going to have to deal with endless errors... which would nicely lead me through important segments of code.
And right now that looks like it could take forever:)

Ramon · 2014-08-30 00:46

rjo__ wrote: »
Which brings me to this:
	bus_sel <= {3'b0,bus_sel[3:0], ~|bus_sel[3:0]};
which I am fairly certain is correct.

I think it can be wrong.

D:\Verilog\samples>vvp p1bus_tb
At time                    0, bus_sel = xxxxxxxx, ena_bus = xx0
At time                   16, bus_sel = 00000000, ena_bus = 000
At time                   18, bus_sel = 00000000, ena_bus = 001
At time                   25, bus_sel = 00000001, ena_bus = 011
At time                   35, bus_sel = 00000010, ena_bus = 021
At time                   45, bus_sel = 00000100, ena_bus = 041
At time                   55, bus_sel = 00001000, ena_bus = 081
At time                   65, bus_sel = 00010000, ena_bus = 101
At time                   75, bus_sel = 00000001, ena_bus = 011
At time                   85, bus_sel = 00000010, ena_bus = 021
At time                   95, bus_sel = 00000100, ena_bus = 041
At time                  105, bus_sel = 00001000, ena_bus = 081
At time                  115, bus_sel = 00010000, ena_bus = 101
At time                  118, bus_sel = 00000000, ena_bus = 001
At time                  125, bus_sel = 00000001, ena_bus = 011

Try this one instead:

bus_sel <= {4'b0,bus_sel[2:0], ~|bus_sel[3:0]};

D:\Verilog\samples>vvp p1bus_tb
At time                    0, bus_sel = xxxxxxxx, ena_bus = xx0
At time                   16, bus_sel = 00000000, ena_bus = 000
At time                   18, bus_sel = 00000000, ena_bus = 001
At time                   25, bus_sel = 00000001, ena_bus = 011
At time                   35, bus_sel = 00000010, ena_bus = 021
At time                   45, bus_sel = 00000100, ena_bus = 041
At time                   55, bus_sel = 00001000, ena_bus = 081
At time                   65, bus_sel = 00000000, ena_bus = 001
At time                   75, bus_sel = 00000001, ena_bus = 011
At time                   85, bus_sel = 00000010, ena_bus = 021
At time                   95, bus_sel = 00000100, ena_bus = 041
At time                  105, bus_sel = 00001000, ena_bus = 081
At time                  115, bus_sel = 00000000, ena_bus = 001
At time                  125, bus_sel = 00000001, ena_bus = 011

HOWTO testbench:

p1bus.v

module p1bus (bus_sel, nres, ena_bus, clk_cog);

output  [7:0] bus_sel;
input         nres, clk_cog, ena_bus;
reg     [7:0] bus_sel;

always @(posedge clk_cog or negedge nres)
if (!nres)
	bus_sel <= 8'b0;
else if (ena_bus)
	bus_sel <= {4'b0,bus_sel[2:0], ~|bus_sel[3:0]};

endmodule

p1bus_tb.v

module test;
  reg nres = 1;  
  reg ena_bus = 0;
  initial begin
     # 16  nres = 0;
	   #  1  nres = 1;
	   #  1  ena_bus = 1;
	   # 100 nres = 0;
     #  1  nres = 1;
     # 100 $finish;
  end
  reg clk_cog = 0;
  always #5 clk_cog = !clk_cog;
  wire [7:0] bus_sel;
  p1bus p1 (bus_sel, nres, ena_bus, clk_cog);
  initial 
     $monitor("At time %t, bus_sel = %b, ena_bus = %h", $time, bus_sel, bus_sel, ena_bus);
endmodule // test

Execute with icarus verilog:

iverilog -o p1bus p1bus_tb.v p1bus.v
vvp p1bus

(code stealed from http://iverilog.wikia.com/wiki/Getting_Started
... do not know why ena_bus has 3 bits)

Todd Marshall · 2014-08-30 05:19

... do not know why ena_bus has 3 bits)

Because there are 8 cogs?

rjo__ · 2014-08-30 06:28

Ramon,

Big thank you. Away from my massive Xp machine. I need a itty bitty 64 bit laptop:)

Ramon · 2014-08-30 08:11

Todd Marshall wrote: »

Because there are 8 cogs?

No. Actually It had 9 bits. I have found the typo. A duplicated variable in the monitor line:

BAD> $monitor("At time %t, bus_sel = %b, ena_bus = %h", $time, bus_sel, bus_sel, ena_bus);
OK > $monitor("At time %t, bus_sel = %b, ena_bus = %b", $time, bus_sel, ena_bus);

It didn't warned that I used three parameters (%t, %b, %h) and four variables, the compiler just joined the last two variables 8 bits + 1 bit (2nd bus_sel & ena_bus).

Ramon · 2014-08-30 09:20

rjo__ wrote: »

Of course the proper way to do it would be to limit sys_q to 2 bits... but if I do that then I am going to have to deal with endless errors... which would nicely lead me through important segments of code.

Substitute this:

sys_q <= ac[2:0] == 3'b001 ? { [B]bus_sel[7] || bus_sel[6] || bus_sel[5] || bus_sel[0],	         // cogid[/B]
                                                  bus_sel[7] || bus_sel[4] || bus_sel[3] || bus_sel[0],
                                                  bus_sel[6] || bus_sel[4] || bus_sel[2] || bus_sel[0] }
                                                  : num;                                                                             // others

// 76543210  {OR(7,6,5,0),OR(7&4&3&0),OR(6&4&2&0)} 
// ========
// 00000001  {         1 ,         1,          1}  =  111b (7)
// 00000010  {         0 ,         0,          0}  =  000b (0) 
// 00000100  {         0 ,         0,          1}  =  001b (1) 
// 00001000  {         0 ,         1,          0}  =  010b (2)
// 00010000  {         0 ,         1,          1}  =  011b (3) 
// 00100000  {         1 ,         0,          0}  =  100b (4) 
// 01000000  {         1 ,         0,          1}  =  101b (5) 
// 10000000  {         1 ,         1,          0}  =  110b (6)

with this:

sys_q <= ac[2:0] == 3'b001 ? { [B]1'b0,                                                                               // cogid[/B]
                                                  bus_sel[7] || bus_sel[4] || bus_sel[3] || bus_sel[0],
                                                  bus_sel[6] || bus_sel[4] || bus_sel[2] || bus_sel[0] }
                                                  : num;								         	// others

// 76543210  {       1'b0,OR(7&4&3&0),OR(6&4&2&0)} 
// ========
// 00000001  {         1 ,         1,          1}  =  111b (3)
// 00000010  {         0 ,         0,          0}  =  000b (0) 
// 00000100  {         0 ,         0,          1}  =  001b (1) 
// 00001000  {         0 ,         1,          0}  =  010b (2)
// 00010000  {         0 ,         1,          1}  =  011b (3) 
// 00100000  {         0 ,         0,          0}  =  000b (0) 
// 01000000  {         0 ,         0,          1}  =  001b (1) 
// 10000000  {         0 ,         1,          0}  =  010b (2)

Beware ! Not tested.

rjo__ · 2014-08-30 18:11

Ramon,

Thanks again.

We are moving our house... and it isn't going well:)

I had just enough time tonight to go to my "lab" and test your changes... I first tested post#21... it worked but did not change the number of clocks (same as measured on my "p1v*1/2" and on a regular p1. I started and stopped all 4 cogs... they worked as expected)

I then added the change from the above( and if you hadn't shown me the truth table, I wouldn't have believed it.) Again, everything works, but the timing in my test code remains as it was tested on a regular p1 with the spin file that I posted.

One issue I have about post#25... (and I suspect that it is just me) is this: you show results for all bus_sel options... but to my mind, bus_sel[7..4] should always be 0... the idea is to never have these selected.
It doesn't seem to make a difference to my final result.... so, I don't see a reason to change it back.

I'm at something of a loss. There must be some other source of bus arbitration that I am missing... I would have expected the measured clocks to drop... maybe not in half, but substantially. They are exactly same. Kind of amazing.

We are taking a hack saw to a Propeller... and it doesn't seem to care:)

On the bright side... we do have a smaller footprint and much quicker compile times in Quartus... but that is not exactly what I want:)(:~~~~^^^^^

rjo__ · 2014-08-31 06:46

OK... now that I am loading the correct .jic file... with both fixes applied, coginit / cognew appear to fail.

BUT using spin to measure elapsed times... I get a decrease of 16 clocks(496->480) when the code is run on a Project Board vs. p1v*1/2

 time1 :=cnt
 time2:=cnt
 elapsed:=time2-time1

Note... in PropellerIDE, use a baud rate of 115200.

rjo__ · 2014-08-31 07:13

For what is worth when I change

bus_sel <= {4'b0,bus_sel[2:0], ~|bus_sel[3:0]};

to

bus_sel <= {4'b0,bus_sel[2:0], ~|bus_sel[2:0]};

In Spin, the measured clocks drops to 448.
As before, the LEDs light up appropriately, so the Prop1v*1/2 seems to think the cog is being used.

Cogstop does work but the pasm routine never writes to hub ram.

rjo__ · 2014-08-31 07:26

That is progress... we now have 1/2 of p1v, with faster Spin... but no PASM:)

Now that Ramon, Cluso99 and Ozpropdev have me heading in the right (Thank you guys:)

I'm going back to over clocking and see what I screwed up there:)

rjo__ · 2014-08-31 15:16

After experimentation, we have only 2 cogs... but what is even more interesting... if the pasm code segment of read->modify-> write, is optimized for the P1 , the P1 pasm code outperforms the 2 cog P1v running the same code.

With unoptimized pasm code, there is about a 15 percent improvement in PASM timing of the P1v*1/4 over a standard P1 and a similar increase (though smaller) for Spin.

The ultimate goal here is to make the 4Cog P1v... perform all hub related tasks about as fast as optimized PASM, with no regard to code optimization.

It bothers me (but I don't know what to do about it) that cog_ena doesn't reflect anything that we have done so far.
I have tried to follow the uses and assignments through multi file searching, but it is seems very much like 32-bit sudoku:)

Ramon · 2014-09-01 07:18

Good progress !

Yes, my code was wrong. It introduced an "all_zero" in bus_sel. God to know that you solved it. I have found that this one may also be ok: "bus_sel <= {bus_sel[2:0], ~|bus_sel[2:0]};"

Look at the following code, I think that there is an assign that maybe need to be changed:

P1V     -> assign bus_ack = ed ? {      bus_sel[1:0], bus_sel[7:2]} : 8'b0;
P1V_1/2 -> assign bus_ack = ed ? {4'b0, bus_sel[1:0], bus_sel[3:2]} : 8'b0;

bus_sel    P1v_cog[n-2]  P1v_1/4[n-2]
=========  ============  ===========           
0000 0001     0100 0000    0000 0100  
0000 0010     1000 0000    0000 1000
0000 0100     0000 0001    0000 0001
0000 1000     0000 0010    0000 0010
0001 0000     0000 0100
0010 0000     0000 1000
0100 0000     0001 0000
1000 0000     0010 0000

4 Cog overclocked P1V

Comments