Shop OBEX P1 Docs P2 Docs Learn Events
Did I find a bug in cog_ctr.v? — Parallax Forums

Did I find a bug in cog_ctr.v?

andrewsiandrewsi Posts: 59
edited 2014-09-12 16:08 in Propeller 1
So in cog_ctr.v, there's a simulated PLL at the end using this code:
always @(posedge clk_pll)
if (~&ctr[30:28] && |ctr[27:26])
     pll_fake <= pll_fake + {4'b0, frq};

This is checking the 5 CTRMODE bits in the CTRA/CTRB registers, and I think what it's trying to do is ensure that the 3 MSBs are all zeroes, and that at least one of the 2 LSBs is one. In other words, the whole expression is true (and thus the PLL is enabled) when the mode is set to 00001, 00010, or 00011, but not in any other case. This would make sense since these are the three PLL-enabled counter modes.

However, I think the "~&ctr[30:28]" should actually be "~|ctr[30:28]" instead, shouldn't it? The present form is effectively a NAND of those 3 bits, so it's true if any of them are zero, which would enable the fake PLL in all sorts of counter modes where it really shouldn't be. I think it should be the second form (NOR) instead, which would only allow the fake PLL to be enabled when all three bits are zero, but none of the other cases.

Have I made a fundamental error here (which would not surprise me in the slightest), or have I actually found a bug?

Comments

  • thoththoth Posts: 75
    edited 2014-09-12 13:14
    andrewsi wrote: »
    So in cog_ctr.v, there's a simulated PLL at the end using this code:
    always @(posedge clk_pll)
    if (~&ctr[30:28] && |ctr[27:26])
         pll_fake <= pll_fake + {4'b0, frq};
    

    This is checking the 5 CTRMODE bits in the CTRA/CTRB registers, and I think what it's trying to do is ensure that the 3 MSBs are all zeroes, and that at least one of the 2 LSBs is one. In other words, the whole expression is true (and thus the PLL is enabled) when the mode is set to 00001, 00010, or 00011, but not in any other case. This would make sense since these are the three PLL-enabled counter modes.

    However, I think the "~&ctr[30:28]" should actually be "~|ctr[30:28]" instead, shouldn't it? The present form is effectively a NAND of those 3 bits, so it's true if any of them are zero, which would enable the fake PLL in all sorts of counter modes where it really shouldn't be. I think it should be the second form (NOR) instead, which would only allow the fake PLL to be enabled when all three bits are zero, but none of the other cases.

    Have I made a fundamental error here (which would not surprise me in the slightest), or have I actually found a bug?

    I think you're right. Here's some equivalent code in dig.v
    bus_sel <= {bus_sel[6:0], ~|bus_sel[6:0]};  // Obscure code.  Bits 6 thru 0 with the NOT OR of those bits appended.
                                                    // 00000001 becomes 00000010.  Resets when '1' reaches bit 7 position since only the NOT of all 0s is 1.
    

    This is a shift register and it only appends a 1 bit when the previous 1 bit has shifted out the top leaving all 0 bits. That's the TRUE case and what you've found is an expression that is SUPPOSED to return 1 when all 0s are found.

    Looks like a bug to me.
  • andrewsiandrewsi Posts: 59
    edited 2014-09-12 13:26
    Thanks for the confirmation, hopefully a Gracey will see this and make the fix in the official release drop.
  • thoththoth Posts: 75
    edited 2014-09-12 14:46
    andrewsi wrote: »
    Thanks for the confirmation, hopefully a Gracey will see this and make the fix in the official release drop.

    Out of pure curiousity, how did you find this? It's not something that exactly leaps off the page!
  • andrewsiandrewsi Posts: 59
    edited 2014-09-12 15:07
    I contributed a version that works for a Digilent Nexys 4 (based on Xilinx Artix-7) to the Github project. I have been playing around to produce a version that compiles in the Vivado tools (which beat ISE for friendliness by a mile) and also produce accurate constraints, and make better use of the clocking resources, since the generic code uses lots of combinational clock generation which isn't really ideal on an FPGA, and the Xilinx tools whine a lot about that. I had worked my way down to the last few bits of unconstrained paths, which are the fake PLL taps which drive the video clock, and in deciphering the fake PLL code implementation and mapping it back to what goes on on the real chip, I ran into this line and realized it didn't make sense.
  • thoththoth Posts: 75
    edited 2014-09-12 15:29
    andrewsi wrote: »
    I contributed a version that works for a Digilent Nexys 4 (based on Xilinx Artix-7) to the Github project. I have been playing around to produce a version that compiles in the Vivado tools (which beat ISE for friendliness by a mile) and also produce accurate constraints, and make better use of the clocking resources, since the generic code uses lots of combinational clock generation which isn't really ideal on an FPGA, and the Xilinx tools whine a lot about that. I had worked my way down to the last few bits of unconstrained paths, which are the fake PLL taps which drive the video clock, and in deciphering the fake PLL code implementation and mapping it back to what goes on on the real chip, I ran into this line and realized it didn't make sense.

    Whew! OK. Glad I asked.

    Do you know how much of an Artix-7 50 the P1v uses? And, more to my interest, how long does it take Vivado to compile? I'm too old to do desk checking anymore and the Cyclone V is a real pig - no amount of deodorant is gonna change that. Avnet is selling the eval board for 2 and change. 50k something or others sounds like a reasonable breadboard. Also, what's the link to the github project? I'm not following along this project all that closely.
  • jac_goudsmitjac_goudsmit Posts: 418
    edited 2014-09-12 15:37
    Good find, Andy! I looked at the code and I'm pretty sure that yes, this is a bug.

    I fixed the bug in my Master and Altera branches (the two main branches I'm working in), and updated my pull-request to Mindrobots that he still hasn't merged at this time.

    I had to fix it in 7 locations in 2 branches, if I counted correctly... I really need to get busy merging the Altera and Xilinx trees together; this one-character change was easy to merge but if we find anything more complicated, or if we add more targets, the amount of work to synchronize source code quickly goes out of hand.

    ===Jac
  • andrewsiandrewsi Posts: 59
    edited 2014-09-12 16:08
    Well, the P1V I currently have is using around only about 15% of both the block RAM and LUTs on the Artix-7 100, so depending on what the relative slice count is on a 50, I imagine the percentage would grow roughly proportionally. However, because the Propeller has so many different clocks flying around (two PLLs per cog, plus the offboard clock source and the main 80 and 160Mhz clocks in the design), that adds up to a lot of BUFGs pretty quickly. That's 11 of the 32 available (34.4%) on the 100, and possibly a much higher percentage. If the 50 has less BUFGs available and you try to get another 8 cogs in there you might run out and have to resort back to some logic-driven clocks, which is of course bad news from a skew and timing standpoint.

    The Artix chips do have some additional local clock routes that can be used in more limited circumstances (i.e. not fully global but limited to particular clock regions) but I haven't made any attempt to do floorplanning to minimize consumption of the global clock buffers.

    The github project Jac is running is at http://www.github.com/jacgoudsmit/P8X32A_Emulation. It contains only my ISE version with some basic constraints but it seems to run fine. My Vivado version is in a different repository at andrewsil1/P8X32A_Vivado. What's there builds and runs OK, but depending on what I check in it may become temporarily unstable as I play with the constraints more.
Sign In or Register to comment.