P1v on the Upduino

Heater. · 2018-01-16 00:52

That is possibly the most horrendous two lines of code I have read since the last time I tried to read any Forth!

Surely there is some way to disentangle that into a few lines that are actually more self evident.

Ariba · 2018-01-16 04:40

David Betz wrote: »
I now see that the code that SaucySoliton posted has it this way:
wire [255:0] ri     = { 32'b0,              // rev
                        {32{d[31]}},        // sar
                        {32{ci}},           // rcl
                        {32{ci}},           // rcr
                        32'b0,              // shl
                        32'b0,              // shr
                        dr[31:0],           // rol
                        d[31:0] };          // ror

wire [63:0] rot     = {ri[i[2:0]*32 +: 32], i[0] ? dr : d} >> s[4:0];
It looks like he widened each entry to 32 bits so he could use the "*32" syntax to simulate indexing into a double dimension array with a single dimension array. Does the "*32" term instantiate a multiplier or just a shift which I guess can be implemented simply by connecting the wires at an offset? If so, why couldn't he have continued to use 31 bit entries?

The *32 does not instantiate anything, it's a only evaluated at compile time (better: synthesis time) to build an intermediate 64 bit wide bit array, in Chips original it's a 63 bit wide array. From this array the needed 32bit portion is extracted into rotr[31:0]. It does not matter if there is a 64th bit at the left side, it will never be used, because the shift is max. 31, so the synthesis will optimize it away.
I don't know why Magnus has used 32bit size instead of 31, maybe just to clarify how it works, or 2^n sizes are better handled by the synthesis tool.

Andy

SaucySoliton · 2018-01-16 06:25

Ariba wrote: »

The *32 does not instantiate anything, it's a only evaluated at compile time (better: synthesis time) to build an intermediate 64 bit wide bit array, in Chips original it's a 63 bit wide array. From this array the needed 32bit portion is extracted into rotr[31:0]. It does not matter if there is a 64th bit at the left side, it will never be used, because the shift is max. 31, so the synthesis will optimize it away.
I don't know why Magnus has used 32bit size instead of 31, maybe just to clarify how it works, or 2^n sizes are better handled by the synthesis tool.

Andy

Thanks for the explanation! I was just wondering if passing d[31] was causing problems.

The original SystemVerilog version make a good reference for understanding the modified version. I guess they added some features to make it easier to describe multiplexers.

Maybe someone should try a *31 or a *3 just to see what happens?

Ariba · 2018-01-16 07:18

Hmm - if I think further about it: The i[2:0] is not constant, so there must be some multiply or shifting in the produced logic. Then it makes sense to use 32 instead of 31.

Yes, somebody should compare the results with 31 and 32 bit size.

Edit:
Just tried it in my IceCube2 project.:
with a size of 32 it takes 159 LUTs more than with 31.
This produces anyway a complicated barrel shifter, and the multiply with 31 is just inherent in the produced shifts and multiplexers. Seems that the optimizer of Symplify does not remove the unused bit.

wire [247:0] ri     = { 31'b0,              // rev
                        {31{d[31]}},        // sar
                        {31{ci}},           // rcl
                        {31{ci}},           // rcr
                        31'b0,              // shl
                        31'b0,              // shr
                        dr[30:0],           // rol
                        d[30:0] };          // ror

wire [62:0] rot     = {ri[i[2:0]*31 +: 31], i[0] ? dr : d} >> s[4:0];

Andy

David Betz · 2018-01-16 13:39

Ariba wrote: »
Hmm - if I think further about it: The i[2:0] is not constant, so there must be some multiply or shifting in the produced logic. Then it makes sense to use 32 instead of 31.

Yes, somebody should compare the results with 31 and 32 bit size.

Edit:
Just tried it in my IceCube2 project.:
with a size of 32 it takes 159 LUTs more than with 31.
This produces anyway a complicated barrel shifter, and the multiply with 31 is just inherent in the produced shifts and multiplexers. Seems that the optimizer of Symplify does not remove the unused bit.
wire [247:0] ri     = { 31'b0,              // rev
                        {31{d[31]}},        // sar
                        {31{ci}},           // rcl
                        {31{ci}},           // rcr
                        31'b0,              // shl
                        31'b0,              // shr
                        dr[30:0],           // rol
                        d[30:0] };          // ror

wire [62:0] rot     = {ri[i[2:0]*31 +: 31], i[0] ? dr : d} >> s[4:0];
Andy

Thanks for trying that. I'll use that version when I try building for the Upduino.

P1v on the Upduino

Comments