It looks like he widened each entry to 32 bits so he could use the "*32" syntax to simulate indexing into a double dimension array with a single dimension array. Does the "*32" term instantiate a multiplier or just a shift which I guess can be implemented simply by connecting the wires at an offset? If so, why couldn't he have continued to use 31 bit entries?
The *32 does not instantiate anything, it's a only evaluated at compile time (better: synthesis time) to build an intermediate 64 bit wide bit array, in Chips original it's a 63 bit wide array. From this array the needed 32bit portion is extracted into rotr[31:0]. It does not matter if there is a 64th bit at the left side, it will never be used, because the shift is max. 31, so the synthesis will optimize it away.
I don't know why Magnus has used 32bit size instead of 31, maybe just to clarify how it works, or 2^n sizes are better handled by the synthesis tool.
The *32 does not instantiate anything, it's a only evaluated at compile time (better: synthesis time) to build an intermediate 64 bit wide bit array, in Chips original it's a 63 bit wide array. From this array the needed 32bit portion is extracted into rotr[31:0]. It does not matter if there is a 64th bit at the left side, it will never be used, because the shift is max. 31, so the synthesis will optimize it away.
I don't know why Magnus has used 32bit size instead of 31, maybe just to clarify how it works, or 2^n sizes are better handled by the synthesis tool.
Andy
Thanks for the explanation! I was just wondering if passing d[31] was causing problems.
The original SystemVerilog version make a good reference for understanding the modified version. I guess they added some features to make it easier to describe multiplexers.
Maybe someone should try a *31 or a *3 just to see what happens?
Hmm - if I think further about it: The i[2:0] is not constant, so there must be some multiply or shifting in the produced logic. Then it makes sense to use 32 instead of 31.
Yes, somebody should compare the results with 31 and 32 bit size.
Edit:
Just tried it in my IceCube2 project.:
with a size of 32 it takes 159 LUTs more than with 31.
This produces anyway a complicated barrel shifter, and the multiply with 31 is just inherent in the produced shifts and multiplexers. Seems that the optimizer of Symplify does not remove the unused bit.
Hmm - if I think further about it: The i[2:0] is not constant, so there must be some multiply or shifting in the produced logic. Then it makes sense to use 32 instead of 31.
Yes, somebody should compare the results with 31 and 32 bit size.
Edit:
Just tried it in my IceCube2 project.:
with a size of 32 it takes 159 LUTs more than with 31.
This produces anyway a complicated barrel shifter, and the multiply with 31 is just inherent in the produced shifts and multiplexers. Seems that the optimizer of Symplify does not remove the unused bit.
Comments
Surely there is some way to disentangle that into a few lines that are actually more self evident.
The *32 does not instantiate anything, it's a only evaluated at compile time (better: synthesis time) to build an intermediate 64 bit wide bit array, in Chips original it's a 63 bit wide array. From this array the needed 32bit portion is extracted into rotr[31:0]. It does not matter if there is a 64th bit at the left side, it will never be used, because the shift is max. 31, so the synthesis will optimize it away.
I don't know why Magnus has used 32bit size instead of 31, maybe just to clarify how it works, or 2^n sizes are better handled by the synthesis tool.
Andy
The original SystemVerilog version make a good reference for understanding the modified version. I guess they added some features to make it easier to describe multiplexers.
Maybe someone should try a *31 or a *3 just to see what happens?
Yes, somebody should compare the results with 31 and 32 bit size.
Edit:
Just tried it in my IceCube2 project.:
with a size of 32 it takes 159 LUTs more than with 31.
This produces anyway a complicated barrel shifter, and the multiply with 31 is just inherent in the produced shifts and multiplexers. Seems that the optimizer of Symplify does not remove the unused bit.
Andy