Random/LFSR on P2

1343536373840»

Comments

  • TonyB_ wrote: »
    Thanks a lot, Chip. How about Q2?
    TonyB_ wrote: »
    2. Is there enough time to do the 64-bit addition in a single clock?

    If the answer is yes, I'm wondering whether two 16-bit additions could be done in one clock, with second using result of the first, perhaps with bespoke carry logic.

    Apparently, there IS time to do a 64-bit addition at 160MHz, since OnSemi didn't complain about it when we gave them the Verilog that, among other things, implemented xoroshiro128+. It was actually the 16x16 multiplier in the cogs' ALUs that were the critical paths in the design. The FPGA hid it well, having hard multiplier blocks in 28nm.

    I think that two 16-bit additions could be done, one after the other. I know that at 160MHz, there is NOT time for two 32-bit additions, one after the other, as this was a problem in the CORDIC solver that I had to work around by adding additional pipeline stages.
  • Ah, so the Prop2 will overclock more if one is not using MUL then. :D
    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • TonyB_TonyB_ Posts: 248
    edited November 18 Vote Up0Vote Down
    scro,

    Is the FPF test in PractRand important for something like xoroshiro32+?
    http://pracrand.sourceforge.net/Tests_engines.txt

    The scores for all 16 bits appear to be worse than when testing fewer bits, e.g. 15.

    EDIT:
    This question has been answered already here:
    http://forums.parallax.com/discussion/comment/1424224/#Comment_1424224

    Formerly known as TonyB
  • evanhevanh Posts: 4,428
    edited November 17 Vote Up0Vote Down
    Thanks Tony. I hadn't found that description.
    FPF - "floating point frequency" test; contrary to the name
    it's purely integer math. Can be slow on some settings.
    This checks for very short range correlations, even shorter
    than DC6, especially those correlations involving lots of 0
    bits.
    Roughly speaking, this test does a frequency test applied to
    the binary format of floating point numbers storing the
    integer values of overlapping windows of the original data
    stream.

    Hmm, reading that, I'm thinking there could be some sort of pattern forming when sampling is aligned with LSbit. I'll reorder the Word1/2 variants to be LSbit aligned rather than the current MSbit ... Nope, no quality change.
    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • And another description - https://sourceforge.net/p/pracrand/discussion/366935/thread/55980eac/#1b63/56bf
    FPF: A frequency test. At the parameterization used in core test set it's almost a non-overlapping frequency test. And not on hamming weights this time! Every 16 bits in the datastream it does a sample. The sample value consists of the number of consecutive zero bits starting from the lowest bit of the sample (up to a maximum of like ~50 or so, but that never happens, realistically it's almost always a small number), and the first 14 bits above the lowest 1 bit in the sample. Which is basically equivalent to a bit-backwards unsigned integer to floating-point value conversion.
    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • TonyB_ wrote: »
    scro wrote: »
    evanh wrote: »
    scro wrote: »
    evanh wrote: »
    I'll do some more configurations tomorrow if Scro doesn't tell me I'm wasting my time.

    I don't know what you'd be doing to wasting your time or not waste your time.
    The main attempts right now are shuffling the summing output bit order. This is the question I was trying to articulate earlier - Is there is any real quality advantage to a fixed post-summing shuffle? Or are we just tricking PrandRand?
    Shuffling bit order in the final output? Yeah, that sounds pretty suspicious. Exactly how useless that is depends upon what test failures you're managing to avoid that way, but in general it sounds like a bad idea.

    scro,

    Is the FPF test in PractRand important for something like xoroshiro32+?
    That particular question was asked back when DC6 tests were the failure point.

    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • cgracey wrote: »
    TonyB_ wrote: »
    Thanks a lot, Chip. How about Q2?
    TonyB_ wrote: »
    2. Is there enough time to do the 64-bit addition in a single clock?

    If the answer is yes, I'm wondering whether two 16-bit additions could be done in one clock, with second using result of the first, perhaps with bespoke carry logic.

    Apparently, there IS time to do a 64-bit addition at 160MHz, since OnSemi didn't complain about it when we gave them the Verilog that, among other things, implemented xoroshiro128+. It was actually the 16x16 multiplier in the cogs' ALUs that were the critical paths in the design. The FPGA hid it well, having hard multiplier blocks in 28nm.

    I think that two 16-bit additions could be done, one after the other. I know that at 160MHz, there is NOT time for two 32-bit additions, one after the other, as this was a problem in the CORDIC solver that I had to work around by adding additional pipeline stages.

    Chip, do you think there would be time in XORO32 for a second addition that uses the result of the first, for each of the two iterations? No need to calculate parity (no longer required) and the second sums would be the outputs.



    Formerly known as TonyB
  • TonyB_ wrote: »
    cgracey wrote: »
    TonyB_ wrote: »
    Thanks a lot, Chip. How about Q2?
    TonyB_ wrote: »
    2. Is there enough time to do the 64-bit addition in a single clock?

    If the answer is yes, I'm wondering whether two 16-bit additions could be done in one clock, with second using result of the first, perhaps with bespoke carry logic.

    Apparently, there IS time to do a 64-bit addition at 160MHz, since OnSemi didn't complain about it when we gave them the Verilog that, among other things, implemented xoroshiro128+. It was actually the 16x16 multiplier in the cogs' ALUs that were the critical paths in the design. The FPGA hid it well, having hard multiplier blocks in 28nm.

    I think that two 16-bit additions could be done, one after the other. I know that at 160MHz, there is NOT time for two 32-bit additions, one after the other, as this was a problem in the CORDIC solver that I had to work around by adding additional pipeline stages.

    Chip, do you think there would be time in XORO32 for a second addition that uses the result of the first, for each of the two iterations? No need to calculate parity (no longer required) and the second sums would be the outputs.



    I think so. What are you thinking about?
  • TonyB_TonyB_ Posts: 248
    edited November 21 Vote Up0Vote Down
    deleted
    Formerly known as TonyB
  • TonyB_TonyB_ Posts: 248
    edited November 21 Vote Up0Vote Down
    cgracey wrote: »
    TonyB_ wrote: »
    Chip, do you think there would be time in XORO32 for a second addition that uses the result of the first, for each of the two iterations? No need to calculate parity (no longer required) and the second sums would be the outputs.

    I think so. What are you thinking about?

    Just curious. :)
    Formerly known as TonyB
  • TonyB_ wrote: »
    cgracey wrote: »
    TonyB_ wrote: »
    Chip, do you think there would be time in XORO32 for a second addition that uses the result of the first, for each of the two iterations? No need to calculate parity (no longer required) and the second sums would be the outputs.

    I think so. What are you thinking about?

    Chip, there is a simple way to improve the xoroshiro+ output considerably and all LFSR artifacts can be eliminated. Sebastiano Vigna told me the way, however I am sworn to secrecy until it is announced officially. Evan knows and has been doing tests and I am allowed to tell you but nobody else.

    Cool! When is he going to announce it?
  • how other people seed random generators...



    Enjoy!

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • That's gorgeous but computers, predictable?

    Might have been in the old days. Now a days I can never tell what my Win 10 machine is going to do next.

    They could get much better and randomness by replacing all those Lava Lamps with Surface Pro 4's

    :)

  • Random is what comes to mind when I think about the computers at work these days. We've recently had IT infrastructure moved overseas so now the M$ terminal server desktops show up via some laggy and congested shared data pipe. Even something as simple as selecting some text takes a long time to get right. I end up using the cursor keys a whole lot now because then I can precisely predict what it will do without having to wait to see what happens.

    At least the local guy was able to help things by setting the mouse pointer to be locally hosted. It was pure agony until he did that.

    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • @evanh,

    and the real fun begins when you from your remote desktop access another server via remote desktop, then you can watch how often programs redraw their buttons and borders!

    interesting,

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • evanhevanh Posts: 4,428
    edited December 2 Vote Up0Vote Down
    Well, the marathon effort to decide on the best Xoroshiro32 shifter combinations, aka triplets, aka candidates, has finally come to an end. It was engrossing and enjoyable. Many thanks to everyone posting here and especially to:

    - Johannes, aka Ahle2, for starting the ball rolling.

    - Chip's openness to ideas and proving of implementation.

    - Tony's enthusiasm, engagement and taking the lead to find even better.

    - Sebastiano Vigna, aka Seba, and David Blackman for the Xoroshiro algorithm.

    - Chris Doty-Humphrey, aka Scro, for the wonderful PractRand!!!! and helpful comments. I would never have gone this far without PractRand.

    - And on that note of just how much work was done with PractRand I'll have to credit the coincidental timing of AMD launching the Ryzen CPU when they did. Stepping up from an ageing dual-core to an octa-core was perfect for me. It provided excellent flexibility for testing hair brained ideas and just throwing brute MIPS at the job. At one moment of unrestrained task launching I even hit 29 GB of RAM usage ... and it came out fine. Something that would have needed a hard reset on the old system I suspect.
    The Prisoner's Dilemma, in english - "Selfishness beats altruism within groups. Altruistic groups beat selfish groups." - Quoted part from 2007, D.S Wilson/E.O Wilson.
  • Thanks for all your work on this Evanh, TonyB_, Seba, and Scro.

    I just need to understand what to do exactly, and I'll do it.
  • cgraceycgracey Posts: 8,343
    edited December 5 Vote Up0Vote Down
    evanh wrote: »
    Random is what comes to mind when I think about the computers at work these days. We've recently had IT infrastructure moved overseas so now the M$ terminal server desktops show up via some laggy and congested shared data pipe. Even something as simple as selecting some text takes a long time to get right. I end up using the cursor keys a whole lot now because then I can precisely predict what it will do without having to wait to see what happens.

    At least the local guy was able to help things by setting the mouse pointer to be locally hosted. It was pure agony until he did that.

    Sounds like a major boon for productivity. Nothing could be smarter than moving your desktop to the other side of the world.

    Okay, I'm ready to implement the improved XORO32, but I have some questions. I emailed you and TonyB_. I'm not clear on how this new scheme works when it comes to the adding.

    Once this is implemented, I'll have a new release ready in a short time.
Sign In or Register to comment.