Random/LFSR on P2

1717273747577»

Comments

  • evanhevanh Posts: 7,726
    edited 2019-07-22 - 12:55:49
    Hmm, the dude doing this review hasn't really tried to lower the voltage - https://www.guru3d.com/articles-pages/amd-ryzen-5-3600-review,26.html

    Here's a QUOTE: "Remember that we're going for an all-core overclock and that means a lower clock frequency than the highest Turbo bin offers. What you need to do:
    Enable and start at 4200 MHz (42 Multiplier)
    Apply 1.40V to the CPU (or simply leave it at auto)
    Work your way upwards from there until the system becomes unstable and then back down at least 100 MHz."


    His first wrong assumption is that a multiplier beyond max boost isn't achievable.
    Second is he started from 1.40 volts!
    And ultimately, the cooler likely wasn't up to the job. No mention of changing it, which presumably means he used the AMD supplied cooler "we use the Wraith stock AMD cooler" and thermal transfer paste.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • xoroshironotxoroshironot Posts: 103
    edited 2019-07-23 - 21:43:01
    evanh wrote: »
    Hmm, the dude doing this review hasn't really tried to lower the voltage...
    I agree... bottom-up on voltage / clock (with better than stock cooler) would be best practice for these, it seems, based on the results of extreme-underclocking here.
    Apparently clock-stretching is the wild card that complicates under/over-clocking.
  • Huh, intriguing, I wonder if that is new to the 3k parts, or that I've also been benefiting from it. I've not tried comparing my own benchmarks with different voltages. In fact, I've not tried lowing the voltage much below default. Of course, the default for the 3k series will be lower than the default voltage for my 1700X so I don't know how much lower the 1.00 volts is that they talk about ...

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • evanhevanh Posts: 7,726
    edited 2019-07-25 - 07:12:36
    I'm not seeing any speed variations here. Although, Cinebench is far from consistent in its scoring. Have to run it a number of times to decide if a reliable score has occurred or not. The power/current does follow the expected gradient. While running Cinebench R15 at 4 GHz, system power went from a high of 265 W at 1.5 volts, down to 186 W at 1.287 volts.

    Under load, the 1700X fails to a reset/lockup when the voltage is too low. At my set 4 GHz clock, below 1.300 core volts I had to reduce the temperature for max RPM on the fan to keep it stable. I was able to go down to 1.287 volts. All this would seem to support the idea that the 3k series has the clock-stretching (or possibly skipping) as a new feature.

    PS: System power being 1700X CPU, B350 mobo, GTX960 GPU, 1x SSD, 1x HDD, 1x ODD, gold rated power supply, mouse, keyboard, USB extender, card reader, and some cooling fans.

    PPS: I've worked out that it isn't Cinebench's fault for the inconsistent scores. I just have to wait longer after each reboot for the OS to settle down and stop running its little background jobs. Conveniently, I can see the transition on the power meter with idle system power always at 55 watts.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • I've just done some more experimenting with the minimum voltage and using Cinebench to load test and verify constant processing speed. Just by changing the multiplier to 3800 MHz I could go down to 1.175 volts on the CPU core voltage. And remember this is with a 14 nm first gen part that has a default of 1.35 volts for 3400 MHz. Not that I would rely on it that close to the edge.

    I had another go at 4100 MHz, which had proved too unstable historically, and found that indeed it is creating too much heat for my S40 cooler. With all cores going flat out, the temperature rises to needing more volts for stability which in turn heats higher. There's no sweet spot.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • xoroshironotxoroshironot Posts: 103
    edited 2019-07-26 - 03:53:47
    Now I am understanding more about some of the conflicting information I have read... if the write-up here is accurate.
    Excerpt: "... if you want to overclock a single core inside a CCX, the second core must run at a 1 GHz difference, meaning that if one core is OC'd to 4.5 GHz, the second core must run at 3.5 GHz. Such design is to be blamed on CPU's internal clock divider..."
    No worries... Shamino to the rescue with a new version of Work Tool.
    EDIT: Required reading (and some scary stuff in one of the links before using Work Tool).
  • evanhevanh Posts: 7,726
    edited 2019-07-26 - 02:56:53
    Um, there is four cores, not two, per CCX - https://www.techpowerup.com/review/amd-ryzen-9-3900x/images/arch9.jpg

    On another note, that "data fabric" in the I/O die is huge. So far I've not seen a single official comment on what is in there. I've seen one off-hand comment by a journo speculating it could be DRAM.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • I see Seba has formally posted code links for larger state xoroshiro++ and xoshiro++ on his PRNG Shootout page.
    I have already tested many of the xoroshiro++ constants at 128-bit, but wasn't perfectly satisfied with the ones I looked at via meta-analysis.
    It is just like the search for 32-bit state constants, but without any easy ability to exhaustively test.

    I am in the process of re-fitting my BigCrush meta-analysis engine with mathematical improvements (e.g. I noticed I had been using the sample average as part of the StdDev calculation, where the known population average of 0.5 seems more appropriate and is reasonably mathematically scalable for n > 2, and almost perfectly so for n > ~30) to better explore this.
  • When did they decide to put ++ in there?
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • xoroshironotxoroshironot Posts: 103
    edited 2019-08-07 - 03:46:13
    evanh wrote: »
    When did they decide to put ++ in there?
    I don't know 'when' about the decision itself, but the announcement was August 1st.
    On the whole I think it is overdue, as I believe large state ++ (with correct choice of constants) has the potential to obsolete most other PRNGs for most common tasks (that do not require a CSPRNG).

    The only real failing (in very specific use-case scenarios) of a good ++ is sparse-state recovery (i.e., lag in propagating a single state bit change to other bits), which my +*+ w/mask idea addresses fairly well.
    Some might argue that lack of multidimensional equidistribution is a 'failing' with + (or ++), but I don't think that argument carries much weight for the vast majority of uses, and even less so at larger state sizes.
  • Ah, cool, thanks, very recent then.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Evan,
    I gather you’re trying to overclock you pc to get actual results from the png. If so, could you use the free GPU/TPU from google? There is a version that runs python and you can get about 10 hr time blocks at a time although you can get kicked off for paying customers. Just a thought.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • The PRNG testing hasn't been touched for months. I've been doing streamer and smartpin testing of late.

    I guess the reason I bring up the Ryzen in this topic is because I've done all my significant PRNG testing using the original 8-core product from early 2017 and have mentioned more than once how much of an upgrade it was from the dual-core. That and Chris has shown interest in adding to his extensive collection of PCs.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • evanh wrote: »
    I see Seba has formally posted code links for larger state xoroshiro++ and xoshiro++ on his PRNG Shootout page.
    When did they decide to put ++ in there?
    I don't know 'when' about the decision itself, but the announcement was August 1st.
    On the whole I think it is overdue, as I believe large state ++ (with correct choice of constants) has the potential to obsolete most other PRNGs for most common tasks (that do not require a CSPRNG).

    The new version of the paper has some differences from the original and is worth downloading (however I suggest keeping a copy of the old one). The ++ scrambler section is now section 10.6, not 10.7. Our constant d is called r in the paper, but we started using [a,b,c,d] long before the paper was published.

    Seba and David now suggest d aka r values of 5 and 11 for w = 16 (32-bit state, 16-bit output as used by XORO32). What's quite amusing is that Seba knows that we've changed from [14,2,7,5] to [13,5,10,9] and the former is mentioned in the first paper whereas the latter is not in the second presumably because it conflicts with their new advice! As mentioned, though, test results are what really matter. Also the double-iteration in XORO32 is a unique feature of the P2 and others would use a 64-bit or larger state to get a 32-bit PRN.

    I think the amended paper still gives the (misleading) impression that there is not a lot to choose between + and ++ on quality. Perhaps it's hard to tell the difference with states > 32-bit, but our tests show that ++ is much better. + is faster and easier to analyse theoretically, but if there is time to do ++ I can't see any reason to do + instead.

    Regarding the PRNG shootout, I don't understand how a footprint of 1068 bits arises when the text says it is always padded to a multiple of 64.
    Formerly known as TonyB
  • xoroshironotxoroshironot Posts: 103
    edited 2019-08-09 - 02:35:15
    evanh wrote: »
    That and Chris has shown interest in adding to his extensive collection of PCs.
    Interest is all I have at the moment... choosing between paying to send a kid through college and buying one of the new EPYC processors announced yesterday is easy for me. But, others don't share my appreciation for awe-inspiring processors.
    TonyB_ wrote: »
    Seba and David now suggest d aka r values of 5 and 11 for w = 16...
    My test results at various bit-widths suggest a gravitation toward certain D (aka R) constants in general, but indeed the specific ABC choice decides whether any given D will create excess bit correlations.
    TonyB_ wrote: »
    Perhaps it's hard to tell the difference with states > 32-bit.
    I believe I can tell the difference with my meta-analysis (and it is a given that I would need to intentionally ignore the obvious low-bit binary matrix and linear complexity issues with +), but it is difficult to wrap up all the details to publish credibly while demonstrating proof. Part of the proof issue has to do with the statistics of the most important values, which are located on the outside of my cloud model... those points are non-normally distributed, thus the uninitiated would attempt to apply their knowledge of normally distributed statistics to my results. To avoid this conundrum, it takes at least 1000 BigCrush results to come close enough to the Central Limit Theorem expectation where the normal and non-normal merge well (in this case) and actual biases in 'so called' good PRNGs begin to emerge. At that point we begin to see a correlation between the most biased statistical tests and the bit positions they are testing, or other patterns that emerge (which are not present in AES or other simulated TRNG data).
    TonyB_ wrote: »
    Regarding the PRNG shootout, I don't understand how a footprint of 1068 bits arises when the text says it is always padded to a multiple of 64.
    I don't understand either. Perhaps they meant 1088, except that the 1024 index is an 'int', which would suggest 1056 as the real answer (also in conflict with the 'multiple of 64').
  • If anyone is interested, I found a paper that discusses the idea of what I am trying to accomplish with my TestU01 meta-analysis: Normal Cloud Distribution Probability Fast
    My implementation will be different, but hopefully comparable.
Sign In or Register to comment.