Besides I have been in a critical layout mode for a few years and didn't want to mess with trying to make sure a 'new' system would be stable in the middle of a project.
If I take some vacation time to come out there, build the new system, and make sure everything you need is working/stable before you move to it, can I sleep on your couch? :-D
Or, if you throw the parts together and give me SSH access, I'll be happy to set it up for you over a weekend.
If I take some vacation time to come out there, build the new system, and make sure everything you need is working/stable before you move to it, can I sleep on your couch? :-D
Or, if you throw the parts together and give me SSH access, I'll be happy to set it up for you over a weekend.
I appreciate the offer, but I'm afraid of what might be found in our couch. :-) Let me think on this and see what Parallax says. an SSH access might be a more sensible approach, but first the computer upgrade needs to be considered.
Using an SSD for swap was a good idea; now the next easy performance boost would be to max out the ram capability of your machine - if you can run 6GB, you can run 8GB. Then add a second SSD for /tmp.
Surely /tmp isn't heavily thrashed in this software? Sounds more like it just gobbles RAM instead.
As for the 6GB vs 8GB, I'd guess it's a tripple channel setup, but I don't know what was actually possible on socket 775 systems back then.
PS: I'd certainly be eyeing up that socket 2011, 64 GB setup for future.
/tmp usage would depend on the software using it for temp files; I am not familiar with the software, I was guessing it would use it some.
If you know the motherboard model, we can google the maximum memory capacity. Socket 775 only supported dual channel, but some OEM's put two DIMM sockets on one channel, and four on the other. If you have six DIMM sockets, you still might be able to use 2GB sticks, and go to 12GB.
Socket 775 motherboards tended to be mostly SATA and SATA2, to get the full benefit of your SSD you may need to put a SATA3 controller into a PCI-e slot.
About sleeplessness... I had irradiation to my brain stem as a side effect of treating metastatic squamous cell carcinoma in my neck... I mention this because after the radiation I had of narcolepsy.
I simply couldn't wake up. Something in my brainstem had been damaged. I was placed on Provigil... which worked like a charm. Except that when I was traveling in Amsterdam, I suddenly went sleepless.
My brain stem suddenly recovered to the point that I no longer needed the Provigil. I understood this, but as an experiment, I decided to continue the Provigil anyway.
On the 11th day, I was absolutely convinced that I should walk through Amsterdam with my eyes shut, because at a certain location I would be knighted by the Queen of England... it was all very real. Sleep is essential. Sleeplessness is not acceptable.
Rich
Yep. Here's a little something about ProVigil: http://www.33rdsquare.com/2013/04/is-modafinil-limitless-brain-enhancing.html
I have little opinion on it's use as a nootropic - it seems to be relatively harmless on the scale of mind altering substances. If you haven't seen the movie "Limitless" you may want to if you're a hard driving, get it done type worker. My "little opinion" is that it seems physically harmless but, since human physiology is so complex, it's impossible to predict the effect of it for the purpose of imrpoving productivity -I know Rich was not taking it for that reason. Rich had a somewhat typical experience with the drug after brain repair and remodeling occurred - I know of a couple of cases very similar to his. I also know of a few previous narcoleptics whose narcolepsy has been completely alleviated by ProVigil and they are very successful people who have relatively normal sleep/wake cycles. Sorry for the off topic, just wanted to share my observations.
"Surely /tmp isn't heavily thrashed in this software? Sounds more like it just gobbles RAM instead." - The latter of the two is correct, the software gobbles RAM because the LVS/DRC are performed as a flattened database rather than a hierarchical structure. If the tool that we are using performed hierarchical LVSing and DRCing then it would use far less memory, because it could "black box' substructures that are "known to be correct" from the DRC/LVS run. ...But instead, the software looks at the entire structure and tests every single occurrence of a block or cell that might get repeated over and over. I tried to make several attempts with the software vendor to push for hierarchical LVSing and DRCing without any luck. It translates into $$$ for a tool that does hierarchical versus one that does not. Laker has recently been acquired by Synopsis, so perhaps there will be some improvements in the way the software traverses the database.
Socket 775 motherboards tended to be mostly SATA and SATA2, to get the full benefit of your SSD you may need to put a SATA3 controller into a PCI-e slot.
With that old chipset, I'd check if the SATA controller is in Legacy/IDE mode or running in AHCI.
Switching it to AHCI can mean a 10 - 20% improvement on oldfashioned spinning platters, and are reported by some to help also with SSDs.
If you get a SATA-3 controller, get a decent (RAID?) controller with battery-backed cache RAM. Doesn't matter if you only ever run it with one HDD or SSD. The write cache speeds up the system something silly.
"Surely /tmp isn't heavily thrashed in this software? Sounds more like it just gobbles RAM instead." - The latter of the two is correct, the software gobbles RAM because the LVS/DRC are performed as a flattened database rather than a hierarchical structure. If the tool that we are using performed hierarchical LVSing and DRCing then it would use far less memory, because it could "black box' substructures that are "known to be correct" from the DRC/LVS run. ...But instead, the software looks at the entire structure and tests every single occurrence of a block or cell that might get repeated over and over. I tried to make several attempts with the software vendor to push for hierarchical LVSing and DRCing without any luck. It translates into $$$ for a tool that does hierarchical versus one that does not. Laker has recently been acquired by Synopsis, so perhaps there will be some improvements in the way the software traverses the database.
I assume the LVS and DRC doesn't max the CPU? Can Linux put the swap file on a software raid-0? If so, 64GB SSDs are getting quite cheap lately. Might be worth tossing 4x of them at the swap file? (or just get one of the PCI-E SSDs that can maintain over 1GB/s)
That's 830MB/s of swap speed. SATA1 is limited to 150MB/s, SATA2 is 300MB/s and SATA3 is limited to ~600MB/s.
That add-in card will trump any SSD you hang off the SATA bus. There are even cards that do 2GB/s and are several hundred GB in size, but they are decidedly bank busting.
Hmm, not a Linux head however I think the scale of the project merits a closer look at computing for sure.
Since we're looking at, what, $500K for the past year and the upcoming spin?
Regardless of whether this is a cpu or I/O issue, I think one could make a smart business case for setting up a dual socket Xeon or AMD 8-12 core server
with 64/96GB memory to cut down on disk/ssd I/O, and a non-OCZ SSD storage solution with back-up.
Depends upon whether the software likes extra cores, or pure Ghz and if AMD is fully supported by Fedora and the apps.
$5-6K/ea to get much faster results from the Primary designers seems like a no-brainer.
Not only will their normal work go faster, but those ideas that pop into your head to try that are not followed up upon because of the amount of time required to crank out a result would now be less likely to be pushed off.
And there's got to be someone in the Bay Area who's a Fedora server guy who could hit Sactown and do a build-out and get it up and running in parallel so as not to impact on-going activity.
That's 830MB/s of swap speed. SATA1 is limited to 150MB/s, SATA2 is 300MB/s and SATA3 is limited to ~600MB/s.
That add-in card will trump any SSD you hang off the SATA bus. There are even cards that do 2GB/s and are several hundred GB in size, but they are decidedly bank busting.
The software does not support CPU farming, but I have since upgraded. I have a Quad CPU (Q6600 @2.4GHz) processor with 6 Gigs of main memory and a 120 Gig solid state drive configured as a swap drive ... All Linux platform (which makes a difference when talking swap drive). During an LVS or DRC I have dipped into the swap by about 50 Gigs.
An LVS takes almost exactly 7 hours to run, while a DRC does as well. .... <--- before this process took about 3 days
Wow. This is an ancient machine.
If you're doing serious work, get a serious machine. Going into swap just wastes everyone's time.
My primary workstation has dual 8-core Xeons and 256GB of RAM; with hyperthreading that's 32 logical
cores.
Even my "baby" workstation for running Windows has 64GB of RAM.
Memory is cheap; engineer time is expensive. Toss the Q6600 box and get yourself a nice Xeon
workstation with maximum RAM.
Putting in an SSD for *swap* is like adding wings to make your VW bug faster---you're solving the
wrong problem.
Your workload needs RAM. Give it RAM. And ECC RAM at that; I'd hate there to be a bug in the
Prop II because you got a bitflip during a long, large-memory run.
(I decomissioned my old Q6600 machine with 8GB of RAM because it was no longer
worth the power it was consuming; modern SB or IB machines are *so* much faster
than old Core2 boxes.)
I appreciate all of the suggestions, and will take them into consideration, however changing too many variables right now on a project such as this is NOT a wise decision.
I appreciate all of the suggestions, and will take them into consideration, however changing too many variables right now on a project such as this is NOT a wise decision.
Agreed, but then again, there never will be a right time
I cannot tell you how many times I have done precisely what you are doing - but hindsight is wonderful and the time I wasted in not doing it was the wrong decision.
Get someone to setup a fast machine with lots!!! of ECC RAM. Since you will still have the old PC, you can run both to do the first verification to ensure the new PC is working as it should if you feel so inclined.
I appreciate all of the suggestions, and will take them into consideration, however changing too many variables right now on a project such as this is NOT a wise decision.
Beau is absolutely right here.
Beau has a slow but known working system with a sacrificial SSD. To move to a faster system requires an OS refresh which will likely trigger a tools and tool-chain refresh. All it takes is introducing one new unknown into the equation and we're back to square one again.
The HP dc7900 SFF(Small Form Factor) 'Business PC' I use at the office has the same CPU, but 8GB RAM and frankly...
*Snore*
That thing is SLOW!
The only reason I haven't upgraded it yet(next year I have to upgrade, it's already out of warranty) is that it has a 3.5" diskette drive, and don't think it's possible to fit one of those in the DELL 9090 desktop systems we're using these days.
Even the HP dc8000 is on our list of 'used computers not to pass onto other users even if they only run light programs'.
(The dc7900 uses DDR2 RAM, the dc8000 uses DDR3.)
Even if your computer, against all reason, uses DDR3, it's an early system, and welll... Early DDR3 is slower than the fastest DDR2 RAM.
Checked the GeekBench scores over at primatelabs.com and yeah...
The only reason that Q6600 scores better than the 2.16GHz Core 2 Duo I stuffed into my 1.66GHz Mac Mini is that it has twice the number of cores...
(Of course, the Mini can only fit 2GB RAM and has a really sh!tty graphics card. But I did stuff in a nice SSD for the system, and user files is on an external FireWire800 HDD)
This is all ANCIENT TECH, though, and really...
I collect and use(and sometimes modify) old computers at home as a hobby.
What's your excuse?
Gadgetman, you can get an external USB 3.5" diskette drive. I got one for my Dad so he could transfer all his old DOS and Windows 3.1 stuff to the Win7 box he's using most of the time now.
Wooo, I might have a really good feature request ...
What details on counters A and B exist for the Prop2? I've just decided to stop being so lazy and start trying to understand those AD7401 A/D chips. In the process I realised the dual counter setup in the Prop1 is only one mux, or one adder for third order, away from being able to perform the toughest part of a second order "decimation" filter!
They've given a third order example but two integrator stages is enough for good performance gains. The fastest, as in on every clock, part of the whole deal is the multistage integrator. The Prop1 counters are already built as a single stage integrator. The Prop2 would only need, say, PHSA summed in place of FRQB. Viola! The second stage is born. After that one can do the rest in software with ease. And with the WAIT instructions one can be very exacting on the sample timing even without the typical capture circuit.
This should improve performance for the internal A/Ds but I'm sure we'll be able to make even more use than that. And presumably, a big guess on my part, throw in the opposite for D/A, as in also have one differentiate the other (and use the high bit of the result for bitstream output?).
However changing too many variables right now on a project such as this is NOT a wise decision.
That's why I offered to build/set up the computer for you. That way, you don't lose productivity on the old system while the new one is being set up, and someone else can make sure the system is running up to your spec before you move over to it.
Seriously, I can't believe a Q6600 is an upgrade. Like many others here, a Q6600 is barely better than the machine I just got rid of at work. Our antenna simulation machine has an i7-3960x and 64GB of DDR3-1833, but the person who specced it out didn't really know what they were doing. Our build server is the dual-socket Xeon machine with 96GB ram because, once again, machines are a lot cheaper than engineer time. Any time you are waiting for your computer to finish an operation is a waste. At the same time, any time you dedicate to upgrading is also a waste. Make someone else do it so you don't lose time on it.
I understand the desire to not mess with anything, but a Q6600 is, well, ancient. I upgraded my q9660 about 9 months ago to a third gen i7 with SSD and it is SO much faster. It runs better with the win 8 too.
The attached piccy puts some labels in place. In particular, it says what is the sinc3 part and what is the decimation part. Both are new to me. And Mod_Clk presumably means modulator.
EDIT: I've added the Prop1 A/B counters diagram for reference. BTW: Why was this diagram never in the Prop's datasheet?
For those, like myself, trying to grasp the value of multi-order filtering, here's one example: A second order filter achieves 16 bit resolution in the same number of clocks as an 8 bit first order filter - ie 256 clocks instead of the 65536 clocks needed for a 16 bit first order filter. And third order achieves 24 bits in the same time.
The higher orders may not be fully accurate to their respective resolutions but it doesn't take much to think they are still a big improvement.
PS: This example of (8 bits) 256 clocks per order is dependant on the sampling rate at the decimator. Any number of clocks per sample is choosable I presume.
All you guys poo-pooing the Q6600, it's only 4 times slower than the fastest Core i7 4 core chip that Intel sells right now. It's not like he's down 20x from the current setup.
Yes, but in a workstation application where an engineer is blocked and costing money in a CPU bound operation, then 4x behind is huge. Really though, a multi-socket workstation is a good investment with this level of work. You'll notice the CPU I recommended is a 10-core CPU, and two in one motherboard would be significantly faster than any i7.
I'm aware of your CPU recommendations, but it's clear from past posts that LVS and DRC aren't regular operations that Beau has to perform. These are done once some milestone is reached. He said it's down to around 7 hours for the common case, and I expect it will be a little faster in a couple of weeks.
Your recommendation makes the assumption that the software architecture is robust enough to take advantage of 20 cores and 256GB of RAM. I'm not convinced the software would see a linear improvement, and just for a specific use case.
evanh: Perhaps you might like to take both P1 counter circuits and mark up the appropriate additions/mods in red (use Paint or equiv) so we can actually see what you are proposing. It sounds interesting but I for one would like to see the actual resulting diagram.
For those, like myself, trying to grasp the value of multi-order filtering, here's one example: A second order filter achieves 16 bit resolution in the same number of clocks as an 8 bit first order filter - ie 256 clocks instead of the 65536 clocks needed for a 16 bit first order filter.
It is not quite a free lunch, as after 256 clocks, you do not really have 2^16 decimation of any input - there is also a settling time spec on such filters, and they take multiple readings to converge on a final value, and they also need care with external multiplexing.
I think we are still waiting on the final Counter specs, from Chip ?
I'd also like to see Atomic control/capture from two timers, to support reciprocal frequency counting.
Simultaneous capture of multiple Cog counters could be done now by utilising multiple Cogs. Each Cog being set to WAIT on the same count of the global counter then immediately capturing their respective internal counters.
Cluso: Good idea. I'll get right on it when I finish work in an hour.
Comments
Or, if you throw the parts together and give me SSH access, I'll be happy to set it up for you over a weekend.
You just want to play with expensive computers!
I appreciate the offer, but I'm afraid of what might be found in our couch. :-) Let me think on this and see what Parallax says. an SSH access might be a more sensible approach, but first the computer upgrade needs to be considered.
Surely /tmp isn't heavily thrashed in this software? Sounds more like it just gobbles RAM instead.
As for the 6GB vs 8GB, I'd guess it's a tripple channel setup, but I don't know what was actually possible on socket 775 systems back then.
PS: I'd certainly be eyeing up that socket 2011, 64 GB setup for future.
/tmp usage would depend on the software using it for temp files; I am not familiar with the software, I was guessing it would use it some.
If you know the motherboard model, we can google the maximum memory capacity. Socket 775 only supported dual channel, but some OEM's put two DIMM sockets on one channel, and four on the other. If you have six DIMM sockets, you still might be able to use 2GB sticks, and go to 12GB.
Socket 775 motherboards tended to be mostly SATA and SATA2, to get the full benefit of your SSD you may need to put a SATA3 controller into a PCI-e slot.
I have little opinion on it's use as a nootropic - it seems to be relatively harmless on the scale of mind altering substances. If you haven't seen the movie "Limitless" you may want to if you're a hard driving, get it done type worker. My "little opinion" is that it seems physically harmless but, since human physiology is so complex, it's impossible to predict the effect of it for the purpose of imrpoving productivity -I know Rich was not taking it for that reason. Rich had a somewhat typical experience with the drug after brain repair and remodeling occurred - I know of a couple of cases very similar to his. I also know of a few previous narcoleptics whose narcolepsy has been completely alleviated by ProVigil and they are very successful people who have relatively normal sleep/wake cycles. Sorry for the off topic, just wanted to share my observations.
With that old chipset, I'd check if the SATA controller is in Legacy/IDE mode or running in AHCI.
Switching it to AHCI can mean a 10 - 20% improvement on oldfashioned spinning platters, and are reported by some to help also with SSDs.
If you get a SATA-3 controller, get a decent (RAID?) controller with battery-backed cache RAM. Doesn't matter if you only ever run it with one HDD or SSD. The write cache speeds up the system something silly.
I assume the LVS and DRC doesn't max the CPU? Can Linux put the swap file on a software raid-0? If so, 64GB SSDs are getting quite cheap lately. Might be worth tossing 4x of them at the swap file? (or just get one of the PCI-E SSDs that can maintain over 1GB/s)
Marty
That's 830MB/s of swap speed. SATA1 is limited to 150MB/s, SATA2 is 300MB/s and SATA3 is limited to ~600MB/s.
That add-in card will trump any SSD you hang off the SATA bus. There are even cards that do 2GB/s and are several hundred GB in size, but they are decidedly bank busting.
Since we're looking at, what, $500K for the past year and the upcoming spin?
Regardless of whether this is a cpu or I/O issue, I think one could make a smart business case for setting up a dual socket Xeon or AMD 8-12 core server
with 64/96GB memory to cut down on disk/ssd I/O, and a non-OCZ SSD storage solution with back-up.
Depends upon whether the software likes extra cores, or pure Ghz and if AMD is fully supported by Fedora and the apps.
$5-6K/ea to get much faster results from the Primary designers seems like a no-brainer.
Not only will their normal work go faster, but those ideas that pop into your head to try that are not followed up upon because of the amount of time required to crank out a result would now be less likely to be pushed off.
And there's got to be someone in the Bay Area who's a Fedora server guy who could hit Sactown and do a build-out and get it up and running in parallel so as not to impact on-going activity.
Wow. This is an ancient machine.
If you're doing serious work, get a serious machine. Going into swap just wastes everyone's time.
My primary workstation has dual 8-core Xeons and 256GB of RAM; with hyperthreading that's 32 logical
cores.
Even my "baby" workstation for running Windows has 64GB of RAM.
Memory is cheap; engineer time is expensive. Toss the Q6600 box and get yourself a nice Xeon
workstation with maximum RAM.
Putting in an SSD for *swap* is like adding wings to make your VW bug faster---you're solving the
wrong problem.
Your workload needs RAM. Give it RAM. And ECC RAM at that; I'd hate there to be a bug in the
Prop II because you got a bitflip during a long, large-memory run.
worth the power it was consuming; modern SB or IB machines are *so* much faster
than old Core2 boxes.)
I cannot tell you how many times I have done precisely what you are doing - but hindsight is wonderful and the time I wasted in not doing it was the wrong decision.
Get someone to setup a fast machine with lots!!! of ECC RAM. Since you will still have the old PC, you can run both to do the first verification to ensure the new PC is working as it should if you feel so inclined.
Beau is absolutely right here.
Beau has a slow but known working system with a sacrificial SSD. To move to a faster system requires an OS refresh which will likely trigger a tools and tool-chain refresh. All it takes is introducing one new unknown into the equation and we're back to square one again.
Beau is the best judge of risk verses reward.
The HP dc7900 SFF(Small Form Factor) 'Business PC' I use at the office has the same CPU, but 8GB RAM and frankly...
*Snore*
That thing is SLOW!
The only reason I haven't upgraded it yet(next year I have to upgrade, it's already out of warranty) is that it has a 3.5" diskette drive, and don't think it's possible to fit one of those in the DELL 9090 desktop systems we're using these days.
Even the HP dc8000 is on our list of 'used computers not to pass onto other users even if they only run light programs'.
(The dc7900 uses DDR2 RAM, the dc8000 uses DDR3.)
Even if your computer, against all reason, uses DDR3, it's an early system, and welll... Early DDR3 is slower than the fastest DDR2 RAM.
Checked the GeekBench scores over at primatelabs.com and yeah...
The only reason that Q6600 scores better than the 2.16GHz Core 2 Duo I stuffed into my 1.66GHz Mac Mini is that it has twice the number of cores...
(Of course, the Mini can only fit 2GB RAM and has a really sh!tty graphics card. But I did stuff in a nice SSD for the system, and user files is on an external FireWire800 HDD)
This is all ANCIENT TECH, though, and really...
I collect and use(and sometimes modify) old computers at home as a hobby.
What's your excuse?
What details on counters A and B exist for the Prop2? I've just decided to stop being so lazy and start trying to understand those AD7401 A/D chips. In the process I realised the dual counter setup in the Prop1 is only one mux, or one adder for third order, away from being able to perform the toughest part of a second order "decimation" filter!
For anyone that's interested in a faster settling A/D, read pages 16 and 17 of the AD7401 datasheet - http://www.analog.com/en/analog-to-digital-converters/ad-converters/ad7401a/products/product.html. And I'm guessing, although not shown in that datasheet, this goes both ways, as in D/A, as well.
They've given a third order example but two integrator stages is enough for good performance gains. The fastest, as in on every clock, part of the whole deal is the multistage integrator. The Prop1 counters are already built as a single stage integrator. The Prop2 would only need, say, PHSA summed in place of FRQB. Viola! The second stage is born. After that one can do the rest in software with ease. And with the WAIT instructions one can be very exacting on the sample timing even without the typical capture circuit.
This should improve performance for the internal A/Ds but I'm sure we'll be able to make even more use than that. And presumably, a big guess on my part, throw in the opposite for D/A, as in also have one differentiate the other (and use the high bit of the result for bitstream output?).
Evan
Seriously, I can't believe a Q6600 is an upgrade. Like many others here, a Q6600 is barely better than the machine I just got rid of at work. Our antenna simulation machine has an i7-3960x and 64GB of DDR3-1833, but the person who specced it out didn't really know what they were doing. Our build server is the dual-socket Xeon machine with 96GB ram because, once again, machines are a lot cheaper than engineer time. Any time you are waiting for your computer to finish an operation is a waste. At the same time, any time you dedicate to upgrading is also a waste. Make someone else do it so you don't lose time on it.
EDIT: I've added the Prop1 A/B counters diagram for reference. BTW: Why was this diagram never in the Prop's datasheet?
The higher orders may not be fully accurate to their respective resolutions but it doesn't take much to think they are still a big improvement.
PS: This example of (8 bits) 256 clocks per order is dependant on the sampling rate at the decimator. Any number of clocks per sample is choosable I presume.
Your recommendation makes the assumption that the software architecture is robust enough to take advantage of 20 cores and 256GB of RAM. I'm not convinced the software would see a linear improvement, and just for a specific use case.
The latest Analog Devices M4's include such a sinc filter - designed for AD74xx isolated converters.
http://www.analog.com/static/imported-files/images/functional_block_diagrams/ADSP-CM40x_fbl.png
It is not quite a free lunch, as after 256 clocks, you do not really have 2^16 decimation of any input - there is also a settling time spec on such filters, and they take multiple readings to converge on a final value, and they also need care with external multiplexing.
I think we are still waiting on the final Counter specs, from Chip ?
I'd also like to see Atomic control/capture from two timers, to support reciprocal frequency counting.
Cluso: Good idea. I'll get right on it when I finish work in an hour.