Shop OBEX P1 Docs P2 Docs Learn Events
First internal SSD failure - Page 2 — Parallax Forums

First internal SSD failure

2»

Comments

  • evanhevanh Posts: 15,214
    edited 2016-04-06 12:47
    Ray,
    Now that we know it's the old SSD830, when you ran GSmartControl, you did actually check that all the entries in the attributes tab on the "Failed" column were all showing NEVER, right?

    And the error log should be completely blank.
  • I just checked it, NEVER and blank. As for the memory, I have a standalone program that I used to check the memory yesterday, before re-install, it showed that no problems were found. Well, at least I got the SSD thing settled; not sure if I am getting any closer to finding the original problem though.

    Ray
  • Could this have been caused by a trim issue?
  • evanhevanh Posts: 15,214
    I doubt it. There is a documented corner case RAID + TRIM issue with the kernel apparently, I presume Ray wasn't RAIDing. And Samsung drives in general need NCQ disabled when performing TRIM ops but this is dealt with already.
  • Well, so far I am not seeing any problems with my system, but then I have not really loaded or installed all of the programs that I had, pre-crash. I am now leaning towards an Desktop(OS) malfunction, which probably has nothing to do with the underlying kernel. I will work the system through next week, and then see where I am at with this.

    Ray
  • Heater.Heater. Posts: 21,230
    Having read through this thread I see no sign that this was an SSD failure.

    I can well believe it was a corruption at the file system level, for whatever reason. Or at least some file somewhere got mangled and prevented boot up.

    I have not seen such an issue on a Linux system for so many years I don't know what to make of it.

    Anyway, my attitude now is that any computer can self-destruct at any moment and I don't care. It's disposable. Everything of importance is backed up and replicated in many other places.

  • Heater. wrote: »
    Having read through this thread I see no sign that this was an SSD failure.

    I can well believe it was a corruption at the file system level, for whatever reason. Or at least some file somewhere got mangled and prevented boot up.

    I have not seen such an issue on a Linux system for so many years I don't know what to make of it.

    Anyway, my attitude now is that any computer can self-destruct at any moment and I don't care. It's disposable. Everything of importance is backed up and replicated in many other places.

    +1 there, the only problem I've ever had with a Linux drive is when I've plugged a 2TB USB ext4 into the NAS system which decided automatically and instantly that this drive should be formatted!!!$#%^&*!**@!

    Same with attitude, I have backups and if I don't then they couldn't have been important, right? Of course if every scattered backup were trashed somehow then I would have NOTHING to worry about.

  • From this experience, I think I need some standalone tools to check the hardware. I have a standalone program that I can check the memory, but a standalone program to check the HDD/SSD, and other things, would be nice. Anybody know of such a suite, free of course?

    Ray
  • Heater.Heater. Posts: 21,230
    smartctl, to get the error report from the SSD.

    fschk, to check the file system.

    dd, Write to the entire drive or a big file that will take up most of your free space. Get the md5sum of the data written and check that it verifies on reading it back. Not really recommended, that's just wearing out the FLASH for not much benefit.

  • By standalone, I meant, programs on a bootable flash drive. That way your system goes down, you plug in your flash drive, boots up, and than you can choose the program(s) you need to run, from a menu preferably. My standalone program that checks memory, is on an optical disc, those things will probably start to disappear, pretty soon now.

    Ray
  • Heater.Heater. Posts: 21,230
    You can put a "live" install of Debian on a CD/DVD or USB stick and boot that. From there you can run the commands I gave above to check out your disk and other hardware. https://www.debian.org/CD/live/ You can also repartition disks, format partitions, etc etc. Sorry no menu system, hardly worth building one given how infrequently one does these things.

    There are many other such "live" systems. Arch linux live seems to be popular today: https://www.archlinux.org/download/

  • Well, I think my "big" desktop computer has joined a union. It has been over a week now since I had upgraded to Ubuntu 16.04 LTS, and yesterday, not having it do any thing extra ordinary, it decides to do a spontaneous reboot.

    Hmm, so I waited for the reboot to finish, signed in again, and continued with the interrupted job, this time it took a more aggressive action, the computer froze. Nothing was responding, no keyboard, no mouse response, nothing. Maybe it just needs to rest for awhile, so I just hit the power button and turned it off for the night.

    This morning the first job, it goes right to a freeze the computer, now I am not sure what to do ?

    Ray
  • Heater.Heater. Posts: 21,230
    edited 2016-05-05 12:55
    You can put a "live" install of Debian on a CD/DVD or USB stick and boot that. (Did I say that already?)

    But this time start testing things. The disk as I mentioned above.

    The memory with memtester:
    $ sudo apt-get install memtester
    $ memtester 600m 10
    Where 600m is the number of megabytes to test. Change to suit your machine.

    There are probably other tests we can dig up.

    I very rarely see a Linux box freeze as you describe. The last time it happened it was weird, I got a green screen of death. Turned out the CPU had overheated and shut down. Cleaning up the fan and putting some thermal compound under the heat sink fixed that.
  • evanhevanh Posts: 15,214
    Yeah, hardware is failing, CPU, motherboard, power supply, maybe video card.

    If fans are spinning and clear of dust, make sure the CPU heatsink is secured!, then next easiest is try another/new power supply, it's the one sure thing that can be reused. If you can easily eliminate the GPU then do that next.

    If those don't help then your options are suddenly getting costly. Pulling a CPU gets a bit messy, I'd seriously consider throwing it and get a new upgrade kit - CPU/mobo/GPU and probably RAM too.
  • evanhevanh Posts: 15,214
    Oh, try resocketing the DIMMs.
  • I tried installing Windows 10, and that came up with an error, and would not install. So, yes my box has a hardware problem.

    Putting that debugging session off for another day, I dug out my old Gateway tower box, and installed Windows 10, so far no problems what so ever.

    Now I am starting to have this feeling that what I really need is a Surface Pro 4, the one with i5 and 8GB of mem. I think I will start saving my pennies, and get one as soon as possible. Done messing around with this stuff.

    Ray
  • Heater.Heater. Posts: 21,230
    A very good choice Ray, at least you can put Linux on a Surface Pro when Win 10 gets too much :)

  • evanhevanh Posts: 15,214
    Resocketing the DIMMs at least is easy to do and totally free.
  • I will probably do the debugging of the box, maybe next week, probably pull everything off the motherboard than reseat everything and see if that makes a difference, but I will not trust that the box is fully functional.

    So far I have installed Windows 10 Pro on two machines now, and they are working as expected, not sure why everybody else here is having problems. I do have one machine that has Windows 7 Home on it, because I use WMC a lot, Windows 7 stays until I can find a substitute that will work on a Windows 10 machine. After that occurs, than so long Windows 7.

    As for Linux, the only one I will be using is what comes with the Raspberry Pi, I think I have had it, for the time being, with the desktop versions.

    Ray
  • evanhevanh Posts: 15,214
    Resocketing the CPU is not advised just for the sake of it. The thermal paste is considered as single use, you should reapply a fresh coating if lifting the heatsink.

    But do make sure the clips are holding the heatsink firmly in place.
  • Heater. wrote: »
    There are probably other tests we can dig up.

    I favor 'stressapptest' (https://github.com/stressapptest/stressapptest , but the Debian package will be a tad quicker to install) ever since I was able to determine a stable configuration on a Tyan server board. Memtest was of no help (some weird memory controller / bus / timing issue I suppose).
Sign In or Register to comment.