The machines, now inaccessible, are arguably more secure than before.

  • Mike Knell@blat.at
    link
    fedilink
    arrow-up
    0
    ·
    4 months ago

    @sailor_sega_saturn And given enough time and enough scale even the most improbably weird things will eventually happen. Update file corrupted by a storage controller that flips a couple of bits at random after every 720 hours of uptime but only if it’s 23.682 seconds after the hour? Weirder shit has happened.

    • YourNetworkIsHaunted@awful.systems
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      I once helped one of my company’s customers troubleshoot an issue that had seen the same ridiculous edge case error happen three times over the course of a few years. At one point the actual sustaining developer we worked with was able to narrow down a specific bit that was getting flipped somehow, and pitched that cosmic radiation was a plausible solution given how rarely this kind of thing impacted other customers.

      It was at this point that we remembered that the customer was either a university with a nuclear physics lab or a hospital with a nuclear medicine program (can’t remember now, ironically enough) that the server rack lived adjacent to.

    • flere-imsaho@awful.systems
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      4 months ago

      some twenty four years ago i managed, amongst others, a company’s samba and print server (that was at the time when all the company’s servers were beige boxes with less memory and disk than the laptop i’m using to type this – and still they served a few hundred employees).

      the machine developed a strange custom of hard-resetting itself, which we initially tracked to specific files being sent for printing; the behaviour was fully reproducible.

      as it happened, it was a hardware fault somewhere between the mainboard and the integrated SCSI card; installing a separate SCSI card and reconnecting the disks and backup tape device fixed the problem. (i did not have the budget for a new serwer, no.)

      establishing the actual cause took me fucking weeks.