Page 2 of 2 FirstFirst 12
Results 11 to 17 of 17

Thread: Network problems at GNAX datacenter

  1. #11
    andychev's Avatar
    andychev is offline Master Glow Jedi
    Join Date
    Apr 2005
    Location
    Chester, UK
    Posts
    150

    Default

    Quote Originally Posted by jmarcv View Post
    I am not sure that it is even a GNAX problem. Which server was down 5 hours?
    sorry i should have made that clearer, i dont think any glowhsot servers were down for that period of time, however other server in GNAX had been reported to be down for 8 hours.

  2. #12
    bdominick's Avatar
    bdominick is offline I am a Glowru!
    Join Date
    Dec 2007
    Posts
    76

    Default

    Quote Originally Posted by jmarcv View Post
    Apparently, it was a battery backup unit that went bad is what I last heard.
    How does a battery backup problem cause an outage? That has to be a secondary problem.

    Quote Originally Posted by jmarcv View Post
    I am not sure that it is even a GNAX problem.
    Here's what GNAX says about their power backup system:

    multiple power generators on a redundant grid with two separate feeds into the building.
    How on earth does that kind of system lose power? A Navy SEAL team should have trouble taking that out! So what if a backup battery goes bad -- they're supposed to have redundand AC feeds into the building and multiple generators...
    Brian Dominick
    | WebRoot Solutions
    | Server Administrator, Software Developer
    http://mywebroot.com

  3. #13
    andychev's Avatar
    andychev is offline Master Glow Jedi
    Join Date
    Apr 2005
    Location
    Chester, UK
    Posts
    150

    Default

    Quote Originally Posted by bdominick View Post
    How does a battery backup problem cause an outage? That has to be a secondary problem.

    Here's what GNAX says about their power backup system:

    How on earth does that kind of system lose power? A Navy SEAL team should have trouble taking that out! So what if a backup battery goes bad -- they're supposed to have redundand AC feeds into the building and multiple generators...

    Gnax have released an official explanation for this, im sure glowhost will update shortly. To cut a very very long story short it was a series of freak events. A main utility company supplying the datacentre failed causing a complte outage, the backup generators kicked in an most things rebooted. However both battery systems failed (the main and redundant) causing the complate failure. Once running multiple hardware failures and router problems caused a catalog of problems which had to be fixed.

    That is a very very short version of what happened!

  4. #14
    rlhanson's Avatar
    rlhanson is offline Master Glow Jedi
    Join Date
    Aug 2007
    Location
    Chapman, Kansas
    Posts
    531

    Default

    bdominick -
    Just wanted to add my two cents about dealing with your clients - especially as a fairly new provider:

    Be confident that you are offering a quality service.

    Be of the opinion that mechanical devices can and will fail, regardless of the precautions we have in place.

    If your clients go elsewhere, chances are they will return - especially if you have been forthright about the issues they originally experienced.

    I've gone as far as "helping" my clients find another qualified service provider and have yet to lose a client when I've sent them the information. I just keep doing what I do - I'm confident that I am providing an awesome service.
    Thank you,
    Lynne Hanson
    RL Hanson-Online

  5. #15
    bdominick's Avatar
    bdominick is offline I am a Glowru!
    Join Date
    Dec 2007
    Posts
    76

    Default

    Thanks a lot, Lynne. That's really helpful advice!
    Brian Dominick
    | WebRoot Solutions
    | Server Administrator, Software Developer
    http://mywebroot.com

  6. #16
    bdominick's Avatar
    bdominick is offline I am a Glowru!
    Join Date
    Dec 2007
    Posts
    76

    Default

    Quote Originally Posted by andychev View Post
    Gnax have released an official explanation for this, im sure glowhost will update shortly.
    Hey Andy, where do you find notices from GNAX? I'd like to see the long version of this explanation.

    From the short version, it sounds like GNAX has been deceiving its customers, which is very disconcerting. If you have 2 backups and they both fail, you actually had zero backups. You can't promote backup systems that aren't working. And how did the two grid connections both fail in the first place? Probably because there was only one...
    Brian Dominick
    | WebRoot Solutions
    | Server Administrator, Software Developer
    http://mywebroot.com

  7. #17
    jmarcv's Avatar
    jmarcv is offline Cranky Coder
    Join Date
    Jan 2005
    Posts
    354

    Default

    Quote Originally Posted by bdominick View Post
    You can't promote backup systems that aren't working. And how did the two grid connections both fail in the first place? Probably because there was only one...
    GNAX has a forum.

    I visited the datacenter, and it is pretty damn impressive! The guys I met are all real people.
    However, if you read the long version you will see the long list of comedy of errors.

    I can relate to their issue, it was a great embarrasment for them. When I visited they were so proud that they had it all figured out.

    I have been working on a failover system for a client with 17 servers. I can relate. The only way to test it is to flip the switch, which is hard to do when you are live. So you wait and pray. Then judgement day comes, and you find all the loopholes. And then you patch them and lick your wounds.

    However, when both battery units failed they fired up the backup generator and much of the power was back online.
    But .... when a computer goes down unexpectedly, bad things can happen to the file system as I said, delaying the boot. When it is bad enough, your OS may get hosed.
    When the power fluctiuates on and off from load spikes, routers can lose configuration settings. Suddenly you have power but data is not getting routed.

    I am a moderator and a glowhost client. I've done my share of ragging on GNAX. But I think we will see a much more stable DC after this. They had every employee called in to deal with this. From my experience with hardware, this was probably pretty ugly for them.

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14