Pages: 1 [2]   Go Down

Author Topic: LuLa Server down for almost 24hrs!  (Read 10466 times)

Rob C

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 24074
Re: LuLa Server down for almost 24hrs!
« Reply #20 on: December 03, 2010, 05:22:49 pm »

It's sun spots.

My own website needs me to go to Weebly to access it in case I want to make alterations or check traffic: I couldn't get in.

Worse, with this one down, there will probably have been no traffic!

;-)   or, alternatively, ;-(

Nonetheless, thanks to you guys in Mission Control for getting us out of the warp.

Rob C

Mark D Segal

  • Contributor
  • Sr. Member
  • *
  • Offline Offline
  • Posts: 12512
    • http://www.markdsegal.com
Re: LuLa Server down for almost 24hrs!
« Reply #21 on: December 03, 2010, 05:25:45 pm »

It's sun spots.


Aw shucks, ya mean there's a scientific explanation? That spoils all the fun.
Logged
Mark D Segal (formerly MarkDS)
Author: "Scanning Workflows with SilverFast 8....."

bobtowery

  • Antarctica 2016
  • Full Member
  • *
  • Offline Offline
  • Posts: 244
    • http://bobtowery.typepad.com
Re: LuLa Server down for almost 24hrs!
« Reply #22 on: December 03, 2010, 06:25:26 pm »

Well, there was this picture in one of the wiki leaks documents.  But really, I think it is just a case of mistaken identity?

Logged
Bob
 ht

michael

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 5084
Re: LuLa Server down for almost 24hrs!
« Reply #23 on: December 03, 2010, 07:03:48 pm »

The fact that a lot of other sites had problems yesterday may indeed be related. Our server is maintained at a large hosting farm in Texas (The Planet). They host thousands of servers and sites, and the problems may have manifested across a lot of machines.

Vincent and Mark are both catching up on their sleep, so I won't have a full post mortem until later in the weekend. If I learn anything relevant I'll post it here.

Michael
Logged

Slobodan Blagojevic

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 18090
  • When everyone thinks the same, nobody thinks
    • My website
Re: LuLa Server down for almost 24hrs!
« Reply #24 on: December 03, 2010, 07:26:10 pm »

All seems pretty much back to normal now. If readers find any problems, please let us know.

My avatar is missing and it seems impossible to attach a new one. Also noticed some other members' avatars missing too.

Eric Myrvaagnes

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 22814
  • http://myrvaagnes.com
    • http://myrvaagnes.com
Re: LuLa Server down for almost 24hrs!
« Reply #25 on: December 03, 2010, 09:20:37 pm »

My avatar is missing and it seems impossible to attach a new one. Also noticed some other members' avatars missing too.
So it was all a plot by the folks at Wikileaks to steal LuLa avatars!
If yours is missing, watch for it to appear soon in the NY Times.
Logged
-Eric Myrvaagnes (visit my website: http://myrvaagnes.com)

Justan

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1928
    • Justan-Elk.com
Re: LuLa Server down for almost 24hrs!
« Reply #26 on: December 04, 2010, 10:19:52 am »

Time to start planning our new recovery strategy for the next time. With computers, there's always a next time.

Michael



If you were interested in installing fail-over capability, there are a number of ways to mirror SQL databases and any related software. The goal would be to have a 2nd site, managed by a different group. The on-line site would send regular or real-time updates to the 2nd site. In the event that the primary site goes off line, all that’s needed is to change your dns values so that they point to the 2nd site and you’re back up and running in a few minutes.

It’s not the most trivial of task to establish, but offers many advantages and isn't all that expensive to implement or maintain. Do a Google search on “how to mirror sql servers.”

Christopher Sanderson

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2694
    • photopxl.com
Re: LuLa Server down for almost 24hrs!
« Reply #27 on: December 04, 2010, 10:56:02 am »

Yes, this is already 'in the works' - but thanks for the suggestion!

mguertin

  • Guest
Re: LuLa Server down for almost 24hrs!
« Reply #28 on: December 04, 2010, 04:02:45 pm »

My avatar is missing and it seems impossible to attach a new one. Also noticed some other members' avatars missing too.

I'm not sure why some went missing but I will further investigate this.  They all appear to exist so it might be a permissions problem.  You should now be able to upload avatars again, there was a missing PHP module that is now installed.
Logged

ErikKaffehr

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 11311
    • Echophoto
Re: LuLa Server down for almost 24hrs!
« Reply #29 on: December 04, 2010, 04:38:48 pm »

Congratulations to handling an unexpected problem in reasonable time!

Best regards
Erik

Ps. I have worked with a Mr. Merik Guertin of L3 Maps, no relative of yours?



I'm not sure why some went missing but I will further investigate this.  They all appear to exist so it might be a permissions problem.  You should now be able to upload avatars again, there was a missing PHP module that is now installed.
Logged
Erik Kaffehr
 

mguertin

  • Guest
Re: LuLa Server down for almost 24hrs!
« Reply #30 on: December 04, 2010, 04:41:35 pm »

Congratulations to handling an unexpected problem in reasonable time!

Best regards
Erik

Ps. I have worked with a Mr. Merik Guertin of L3 Maps, no relative of yours?


Thanks Erik.  Nope, no relation.

Mark
Logged

K.C.

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 671
Re: LuLa Server down for almost 24hrs!
« Reply #31 on: December 04, 2010, 06:15:30 pm »

As an IT professional for 25+ years I'm trying to understand why a site, with the level of demand this one has, is being run on a single box and maintained by a couple of guys. No matter how competent you may be that's an old school approach.

If you're using your own server in a colo then you really need to be running RAID and the colo should have another box ready to hot swap to. With all due respect, 24 hrs down time and the need for a manual rebuild is pretty amateur with the options you have available to you.

At the very least write a script and ftp it off site several times a day.

# Dump SQL data
/usr/bin/mysqldump -uUSER -pPASS --all-databases --opt -l --result-file=/backup/mysql/mysqld­ ump.sql

# Compress sql dump
tar zcf /backup/mysqldump.sql.tar.gz /backup/mysql

# UPLOAD TO FTP (DD deletes on successful upload)
ncftpput -f ftplogin.cfg -DD /remote_path /backup/2010_12_4.tar.gz

# EMAIL TO MAILBOX
uuencode /home/user/backup/$DATE.tar.gz Some_Hosting_SQL_Dbases.$DATE.t  ar.gz | mail -s "Some Hosting SQL Database Backup" recipient@domain.com


Logged

mguertin

  • Guest
Re: LuLa Server down for almost 24hrs!
« Reply #32 on: December 04, 2010, 07:34:36 pm »

As an IT professional for 25+ years I'm trying to understand why a site, with the level of demand this one has, is being run on a single box and maintained by a couple of guys. No matter how competent you may be that's an old school approach.

If you're using your own server in a colo then you really need to be running RAID and the colo should have another box ready to hot swap to. With all due respect, 24 hrs down time and the need for a manual rebuild is pretty amateur with the options you have available to you.

<snip>



K.C.:

The length of downtime had nothing to do with us not having backups -- in fact we had backups right down to the last minute we were online.  It had everything to do with hardware failure and response times in first diagnosing and then rectifying the problem at the DC end of the equation, and I can assure you that we are taking this up with our provider.  Also I'm not really sure where you get the idea that we performed a manual rebuild of the server or data.  As stated we had full and complete backups of anything remotely considered essential right up to the last minute the old server was online to work from and we restored from these backups. 

A very small portion of the actual downtime was required for the actual data restore.  We also have offsite backups as well but had we made the decision to take that route it's likely that it would have ultimately taken longer at the end of the day than it did to wait the (unacceptably long) time it took the DC to get it's act together and get us back onto functional hardware.  RAID would not have helped us in this situation -- this was not a hard drive failure -- and in fact had we had RAID to deal with for the hardware changeover it likely would have slowed the process down yet again.  I'm also not really sure how you think that having more people maintaining the site and server would have sped up the process (at least on our end of things), you can't restore data if you have nothing to restore it to...

Lastly I have to say that while emailing uuencdoded data is an interesting backup approach, for a dataset the size we are talking about here it wouldn't even be remotely feasible.

Rest assured there are plans underway that will make sure this type of a failure won't require this kind of turnaround time again.

Mark
Logged

K.C.

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 671
Re: LuLa Server down for almost 24hrs!
« Reply #33 on: December 04, 2010, 07:51:38 pm »

Mark you describe a much different picture than the thread let me to believe was the case.

Sounds like a familiar scenario. You don't realize the competency, or lack there of, of the people you're relying on until the worst case happens. Time for a new host/colo.

Emailing gigs of data unsecured is common. You dump tables in random order. Nobody sniffing it can get enough info at once for it to be useful.

Logged

mguertin

  • Guest
Re: LuLa Server down for almost 24hrs!
« Reply #34 on: December 04, 2010, 08:56:26 pm »

It's not the unsecured data part of the emailing that bothers me as much as the size of said emails ;)
Logged

Christoph C. Feldhaim

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2509
  • There is no rule! No - wait ...
Re: LuLa Server down for almost 24hrs!
« Reply #35 on: December 05, 2010, 07:00:21 am »

I'd put such a server in a virtualized movable environment, like vmware or Xen.
Just my 0.02

Justan

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1928
    • Justan-Elk.com
Re: LuLa Server down for almost 24hrs!
« Reply #36 on: December 05, 2010, 08:54:52 am »

my idea is: how to automatically and constantly backup files in a server with zero intervention (automatized) from a HD?

And I'd like an automatized backup that runs all that in case of crash. Is that possible? (I'm not tech as you can see)

Look into the osql utility program or it’s newer incarnation the sqlcmd util.

There are some ftp programs that will do as you wish and which have their own scheduler, or you can use the ftp command line and use the system scheduler, at least in windows boxes.

Justan

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1928
    • Justan-Elk.com
Re: LuLa Server down for almost 24hrs!
« Reply #37 on: December 05, 2010, 09:17:15 am »


The length of downtime had nothing to do with us not having backups -- in fact we had backups right down to the last minute we were online.  It had everything to do with hardware failure and response times in first diagnosing and then rectifying the problem at the DC end of the equation, and I can assure you that we are taking this up with our provider.  [snip]

...you can't restore data if you have nothing to restore it to...


Disaster recovery is a thorny topic and a troublesome thing to implement. Few will spend the time or $$ to implement a fail-over system due to cost, complexity. It takes this kind of problem to motivate and show the value of a fail-over solution.

This appears a classic case where it takes a series of failures to identify the nature of the infrastructure’s (the data center) shortcomings. It sounds like the core issue is that the data center was not quick to identify or resolve their hardware problems ( :o ) and from what you wrote, didn't have a ready solution ( :o :o). And added to that, the site’s management did not plan for or expect the data center to let them down. ( :o )

The good news is that the backups worked ( HURRAY ;D ;D ;D) so little or nothing was lost but time, and gave the site’s management the opportunity to see where the recovery scheme could be improved.

Bravo on the diligence and getting the site up and running in short order!

Craig Arnold

  • Full Member
  • ***
  • Offline Offline
  • Posts: 219
    • Craig Arnold's Photography
Re: LuLa Server down for almost 24hrs!
« Reply #38 on: December 05, 2010, 09:17:56 am »

I'd put such a server in a virtualized movable environment, like vmware or Xen.
Just my 0.02

Yup downtime would have likely been zero (if the hardware was failing with a detectable failure just Vmotion it automatically) or down to a few minutes at most if you needed to spin up an new instance.

Check out something like the Rackspace Cloud web hosting solutions (there are of course other providers too - but starting with Rackspace gives you an idea of what is possible). No single point of failure anywhere. Essentially infinitely scalable too.
Logged
Pages: 1 [2]   Go Up