PowerBlogs.Com Development

Maintenance (News)

There's going to be some minor downtime this weekend on all servers as we upgrade our webservers. The downtime should be minimal. I'll post more here as the times firm up.

Posted by Chris on 07.07.2006. (0 Comments)
Server down

One of the powerblogs servers got severely overloaded and I had to reboot it. The database isn't coming up for some reason, I'm fixing that now.

I'm very sorry for the problem, and I'll post updates here.

Update: Ok, the database is working again and things should be back to normal, finally. More in a bit.

Posted by Chris on 06.08.2006. (0 Comments)
One of the powerblogs servers is down (News)

We're very sorry for the interruption of service to those who are affected, and I'll be posting more as I know it here.

Update: We've got the server back up. Things are looking reasonably normal so far, but I'm going to do more investigation. We will probably need to bring the server offline for maintenance some time late tonight. I'll post more as I know it.

I'm very sorry for the trouble that this has caused people. We're investigating the cause to make sure that it doesn't happen again. There's no reason, so far, to suspect any data loss from the server, but would everyone please just take a quick look to make sure that everything appears to be intact? We've got all of the livebackups (plus the last-ditch mailing list backup), so if anything is missing we should be able to restore it promptly. Thank you, and once again we apologize for the trouble.

Posted by Chris on 06.01.2006. (0 Comments)
Server Downtime (News)

We may need to bring one of the powerblogs servers down for maintenance tonight. I'll update this post with when we start and when we finish.

Update: Ok, we're about to take the server down for maintenance. Hopefully, the downtime should only take about an hour.

Update: We're deferring the downtime until tomorrow, to give us some time to analyze what's going on further and minimize the downtime we need to perform.

Posted by Chris on 05.25.2006. (0 Comments)
Email Issues (News)

Just a quick note, at the end of last week we were having some problems with email. Everything should have been resolved and eventually received/sent, but if you emailed us or used the contact form and didn't receive a response, we would be very grateful if you would re-send your email or just drop us a note that you haven't heard back from us. Thanks!

Posted by Chris on 05.08.2006. (0 Comments)
Scheduled downtime (News)

We're going to need to bring one of the Powerblogs servers down tonight for a few minutes for some maintenance. This will probably happen at around 10pm tonight. The interruption should not last long, but we apologize for the inconvenience. I'll update this post when we start and finish.

Update: We're pushing the maintenance to sunday. I'll update with more details tomorrow.

Update: I'll be bringing the server down for maintenance at 9pm tonight. Hopefully it shouldn't be down long. We apologize for the inconvenience.

Update: Ok, things look good now, and the server is back up. We apologize for the inconvenience.

Posted by Chris on 04.22.2006. (0 Comments)
Mailing Lists cleaned up (Bug Fixes)

Some of the mailing lists were improperly configured. I've cleaned all of that up, and they should all be working perfectly now.

Posted by Chris on 04.22.2006. (0 Comments)
Syndication Preferences fixed (Bug Fixes)

There was a bug in the syndication preferences where the email labels were switched — this is now fixed.

Posted by Chris on 04.22.2006. (0 Comments)
Server overloaded this morning

This morning one of the Powerblogs servers was overloaded and responding very slowly to web requests when it was responding at all. At first I suspected network trouble, but it turned out that Gaijin Biker got linked on fark with the description, "Buddhist temple in Japan 'devoted entirely to the worship of women's breasts' (with pic.)" (here's that post, if you're curious). So, about 75,000 people came over to take a look at it, most of them in the morning and early afternoon. If you've looked at his site, you'll notice that it's got a lot of graphics, and of course there was the advertised picture, so those 75,000 people drove about 3.3 million http requests on the web server.

To put this in perspective:

root@server2:/var/log/apache# ls -1sShr *.access.log | tail -5 
 25M themoderatevoice.access.log
 36M volokh.access.log
 39M deanesmay.access.log
458M ridingsun.access.log

To make a long story short, while the server was tuned to take instalanches well, this was way beyond an instalanche, and the server wasn't properly configured to handle it as well as it could. I re-tuned the webserver configs, and things were fine again. Now all of the powerblogs servers should be able to handle these sorts of loads without slowing down. (Unfortunately, webserver tuning is still something of an art — setting the parameters too far can cause your webserver to simply fail entirely and stop responding to requests when it gets heavily loaded — and what the parameters should be depend on your entire system, not just the web server.)

So, the servers should not get overloaded by traffic for the forseeable future. However, if you hear that congress is about to pass a law requiring all able-bodied adults to go view your pictures of breasts commentary at least 5 times a day, would you please email us with a heads-up?

Posted by Chris on 03.28.2006. (0 Comments)
Live Backups now active (New Features)

It's taken longer than I had hoped, but the live backing up of posts in their source form has now been pushed live. Now, every time a post gets written, a copy of the text gets sent to an offsite-backup immediately before the post is written locally. As long as the server can reach the internet, then, published posts will be backed up. The current Live Backups offsite-backup server is different than the offsite backup server which was in place since the beginning of Powerblogs. Soon we'll get that server configured for Live Backups as well, and then we'll have two offsite backups of posts separated by a few hundred miles.

This has been tested and should not have a noticeable impact on speed, but it hasn't been tested live (obviously), so during our initial experiences with it going live, there may be some delays in responsiveness in posting (on the order of about 30 seconds at the utmost maximum). If there are, please bear with us and we'll have those delays resolved quickly (but do let us know).

Update: Of course, there were timing issues. Unfortunately, they ran into the minutes, not seconds, so I've temporarily disabled the Live Backup system. I'm going to modify it to run the backup in the background so it has no chance of slowing down posting, and push it live again. With luck, that will be tonight.

Update: This is attempt two. Now the backup process runs in parallel to the posting process, so there should be no way that the Live Backups can possibly interfere with posting. It works great in testing, so I've pushed it live again. If you have any issues posting, please let us know immediately. Thanks!

Update: Ok, attempt two didn't work either. It was fast enough, but apparently sometimes interfered with the front page being rewritten after posts are published. I disabled LiveBackups again. The next version uses a completely different approach where the user interface process writes to a local process which then turns around and makes the foreign connection. The local connection will always be instant, since the same computer that you're running on can't disappear out from under you (or get lagged because some construction worker took out an internet line with a backhoe and traffic is being routed around it) unlike foreign computers, so this one should be downright foolproof. I hope to have it up tonight.

Update: Ok, here goes attempt three. As I mentioned, this one has the user interface process dumping the data off to another local process which can wait on the remote backup server without stalling the user interface or the writing of the blog pages. I've tested it very thoroughly, which means that it might work. ;)

Posted by Chris on 03.24.2006. (0 Comments)
Possible downtime tonight (News)

We might need to take one of the Powerblogs servers down temporarily for some emergency maintenance tonight at around 10pm EST. (It's the original Powerblogs server, not the newest server that experienced problems two months ago.) The downtime shouldn't last more than about an hour, and hopefully less than that. I'll update this post with more information as it becomes available.

Update: We're definitely going to do it tonight. We'll probably take it down between 10pm and 10:30pm. With luck, it won't be down for more than an hour. We apologize for the short notice, but we need to fix the problem which has just surfaced before it gets worse and forces us to deal with it. Thank you for your understanding, and we apologize for the inconvenience.

Update: Ok, we've finished with the down time for now. We'll be keeping an eye on the issue, and it may require additional downtime, but for the moment, things look good. Thank you for your patience, and again we apologize for the inconvenience.

Posted by Chris on 02.13.2006. (0 Comments)
Reports coming back online (News)

I've generated the reports again today, finally, and things are looking good. I've just put them on an automatic schedule for every 8 hours, so they should be generated regularly again from now on. I apologize to everyone for the time without them.

Posted by Chris on 01.17.2006. (0 Comments)
Improved backups (News)

Since there is still a lot of work to be done (as well as some theoretical issues to overcome) with redundant servers, there obviously needs to be significantly improved backups immediately. Here's what I've come up with so far:

  1. I've already set the off-site backups to take place every 4 hours. Unfortunately, the problem with the server that went down happened about 23 hours into the 24 hour backup cycle (i.e. at the worst possible time). This will immediately drastically cut down on the risk period. (This applies to all Powerblogs servers.)

  2. The new server is more powerful than the server which went down, and in particular has two identically sized hard drives (larger than on the previous server, too). I'm going to set up a program to mirror the entire server onto its second hard drive, so if something goes wrong with the primary hard drive, we can immediately reboot to the secondary hard drive, minimizing downtime. I'm thinking that to start we'll sync it every hour. Tom, the Powerblogs Idea Rat (his semi-official title), is doing research into whether we can use realtime filesystem change information to bring the syncing to something like every minute. (This will only apply to the new server.)

  3. I had forgotten that I purposely enabled the mailing list archives with being a last-ditch backup in mind. Unfortunately, we ended up having to use the last-ditch safety net — not something that should ever happen — so I want to improve this concept. I'm going to set up an off-site mail server and modify the powerblogs software to email the full post information in machine-readable form (suitable for use in automatic restoring), before it does anything else, to the off-site backup. This will make the off-site backups for the posts genuinely real-time, or at least very, very, very close to it. (This will apply to all servers.)

  4. When we get the old server back (and test it thoroughly), until it's a redundant server sibling with the newest server, I'll keep it around with a full install and configuration of the powerblogs software, ready to take on the role of another server if anything goes wrong.

#1 is done already. #2 will not take long to set up, though the downside is that it will require scheduled downtime in order to test. #3 will take a little longer to implement, but it won't take very long. I'm guessing that it will take about a day or two to get the first version of the full-system syncing to the second hard drive. I should have the email code and email receptacle up in about a week. (Please note, since I screwed up with this before, my estimates are not guarantees, and please assume that anything that I talk about is not implemented until I explicitly say that it's finished and live.)

The long-term plan is for redundant peer-to-peer servers which will truly have no single points of failure and can operate both together with load balancing and realtime syncing, and independently with resyncing. There are still a few practical problems with this that need addressing, and a few theoretical ones, but I think that it's doable and will only be a few months away.

Comments and suggestions about these backup plans would be appreciated. One of the problems that this has exposed is that Powerblogs operates on pretty thin margins (given the cost of bandwidth, the bandwidth that accounts come with, the cost of disk space, the size chunks that we have to buy the stuff in in order to get good prices, and the infrastructure to do development), which makes reliability enhancements like spare servers and RAID for primary storage somewhere between difficult and not doable. Now, Powerblogs has been especially unlucky (not counting the very brief outage a few weeks ago when an errant program filled up the hard drive — ironically, that would have taken out even redundant servers, since the 100Gb file would have been replicated to them both), but bad luck can be overcome with money. I'd be especially interested to know how users would feel about increased prices in order to pay for higher-end hardware (with RAID to guard against disk failure, more RAM and CPU for better performance, etc), faster connections to the off-site backup, etc. For example, if Powerblogs doubled prices, we should be able to afford to rent a dual 3.2GHz Xeon with 4 GB ram and 4 250GB SATA drives in RAID 0+1 for 500GB of usable storage. (RAID 0+1 means fast reads and the data is always on 2 drives at any time, so that a single drive failure won't impact the system's uptime at all.) It would be blazingly fast, handle high loads very well, and be quite reliable (there's also the effect in computers that the more the computer costs, typically the higher quality all of the parts in it).

I would greatly appreciate if subscribers could leave comments whether you'd want to pay more for a better system and higher reliability, or whether you prefer going the less expensive route and doing the best that we can with what we have? What do you guys want? Where do you think that we should go?

Update: I've made the initial copy of the data onto the second hard drive in the server. I'll be working on setting up the scheduled syncing to the second hard drive tomorrow. Within a few days, I hope to have the post-email-backups going, and within a two weeks, we might have two off-site locations that will be getting the posts emailed to them.

Update: I've been working on the syncing, and unfortunately it's not as fast as I want it yet. I'm periodically syncing it manually, and before too long I should have it scheduled to do the syncing. I'm also working on some changes to the Powerblogs code that will let the syncs go faster. (The reason for the concern over speed is that the syncing places some stress on the server, and while reader page loads should still be pretty quick, the Powerblogs interface itself will be slowed down a bit. I don't want improved reliability to come at the expense of increased frustration.)

Posted by Chris on 12.29.2005. (7 Comments)
Server Problems (News)

Something is wrong with one of the powerblogs servers, and we're currently working on figuring out what and resolving it. I'll keep you up to date as I get more information.

Update: The problem appears to be with server's hardware. While the people at EV1's datacenter are working on fixing that, I've rented another server and will be restoring it from backups (we have a backup current as of early this morning EST). If I'm done before the other server is fixed, it will become the new primary server and the other server will become its backup when it's fixed. If the current server is fixed first, this new server will become a backup for the old server. I've gotten a lot of progress done on having the software support the live backup system that I described earlier, so when I've got two servers, I'll push to get the rest done ASAP (the log reports were a major bottleneck, but the new off-server generation method will now only take a little bit of modification to adapt to that problem).

Update: So it appears to be a hard drive problem. I've put in to have an EV1 specialist look at the machine and replace the hard drive if necessary. In the mean time, we're working to get the new server up and loaded with the backups. My guess now is that I'll have that up before the older server is fixed. For some people who set their DNS statically (rather than using a CNAME), once the new server is up we'll have to modify your DNS. (Once the new server is up, youraccount.powerblogs.com will work immediately, as will anyone whose DNS is a CNAME.) I'll keep you up to date on the progress with the new server we've ordered and are currently configuring.

Update: We've got the new server, and have done the base config. I'm working on turning it into a Powerblogs server at the moment, which is coming along well. Soon, I'll be testing it out. Once I've tested it, I'll start restoring from this morning's backups.

Update: I believe that I've finished configuring the new server as a Powerblogs server. I've been testing it, and I the tests are looking good now. I'm just about ready to start restoring from this morning's backups.

Update: I've done tests, and the server config definitely appears correct. I'm uploading the backup data to the server. As soon as it's up, I'll begin restoring from it.

Update: Reloading the backup data is going well. I've got the main database up and have done some initial integrity checks and it looks good. I'm still working on getting auxiliary data (stylesheets, uploaded files, etc) up and restored. Once I'm finished with that, I have to republish the blog pages, do a final check to make sure everything is ready, and modify the DNS settings, but once that's finished we'll be good to go.

Update: Oh, fyi, I still haven't heard back from the EV1 "systems support specialist" about the old server. They were going to begin investigating shortly 6 hours ago.

Update: I've got all of the data uploaded and am currently regenerating the blogs. After this, I need to verify that everything's correct and change the DNS, and we'll be all set.

Update: The regeneration is moving along. I've been doing some testing and things are looking good so far. It shouldn't be too long after the regeneration that we go live. I've got a few more tests to do, and then I might start changing the DNS for blogs as they finish their regeneration individually.

Update: The regeneration is going pretty well. Only volokh and whiteperil are left. If your DNS isn't working, please send an email to support@powerblogs.com and I'll investigate. I think that most people on this server are set up to automatically pick up the DNS change.

Also, I've heard from EV1 and it looks like we might be able to recover the data from the hard drive. I'll let you know more on that front as soon as they tell me.

Update: The regeneration is very nearly complete, and nearly everything is back up again. Scheduled posting is not yet enabled, however, and probably won't be until tomorrow. For various technical reasons, when I restored it re-created posts which were saved for later. I need to write a quick program to cull the posts which are saved for later but were already published, and I'm too tired at the moment to trust myself to do that correctly. I should have that up some time tomorrow. I'm going to bed now, but the DNS for the remaining blogs has been set and I've done a partial regeneration on them, so their front page works and they can be used. That means that as of now, all Powerblogs blogs can be used, except for scheduling posts (you can schedule them, they just won't be published until some time tomorrow).

Update: I'm working on fixing the problem where saved posts that were published have been "resurrected". I hope to have the redundant saved posts removed within an hour or so.

EV1 has indicated that prospects for file recovery looks hopeful, but that it will take some time. They said that they should have further results for me in the late afternoon (I believe that that's Texas time). I should mention, though, that even if all of the data from the drive can be recovered, it is possible that a post could have been caught at the exactly wrong time so that it was never written to disk in the first place.

I also just realized another layer of backup which we have — the mailing list archives. You can get to them at http://powerblogs.com/pipermail/yourhostname/ — any lost posts might be there, since that's hosted on a different machine from the one that went down.

Update: The mailing list archives are looking to be a real saver from this disaster. So, if you may have lost any posts that were made after the last daily backup was taken, check your mailing list archive. It's URL is http://powerblogs.com/pipermail/{yourhostname}/2005-December/date.html — e.g. the archive for dev.powerblogs.com is http://powerblogs.com/pipermail/dev/2005-December/date.html

Update: I'm nearly done culling the saved posts, and will very shortly re-enable the scheduled poster.

Update: I'm done culling the saved posts, and have re-enabled the scheduled publisher. Scheduled posts should be good from here on out.

Update: For some blogs, the trackbacks to their posts didn't restore correctly. The trackback data is not lost, and I'm working on restoring that.

I'd also like to take this opportunity, now that things are calming down, to give my heartfelt apologies that it was not handled better and more swiftly, and that it caused so much grief, frustration, and worry. I'm very sorry for that, and we're working on ways to improve our reliability.

Update: For some people, not all of the comment pages were written properly. I've triggered a republishing of everyone's comment pages. This is slowing things down a little bit, but that should be over with in an hour or two. Also, I've been tuning the system to improve performance.

Posted by Chris on 12.28.2005. (64 Comments)
Downtime (News)

Some Powerblogs users experienced a few minutes of downtime recently. The short version is that it's over and won't happen again.

The longer version is that there was a very subtle bug in the new remote report software which somehow made one user's log report consume all available disk space (the bug was subtle; the effects certainly weren't). Servers require some disk space for temporary files and such, or they can't do things like serve pages or let people log in. I'm modifying the report software to ensure that it will never write large html files, so this problem will never happen again.

Posted by Chris on 12.20.2005. (0 Comments)
rel="nofollow" (Improvements)

Google has created a way to fight comment spam. It involves adding rel="nofollow" to all of the links in comments, trackbacks, readership reports, etc. I've updated the Powerblogs software to make use of this tag to help fight trackback spam.

Posted by Chris on 12.17.2005. (0 Comments)
Reports coming back on line (News)

I've finally got the offsite log generation system to the point where it's downloading the logs, generating the reports, and uploading them. It will still be a few days until the logs are generated regularly — for the next few days I'm going to manually run the report generation so that I can watch it and make sure that everything is going well. I will run the report script at least once a day for now. Once I'm confident in the new system, I'll have it back to generating the reports every four hours.

Posted by Chris on 12.14.2005. (0 Comments)
reports status (News)

Just a heas up on the reports: I'm almost ready to go live, now. I expect to have the reports live any day now — probably starting saturday night. (Because log data is valuable, and the off-site solution involves moving it and deleting it on the webserver, extensive testing needs to be done to ensure that no readership data is ever lost. Testing and debugging always takes more time that programming does, unfortunately.)

I am very grateful for the patience everyone has shown on this issue.

Posted by Chris on 12.08.2005. (0 Comments)
Reader statistics (News)

The new setup for readership statistics generation is going to be very robust, but it's not quite ready. It's the top priority project, though, and should be done very soon now. I'm hopeful that I can get it working before wednesday starts.

Posted by Chris on 12.06.2005. (0 Comments)
Scheduled posts display order (Improvements)

The order that saved-for-later posts display in on the page for choosing them now groups the posts scheduled for publishing first, and orders them by date. This was suggested by the EclectEcon.

(The off-site report generation is coming along, and I hope to have it up and running by this monday.)

Related Posts (on one page):

  1. Scheduled posts display order
  2. Scheduled posts
Posted by Chris on 12.03.2005. (0 Comments)