I find just from the command line that stagingvm is behaving abysmally. You can just be typing a command and it will halt.

A quick check on top shows a typical VM story: High wait time, but the machine is actually doing nothing. This is typically when the physical node has too much disk access to do.

I think we probably need to find a new home for stagingvm.

Attached is a top snapshot showing the situation.
Screenshot-rfay@stagingvm:-usr-local-sbin.png

Comments

rfay’s picture

Status: Active » Fixed

Calling this fixed for now after the disk back-end change yesterday. Will reopen if it looks problematic.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

rfay’s picture

Priority: Normal » Critical
Status: Closed (fixed) » Active

All of a sudden we have really poor performance again. Just executing a vi session takes seconds.

I've disabled svn updates and such because they were never completing and were stomping on top of each other.

rfay’s picture

The every-5-minute cron job that checks out and processes themes and such, which normally takes less than a minute to run, is often taking more than 5 minutes. I've changed it to run every 10 minutes instead of every 5, but it looks like something has gone haywire. And you can feel the slowness even on the command line.

calebgilbert’s picture

It sounds like the symptoms rfay is describing are things from stagingvm. When I checked staging vm the load was a bit high for a moment, rysnc was 18% cpu and then php spiked to 100% for a second (I'm not sure what these are from - anyone?), but then it went back down.

The worst thing I saw was that load times a staging site is taking forever stagingdb which looks like it is tapped out on memory and swapping quite a bit: http://screencast.com/t/MDlhYTNiYWY

So definitely a problem on stagingdb and a possibly a problem I wasn't able to replicate on stagingvm...

rfay’s picture

It did settle down. The rsync is from the every 5 minute update (which is just rsyncing locally). That's the amazing thing - that a local filesystem rsync could get so slow.

Once I spaced out the cron jobs and they didn't then overlap each other, we at least didn't get hundreds of rsync jobs.

I'd say it is a disk i/o issue.

obrienmd’s picture

I assume D.O has enough infrastructure for stuff like this, but my workplace might be able to donate a non-oversubscribed VM on one of our vmserver (kvm) clusters for this

gerhard killesreiter’s picture

The OSL has had difficulties with their diskspace on the VMs' machines for a while. They are working on resolving this.

rfay’s picture

Status: Active » Fixed

Good enough.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Component: staging.drupal.org » Other