Server keeps timing out due to memory usage - How to fix this?

Sutra

Investor and Business Mentor
BuSo Pro
Joined
Oct 28, 2015
Messages
840
Likes
917
Degree
3
Over the last few months my site has been sporadically going down. I spoke with support at Knownhost and they said the server's services were killed due to using too much memory.

They found there were a bunch of brute force attacks so I had them block the offending IP's. Then I added rate limiting and throttling via Wordfence.

That helped. The outages decreased, however it still happens sometimes. I spoke to support again and this time they said the 2 biggest factors they see are:
  1. Googlebot crawling excessively
  2. Admin-ajax.php called every time someone accesses a new page. Regarding the admin-ajax they said: "...while probably negligible as far as memory resources are concerned, it seems to be being queried very often (like someone is currently logged into the page, editing a post or creating a new post, and haven't logged out)..."
How do I fix this?
 
How much RAM does your server have?
 
My server has 3G of RAM.

Support told me I am using about half of that, and have a little bit more than half of the total available to use.

total used free shared buff/cache available
Mem: 3.0G 1.3G 1.3G 115M 434M 1.6G
Swap: 0B 0B 0B
 
If your server isn't hitting SWAP then it's not a RAM problem. When your server/computer runs out of RAM it then starts writing to the harddrive, which is WAYYY slower than RAM memory - and that is called SWAP.

Since you aren't hitting SWAP Knowhost's support is at best guessing what the problem might be. You need someone to diagnosis it for real by getting SSH access and looking at what's running throughout the day.

LOL at Googlebot crawling being the problem. The problem is you've got a shitty plugin (or badly coded theme) that's calling that admin-ajax.php script and causing you problems. There is a chance that you've been hacked/compromised and your server is being used to send mass emails or something - perhaps one of those bruteforce attacks worked. But again this is all guesswork.

My first action would be to turn off all plugins and see if the site gets back to normal. If things stabilize then you know it's a plugin or a badly coded theme.
 
According to G Webmaster Tools, I have a bunch of 404 errors (due to me removing pagination and redirecting everything a while back - obviously I didn't do it right, hah). Is it possible that the 404s/redirects are causing the overload? As in, googlebot is crawling and rapidly going in circles taking up a ton of resources?
 
According to G Webmaster Tools, I have a bunch of 404 errors (due to me removing pagination and redirecting everything a while back - obviously I didn't do it right, hah). Is it possible that the 404s/redirects are causing the overload? As in, googlebot is crawling and rapidly going in circles taking up a ton of resources?

Google could be getting caught up those pages but they're not likely going to hammer your site. Do you have access to your server logs? Dump it into something like Screaming Frog's Log Analyzer and you can visualize what bots are getting caught up in and how frequently.

New Relic could also show you what pages and processes are using up the most resources and give you a time frame of when it happens.
 
@Sutra,

Every time your website is hit by a visitor, cron jobs as well as admin-ajax.php is run along side the Heartbeat API to check a bunch of nonsense. This was built in for sites that get very little traffic to trigger cron jobs instead of just running them when the timer runs up.

... Every single page load checks and runs crap unnecessarily. This is native Wordpress behavior that doesn't interfere often because most sites never get enough traffic for it to matter.

In your wp-config.php file at the root of your Wordpress installation, add this:

PHP:
/** Limit Chron Runs **/
define('WP_CRON_LOCK_TIMEOUT', 900 );

The number 900 is in seconds. 900 seconds is 15 minutes. You can set it lower than 15 minutes if you want, that's just where I set it because I don't have any crons that have to be run any more urgent than that. I could probably go higher.

But what this will do is stop this crap from running every single page load to ONCE and only once every 15 minutes. That should reduce that load drastically, if not fix your problem entirely.

This won't stop the Heartbeat API, which you probably want running, but I'm pretty sure it limits it's execution. It might goof up any real-time plugins you're using for stat gathering, but if you're running a normal content site it's fine.

If it turns out this makes a difference but not the full difference you need you can disable the Heartbeat API entirely or limit it like I showed with the WP_CRON_LOCK_TIMEOUT above. There's a plugin for it or you can find some functions.php code for it. Usually the Heartbeat API only fires from backend activity but it's available for plugin developers to abuse and fire from frontend users, which might be what's happening in your case.

Other things to consider if that doesn't fix it would be to make sure you're on PHP7 (if all your plugins are compatible), maybe run the P3 Plugin Profiler to see if it can spot any one or two plugins acting crazy, and you can try the Query Monitor plugin to see if you have any loops or queries going nuts and isolate that to a specific plugin.
 
That's great specific advice from @Ryuzaki and it sounds like it may be the root of your problem, but I'd still get a dev to check out the server logs. I bet there are some other clues in there that combined with what Ryuzaki said will get you fixed up.

It's a shame you're paying a lot of money to knownhost for fully managed hosting and they can't offer you any real support.
 
Over the last few months my site has been sporadically going down. I spoke with support at Knownhost and they said the server's services were killed due to using too much memory.

My server has 3G of RAM.

Support told me I am using about half of that, and have a little bit more than half of the total available to use.

Those two data points (services killed for excessive memory, using only half the available memory) obviously occurred at different timelines. So no point in analyzing further.

Ask support for memstat output at the time your services are being killed. Not when everything is running well.
 
@Ryuzaki Thank you mucho. I just added that to the php file. Will see how it goes.

@CCarter @builder Thank you for the detailed info. I've just requested the memstat info to investigate this further.
 
Sounds like an ajax script/plugin that's just not dying. Do you have any plugins that are using ajax to retrieve data? Calling admin-ajax.php at every page refresh shouldn't be an issue, it's the stacking of resources that some rogue script thats using the backend of WP is more than likely the issue.

Did you already try installing debugbar in WP and checking which functions, hooks and actions are being run on the page(s) that are causing issues?
 
@Sutra, So how did this pan out?
 
@Ryuzaki Funny you ask, I was actually going to post about this again.

I added the WP_CRON_LOCK_TIMEOUT', 900 like you suggested. That seemed to help but then over the last couple weeks the timeouts started happening again. Knownhost wasn't much help, basically saying the same things as before. However, this time they did recommend I use a plugin to change the login URL, which I did. Also on their suggestion I disabled Cron jobs through WP entirely. Now the jobs are only submitted via the server.

The last thing they suggested is to reduce the MaxRequestWorkers setting. They say it's a bit high for current resources. I told them to hold off on that for now though. It sounded to me like it would affect users even further - but I could be wrong.

After all that, the outages still happened a few times. So a few days ago I removed a few plugins I thought might be the issue and I also installed Query Monitor and P3 Plugin Profiler.

Yesterday and today there haven't been outages, however, at times the site goes really slow, on both the front-end and the Wordpress backend. So it seems like something may still be up. I tried running the scan for the P3 Plugin Profiler and let it run for about 90 minutes. The progress bar only showed about 15% done so I stopped it.

I asked Knownhost for the the memstat output as @builder suggested. Support replied, "The only outputs we have available are the OOM log entries, and limited output from sar -r."

So after all that I have some questions, hah:
  1. Should I reduce the the MaxRequestWorkers setting as support suggested?
  2. When I ran the P3 Plugin Profiler I thought something was wrong because it was taking so long, thus I stopped it. But should I just let it run overnight?
  3. If P3 Plugin Profiler completes, what should I be looking for in the report?
  4. What exactly am I looking for within the Query Monitor info?
  5. Anything else I should check/do?
 
@Sutra, P3 Plugin Performance Profiler should be completing in minutes, not hours. I'm not sure how it does it, but it takes a measurement of the runtime for all of your plugins and displays them in a pie chart:

q4Qis2G.png

It tells you your overall plugin loading time and the percentage of your total page load time associated with your plugins (and also if they're running a zillion MySQL queries. The fact that you can't get it to run at all seems like you may have a runaway plugin.

Query Monitor can help. What it does is add a drop down in the Wordpress menu bar that lets you see how long it took your page to load and how much of that was associated with Wordpress queries.

hS4etGn.png

If there's anything very wrong it will let you know and identify which queries are the problem. With some searching you can identify whether a plugin is causing the issue or if there's some crazy queries in your theme that can't be cached. Anything like "most popular posts" that write to the database on each page load and change which are displayed on each pageload can't really be cached. Stuff like that will pop out.

Which model of VPS are you using at Knownhost? Could it be that it's time to upgrade? Can you tell us what plugins you're using, or if not have you done any searches related to this issue with each plugin involved in the search?
 
You're just taking shots in the dark here and the performance of your site is directly tied to your traffic.

Why the resistance to hiring a developer to properly troubleshoot?
 
I asked Knownhost for the the memstat output as @builder suggested. Support replied, "The only outputs we have available are the OOM log entries, and limited output from sar -r."

Can you ask for (1) sar output at the time of the event and (2) OOM logs?

If it is feasible, move the site to a different server and see if the problem still persists. Virtualization has brought with it a whole host of weird behaviors by VMs - sometimes as a result of incorrect configurations, sometimes as a result of bugs.
 
Last edited:
@Ryuzaki Thanks for the detailed info. I uninstalled all the plugins and reinstalled them one by one and monitored the times with Query Monitor. Found a couple of plugins that may be the problem, one was Simple History (logs every action and shows it within the WP backend). With that installed the pages were taking, on average, 2 extra second to load the page. Once in a while it would take up to 15 seconds and would show a Slow Query message in Query Monitor. I uninstalled that.

The other is Thrive Clever Widgets. This one may actually be the root of the OOM problem - but I'm no expert so I could be wrong, hah. With this installed, the pages on average took another extra 2.5 seconds to load, sometimes as long as 25 seconds. And it gave both Slow Query message and about 80 Duplicate Query messages.

The issue with uninstalling that though, is that the site starts broad and covers many niches within the broad niche. Because of that, I use the Clever Widgets to display relevant stuff in the sidebar for each sub niche. Some of the sub niches are better suited to show ads, some are better to show an info product, some show the specific social media pages we have for that specific sub niche, some show blog posts relevant only to that sub niche, etc.

So not sure what to do, hah. I don't know for sure that it's causing the server OOM yet, but I do know for sure it makes the pages/posts take an extra 2-3 seconds on average to load. But if I remove it then it removes all those things that are specifically shown for the sub niche, including social media pages made just for that sub niche in the site. I assume if I remove those that will negatively affect rankings because Google won't know those social media pages are associated with the site. Unless maybe I create a separate page that lists all of our social media pages.

What do you suggest I do about this?

@Calamari No resistance to hiring a dev. I just thought this would be easy to figure out on my own, hah. After this, if I get the OOM error again I will definitely hire a dev.

@builder I've requested the info. Waiting to hear back.
 
@Sutra, you could create a child theme and register a bunch of sidebars in the functions.php. These would show up as spots you could drop widgets into in the Widgets screen. I can't predict how the sidebars are currently set up. Sometimes it's one master sidebar.php file and sometimes there's sidebar-blog.php and sidebar-page.php as examples, which get called in their respective templates like single.php and page.php.

The point being though, if it's set up like that you could create your own sidebar templates to show the specific new sidebars you registered. Alternatively, if it's all crammed into one sidebar.php with if loops, you could write a bunch of if loops in there like "if direct child of this parent, show this sidebar" and "if child or grandchild or any other level of descendent" and so forth. The if loops might get pretty complicated depending on what you have going on. It might be as easy as "if in this category."

It really depends on how dynamic your site is in terms of how often you roll out more categories and things like that. If it's pretty static in its build at this point this could be a good solution for you, in order to keep that functionality while dodging the plugin.
 
Oooookaaay. Out of Memory errors happened again yesterday. Time to hire someone to take a look.

@Calamari @Ryuzaki can you recommend any devs?
 
I'm sorry I don't know any. My best advice is to look for a sys admin that works with wordpress that will log into the server via ssh to properly diagnose the server.
 
Oooookaaay. Out of Memory errors happened again yesterday. Time to hire someone to take a look.

@Calamari @Ryuzaki can you recommend any devs?

See if you can get @SmokeTree and/or @turbin3 to review this thread and see if they think they can diagnose the issue. They're likely among the most qualified for the task.
 
Back