WordPress, nginx, W3TC and robots.txt

A quick note to try and save somebody else the hours of pain I just experienced…

Here’s the scenario: you’re being dead clever and ditching Apache in favour of Nginx to run your WordPress blog/site and pretty much have everything right. You’re NOT using a plugin to generate robots.txt for you – after all, WordPress does a good enough job through the Settings > Privacy page. You browse to http://domain.com/robots.txt and everything looks pretty sweet. Heck, you might even go and change the privacy settings and grab robots.txt again to make sure it’s all working the way you expect.

Then… you drop the W3 Total Cache bomb. Now, W3TC is pretty well regarded, but it hasn’t had any love for a several months. In fact, it hasn’t even been updated to say it’s compatible with WordPress 3.3.0+ (which it appears to be, AFAICT, although some people have had issues with Minify). What it does have though, is Nginx support out of the box.

What does that mean? Well, if W3TC detects that it is running on Nginx, it will write out a snippet of Nginx configuration which deals with all the cleverness needed to get Nginx to serve W3TC page cache files statically off the disk without having to go through PHP. (This, my friends, is a large part of the secret sauce that makes an Nginx/PHP stack so much faster than Apache/PHP.) Theoretically, all you have to do is use the include directive to pull this snippet into your virtual host configuration file, and you’re good to go. (If you do this then don’t forget to nginx -s reload every time you tweak your W3TC settings.)

And then it hits you. robots.txt has stopped working.

Here’s my solution (in my virtual host file, if you care):

    location = /robots.txt {
        # Force robots.txt through the PHP. This supercedes a match in the
        # generated W3TC rules which forced a static file lookup
        rewrite ^ /index.php;
    }

This is a pretty specific location (using = and not having a regexp), so it trumps anything in the W3TC generated config. Any request for robots.txt is rewritten to index.php which your regular Nginx rules should then hand off to PHP-FPM, which means WordPress will dynamically generate the content for you.

Wow. That took me, literally, 2-3 hours to figure out. Mostly because I didn’t notice it had stopped working when I added W3TC into the mix. Once I’d figured out W3TC (or rather the W3TC generated config) was the culprit, the actual fix was pretty quick.

I’ll be writing more about my Nginx config and the relative performance against Apache2 on an Amazon EC2 Micro instance soon. In the mean time, I hope I saved you some time!

Share this:
Facebook Twitter Digg Email

This entry was posted in nginx, Open Source, Performance. Bookmark the permalink.

4 Responses to WordPress, nginx, W3TC and robots.txt

  1. Chris says:

    cheers for this mate. Just noticed my own robots.txt files were broken on my nginx/varnish setup with W3 and had no idea till i was looking in the server logs for something.

    This fixed it up perfectly

  2. Sutariya says:

    Thank you so much Man ! I was trying to solve this error from hours !! Tried almost 15 different ways to fix robots.txt but from your Tutorial, Finally fixed it !

    Thanks Again 🙂

  3. Dan says:

    This also works:
    rewrite ^/robots\.txt$ /index.php last;

Leave a Reply

Your email address will not be published. Required fields are marked *