Technical SEO 101
Last Update
It gets tough when you are a total beginner at the technical sides of running your own blog or website. I remember it took me a while to figure out the difference between the .htaccess file and robots.txt
Even though I always loved anything IT and computers, the first time you encounter a new object or concept, your mind needs a bit of time to absorb the workings of it.
And .htaccess and robots.txt are the two files in the root folder of your website that you really can’t live without, so learning what they are and how they work is critical.
In this post, I will explain to you what they are, why they matter and how you can use them to your advantage, with a few examples to make learning easier and faster.
The Difference Between .HTACCESS and ROBOTS.txt File
A first general definition of this difference is that .htaccess is used mostly for internal access, whereas robots.txt manages external access.
“Internal” because .htaccess tells your Apache server how to handle page and file names, URLs and a user’s way to access these resources; it’s for your site to handle its own features.
Robots.txt, on the contrary, regulates “external” access, because it tells search engines and other web tools what to read and index and whatnot (but a human user can still browse and read).
Some examples:
.htaccess for a WordPress installation with “nice” permalinks:
<IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule>
Robots.txt for a WordPress blog that disallows Googlebot but allows other search engines, but also allows none to crawl their /familypics folder:
User-agent: * Disallow: /familypics User-agent: Googlebot Disallow: /
In other words .htaccess protects your site from people, robots.txt from machines.
Why You Can’t Live Without .HTACCESS and ROBOTS.txt
Without .htaccess, your site will behave in its default way or may not work at all, depending on the software you use. For example, the standard installation of WordPress doesn’t include an .htaccess file: WordPress automatically creates the file when you configure your permalinks under Settings -> Permalinks. If you don’t set your permalinks, the default configuration for your URLs will stay the “unfriendly” http://example.com/?p=203, with “203” being the post ID in the MySQL database.
Without robots.txt, search engine and tool crawlers (and scrapers) would index or copy your entire website without exceptions, including files you want to keep “private”. See the example from my previous paragraph about this.
The modern Web user doesn’t like “unfriendly” URLs and search engines don’t want to find any “junk” present on your website when their spiders meet it, so having your .htaccess and robots.txt files correctly configured only works to your advantage.
How You Can Benefit From These Two Files
The technical side of these two files is only a small portion of the benefits they can bring to your table. There are also benefits connected with UX (User Experience) and SEO.
With .htaccess you can:
- Create user-friendly URLs
- Redirect users away from specific files and directories
- Manage 301 redirects (you can use a generator tool)
With robots.txt you can:
- Tell search engines what (not) to index
- Disallow crawlers from other services (like Archive.org)
Takeaway:
Configure your .htaccess and robots.txt files as soon as you set up your website or blog. They are critical to the wellbeing of your site in the search index and from a user’s viewpoint.
More resources to check out:
- Introduction to URL Rewriting by Paul Tero at Smashing Magazine
- Robots.Txt: A Beginners Guide by Boris Demaria at Woorank
- Robots.txt Tutorial by Aaron Wall