INTRODUCTION TO URL REWRITING — SMASHING MAGAZINE

  -  

About The Author

Paul Tero is an experienced PHPhường. programmer và hệ thống administrator. He developed the Stockashop ecommerce system in 2005 for Sensable Media. He now works …More aboutPaul↬


Many Web companies spover hours và hours agonizing over the best domain name names for their clients. They try to lớn find a domain name that is relevant và appropriate, sounds professional yet is distinctive, is easy lớn spell và remember và read over the phone, looks good on business cards và is available as a dot-com.

Bạn đang xem: Introduction to url rewriting — smashing magazine



Or else they spkết thúc thousands of dollars to lớn purchase the one they really want, which just happened khổng lồ be registered by a forward-thinking và hard-to-find squatter in 1998.

Further Reading on SmashingMag:

They go through all that trouble with the domain name name but neglect the rest of the URL, the element after the domain name name. It, too, should be relevant, appropriate, professional, memorable, easy lớn spell & readable. And for the same reasons: khổng lồ attract customers and improve in tìm kiếm ranking.


Fortunately, there is a technique called URL rewriting that can turn unsightly URLs into nice ones — with a lot less agony và expense than picking a good tên miền name. It enables you to fill out your URLs with friendly, readable từ khóa without affecting the underlying structure of your pages.

This article covers the following:

What is URL rewriting?How can URL rewriting help your search rankings?Examples of URL rewriting, including regular expressions, flags và conditionals;URL rewriting in the wild, such as on Wikipedia, WordPress và shopping websites;Creating friendly URLs;Changing pages names và URLs;Checkdanh sách and troubleshooting.

What Is URL Rewriting?

If you were writing a letter lớn your bank, you would probably open your word processor và create a file named something like lettertongân hàng.doc. The file might sit in your Documents directory, with a full path like C:WindowsusersjulieDocumentslettertobank.doc. One file path = one document.

Similarly, if you were creating a banking website, you might create a page named page1.html, upload it, và then point your browser to lớn . One URL = one resource. In this case, the resource is a physical Web page, but it could be a page or product drawn from a CMS.

URL rewriting changes all that. It allows you khổng lồ completely separate the URL from the resource. With URL rewriting, you could have sầu taking the user to lớn …/page1.html or khổng lồ …/about-us/ or to lớn …/about-this-website-and-me/ or khổng lồ …/youll-never-find-out-about-me-hahaha-Xy2834/. Or to lớn all of these. It’s a bit like shortcuts or symbolic link on your hard drive. One URL = one way to lớn find a resource.

With URL rewriting, the URL & the resource that it leads khổng lồ can be completely independent of each other. In practice, they’re usually not wholly independent: the URL usually contains some code or number or name that enables the CMS khổng lồ look up the resource. But in theory, this is what URL rewriting provides: a complete separation.

How Does URL Rewriting Help?

Can you guess what this Web page sells?

B&Q went lớn all the trouble and expense of acquiring diy.com và implementing a stoông xã controlled e-commerce trang web, but left its URLs indecipherable. If you guessed “brown guttering,” you might want to lớn considering playing the lottery.

Even when you tìm kiếm directly for this “miniflow gutter brown” on Google UK, B&Q’s page comes up only seventh in the organic tìm kiếm results, below much smaller companies, such as a building supplier with a single outlet in Stirlingshire. B&Q has 300+ branches & so is probably much bigger in budget, form size & exposure, so why is it not doing as well for this search term? Perhaps because the other tìm kiếm results have URLs like http://www.prof…co.uk/products/brown-miniflo-gutter-148/; that is, the URL itself contains the words in the tìm kiếm term.

url rewriting

Almost all of these results on Google have the search term in their URLs (highlighted in green). The one at the bottom does not.

Looking at the URL from B&Q, you would (probably correctly) assume that a file named nav.jsp within the directory /diy/jsp/bq/ is used to display products when given their ID number, 11577676 in this case. That is the resource intimately tied khổng lồ this URL.

So, how would B&Q go about turning this inlớn something more recognizable, like , without restructuring its whole website? The answer is URL rewriting.

Another way lớn look at URL rewriting is lượt thích a thin layer that sits on top of a trang web, translating human- and search-engine-friendly URLs inlớn actual URLs. Doing it is easy because it requires hardly any changes to lớn the website’s underlying structure — no moving files around or renaming things.

URL rewriting basically tells the Web hệ thống that/products/miniflow-gutter-brown/11577676 should show the Web page at: /diy/jsp/bq/nav.jsp?action=detail&fh_secondid=11577676,without the customer or tìm kiếm engine knowing about it.

Many factors (or “signals”), of course, determine the search ranking for a particular term, over 200 of them according khổng lồ Google. But friendly and readable URLs are consistently ranked as one of the most important of those factors. They also help humans khổng lồ quickly figure out what a page is about.

The next section describes how this is done.

How To Rewrite URLs

Whether you can implement URL rewriting on a website depends on the Web server. Apabít usually comes with the URL rewriting module, mod_rewrite, already installed. The set-up is very common and is the basis for all of the examples in this article. ISAPI Rewrite is a similar module for Windows IIS but requires payment (about $100 US) & installation.

The Simplest Case

The simplest case of URL rewriting is to rename a single static Web page, and this is far easier than the B&Q example above. To use Apache’s URL rewriting function, you will need to lớn create or edit the .htaccess file in your website’s document root (or, less commonly, in a subdirectory).

For instance, if you have a Web page about horses named Xu8JuefAtua.htm, you could add these lines to .htaccess:

RewriteEngine OnRewriteRule horses.htm Xu8JuefAtua.htmNow, if you visit , you’ll actually be shown the Web page Xu8JuefAtua.htm. Furthermore, your browser will remain at horses.htm, so visitors & search engines will never know that you originally gave the page such a cryptic name.

Introducing Regular Expressions

In URL rewriting, you need only match the path of the URL, not including the tên miền name or the first slash. The rule above essentially tells Apabít that if the path contains horses.htm, then show the Web page Xu8JuefAtua.htm. This is slightly problematic, because you could also visit reallyfasthorses.html, & it would still work. So, what we really need is this:

RewriteEngine OnRewriteRule ^horses.htm$ Xu8JuefAtua.htmThe ^horses.htm$ is not just a tìm kiếm string, but a regular expression, in which special characters — such as ^ . + * ? ^ ( ) < > and $ — have sầu extra significance. The ^ matches the beginning of the URL’s path, & the $ matches the end. This says that the path must begin và end with horses.htm. So, only horses.htm will work, và not reallyfasthorses.htm or horses.html. This is important for tìm kiếm engines lượt thích Google, which can penalize what it views as duplicate content — identical pages that can be reached via multiple URLs.

Without File Endings

You can make this even better by ditching the file ending altogether, so that you can visit either or :

RewriteEngine OnRewriteRule ^horses/?$ Xu8JuefAtua.html The ? indicates that the preceding character is optional. So, in this case, the URL would work with or without the slash at the over. These would not be considered duplicate URLs by a search engine, but would help prevent confusion if people (or liên kết checkers) accidentally added a slash. The stuff in brackets at the end of the rule gives Apabịt some further pointers. is a flag that means that the rule is case insensitive sầu, so would also work.

Wikipedia Example

We can now look at a real-world example. Wikipedia appears khổng lồ use URL rewriting, passing the title of the page lớn a PHP.. file. For instance…http://en.wikipedia.org/wiki/Barack_obama

… is rewritten to:http://en.wikipedia.org/w/index.php?title=Barack_obama

This could well be implemented with an .htaccess file, like so:

RewriteEngine On#Look for the word "wiki" followed by a slash, and then the article titleRewriteRule ^wiki/(.+)$ w/index.php?title=$1 The previous rule had /?, which meant zero or one slashes. If it had said /+, it would have meant one or more slashes, so even would have sầu worked. In this rule, the dot (.) matches any character, so .+ matches one or more of any character — that is, essentially anything. And the parentheses — ( ) — ask Apabít to rethành viên what the .+ is. The rule above sầu, then, tells Apache lớn look for wiki/ followed by one or more of any character & lớn remember what it is. This is remembered và then rewritten as $1. So, when the rewriting is finished, wiki/Barack_obama becomes w/index.php?title=Barack_obama

Thus, the page w/index.php is called, passing Barack_obama as a parameter. The w/index.php is probably a PHP. page that runs a database lookup — lượt thích SELECT * FROM articles WHERE title=‘Barachồng obama’ — và then outputs the HTML.

url rewriting

You can also view Wikipedia entries directly, without the URL rewriting.

Comments và Flags

The example above also introduced comments. Anything after a # is ignored by Apabít, so it’s a good idea lớn explain your rewriting rules so that future generations can understand them. The flag means that if this rule matches, Apache can stop now. Otherwise, Apađậy would continue applying subsequent rules, which is a powerful feature but unnecessary for all but the most complex rule sets.

Implementing the B&Q Example

The recommendation for B&Q above could be implemented with an .htaccess file, like so:

RewriteEngine On#Look for the word "products" followed by slash, hàng hóa title, slash, id numberRewriteRule ^products/.*/(<0-9>+)$ diy/jsp/bq/nav.jsp?action=detail&fh_secondid=$1 Here, the .* matches zero or more of any character, so nothing or anything. And the <0-9> matches a single numerical digit, so <0-9>+ matches one or more numbers.

The next section covers a couple of more complex conditional examples. You can also read the Apabít rewriting guide for much more information on all that URL rewriting has to offer.

Conditional Rewriting

URL rewriting can also include conditions & make use of environment variables. These two features make for an easy way to lớn redirect requests from one domain name alias khổng lồ another. This is especially useful if a trang web changes its domain, from mywebsite.teo.uk lớn mywebsite.com for example.

Domain Forwarding

Most domain registrars allow for tên miền forwarding, which redirects all requests from one domain name lớn another domain name, but which might send requests for www.mywebsite.teo.uk/horses to lớn the trang chính page at www.mytrang web.com & not khổng lồ www.mywebsite.com/horses. You can achieve this with URL rewriting instead:

RewriteEngine OnRewriteCond %HTTP_HOST !^www.mywebsite.com$ RewriteRule (.*) http://www.mywebsite.com/$1 The second line in this example is a RewriteCond, rather than a RewriteRule. It is used to lớn compare an Apabịt environment variable on the left (such as the host name in this case) with a regular expression on the right. Only if this condition is true will the rule on the next line be considered.

In this case, %HTTP_HOST represents www.mytrang web.teo.uk, the host (i.e. domain) that the browser is trying to visit. The ! means “not.” This tells Apabịt, if the host does not begin và over with www.mywebsite.com, then remember and rewrite zero or more of any character to www.mywebsite.com/$1. This converts www.mytrang web.teo.uk/anything-at-all lớn www.mytrang web.com/anything-at-all. And it will work for all other aliases as well, like www.mytrang web.biz/anything-at-all và mytrang web.com/anything-at-all.

Xem thêm: Mẹo Bài Thuyết Trình Giới Thiệu Sản Phẩm Mới Gây Ấn Tượng, Mẹo Thuyết Trình Giới Thiệu Sản Phẩm Gây Ấn Tượng

The flag is very important. It tells Apabít khổng lồ vị a 301 (i.e. permanent) redirect. Apađậy will sover the new URL baông chồng to lớn the browser or tìm kiếm engine, và the browser or search engine will have sầu khổng lồ request it again. Unlượt thích all of the examples above sầu, the new URL will now appear in the browser’s location bar. And tìm kiếm engines will take note of the new URL và update their databases. by itself is the same as và signifies a temporary redirect.

File Existence và WordPress

screenshot

WordPress enables the author lớn choose their own URL for an article.

WordPress’ .htaccess tệp tin looks like this:

Internally, index.php (probably) looks at the environment variable $_SERVER<‘REQUEST_URI’> and extracts the information that it needs to lớn find out what it is looking for. This gives it even more flexibility than Apache’s rewrite rules and enables WordPress to mimic some very sophisticated URL rewriting rules. In fact, when administering a WordPress blog, you can go khổng lồ Settings → Permalink on the left side, và choose the type of URL rewriting that you would like khổng lồ mimic.

screenshot

WordPress’ permaliên kết settings, letting you choose the type of URL rewriting that you would like khổng lồ mimic.

Rewriting Query Strings

If you are hired lớn recreate an existing trang web from scratch, you might use URL rewriting to lớn redirect the đôi mươi most popular URLs on the old website to the locations on the new trang web. This could involve redirecting things like prod.php?id=đôi mươi to products/great-product/2342, which itself gets redirected khổng lồ the actual sản phẩm page.

Apache’s RewriteRule applies only lớn the path in the URL, not to parameters like id=đôi mươi. To bởi vì this type of rewriting, you will need lớn refer to lớn the Apache environment variable %QUERY_STRING. This can be accomplished lượt thích so:

RewriteEngine OnRewriteCond %QUERY_STRING ^id=20$ RewriteRule ^prod.php$ ^products/great-product/2342$ RewriteRule ^products/(.*)/(<0-9>+)$ ^productview.php?id=$1 In this example, the first RewriteRule triggers a permanent redirect from the old website’s URL to lớn the new website’s URL. The second rule rewrites the new URL khổng lồ the actual PHP page that displays the product.

Examples Of URL Rewriting On Shopping Websites

For complex content-managed websites, there is still the issue of how to bản đồ friendly URLs to lớn underlying resources. The simple examples above did that mapping by hvà, manually associating a URL like horses.htm with the tệp tin or resource Xu8JuefAtua.htm. Wikipedia looks up the resource based on the title, and WordPress applies some complex internal rule sets. But what if your data is more complex, with thousands of products in hundreds of categories? This section shows the approach that Amazon và many other shopping websites take.

If you’ve sầu ever come across a URL like this on Amazon, http://www.amazon.co.uk/High-Voltage-AC-DC/dp/B00008AJL3, you might have assumed that Amazon’s trang web has a subdirectory named /High-Voltage-AC-DC/dp/ that contains a tệp tin named B00008AJL3.

This is very unlikely. You could try changing the name of the top-level “directory” và you would still arrive sầu on the same page, http://www.amazon.co.uk/Test-Voltage-AC-DC/dp/B00008AJL3.

The bit at the kết thúc is what really matters. Looking down the page, you’ll see that B00008AJL3 is this AC/DC album’s ASIN (Amazon Standard Identification Number). If you change that, you’ll get a “Page not found” or an entirely different product: B003BEZ7HI.

The /dp/ also matters. Changing this leads lớn a “Page not found.” So, the B00008AJL3 probably tells Amazon what khổng lồ display, and the dp tells the website how to display it. This is URL rewriting in action, with the original URL possibly ending up getting rewritten to something like:http://www.amazon.teo.uk/displayproduct.php?asin=B00008AJL3.

Features of an Amazon URL

This introduces some important features of Amazon’s URLs that can be applied lớn any trang web with a complex set of resources. It shows that the URL can be automatically generated và can include up khổng lồ three parts:

The words In this case, the words are based on the album và artist, và all non-alphanumeric characters are replaced. So, the slash in AC/DC becomes a hyphen. This is the bit that helps humans và search engines.An ID number Or something that tells the website what lớn look up, such as B00008AJL3.An identifier Or something that tells the website where to lớn look for it và how to lớn display it. If dp tells Amazon to lớn look for a product, then somewhere along the line, it probably triggers a database statement such as SELECT * FROM products WHERE id="B00008AJL3".Other Shopping Examples

Many other shopping websites have URLs like this. In the danh sách below, the ID number & (suspected) identifier are in bold:

http://www.ecất cánh.co.uk/**itm**/Ian-Rankin-Set-Darkness-Rebus-Novel-/**140604842997**http://www.kelkoo.com/**c**-**138201**-lighting/brand/caravan**5266430**_**3**http://www.gumtree.com/**p**/for-sale/boys-bmx-bronx-blaze/**97669042****c**/Televisions/LCD-Plasma-LED-TVs/**1844**

A significant benefit of this type of URL is that the actual words can be changed, as shown below. As long as the ID number stays the same, the URL will still work. So products can be renamed without breaking old links. More sophisticated websites (like Ciao above) will redirect the changed URL baông xã lớn the real one & thus avoid creating the appearance of duplicate content (see below for more on this topic).

screenshot

Websites that use URL rewriting are more flexible with their URLs — the words can change but the page will still be found.

Friendly URLs

Now you know how lớn map nice friendly URLs to their underlying Web pages, but how should you create those friendly URLs in the first place?

If we followed the current advice, we would separate words with hyphens rather than underscores và capitalize consistently. Lowercase might be preferable because most people search in lowercase. Punctuation such as dots & commas should also be turned inkhổng lồ hyphens, otherwise they would get turned into things like %2C, which look ugly & might break the URL when copied & pasted. You might want to lớn remove sầu apostrophes and parentheses entirely for the same reason.

Whether khổng lồ replace accented characters is debatable. URLs with accents (or any non-Roman characters) might look bad or break when rendered in a different character format. But replacing them with their non-accented equivalents might make the URLs harder for search engines khổng lồ find (and even harder if replaced with hyphens). If your website is for a predominately French audience, then perhaps leave sầu the French accents in. But substitute them if the French words are few and far between on a mainly English website.

This PHPhường function succinctly handles all of the above sầu suggestions:

function GenerateUrl ($s) //Convert accented characters, and remove parentheses and apostrophes $from = explode (",", "ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,e,i,ø,u,(,),<,>,""); $to lớn = explode (",", "c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,e,i,o,u,,,,,,"); //Do the replacements, & convert all other non-alphanumeric characters lớn spaces $s = preg_replace ("~<^wd>+~", "-", str_replace ($from, $lớn, tryên ($s))); //Remove sầu a - at the beginning or over & make lowercase return strtolower (preg_replace ("/^-/", ’, preg_replace ("/-$/", ’, $s)));This would generate URLs like this:

emang đến GenerateUrl ("Pâtisserie (Always FRESH!)"); //returns "patisserie-always-fresh"Or, if you wanted a links khổng lồ a $hàng hóa variable to lớn be pulled from a database:

$sản phẩm = array ("title"=>"Great product", "id"=>100);emang đến "";emang lại $product<"title"> . "";

Changing Page Names

Search engines generally ignore duplicate nội dung (i.e. multiple pages with the same information). But if they think they are being manipulated, tìm kiếm engines will actively penalize the trang web, so avoid this where possible. Google recommends using 301 redirects lớn skết thúc users from old pages lớn new ones.

When a URL-rewritten page is renamed, the old URL & new URL should both still work. Furthermore, lớn avoid any risk of duplication, the old URL should automatically redirect to lớn the new one, as WordPress does.

Doing this in PHP is relatively easy. The following function looks at the current URL, and if it’s not the same as the desired URL, it redirects the user:

function CheckUrl ($s) // Get the current URL without the query string, with the initial slash $myurl = preg_replace ("/?.*$/", ’, $_SERVER<"REQUEST_URI">); //If it is not the same as the desired URL, then redirect if ($myurl != "/$s") Header ("Location: /$s", true, 301); exit;This would be used lượt thích so:

$producturl = GenerateUrl ($product<"title">) . "/" . $product<"id">;CheckUrl ($producturl); //redirects the user if they are at the wrong placeIf you would like to use this function, be sure to chạy thử it in your environment first và with your rewrite rules, to make sure that it does not cause any infinite redirects. This is what that would look like:

screenshot

This is what happens when Google Chrome visits a page that redirects khổng lồ itself.

Checkmenu And Troubleshooting

Use the following checklist khổng lồ implement URL rewriting.

1. Cheông xã That It’s Supported

Not all Web servers tư vấn URL rewriting. If you put up your .htaccess file on one that doesn’t, it will be ignored or will throw up a “500 Internal Server Error.”

2. Plan Your Approach

Figure out what will get mapped khổng lồ what, & how the correct information will still get found. Perhaps you want to lớn introduce new URLs, lượt thích my-great-product/p/123, lớn replace your current sản phẩm URLs, like sản phẩm.php?id=123, & to substitute new-category/c/12 for category.php?id=12.

3. Create Your Rewrite Rules

Create an .htaccess tệp tin for your new rules. You can initially vị this in a /testing/ subdirectory and using the flag, so that you can see where things go:

RewriteEngine OnRewriteRule ^.+/p/(<0-9>+) product.php?id=$1 RewriteRule ^.+/c/(<0-9>+) category.php?id=$1 Now, if you visit www.mytrang web.com/testing/my-great-product/p/123, you should be sent khổng lồ www.mywebsite.com/testing/product.php?id=123. You’ll get a “Page not found” because sản phẩm.php is not in your /testing/ subdirectory, but at least you’ll know that your rules work. Once you’re satisfied, move the .htaccess tệp tin lớn your document root and remove the flag. Now www.mywebsite.com/my-great-product/p/123 should work.

4. Check Your Pages

Test that your new URLs bring in all the correct images, CSS & JavaScript files. For example, the Web browser now believes that your Web page is named 123 in a directory named my-great-product/p/. If the HTML refers to a file named images/hình ảnh sản phẩm.jpg, then the Web browser would request the image from www.mywebsite.com/my-great-product/p/images/logo sản phẩm.jpg & would come up with a “File not found.”

You would need khổng lồ also rewrite the image locations or make the references absolute (lượt thích ) or put a base href at the top of the of the page (). But if you vày that, you would need to fully specify any internal liên kết that begin with # or ? because they would now go to lớn something lượt thích sản phẩm.php#details.

5. Change Your URLs

Now find all references khổng lồ your old URLs, và replace them with your new URLs, using a function such as GenerateUrl lớn consistently create the new URLs. This is the only step that might require looking deep into the underlying code of your website.

6. Automatically Redirect Your Old URLs

Now that the URL rewriting is in place, you probably want Google to forget about your old URLs & start using the new ones. That is, when a search result brings up sản phẩm.php?id=20, you’d want the user khổng lồ be visibly redirected lớn my-great-product/p/123, which would then be internally redirected baông xã to product.php?id=đôi mươi.

This is the reverse of what your URL rewriting already does. In fact, you could add another rule khổng lồ .htaccess to lớn achieve this, but if you get the rules in the wrong order, then the browser would go inlớn a redirect loop.

Another approach is to bởi vì the first redirect in PHPhường, using something lượt thích the CheckUrl function above sầu. This has the added advantage that if you rename the hàng hóa, the old URL will immediately become invalid và redirect khổng lồ the newest one.

7. Update and Resubmit Your Site Map

Make sure khổng lồ carry through your new URLs to your site maps, your sản phẩm feeds và everywhere else they appear.

Xem thêm: Top 10 Cửa Hàng Bán Ốp Lưng Điện Thoại Đẹp Nhất Đà Nẵng, 670 Ốp Ý Tưởng

Conclusion

URL rewriting is a relatively quick & easy way lớn improve sầu your website’s appeal lớn customers & search engines. We’ve sầu tried khổng lồ explain some real examples of URL rewriting và khổng lồ provide the technical details for implementing it on your own trang web. Please leave any comments or suggestions below.