Search Engine Friendly URLs with mod_rewrite

This is an old post!

This post is over 2 years old. Solutions referenced in this article may no longer be valid. Please consider this when utilizing any information referenced here.

By now, I’m sure we all know about search engine friendly (SEF) URLs - that is, URLs that are able to be traversed by a search spider. Spiders don’t like to see a bunch of stuff on the query string (file.html?blah=foo), but do like standard URL patterns like /file/foo.html. Not to mention that it’s a lot easier to read. But what happens when you need to do something more complicated - say, rewrite using different types of conditions with optional arguments?

Say, for instance, I have a script that takes arguments like this:

/file.php?id=1[&view=1]

And I want to rewrite it to look like this

/file/(id).html[&view=1]

In this case, the view argument is optional and could relate to any number of unique cases, such as internal viewing or refcode tracking, for instance. Well, your first thought might be something like this:

RewriteCond %{REQUEST_URI} ^\file\/\d+\.html [OR]
RewriteCond %{REQUEST_URI} ^\/file\/\d+\.html(.*)
RewriteRule ^\/file\/(\d+)\.html(.*) /file.php?id=$1&$2 [L]`

But it doesn’t work. This is because the query string isn’t part of the URI available for the rule to match. But, mod_rewrite, being the cool Swiss Army knife it is, lets you get around this by back referencing to the condition. Using the % operator instead of the $ allows you to reference parentesized expressions in the condition, like so:

RewriteCond %{REQUEST_URI} ^\/file\/\d+\.html
RewriteCond %{QUERY_STRING} (.+)
RewriteRule ^\/file\/(\d+)\.html?(.*) /file/file.php?id=$1&%1 [L]

RewriteCond %{REQUEST_URI} ^\/file\/\d+\.html
RewriteRule ^\/file\/(\d+)\.html /file/file.php?id=$1 [L]`

It’s described here in the docs. I thought this was a pretty cool solution to a problem that had been vexing me.

Comments (0)

Interested in why you can't leave comments on my blog? Read the article about why comments are uniquely terrible and need to die. If you are still interested in commenting on this article, feel free to reach out to me directly and/or share it on social media.

Contact Me
Share It
Apache
The goal of this project were twofold: To completely eliminate the need for me to touch the phone to provision it. I want to be able to create a profile for it in the database, then simply plug the phone in and let it do the rest. And… To eliminate per-phone physical configuration files stored on the server. The configuration files should be generated on the fly when the phone requests them. So the flow of what happens is this: I create a profile for the phone in the database, then plug the phone in. Phone boots initially, receives server from DHCP option 66. Script on the server hands out the correct provisioning path for that model of phone. Reboots with new provisioning information. Phone boots with new provisioning information, begins downloading update SIP application and BootROM. Reboots. Phone boots again, connects to Asterisk. At this rate, provisioning a phone for a new employee is simply me entering the new extension and MAC address into an admin screen, and giving them the phone. It’s pretty neat. **Note: **there are some areas where this is intentionally vague, as I’ve tried to avoid revealing too much about our private corporate administrative structure. If something here doesn’t make sense or you’re curious, post a comment. I’ll answer as best I can. Creating the initial configs I used the standard download of firmware and configs from Polycom to seed a base directory. This directory, on my server, is /www/asterisk/prov/polycom_ipXXX, where XXX in the phone model. Right now we deploy the IP-330, IP-331 and IP-4000. While right now the IP-330 and IP-331 can use the same firmware and configs, since the IP-330 has been discontinued they will probably diverge sometime in the not too near future. With the base configs in place, this is where mod_rewrite comes into play. I added the following rewrite rules to the Apache configs: {% highlight apache %} RewriteEngine on RewriteRule ^/000000000000.cfg /index.php RewriteRule /prov/[^/]+/([^/]+)-phone.cfg /provision.php?mac=$1 [L] RewriteRule /prov/polycom_[^/]+/[^/]+-directory.xml /prov/polycom_directory.php` RewriteCond %{THE_REQUEST} ^PUT* RewriteRule /prov/[^/]+/([^/]+).log /prov/polycom_log.php?file=$1` {% endhighlight %} To understand what these do, you will need to take apart the anatomy of a Polycom boot request. It requests the following files in this order: whichever bootrom.ld image it’s using, [mac-address].cfg if it exists or 000000000000.cfg otherwise, the sip.ld image, [mac-address]-phone.cfg, [mac-address]-web.cfg, and [mac-address]-directory.xml. So, we’re going to rewrite some of these requests to our scripts instead. Generating configs on the fly We’re going to skip the first rewrite rule (we’ll talk about that one in a little bit since it has to do with plug-in auto provisioning). The one we’re concerned with is the next one, which rewrites [mac-address]-phone.cfg requests to our provisioning script. So each request to that file is actually rewritten to provision.php?mac=[mac-address]. Now, in the database, we’re keeping track of what kind of phone it is (an IP-330, IP-331 or IP-4000), so when a request hits the script, we look up in the database what kind of phone we’re dealing with based on the MAC address, and use the variables from the database to fill in a template file containing exactly what that phone needs to configure itself. For example, the base template file for the IP-330 looks something like this: {% highlight php %} <server $p) { ?> voIpProt.server..address="" voIpProt.server..expires="3600" voIpProt.server..transport="UDPOnly" /> <reg $p) { ?> reg..displayName=" " reg..address="" reg..type="private" reg..auth.password="" reg..auth.userId="" reg..label=" " reg..server.1.register="1" reg..server.1.address="" reg..server.1.port="5060" reg..server.1.expires="3600" reg..server.1.transport="UDPOnly" /> {% endhighlight %} The script outputs this when the phone requests it. Voila. Magic configuration from the database. There’s a little bit more to it than this. A lot of the settings custom to the company and shared among the various phones are in a master dealnews.cfg file, and included with each phone (it was added to the 000000000000.cfg file). Now, on to the next rule. Generating the company directory Polycom phones support directories. There’s a way to get this to work with LDAP, but I haven’t tackled that yet. So, for now, we generate those dynamically as well when the phone requests any of its *-directory.xml files. This one’s pretty easy since 1) we don’t allow the endpoints to customize their directories (yet), and 2) because every phone has the same directory. So all of those requests go to a script that outputs the XML structure for the directory: {% highlight php %} $ext) { ?> {% endhighlight %} We do this for both the 000000000000-directory.xml and the [mac-address]-directory.xml file because one is requested at initial boot (the 000000000000-directory.xml file is intended to be a “seed” directory), whereas subsequent requests are for the MAC address specific file. Getting the log files Polycoms log, and occasionally the logs are useful for debug purposes. The phones, by default, will try to upload these logs (using PUT requests if you’re provisioning via HTTP like we are). But having the phone fill up a directory full of logs is ungainly. Wouldn’t it be better to parse that into the database, where it can be easily queried? And because the log files have standardized names ([mac-address]-boot/app/flash.log), we know what phone they came from.Well, that’s what the last two rewrite lines do. We rewrite those PUT requests to a PHP script and parse the data off stdin, adding it to the database. A little warning about this. Even at low settings Polycom phones are chatty with their logs. You may want to have some kind of cleaning script to remove log entries over X days old. Passing the initial config via DHCP At this point, we have a working magic configuration. Phones, once configured, fetch dynamically-generated configuration files that are guaranteed to be as up-to-date as possible. Their directories are generated out of the same database, and log files are added back to the same database. It all works well! … except that it still requires me to touch the phone. I’m still required to punch into the keypad the provisioning directory to get it going. That sucks. But there’s a way around that too! By default, Polycom phones out of the box look for a provisioning server on DHCP option 66. If they don’t find this, they will proceed to boot the default profile thats ships with the phone. It’s worth noting that, if you don’t pass it in the form of a fully-qualified URL, it will default to TFTP. But you can pass any format you can add to the phone. {% highlight bash %} if substring(hardware, 1, 3) = 00:04:f2 { option tftp-server-name “http://server.com”; } {% endhighlight %} In this case, what we’ve done is look for a MAC address in Polycom’s space (00:04:f2) and pass it option 66 with our boot server. But, we’re passing the same thing no matter what kind of phone it is! How can we tell them apart, especially since, at this point, we don’t know the MAC address. The first rewrite rule handles part of this for us. When the phone receives the server from option 66 and requests 000000000000.cfg from the root directory, we instead forward it on to our index.php file, which handles the initial configuration. Our script looks at the HTTP_USER_AGENT, which tells us what kind of phone we’re dealing with (they’ll contain strings such as “SPIP_330”, “SPIP_331” or “SSIP_4000”). Using that, we selectively give it an initial configuration that tells it the RIGHT place to look. {% highlight php %} <?php ob_start(); if(stristr($_SERVER[‘HTTP_USER_AGENT’], “SPIP_330”)) { include “devices/polycom_ip330_initial.php”; } if(stristr($_SERVER[‘HTTP_USER_AGENT’], “SPIP_331”)) { include “devices/polycom_ip331_initial.php”; } if(stristr($_SERVER[‘HTTP_USER_AGENT’], “SSIP_4000”)) { include “devices/polycom_ip4000_initial.php”; } $contents = ob_get_contents(); ob_end_clean(); echo $contents; ?> {% endhighlight %} These files all contain a variation of my previous auto-provisioning configuration config, which tells it the proper directory to look in for phone-specific configuration. Now, all you do is plug the phone in, and everything else just happens. A phone admin’s dream. Keeping things up to date By default, the phones won’t check to see if there’s new config or updated firmware until you tell them to. But his also means that some things, especially directory changes, won’t get picked up with any regularity. A quick change to the configs makes it possible to schedule the phones to look for changes at a certain time: {% highlight xml %} {% endhighlight %} This causes the phones to look for new configs at 1AM each morning and do whatever they have to with them. Conclusions The reason all this is possible is because Polycom’s files are 1) easily manipulatable XML, as opposed to the binary configurations used by other manufacturers, and 2) distributed, so that you only need to actually send what you need set, and the phone can get the rest from the defaults. In practice this all works very well, and cut the time it used to take me to configure a phone from 5-10 minutes to about 30 seconds. Basically, as long as it takes me to get the phone off the shelf and punch the MAC address into the admin GUI I wrote. I don’t even need to take it out of the box!
Read More
Apache
I am currently in the process of migrating a bunch of sites on this machine from Apache to nginx. Rather than take everything down and migrate it all at once, I wanted to do this incrementally. But that raises a question: how do you incrementally migrate site configs from one to the other on the same machine, since both servers will need to be running and listening on ports 80 and 443? The solution I came up with was to move Apache to different ports (8080 and 4443) and to set the default nginx config to be a reverse proxy!
Read More
Mac
So there’s this program out there called Calibre which, despite it’s pretty terrible UI, is pretty much the gold standard for managing eBooks. Seriously, it’s such a great program whose only fault is its terrible engineer UI. One of the nice things that Calibre includes is a built-in web server that can serve books via OPDS. If you have an OPDS-compatible reader (I use Marvin), you can browse and download from your library directly on your device, basically creating your own private eBook cloud. But, this presents a little bit of an issue. Namely, I don’t want all of my books to be publicly available, while still providing a subset of my library for visitors to browse and use. But I still want to be able to access them myself from my “private reserve collection.” Fortunately, with a little bit of work, you can do that under Calibre.
Read More