Finding another way to parse URLs

NOTE: This article varies slightly from the original. When I first wrote this, I did not use a ? in the URL. Later in the day, I realised my error. That led to a much better solution. :)

Central to how FreshPorts works is the parsing of the URL. Almost everying in FreshPorts is in a database, not in a static file somewhere. I need to improve the URL parsing. I wrote about Parsing the URL back in January, but I didn’t go into much detail about how and why I need to do this. I now realise, with pagination, I need to embrace better techniques. For example, the existing FreshPorts codee has no idea what do with /UPDATING?page=1. It looks for that in the database, instead of looking for the second page of commits against /ports/UPDATING.

Here are some PHP things I’ve found to help me parse URLs. In this example, we will use http://beta.freshports.org/UPDATING. Let’s break some of this stuff down:

$url = parse_url($_SERVER["SCRIPT_URI"]); echo "parse_url output is: <pre>"; print_r($url); echo "</pre>";

The output from the above will be:

parse_url output is:
Array
(
    [scheme] => http
    [host] => beta.freshports.org
    [path] => /UPDATING
)

This is good. It separates out the components I need. Now, let’s go more complex and try http://beta.freshports.org/UPDATING?page=2. The output for that is:

parse_url output is:
Array
(
    [scheme] => http
    [host] => beta.freshports.org
    [path] => /UPDATING&page=2
)

Good, all good. Why, because $_SERVER[‘REDIRECT_QUERY_STRING’] will contain ‘page=1&size=89’, and I can use parse_str on that! See below.

Now, what can I use to break up path? Let’s add this to the solution:

echo "and the query parts of the URL are:<pre>"; parse_str($_SERVER['REDIRECT_QUERY_STRING'], $query_parts); var_dump($query_parts); echo "</pre>";

This gives me:

array(2) {
  ["page"]=>
  string(1) "1"
  ["size"]=>
  string(2) "89"
}

OK, that’s looking pretty good. The first element of the array has our file name (in FreshPorts-speak), and the rest is our parameters. How does it go with this: http://beta.freshports.org/sysutils/bacula-server/&page=2&files=yes

parse_url output is: 
Array
(
    [scheme] => http
    [host] => beta.freshports.org
    [path] => /sysutils/bacula-server/
)
and the query parts of the URL are:
array(2) {
  ["page"]=>
  string(1) "2"
  ["files"]=>
  string(3) "yes"
}

This approach looks very promising. The second array contains what I need need to work on. The first element is the path name, and I can use that to find an entry in the element table (the element table contains a self-referential table for each file in the CVS tree). The rest of the array are just parameters. Looking good!

Have you done similar URL parsing? How have you done it? Have you seen attempts similar to what I’m doing? Any suggestions?

Related Posts

Leave a Comment Cancel Reply