May 282006
 

Last week I took the car in for an oil change. I usually wait for the work to be done, and while waiting, I head over the road to a Perkins for some breakfast. While there, I usually do a bit of coding. It is actually a good environment for this. The place isn’t full at this time of day, and it’s reasonably quiet. During this lastest coding-over-breakfast episode, I wrote up code to help speed up FreshPorts: caching to disk.

FreshPorts is very database intensive. Most of the information is stored in a database and retrieved for display. This information can be static for long periods of time. In general, the information about a port does not change until a new commit is made against that port. This makes a port ideal for caching.

When a new commit comes in, remove the existing cache entry, and create a new one.

Tonight it was hot out, and I didn’t sleep well. I got up around midnight and did some more coding and testing until about 3 AM, when I started writing this blog entry. Right now, it’s fairly cool after a mild thunderstorm and a bit of rain. I am about ready to head off to bed, but wanted to put this entry up and to prompt some feedback on the code.

There are three main functions:

  1. Add – create a new cache entry
  2. Retrieve – get a copy of something from the cache
  3. Remove – delete something from the cache

For this cache, items remain in the cache until they are explicitly removed.

The basic code, meant to be generic, is available for viewing as http://beta.freshports.org/tmp/cache.phps.

NOTE:

  • I have done only some preliminary testing.
  • There should be some checks on the cache keys to prevent arbitrary read/write of files outside the cache location
  • Don’t depend on this code
  • Please provide feedback

I took this cache class and create a second class, specifically for ports: http://beta.freshports.org/tmp/cache-port.phps. There’s not much in there, but the main pupose is to make the code in the main port of FreshPorts easier to write.

Using the cache-port class as an example, I’ll create a similar class dedicated to ports.

I have implemented the Retrieve() and Add() functionality to the BETA site. Rendering times for a given page haved dropped to about 0.2 seconds for a page that is already in the cache. At present, I am only caching the following information for a given port:

  1. NOTES from /usr/ports/UPDATING
  2. Any relevant extracts from /usr/ports/MOVED
  3. The commit history

The above information was selected for caching because it is all continguous in output and can be saved as a single block of HTML. The port description, master sites, install instructions, etc is generated by a function that is not so easy to use here. Eventually, I’ll also cache that information, but for now, it is dynamically generated for each page load.

Rendering times have fallen to about 0.2 seconds for each page that is already in the cache. This compares to about 0.02 seconds for pages that are truly static HTML. The reduction in rendering time varies according to the number of commits in the history. As a practical example, sysutils/bacula-server takes about 1.2 to 2.5 seconds to render. The same page in the beta site consisntently takes less than 0.5 seconds now that caching is implemented. I expect this to reduce slightly more once I cache additional port information.

Here is how the code is used:

require_once($_SERVER['DOCUMENT_ROOT'] . '/../classes/cache.php');
require_once($_SERVER['DOCUMENT_ROOT'] . '/../classes/cache-port.php');

// start of caching
$HTML = '';

$Cache = new CachePort();
$result = $Cache->Retrieve($port->category, $port->port, $data);
if (!$result) {
    echo $data;
} else {
    # lots of code here to accumulate HTML

    $HTML .= freshports_PortCommits($port);

    // end of caching

    echo $HTML;
    $Cache->Add($port->category, $port->port, $HTML);
}

The code is meant to be simple to use. So far, it is.

How you can help

I have questions. Do you have answers?

  1. Do you see any security issues in the caching code?
  2. Can you see any coding errors?
  3. Do you have any recommendations for improvement?

Left to do

The following still remains to be done:

  • Implement the remove code into the processing of incoming comments so that the cache is updated the next time the page is requested.
  • Add the checks to prevent access to files outside the cache.
  • Implement the cache for categories, etc

Time for bed

I see it’s now 3:30 AM. It’s pretty cool here by the window, and I hope it’s cool up stairs. I need to sleep.

  One Response to “Caching to disk”

  1. Hey thanks for creating the Freshports site. It is one of a kind.

    Anyway, I was interested in the code that you have but now the link is dead. Have you looked into using a ramdisk for storing the database data, if possible, and using a database in RAM for the transactions? That would be really kool. I can fork over some dollars for RAM….