Improving the search – pagination

I’ve spoken about pagination before. Use the search to see where. I’ve started adding pagination to the Search page. I’ll be using the Pager module from PEAR again.

In short, here is what Pager gives you, in the form I’m using it:

  • Gives you a list of hyperlinks to the various pages
  • Tells you the parameters to use in your SQL query to get the first and last item on the page

You can also give it your entire data set, but that’s pretty heavy duty. I use Pager as a counter, for lack of a better term. It tells me what page I’m on, gives me the hyperlinks to the other pages, and leaves me alone to paint the page. It’s a tool.

How have you been paging your output?

I’ll probably give you a better introduction to Pager in a future post. For now, I want to share with you what I found in the search page and how I recently improved the SQL.

Better code

While I was adding the new starting with and ending with search conditions, I wanted to avoid duplicating some code. So I create this little function to help build the SQL. Query building is something I like doing. Here is part of the code that builds the user supplied condition based upon a field containing a supplied value. In short, you are searching for the supplied value somewhere within a given field. The condition is sometimes case-insensitive. Here is how I cater for that.

switch ($method) {
  case 'match':
    if ($casesensitivity == 'casesensitive') {
      $Like = 'LIKE';
    } else {
      $Like = 'ILIKE';
    }
    switch ($stype) {
      case 'name':
        $sql .= "\n     and element.name $Like '%$query%'";
        break;

      case 'package':
        $sql .= "\n     and ports.package_name $Like '%$query%'";
        break;

      case 'latest_link':
        $sql .= "\n     and ports.latest_link $Like '%$query%'";
        break;

      case 'shortdescription':
        $sql .= "\n     and ports.short_description $Like '%$query%'";
        break;
      
      case 'longdescription':
        $sql .= "\n     and ports.long_description $Like '%$query%'";
        break;
      
      case 'depends_build':
        $sql .= "\n     and ports.depends_build $Like '%$query%'";
        break;
      
      case 'depends_lib':
        $sql .= "\n     and ports.depends_lib $Like '%$query%'";
        break;

      case 'depends_run':
        $sql .= "\n     and ports.depends_run $Like '%$query%'";
        break;

      case 'depends_all':
        $sql .= "\n     and (ports.depends_build $Like '%$query%' OR 
                             ports.depends_lib   $Like '%$query%' OR
                             ports.depends_run   $Like '%$query%')";
        break;

      case 'maintainer':
        $sql .= "\n     and ports.maintainer $Like '%$query%'";
        break;
    }
    break;

Now consider how to do that if you have two more very similar conditions: starts with, or ends with. I could duplicate the above case statement two more times, and remove the leading and trailing wild card symbols (% is a wild card in SQL). Or, I could do this instead:

switch ($method) {
   case 'prefix':
      $WildCardMatch = "$query%";
      if ($casesensitivity == 'casesensitive') {
         $Like = 'LIKE';
      } else {
         $Like = 'ILIKE';
      }
      $sqlUserSpecifiedCondition = WildCardQuery($stype, $Like, $WildCardMatch);
      break;

   case 'match':
      $WildCardMatch = "%$query%";
      if ($casesensitivity == 'casesensitive') {
         $Like = 'LIKE';
      } else {
         $Like = 'ILIKE';
      }
      $sqlUserSpecifiedCondition = WildCardQuery($stype, $Like, $WildCardMatch);
      break;

   case 'suffix':
      $WildCardMatch = "%$query";
      if ($casesensitivity == 'casesensitive') {
         $Like = 'LIKE';
      } else {
         $Like = 'ILIKE';
      }
      $sqlUserSpecifiedCondition = WildCardQuery($stype, $Like, $WildCardMatch);
      break;

The function WildCardQuery() looks like this:

function WildCardQuery($stype, $Like, $query) {
# return the clause for this particular type of query
   $sql = '';
   switch ($stype) {
      case 'name':
         $sql .= "\n     and element.name $Like '$query'";
         break;

      case 'package':
         $sql .= "\n     and ports.package_name $Like '$query'";
         break;

      case 'latest_link':
         $sql .= "\n     and ports.latest_link $Like '$query'";
         break;

      case 'shortdescription':
         $sql .= "\n     and ports.short_description $Like '$query'";
         break;
  
      case 'longdescription':
         $sql .= "\n     and ports.long_description $Like '$query'";
         break;
  
      case 'depends_build':
         $sql .= "\n     and ports.depends_build $Like '$query'";
         break;
  
      case 'depends_lib':
         $sql .= "\n     and ports.depends_lib $Like '$query'";
         break;

      case 'depends_run':
         $sql .= "\n     and ports.depends_run $Like '$query'";
         break;

      case 'depends_all':
         $sql .= "\n     and (ports.depends_build $Like '$query' OR
		              ports.depends_lib   $Like '$query' OR
		              ports.depends_run   $Like '$query')";
         break;

      case 'maintainer':
         $sql .= "\n     and ports.maintainer $Like '$query'";
         break;
   }

   return $sql;
}

That’s much easier to maintain.

Better query

I did that yesterday. Today I was completing the work to count the number of rows. Yes, I can get the row count after issuing the query. But the goal with pagination is to not pull back all the data. Pull back exactly what is required for this page. To do that, you need to know how many rows you have, and thus, how many pages. With that information, Pager can tell you the parameters for your SQL query (i.e. to be used in the LIMIT and OFFSET clauses of your SELECT statement.

I did not want to build the query twice. The I wanted one code stream and at the end of it, I would have both the query for the row count, and for the result set. Here is an example of how I did it:

$sql = "
  SELECT DISTINCT 
         ports.id, 
         element.name as port,
         categories.name as category, 
         categories.id as category_id, 
         ports.version as version, 
         ports.revision as revision, 
         ports.maintainer, 
         ports.short_description, 
         ports.package_exists, 
         ports.extract_suffix, 
         ports.homepage, 
         element.status, 
         ports.element_id, 
         ports.broken, 
         ports.deprecated, 
         ports.ignore, 
         ports_vulnerable.current as vulnerable_current,
         ports_vulnerable.past    as vulnerable_past,
         ports.forbidden,
         ports.master_port,
         ports.latest_link,
         ports.no_package,
         ports.package_name,
         ports.restricted,
         ports.no_cdrom,
         ports.expiration_date,
         ports.no_package  ";

   if ($User->id) {
      $sql .= ",
         onwatchlist";
   }

   $sql .= "
    from ports LEFT OUTER JOIN ports_vulnerable on ports_vulnerable.port_id = ports.id , 
             categories, commit_log, commit_log_ports_elements, element  ";

   if ($User->id) {
         $sql .= "
      LEFT OUTER JOIN
 (SELECT element_id as wle_element_id, COUNT(watch_list_id) as onwatchlist
    FROM watch_list JOIN watch_list_element
        ON watch_list.id      = watch_list_element.watch_list_id
       AND watch_list.user_id = $User->id
       AND watch_list.in_service
  GROUP BY wle_element_id) AS TEMP
       ON TEMP.wle_element_id = element.id";
   }

   $sql .= '
   WHERE ports.category_id  = categories.id
      and ports.element_id   = element.id 
      and commit_log.id      = commit_log_ports_elements.commit_log_id
      and ports.element_id   = commit_log_ports_elements.element_id ' ;



/* snip out code already seen at the start of this article */

#
# include/exclude deleted ports
#
switch ($deleted) {
  case 'includedeleted':
    # do nothing
    break;

  default:
    $deleted = 'excludedeleted';
    # do not break here...

  case 'excludedeleted':
    $sql .= " and element.status = 'A' ";
}

switch ($orderby) {
  case ORDERBYCATEGORY:
    switch ($orderbyupdown) {
      case ORDERBYDESCENDING:
      default:
        $sql .= "\n order by categories.name desc, element.name";
        break;

      case ORDERBYASCENDING:
        $sql .= "\n order by categories.name, element.name";
        break;
    }
    break;

  case ORDERBYPORT:
  default:
    switch ($orderbyupdown) {
      case ORDERBYDESCENDING:
      default:
        $sql .= "\n order by element.name desc, categories.name";
        break;

      case ORDERBYASCENDING:
        $sql .= "\n order by element.name, categories.name";
        break;
    }
    break;
}

if ($start > 1) {
  $sql .= "\n OFFSET " . ($start - 1);
}

Now, here is what the code looks like now:

$sqlSelectFields = "
  select distinct 
         ports.id, 
         element.name as port,
         categories.name as category, 
         categories.id as category_id, 
         ports.version as version, 
         ports.revision as revision, 
         ports.maintainer, 
         ports.short_description, 
         ports.package_exists, 
         ports.extract_suffix, 
         ports.homepage, 
         element.status, 
         ports.element_id, 
         ports.broken, 
         ports.deprecated, 
         ports.ignore, 
         ports_vulnerable.current as vulnerable_current,
         ports_vulnerable.past    as vulnerable_past,
         ports.forbidden,
         ports.master_port,
         ports.latest_link,
         ports.no_package,
         ports.package_name,
         ports.restricted,
         ports.no_cdrom,
         ports.expiration_date,
         ports.no_package  ";
         
$sqlSelectCount = "
  SELECT count(*)";
  
$sqlWatchListFields = '';

   if ($User->id) {
      $sqlWatchListFields .= ",
         onwatchlist";
   }

   $sqlFrom = "
    from ports LEFT OUTER JOIN ports_vulnerable on ports_vulnerable.port_id = ports.id , categories, element  ";

$sqlWatchListFrom = '';
   if ($User->id) {
         $sqlWatchListFrom .= "
      LEFT OUTER JOIN
 (SELECT element_id as wle_element_id, COUNT(watch_list_id) as onwatchlist
    FROM watch_list JOIN watch_list_element
        ON watch_list.id      = watch_list_element.watch_list_id
       AND watch_list.user_id = $User->id
       AND watch_list.in_service
  GROUP BY wle_element_id) AS TEMP
       ON TEMP.wle_element_id = element.id";
   }

   $sqlWhere .= '
   WHERE ports.category_id  = categories.id
      and ports.element_id   = element.id ' ;


if ($method == 'soundex') {
   switch ($stype) {
      case 'name':
      case 'package':
      case 'latest_link':
      case 'maintainer':
         break;

      default:
         $method = 'match';
         echo "NOTE: Instead of using 'sounding like' as instructed, the system used 'containing'.  " . 
                          "See the notes below for why this is done.
"; break; } } /* snip code already shown above */ # # include/exclude deleted ports # switch ($deleted) { case 'includedeleted': # do nothing break; default: $deleted = 'excludedeleted'; # do not break here... case 'excludedeleted': $sqlUserSpecifiedCondition .= " and element.status = 'A' "; } switch ($orderby) { case ORDERBYCATEGORY: switch ($orderbyupdown) { case ORDERBYDESCENDING: default: $sqlOrderBy = "\n order by categories.name desc, element.name"; break; case ORDERBYASCENDING: $sqlOrderBy = "\n order by categories.name, element.name"; break; } break; case ORDERBYPORT: default: switch ($orderbyupdown) { case ORDERBYDESCENDING: default: $sqlOrderBy = "\n order by element.name desc, categories.name"; break; case ORDERBYASCENDING: $sqlOrderBy = "\n order by element.name, categories.name"; break; } break; } if ($start > 1) { $sqlOffsetLimit = "\n OFFSET " . ($start - 1); } else { $sqlOffsetLimit = ''; }

With those query clauses, this is what I use to count rows:

$sql = $sqlSelectCount . $sqlFrom .  $sqlWhere . $sqlUserSpecifiedCondition;

And this is the full query for pulling back the result set:

$sql = $sqlSelectFields . $sqlWatchListFields . $sqlFrom . $sqlWatchListFrom . 
        $sqlWhere . $sqlUserSpecifiedCondition . $sqlOrderBy . $sqlOffsetLimit;

That’s how I build my queries. How do you build yours?

Oh, and I wound up making the query much slimmer while doing this. For some reason, I’d been joining on commit_log_ports and commit_log when searching. That is why I needed that DISTINCT clause you see in the old code. Perhaps there was a reason for it in the past, but if so, it is lost.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top