I noticed just now that the database has not been keeping up with the rapid changes to the incoming XML. In general, anything in the XML should eventually get into the database. That is just not happening. Case in port: repo name.
This is now becoming an issue when we link to the SVN repository. Look at this commit and you’ll see that the URL to the repo is:
/base/releng/9.1/sys/amd64/amd64/initcpu.c => http://svnweb.freebsd.org/ports/head//base/releng/9.1/sys/amd64/amd64/initcpu.c?view=log#rev240240
That’s wrong. This is a src commit. It should link to http://svnweb.freebsd.org/base/releng/9.1/sys/amd64/amd64/initcpu.c?view=log#rev240240
Hmm, looking at this more closely, I see the URL is nearly right. It just contains the wrong prefix. Remove /ports/head// and it’s fine.
What about port commits? devel/p4ftpd http://svnweb.freebsd.org/ports/head/head/devel/p4ftpd/Makefile?view=log#rev303919
Yeah. That’s wrong too. It should be http://svnweb.freebsd.org/ports/head/devel/p4ftpd/Makefile?view=log#rev303919 (i.e. remove /head/)
Really, the repo prefix for svnweb is wrong in both cases.
As for doc commits, see index.sgml which links to http://svnweb.freebsd.org/ports/head//doc/head/en_US.ISO8859-1/htdocs/releng/index.sgml?view=log#rev39514 but really needs to have /ports/head// removed from the URL.
I think this calls for a repo table, which points to the correct repo URLs.
CREATE TABLE repo(
id serial not null,
name text not null,
description text not null,
svn_hostname text not null,
path_to_repo text not null
);
create unique index repo_id on repo(id);
create unique index repo_name on repo(name);
create unique index repo_description on repo(description);
create unique index repo_path_to_repo on repo(path_to_repo);
insert into repo (name, description, svn_hostname, path_to_repo) values ('ports', 'The FreeBSD Ports tree', 'svnweb.freebsd.org', '/ports');
insert into repo (name, description, svn_hostname, path_to_repo) values ('doc', 'The FreeBSD doc tree', 'svnweb.freebsd.org', '/doc');
insert into repo (name, description, svn_hostname, path_to_repo) values ('src', 'The FreeBSD src tree', 'svnweb.freebsd.org', '/base');
Which gives us:
freshports.org=# select * from repo; id | name | description | svn_hostname | path_to_repo ----+-------+------------------------+--------------------+-------------- 1 | ports | The FreeBSD Ports tree | svnweb.freebsd.org | /ports 2 | doc | The FreeBSD doc tree | svnweb.freebsd.org | /doc 3 | src | The FreeBSD src tree | svnweb.freebsd.org | /base (3 rows) freshports.org=#
Now, let’s link the commit_log table to the repo table:
ALTER TABLE commit_log add repo_id integer;
alter table commit_log
add foreign key (repo_id)
references repo(id) on update cascade on delete restrict;
Next step: populate that field… This should be interesting. Let’s start with the doc tree:
UPDATE commit_log CL SET repo_id = (SELECT id FROM repo WHERE name = 'doc') WHERE CL.id in ( SELECT distinct CL.id FROM commit_log_elements CLE, commit_log CL, element_pathname EP WHERE CLE.element_id = EP.element_id AND CLE.commit_log_id = CL.id AND EP.pathname like '/doc/%' );
That updated 14,849 records. Let’s see about the ports tree next. Hmm, what if a port affects more than one repo…? That should not happen, but let’s just check:
SELECT * FROM ( SELECT distinct CL.id, CL.repo_id FROM commit_log_elements CLE, commit_log CL, element_pathname EP WHERE CLE.element_id = EP.element_id AND CLE.commit_log_id = CL.id AND EP.pathname like '/ports/%' ) AS TMP WHERE TMP.repo_id is not null; id | repo_id -------+--------- 45075 | 2 (1 row)
This finds all commits against the ports tree for which a repo is already assigned. In this case, it must be the doc repo because that is the only assignment we have done so far. Let’s look at that commit.
freshports.org=# select message_id from commit_log where id = 45075;
message_id
--------------------------------------------------
200208072021.g77KLCRq064264@freefall.freebsd.org
(1 row)
OMG, are you kidding me? There’s three different repos on that commit! Oh well, I’ll just let it be converted to a port commit because I don’t want to special case the following SQL statement, which sets the repo for all the port commmits:
UPDATE commit_log CL SET repo_id = (SELECT id FROM repo WHERE name = 'ports') WHERE CL.id in ( SELECT distinct CL.id FROM commit_log_elements CLE, commit_log CL, element_pathname EP WHERE CLE.element_id = EP.element_id AND CLE.commit_log_id = CL.id AND EP.pathname like '/ports/%' );
That updated 262,478 records.
Now let’s see about the src commits. But first, we’ll check for commits to src, which affected another repo. We did a similar check for ports, after setting the docs commits.
freshports.org=# SELECT * FROM freshports.org-# ( freshports.org(# SELECT distinct CL.id, CL.repo_id freshports.org(# FROM commit_log_elements CLE, commit_log CL, element_pathname EP freshports.org(# WHERE CLE.element_id = EP.element_id freshports.org(# AND CLE.commit_log_id = CL.id freshports.org(# AND EP.pathname like '/base/%' freshports.org(# ) AS TMP freshports.org-# WHERE TMP.repo_id is not null; id | repo_id -------+--------- 27009 | 2 28918 | 2 30685 | 2 32472 | 2 36499 | 2 39683 | 2 (6 rows) freshports.org=# select E.*, element_pathname(id) from element E where id in (select element_id from commit_log_elements where commit_log_id in (27009,28918,30685,32472,36499,39683)); id | name | parent_id | directory_file_flag | status | element_pathname -------+----------------------------+-----------+---------------------+--------+--------------------------------------------------------------- 77514 | param.h | 76996 | F | A | /base/head/sys/sys/param.h 79703 | authors.ent | 77843 | F | A | /doc/head/en_US.ISO8859-1/share/sgml/authors.ent 78787 | article.sgml | 78786 | F | A | /doc/head/en_US.ISO8859-1/articles/contributors/article.sgml 81174 | netchild.key | 77796 | F | D | /doc/head/en_US.ISO8859-1/books/handbook/pgpkeys/netchild.key 77339 | news.xml | 77009 | F | D | /www/en/news/news.xml 77797 | chapter.sgml | 77796 | F | A | /doc/head/en_US.ISO8859-1/books/handbook/pgpkeys/chapter.sgml 78449 | calendar.freebsd | 77807 | F | A | /base/head/usr.bin/calendar/calendars/calendar.freebsd 77799 | pgpkeys.ent | 77796 | F | D | /doc/head/en_US.ISO8859-1/books/handbook/pgpkeys/pgpkeys.ent 83011 | book.sgml | 83010 | F | A | /doc/head/en_US.ISO8859-1/books/porters-handbook/book.sgml 8730 | freebsd.committers.markers | 8729 | F | A | /ports/head/astro/xearth/files/freebsd.committers.markers (10 rows) freshports.org=#
OK, there are all doc. Given that we will be setting src here, I think it’s OK to override those six old commits.
- 200206250815.g5P8FZn39049@freefall.freebsd.org
- 200204031503.g33F3eX91734@freefall.freebsd.org
- 200203211640.g2LGeYH71997@freefall.freebsd.org
- 200205292104.g4TL4QC94947@freefall.freebsd.org
- 200204302354.g3UNs4004251@freefall.freebsd.org
- 200204161217.g3GCHsm65354@freefall.freebsd.org
Now let’s do the src records:
UPDATE commit_log CL SET repo_id = (SELECT id FROM repo WHERE name = 'src') WHERE CL.id in ( SELECT distinct CL.id FROM commit_log_elements CLE, commit_log CL, element_pathname EP WHERE CLE.element_id = EP.element_id AND CLE.commit_log_id = CL.id AND EP.pathname like '/base/%' );
That affected 125,040 records.
How many commits are not assigned to a repo?
freshports.org=# SELECT count(*) FROM commit_log WHERE repo_id IS NULL; count ------- 43077 (1 row) freshports.org=#
Hmmm, where are those files?
freshports.org=# select E.*, element_pathname(id)
from element E
where id in (select element_id
from commit_log_elements
where commit_log_id in (select id
from commit_log
where repo_id IS NULL
LIMIT 100));
id | name | parent_id | directory_file_flag | status | element_pathname
-------+-----------------------+-----------+---------------------+--------+------------------------------------------------
90424 | Makefile | 90423 | F | A | /www/en/donations/Makefile
90425 | index.sgml | 90423 | F | A | /www/en/donations/index.sgml
90426 | index.xsl | 77008 | F | A | /www/en/index.xsl
77005 | modules | 77004 | F | A | /CVSROOT/modules
77273 | schedule.sgml | 77272 | F | A | /www/en/releases/4.6R/schedule.sgml
77313 | index.sgml | 77312 | F | A | /www/en/smp/index.sgml
77359 | news.xml | 77358 | F | A | /www/en/gnome/news.xml
78052 | index.xsl | 78003 | F | A | /www/ja/index.xsl
78588 | bsdcon-devsummit.sgml | 78586 | F | A | /www/en/events/2002/bsdcon-devsummit.sgml
78593 | Makefile | 77008 | F | A | /www/en/Makefile
78734 | submitting.sgml | 78730 | F | A | /www/en/docproj/submitting.sgml
78735 | translations.sgml | 78730 | F | A | /www/en/docproj/translations.sgml
78761 | bylaws.sgml | 78742 | F | A | /www/en/internal/bylaws.sgml
79000 | access | 77004 | F | A | /CVSROOT/access
84072 | avail | 77004 | F | A | /CVSROOT/avail
84200 | exclude | 77004 | F | A | /CVSROOT/exclude
90073 | cvsweb.conf-netbsd | 84490 | F | A | /projects/cvsweb/cvsweb.conf-netbsd
90074 | cvsweb.conf-openbsd | 84490 | F | A | /projects/cvsweb/cvsweb.conf-openbsd
84456 | myths.sgml | 84452 | F | A | /www/en/advocacy/myths.sgml
84492 | ChangeLog | 84490 | F | A | /projects/cvsweb/ChangeLog
85014 | cvsweb.cgi | 84490 | F | A | /projects/cvsweb/cvsweb.cgi
85045 | cvsweb.cgi | 79749 | F | A | /www/en/cgi/cvsweb.cgi
85294 | index.sgml | 85293 | F | A | /www/en/projects/c99/index.sgml
86233 | missing_handler.cgi | 79749 | F | A | /www/en/cgi/missing_handler.cgi
86234 | pds.cgi | 79749 | F | A | /www/en/cgi/pds.cgi
86362 | ports.cgi | 79749 | F | A | /www/en/cgi/ports.cgi
89779 | README | 84489 | F | A | /projects/README
89819 | url.cgi | 79749 | F | A | /www/en/cgi/url.cgi
89635 | relnotes.sgml | 77008 | F | A | /www/en/relnotes.sgml
89744 | Makefile | 89743 | F | D | /projects/trustedbsd/www/Makefile
89745 | beastie.page | 89743 | F | D | /projects/trustedbsd/www/beastie.page
89746 | bib-stylesheet.xsl | 89743 | F | D | /projects/trustedbsd/www/bib-stylesheet.xsl
89831 | components.gif | 89743 | F | D | /projects/trustedbsd/www/components.gif
89747 | components.page | 89743 | F | D | /projects/trustedbsd/www/components.page
89832 | componentsgray.gif | 89743 | F | D | /projects/trustedbsd/www/componentsgray.gif
89748 | dev-stylesheet.xsl | 89743 | F | D | /projects/trustedbsd/www/dev-stylesheet.xsl
89749 | developers.dev | 89743 | F | D | /projects/trustedbsd/www/developers.dev
89750 | developers.gif | 89743 | F | D | /projects/trustedbsd/www/developers.gif
89751 | developersgray.gif | 89743 | F | D | /projects/trustedbsd/www/developersgray.gif
89752 | docs.bib | 89743 | F | D | /projects/trustedbsd/www/docs.bib
89753 | docs.gif | 89743 | F | D | /projects/trustedbsd/www/docs.gif
89754 | docsgray.gif | 89743 | F | D | /projects/trustedbsd/www/docsgray.gif
89755 | heading.xsl | 89743 | F | D | /projects/trustedbsd/www/heading.xsl
89756 | home.gif | 89743 | F | D | /projects/trustedbsd/www/home.gif
89757 | home.page | 89743 | F | D | /projects/trustedbsd/www/home.page
89758 | homegray.gif | 89743 | F | D | /projects/trustedbsd/www/homegray.gif
89759 | horizontalbar1.gif | 89743 | F | D | /projects/trustedbsd/www/horizontalbar1.gif
89760 | includes.xsl | 89743 | F | D | /projects/trustedbsd/www/includes.xsl
89761 | legal.page | 89743 | F | D | /projects/trustedbsd/www/legal.page
89762 | mailinglists.gif | 89743 | F | D | /projects/trustedbsd/www/mailinglists.gif
89763 | mailinglists.page | 89743 | F | D | /projects/trustedbsd/www/mailinglists.page
89764 | mailinglistsgray.gif | 89743 | F | D | /projects/trustedbsd/www/mailinglistsgray.gif
89765 | minipemaquidlight.jpg | 89743 | F | D | /projects/trustedbsd/www/minipemaquidlight.jpg
89766 | news.page | 89743 | F | D | /projects/trustedbsd/www/news.page
89767 | page.css | 89743 | F | D | /projects/trustedbsd/www/page.css
89768 | page.xsl | 89743 | F | D | /projects/trustedbsd/www/page.xsl
89769 | pemaquidlight.jpg | 89743 | F | D | /projects/trustedbsd/www/pemaquidlight.jpg
89770 | powerlogo.gif | 89743 | F | D | /projects/trustedbsd/www/powerlogo.gif
89771 | resources.gif | 89743 | F | D | /projects/trustedbsd/www/resources.gif
89772 | sidebar.xml | 89743 | F | D | /projects/trustedbsd/www/sidebar.xml
89773 | trusted.web.mk | 89743 | F | D | /projects/trustedbsd/www/trusted.web.mk
89774 | trustydibmed2.jpg | 89743 | F | D | /projects/trustedbsd/www/trustydibmed2.jpg
89775 | trustyshadow.gif | 89743 | F | D | /projects/trustedbsd/www/trustyshadow.gif
89776 | trustyteenie.gif | 89743 | F | D | /projects/trustedbsd/www/trustyteenie.gif
89777 | verticalbar1.gif | 89743 | F | D | /projects/trustedbsd/www/verticalbar1.gif
89778 | wwwheader.gif | 89743 | F | D | /projects/trustedbsd/www/wwwheader.gif
77339 | news.xml | 77009 | F | D | /www/en/news/news.xml
78769 | statistic.sgml | 78742 | F | A | /www/en/internal/statistic.sgml
(68 rows)
freshports.org=#
Oh. Now there’s some stuff I forgot we were recording… I’ll look into those and start dealing with them later, but here is a list of the distinct root directory of the files in all those commits:
freshports.org=# select distinct split_part(element_pathname(id), '/', 2)
from element E
where id in (select element_id
from commit_log_elements
where commit_log_id in (select id
from commit_log
where repo_id IS NULL
));
split_part
-----------------------------------------------
releng
svnadmin
vendor-crypto
vendor-sys
COPYRIGHT
COPYRIGHT (contents, props changed)
COPYRIGHT (props changed)
CVSROOT
CVSROOT-doc
CVSROOT-ports
CVSROOT-src
Directory Properties:
LOCKS (props changed)
MAINTAINERS
MAINTAINERS (contents, props changed)
MAINTAINERS (props changed)
Makefile
Makefile (contents, props changed)
Makefile (props changed)
Makefile.inc1
Makefile.inc1 (contents, props changed)
Makefile.inc1 (props changed)
Makefile.mips
NetBSD
ObsoleteFiles.inc
ObsoleteFiles.inc (contents, props changed)
ObsoleteFiles.inc (props changed)
README (props changed)
Replaced:
UPDATING
UPDATING (contents, props changed)
UPDATING (props changed)
bin
bind9
byacc
cddl
clang
com_err
compiler-rt
contrib
crypto
de_DE.ISO8859-1
distrib
dtracetoolkit
en_US.ISO8859-1
es_ES.ISO8859-1
etc
file
flex
fr_FR.ISO8859-1
games
gcc
gnu
gperf
groff
illumos
include
it_IT.ISO8859-15
ja_JP.eucJP
kerberos5
ldns
less
lib
libarchive
libc++
libcxxrt
libexec
libpcap
libz
llvm
mn_MN.UTF-8
netcat
nl_NL.ISO8859-1
one-true-awk
openpam
openresolv
pciids
ports
projects
release
rescue
ru_RU.KOI8-R
sbin
secure
share
sys
tcpdump
tcsh
tools
tzdata
unbound
usr.bin
usr.sbin
www
xz-embedded
zh_CN.GB2312
(96 rows)
freshports.org=#
Now, how about getting repo_id set when the data is coming into the database?
Here is the perl for saving a new commit as it comes in:
$sql = "insert into commit_log (id, message_id, message_date, message_subject, date_added, commit_date,
committer, description, system_id, svn_revision, repo_id, encoding_losses) values ( \
$this->{id},
" . $dbh->quote($this->{message_id}) . ",
" . $dbh->quote($this->{message_date}) . ",
" . $dbh->quote($this->{message_subject}) . ",
" . $dbh->quote($this->{date_added}) . ",
" . $dbh->quote($this->{commit_date}) . ",
" . $dbh->quote($this->{committer}) . ",
" . $dbh->quote($this->{description}) . ",
$this->{system_id},
" . $dbh->quote($this->{revision}) . ",
(SELECT id FROM repo WHERE name = " . $dbh->quote($this->{repo}) . "),
$this->{encoding_losses}::boolean)";
And the change to the XML processing code was simple:
$commit_log->{repo} = $Updates{repository};
It’s getting late. I’ll work on the original problem (links to svnweb) later.











