A recent post on the FreeBSD Ports mailing list asked:
Hi,
I noticed a big difference between the number of ports on
freebsd.org/ports/ and on freshports.org. Currently, it’s 33348 vs.
41346.The freebsd.org’s number equals roughly the number of lines of a current
INDEX, but how does FreshPorts count?Best,
Moritz
In short, they are both wrong.
The FreeBSD value is based on INDEX, which includes flavors. The counts on the webpages under https://www.freebsd.org/ports/ will list some ports multiple times. See below for examples.
The FreshPorts total is wrong because it is including ports on branches.
The real number of ports is in the 28,800 range.
It is debatable whether py27-atspi and py37-atspi should be listed as separate ports. There are separate packages, yes, but they are both generated from one port: accessibility/py-atspi.
The rest of this post has background on how I reached these values.
Where is this FreshPorts count?
In the Statistics box on the right hand side of FreshPorts, you will see:
Let’s see where this value comes from.
FreshPorts count
Everything in the Statistics box is generated by the backend via a periodic job. Let’s grep the code and find out where:
[dan@dev-ingress01:~/scripts] $ grep -r 'Calculated hourly' * hourly_stats.pl: print FILE '<BR>Calculated hourly:<BR>';
If I look in there, I find: select Stats_PortCount()
Going to the sp.txt file, I find this stored procedure:
CREATE OR REPLACE FUNCTION Stats_PortCount() returns int8 AS $$ DECLARE PortCount int8; BEGIN SELECT count(*) INTO PortCount FROM ports, element WHERE element.status = 'A' AND ports.element_id = element.id; return PortCount; END $$ LANGUAGE 'plpgsql';
Let’s run that query:
freshports.org=# select * from Stats_PortCount(); stats_portcount ----------------- 41418 (1 row) freshports.org=#
FreshPorts count with branches
I know why this values is so far from the FreeBSD count. Branches. Let’s look at this output where I start pulling back the port names:
freshports.org=# SELECT EP.pathname freshports.org-# FROM ports P , element E, element_pathname EP freshports.org-# WHERE E.status = 'A' freshports.org-# AND P.element_id = E.id freshports.org-# AND E.id = EP.element_id freshports.org-# ORDER BY EP.pathname LIMIT 10; pathname -------------------------------------------------- /ports/branches/2016Q2/math/blitz++ /ports/branches/2016Q4/archivers/file-roller /ports/branches/2016Q4/archivers/p7zip /ports/branches/2016Q4/archivers/p7zip-codec-rar /ports/branches/2016Q4/archivers/php56-bz2 /ports/branches/2016Q4/archivers/php56-phar /ports/branches/2016Q4/archivers/php56-zip /ports/branches/2016Q4/archivers/php56-zlib /ports/branches/2016Q4/archivers/php70-bz2 /ports/branches/2016Q4/archivers/php70-phar (10 rows) freshports.org=#
FreshPorts count without branches
Let’s try the query and ignore branches.
freshports.org=# SELECT EP.pathname freshports.org-# FROM ports P , element E, element_pathname EP freshports.org-# WHERE E.status = 'A' freshports.org-# AND P.element_id = E.id freshports.org-# AND E.id = EP.element_id freshports.org-# AND EP.pathname NOT LIKE '/ports/branches/%' freshports.org-# ORDER BY EP.pathname desc freshports.org-# LIMIT 10; pathname ----------------------------- /ports/head/x11/zenity /ports/head/x11/yelp /ports/head/x11/yeahconsole /ports/head/x11/yalias /ports/head/x11/yakuake /ports/head/x11/yad /ports/head/x11/xzoom /ports/head/x11/xxkb /ports/head/x11/xwud /ports/head/x11/xwit (10 rows) freshports.org=#
That looks better.
Let’s get a count now.
freshports.org=# SELECT count(*) freshports.org-# FROM ports P , element E, element_pathname EP freshports.org-# WHERE E.status = 'A' freshports.org-# AND P.element_id = E.id freshports.org-# AND E.id = EP.element_id freshports.org-# AND EP.pathname NOT LIKE '/ports/branches/%'; count ------- 28759 (1 row) freshports.org=#
Well, that’s not great either.
That can’t be right
Let’s suspect the element_pathname table and remove it from the query. Instead, I will create the pathname based on a function:
freshports.org=# freshports.org=# SELECT count(*) FROM ( freshports.org(# SELECT element_pathname(E.id) as pathname freshports.org(# FROM ports P , element E freshports.org(# WHERE E.status = 'A' freshports.org(# AND P.element_id = E.id) AS tmp freshports.org-# WHERE pathname NOT LIKE '/ports/branches/%'; count ------- 28759 (1 row) freshports.org=#
That matches the count via the element_pathname table.
So it’s not that table skewing the results. What is it then?
Looking at category counts
Let’s compare https://www.freebsd.org/ports/categories-alpha.html with FreshPorts.
Let’s start with this query on the port_active table, which is actually a view of non-deleted ports.
FROM ports_active PA, element_pathname EP WHERE PA.element_id = EP.element_id AND EP.pathname NOT LIKE '/ports/branches/%' limit 10; category | name | pathname ------------+----------------------------+-------------------------------------------- textproc | rubygem-raabro | /ports/head/textproc/rubygem-raabro biology | pyfasta | /ports/head/biology/pyfasta math | symmetrica | /ports/head/math/symmetrica java | sigar | /ports/head/java/sigar databases | phpmyadmin5 | /ports/head/databases/phpmyadmin5 devel | rubygem-rbtrace | /ports/head/devel/rubygem-rbtrace x11 | xfce4-screenshooter-plugin | /ports/head/x11/xfce4-screenshooter-plugin science | hdf5-18 | /ports/head/science/hdf5-18 lang | nhc98 | /ports/head/lang/nhc98 multimedia | xanim | /ports/head/multimedia/xanim (10 rows)
Now, it’s get count by category.
freshports.dev=# SELECT PA.category, count(PA.name) FROM ports_active PA, element_pathname EP WHERE PA.element_id = EP.element_id AND EP.pathname NOT LIKE '/ports/branches/%' GROUP BY PA.category ORDER BY PA.category; category | count ---------------+------- accessibility | 26 arabic | 8 archivers | 258 astro | 124 audio | 877 base | 1 benchmarks | 100 biology | 176 cad | 126 chinese | 106 comms | 213 converters | 178 databases | 1033 deskutils | 261 devel | 6875 dns | 238 editors | 263 emulators | 177 finance | 113 french | 14 ftp | 96 games | 1133 german | 21 graphics | 1128 hebrew | 7 hungarian | 7 irc | 114 japanese | 280 java | 122 korean | 39 lang | 364 mail | 709 math | 970 misc | 533 multimedia | 457 net | 1563 net-im | 176 net-mgmt | 404 net-p2p | 94 news | 67 polish | 14 ports-mgmt | 67 portuguese | 9 print | 256 russian | 32 science | 340 security | 1313 shells | 56 sysutils | 1538 textproc | 1896 ukrainian | 9 vietnamese | 16 www | 2358 x11 | 534 x11-clocks | 42 x11-drivers | 44 x11-fm | 30 x11-fonts | 250 x11-servers | 10 x11-themes | 145 x11-toolkits | 240 x11-wm | 119 (62 rows) freshports.dev=#
Primary categories vs secondary categories
Remember that some categories are virtual, and do not appear on disk. The above count are only for primary categories, those which do appear on disk. For example, afterstep is not listed above, but you’ll find it in the FreeBSD list. The above SQL is for primary categories only. Virtual categories are covered in FreshPorts, but it’s not relevant to our search.
Also, a port exists on disk only within its primary category. There may be secondary categories, but the port should not be counted there as well. A port should only be counted once.
Picking on Hungarian
Let’s pick Hungarian, which has a small number of ports.
freshports.dev=# SELECT PA.category, PA.name, EP.pathname freshports.dev-# FROM ports_active PA, element_pathname EP freshports.dev-# WHERE PA.element_id = EP.element_id freshports.dev-# AND EP.pathname NOT LIKE '/ports/branches/%' freshports.dev-# AND PA.category = 'hungarian' freshports.dev-# ORDER BY EP.pathname freshports.dev-# LIMIT 10; category | name | pathname -----------+--------------------------+------------------------------------------------ hungarian | aspell | /ports/head/hungarian/aspell hungarian | hunspell | /ports/head/hungarian/hunspell hungarian | hyphen | /ports/head/hungarian/hyphen hungarian | jdictionary-eng-hun | /ports/head/hungarian/jdictionary-eng-hun hungarian | jdictionary-eng-hun-expr | /ports/head/hungarian/jdictionary-eng-hun-expr hungarian | libreoffice | /ports/head/hungarian/libreoffice hungarian | mythes | /ports/head/hungarian/mythes (7 rows) freshports.dev=#
Let’s compare that with what is on disk:
[dan@pkg01:~/ports/head/hungarian] $ ls -l total 13 -rw-r--r-- 1 dan dan 332 May 5 2020 Makefile -rw-r--r-- 1 dan dan 97 Oct 27 2019 Makefile.inc drwxr-xr-x 2 dan dan 6 Oct 27 2019 aspell drwxr-xr-x 2 dan dan 5 Oct 27 2019 hunspell drwxr-xr-x 2 dan dan 5 Oct 27 2019 hyphen drwxr-xr-x 2 dan dan 5 Oct 27 2019 jdictionary-eng-hun drwxr-xr-x 2 dan dan 5 Oct 27 2019 jdictionary-eng-hun-expr drwxr-xr-x 2 dan dan 4 Nov 12 15:09 libreoffice drwxr-xr-x 2 dan dan 5 Oct 27 2019 mythes [dan@pkg01:~/ports/head/hungarian] $ svn info
Don’t trust me. Look at subversion for ports/head/hungarian/
The FreshPorts count is correct. What is FreeBSD talking about then?
Comparing with https://www.freebsd.org/ports/hungarian.html, I see that FreeBSD is including:
- fr-jdictionary-fre-hun : german/jdictionary-ger-hun
- de-jdictionary-ger-hun : german/jdictionary-ger-hun
Will this account for the differences? I don’t know.
The 33399 count listed at https://www.freebsd.org/ports/ (on 2020-11-18) seems close the value contained within INDEX-12 (33406).
The category totals at https://www.freebsd.org/ports/categories-grouped.html include ports listed in their secondary categories. This counts some ports more than once.
Looking at INDEX
Let’s look at INDEX-12:
[dan@pkg01:~/ports/head] $ make fetchindex /usr/bin/env fetch -am -o /usr/home/dan/ports/head/INDEX-12.bz2 https://www.FreeBSD.org/ports/INDEX-12.bz2 /usr/home/dan/ports/head/INDEX-12.bz2 2315 kB 1436 kBps 02s [dan@pkg01:~/ports/head] $ [dan@pkg01:~/ports/head] $ wc -l INDEX-12 33406 INDEX-12 [dan@pkg01:~/ports/head] $ grep -c jdictionary-ger-hun INDEX-12 1
OK, it’s only counted once within INDEX.
So far, we know why the port counts on the web pages differ.
Let’s pick a category which is not language related: x11-servers
This is what FreshPorts has:
freshports.org=# SELECT PA.category, PA.name, EP.pathname FROM ports_active PA, element_pathname EP WHERE PA.element_id = EP.element_id AND EP.pathname NOT LIKE '/ports/branches/%' AND PA.category = 'x11-servers' ORDER BY EP.pathname; category | name | pathname -------------+-----------------+----------------------------------------- x11-servers | Xfstt | /ports/head/x11-servers/Xfstt x11-servers | x2vnc | /ports/head/x11-servers/x2vnc x11-servers | x2x | /ports/head/x11-servers/x2x x11-servers | xephyr | /ports/head/x11-servers/xephyr x11-servers | xorg-dmx | /ports/head/x11-servers/xorg-dmx x11-servers | xorg-nestserver | /ports/head/x11-servers/xorg-nestserver x11-servers | xorg-server | /ports/head/x11-servers/xorg-server x11-servers | xorg-vfbserver | /ports/head/x11-servers/xorg-vfbserver x11-servers | xwayland | /ports/head/x11-servers/xwayland x11-servers | xwayland-devel | /ports/head/x11-servers/xwayland-devel (10 rows) freshports.org=#
From disk:
[dan@pkg01:~/ports/head/x11-servers] $ ls -l total 34 -rw-r--r-- 1 dan dan 375 Feb 14 2020 Makefile drwxr-xr-x 3 dan dan 7 Sep 19 01:19 Xfstt drwxr-xr-x 2 dan dan 5 Nov 9 2019 x2vnc drwxr-xr-x 3 dan dan 6 Nov 9 2019 x2x drwxr-xr-x 2 dan dan 4 Feb 25 2020 xephyr drwxr-xr-x 2 dan dan 5 Feb 25 2020 xorg-dmx drwxr-xr-x 2 dan dan 4 Feb 25 2020 xorg-nestserver drwxr-xr-x 3 dan dan 8 Sep 19 01:19 xorg-server drwxr-xr-x 2 dan dan 4 Feb 25 2020 xorg-vfbserver drwxr-xr-x 2 dan dan 4 Oct 11 13:22 xwayland drwxr-xr-x 2 dan dan 5 Nov 16 15:28 xwayland-devel [dan@pkg01:~/ports/head/x11-servers] $
That matches.
Looking at https://www.freebsd.org/ports/x11-servers.html I find listings not found above:
- tigervnc-server net/tigervnc-server
- tigervnc-viewer net/tigervnc-viewer
- xorg-minima x11/xorg-minimal
Again, it is ports listed here, which are not actually in this category. Ports are being counted twice, at least in the web page.
This extracts the list of ports from INDEX:
[dan@pkg01:~/ports/head] $ cut -f 2 -d '|' INDEX-12 > ~/tmp/INDEX-12-list [dan@pkg01:~/ports/head] $ head -4 ~/tmp/INDEX-12-list /usr/ports/accessibility/accerciser /usr/ports/accessibility/at-spi2-atk /usr/ports/accessibility/at-spi2-core /usr/ports/accessibility/atkmm [dan@pkg01:~/ports/head] $ [dan@pkg01:~/ports/head] $ wc -l INDEX-12 ~/tmp/INDEX-12-list 33406 INDEX-12 33406 /usr/home/dan/tmp/INDEX-12-list 66812 total
The line count matches. Let’s get the same information out of FreshPorts, but this time, I’ll use production.
cat << EOF | psql -t freshports.org > INDEX.FreshPorts SELECT '/usr/ports/' || PA.category || '/' || PA.name FROM ports_active PA, element_pathname EP WHERE PA.element_id = EP.element_id AND EP.pathname NOT LIKE '/ports/branches/%' ORDER BY 1; EOF
We have 28759 entries there.
$ wc -l ~/INDEX.FreshPorts 28759 /usr/home/dan/INDEX.FreshPorts
That is far from the 33406 lines in INDEX-12.
Removing flavors from INDEX-12 list of ports
When I started comparing the output, I noticed that INDEX-12 listed accessibility/py-atspi twice. Why? Because of flavors. Here are the first two columns from INDEX-12:
py27-atspi-2.38.0|/usr/ports/accessibility/py-atspi py37-atspi-2.38.0|/usr/ports/accessibility/py-atspi
Let’s remove duplicate lines from INDEX-12:
[dan@pkg01:~/ports/head] $ wc -l INDEX-12 ~/tmp/INDEX-12-list ~/tmp/INDEX-12-list-nodups 33406 INDEX-12 33406 /usr/home/dan/tmp/INDEX-12-list 28755 /usr/home/dan/tmp/INDEX-12-list-nodups 95567 total [dan@pkg01:~/ports/head] $
That means 4651 lines relate directly to flavors.
That uniq output is much closer to the FreshPorts count of 28759. It is off by 4.
Comparing INDEX-12 and FreshPorts
Let’s do a diff.
All the + lines indicates a port included in FreshPorts, but not INDEX-12. I have annotated the output to indicate what my investigations found.
All the – lines indicate something not found on FreshPorts.
When you see DELETED, that means FreshPorts has marked this port was deleted.
[dan@pkg01:~/ports/head] $ diff -ruN ~/tmp/INDEX-12-list-nodups ~/INDEX.FreshPorts --- /usr/home/dan/tmp/INDEX-12-list-nodups 2020-11-18 16:58:12.360853000 +0000 +++ /usr/home/dan/INDEX.FreshPorts 2020-11-18 17:49:49.321133000 +0000 @@ -1290,6 +1290,7 @@ /usr/ports/audio/zita-resampler /usr/ports/audio/zrythm /usr/ports/audio/zynaddsubfx +/usr/ports/base/binutils NOT A PORT /usr/ports/benchmarks/ali /usr/ports/benchmarks/apib /usr/ports/benchmarks/autobench @@ -7405,7 +7406,6 @@ /usr/ports/devel/php80-sysvsem /usr/ports/devel/php80-sysvshm /usr/ports/devel/php80-tokenizer -/usr/ports/devel/phpunit6 PORT MARKED AS DELETED /usr/ports/devel/phpunit7 /usr/ports/devel/phpunit8 /usr/ports/devel/physfs @@ -7703,6 +7703,7 @@ /usr/ports/devel/py-cachy /usr/ports/devel/py-canonicaljson /usr/ports/devel/py-capstone +/usr/ports/devel/py-case NEWLY CREATED /usr/ports/devel/py-castellan /usr/ports/devel/py-castellan1 /usr/ports/devel/py-cbor @@ -9873,6 +9874,7 @@ /usr/ports/devel/rubygem-rspec-support /usr/ports/devel/rubygem-rspec_junit_formatter /usr/ports/devel/rubygem-rubocop +/usr/ports/devel/rubygem-rubocop-ast NOT IN INDEX-12 /usr/ports/devel/rubygem-ruby-atmos-pure /usr/ports/devel/rubygem-ruby-bugzilla /usr/ports/devel/rubygem-ruby-enum @@ -14078,7 +14080,6 @@ /usr/ports/korean/hanyangfonts /usr/ports/korean/hcode /usr/ports/korean/hmconv -/usr/ports/korean/hpscat PORT MARKED AS DELETED /usr/ports/korean/hunspell /usr/ports/korean/ibus-hangul /usr/ports/korean/imhangul-gtk2 @@ -14439,6 +14440,7 @@ /usr/ports/lang/spidermonkey24 /usr/ports/lang/spidermonkey52 /usr/ports/lang/spidermonkey60 +/usr/ports/lang/spidermonkey68 NOT IN SUBVERSION /usr/ports/lang/spidermonkey78 /usr/ports/lang/spl /usr/ports/lang/squeak @@ -14958,6 +14960,7 @@ /usr/ports/mail/py-dkimpy /usr/ports/mail/py-email-validator /usr/ports/mail/py-email_reply_parser +/usr/ports/mail/py-flanker NOT IN INDEX-12 /usr/ports/mail/py-flask-mail /usr/ports/mail/py-flufl.bounce /usr/ports/mail/py-fuglu [dan@pkg01:~/ports/head] $
Totals:
- NOT A PORT – base looks a category to FreshPorts so that is included
- PORT MARKED AS DELETED – FreshPorts thinks this port is deleted, but it is not
- NEWLY CREATED – this port was created today. INDEX-12 predates that
- NOT IN INDEX-12 – no idea why this is not included
- NOT IN SUBVERSION – this port is not listed in subversion.
Conclusion
FreshPorts has some errors, which I will look into.
The actual number of ports is wrong on both sites and the correct values is in the 28,800 range.