Nov 182020
 

A recent post on the FreeBSD Ports mailing list asked:

Hi,

I noticed a big difference between the number of ports on
freebsd.org/ports/ and on freshports.org. Currently, it’s 33348 vs.
41346.

The freebsd.org’s number equals roughly the number of lines of a current
INDEX, but how does FreshPorts count?

Best,
Moritz

In short, they are both wrong.

The FreeBSD value is based on INDEX, which includes flavors. The counts on the webpages under https://www.freebsd.org/ports/ will list some ports multiple times. See below for examples.

The FreshPorts total is wrong because it is including ports on branches.

The real number of ports is in the 28,800 range.

It is debatable whether py27-atspi and py37-atspi should be listed as separate ports. There are separate packages, yes, but they are both generated from one port: accessibility/py-atspi.

The rest of this post has background on how I reached these values.

Where is this FreshPorts count?

In the Statistics box on the right hand side of FreshPorts, you will see:

Statistics box saying Port Count 41418

Statistics box saying Port Count 41418

Let’s see where this value comes from.

FreshPorts count

Everything in the Statistics box is generated by the backend via a periodic job. Let’s grep the code and find out where:

[dan@dev-ingress01:~/scripts] $ grep -r 'Calculated hourly' *
hourly_stats.pl:		print FILE '<BR>Calculated hourly:<BR>';

If I look in there, I find: select Stats_PortCount()

Going to the sp.txt file, I find this stored procedure:

CREATE OR REPLACE FUNCTION Stats_PortCount() returns int8 AS $$
        DECLARE
                PortCount       int8;

        BEGIN
                SELECT count(*)
                  INTO PortCount
                  FROM ports, element
                 WHERE element.status = 'A'
                   AND ports.element_id = element.id;

                return PortCount;
        END
$$ LANGUAGE 'plpgsql';

Let’s run that query:

freshports.org=# select * from Stats_PortCount();
 stats_portcount 
-----------------
           41418
(1 row)

freshports.org=# 

FreshPorts count with branches

I know why this values is so far from the FreeBSD count. Branches. Let’s look at this output where I start pulling back the port names:

freshports.org=# SELECT EP.pathname
freshports.org-#                   FROM ports P , element E, element_pathname EP
freshports.org-#                  WHERE E.status = 'A'
freshports.org-#                    AND P.element_id = E.id
freshports.org-#                    AND E.id = EP.element_id
freshports.org-#                  ORDER BY EP.pathname LIMIT 10;
                     pathname                     
--------------------------------------------------
 /ports/branches/2016Q2/math/blitz++
 /ports/branches/2016Q4/archivers/file-roller
 /ports/branches/2016Q4/archivers/p7zip
 /ports/branches/2016Q4/archivers/p7zip-codec-rar
 /ports/branches/2016Q4/archivers/php56-bz2
 /ports/branches/2016Q4/archivers/php56-phar
 /ports/branches/2016Q4/archivers/php56-zip
 /ports/branches/2016Q4/archivers/php56-zlib
 /ports/branches/2016Q4/archivers/php70-bz2
 /ports/branches/2016Q4/archivers/php70-phar
(10 rows)

freshports.org=# 

FreshPorts count without branches

Let’s try the query and ignore branches.

freshports.org=# SELECT EP.pathname
freshports.org-#   FROM ports P , element E, element_pathname EP
freshports.org-#  WHERE E.status = 'A'
freshports.org-#    AND P.element_id = E.id
freshports.org-#    AND E.id = EP.element_id
freshports.org-#    AND EP.pathname NOT LIKE '/ports/branches/%'
freshports.org-#  ORDER BY EP.pathname desc
freshports.org-#  LIMIT 10;
          pathname           
-----------------------------
 /ports/head/x11/zenity
 /ports/head/x11/yelp
 /ports/head/x11/yeahconsole
 /ports/head/x11/yalias
 /ports/head/x11/yakuake
 /ports/head/x11/yad
 /ports/head/x11/xzoom
 /ports/head/x11/xxkb
 /ports/head/x11/xwud
 /ports/head/x11/xwit
(10 rows)

freshports.org=# 

That looks better.

Let’s get a count now.

freshports.org=# SELECT count(*)
freshports.org-#   FROM ports P , element E, element_pathname EP
freshports.org-#  WHERE E.status = 'A'
freshports.org-#    AND P.element_id = E.id
freshports.org-#    AND E.id = EP.element_id
freshports.org-#    AND EP.pathname NOT LIKE '/ports/branches/%';
 count 
-------
 28759
(1 row)

freshports.org=# 

Well, that’s not great either.

That can’t be right

Let’s suspect the element_pathname table and remove it from the query. Instead, I will create the pathname based on a function:

freshports.org=# 
freshports.org=# SELECT count(*) FROM (
freshports.org(# SELECT element_pathname(E.id) as pathname
freshports.org(#   FROM ports P , element E
freshports.org(#  WHERE E.status = 'A'
freshports.org(#    AND P.element_id = E.id) AS tmp
freshports.org-# WHERE pathname NOT LIKE '/ports/branches/%';
 count 
-------
 28759
(1 row)

freshports.org=# 

That matches the count via the element_pathname table.

So it’s not that table skewing the results. What is it then?

Looking at category counts

Let’s compare https://www.freebsd.org/ports/categories-alpha.html with FreshPorts.

Let’s start with this query on the port_active table, which is actually a view of non-deleted ports.

  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%' limit 10;
  category  |            name            |                  pathname                  
------------+----------------------------+--------------------------------------------
 textproc   | rubygem-raabro             | /ports/head/textproc/rubygem-raabro
 biology    | pyfasta                    | /ports/head/biology/pyfasta
 math       | symmetrica                 | /ports/head/math/symmetrica
 java       | sigar                      | /ports/head/java/sigar
 databases  | phpmyadmin5                | /ports/head/databases/phpmyadmin5
 devel      | rubygem-rbtrace            | /ports/head/devel/rubygem-rbtrace
 x11        | xfce4-screenshooter-plugin | /ports/head/x11/xfce4-screenshooter-plugin
 science    | hdf5-18                    | /ports/head/science/hdf5-18
 lang       | nhc98                      | /ports/head/lang/nhc98
 multimedia | xanim                      | /ports/head/multimedia/xanim
(10 rows)

Now, it’s get count by category.

freshports.dev=# SELECT PA.category, count(PA.name)
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
 GROUP BY PA.category
 ORDER BY PA.category;
   category    | count 
---------------+-------
 accessibility |    26
 arabic        |     8
 archivers     |   258
 astro         |   124
 audio         |   877
 base          |     1
 benchmarks    |   100
 biology       |   176
 cad           |   126
 chinese       |   106
 comms         |   213
 converters    |   178
 databases     |  1033
 deskutils     |   261
 devel         |  6875
 dns           |   238
 editors       |   263
 emulators     |   177
 finance       |   113
 french        |    14
 ftp           |    96
 games         |  1133
 german        |    21
 graphics      |  1128
 hebrew        |     7
 hungarian     |     7
 irc           |   114
 japanese      |   280
 java          |   122
 korean        |    39
 lang          |   364
 mail          |   709
 math          |   970
 misc          |   533
 multimedia    |   457
 net           |  1563
 net-im        |   176
 net-mgmt      |   404
 net-p2p       |    94
 news          |    67
 polish        |    14
 ports-mgmt    |    67
 portuguese    |     9
 print         |   256
 russian       |    32
 science       |   340
 security      |  1313
 shells        |    56
 sysutils      |  1538
 textproc      |  1896
 ukrainian     |     9
 vietnamese    |    16
 www           |  2358
 x11           |   534
 x11-clocks    |    42
 x11-drivers   |    44
 x11-fm        |    30
 x11-fonts     |   250
 x11-servers   |    10
 x11-themes    |   145
 x11-toolkits  |   240
 x11-wm        |   119
(62 rows)

freshports.dev=# 

Primary categories vs secondary categories

Remember that some categories are virtual, and do not appear on disk. The above count are only for primary categories, those which do appear on disk. For example, afterstep is not listed above, but you’ll find it in the FreeBSD list. The above SQL is for primary categories only. Virtual categories are covered in FreshPorts, but it’s not relevant to our search.

Also, a port exists on disk only within its primary category. There may be secondary categories, but the port should not be counted there as well. A port should only be counted once.

Picking on Hungarian

Let’s pick Hungarian, which has a small number of ports.

freshports.dev=# SELECT PA.category, PA.name, EP.pathname
freshports.dev-#   FROM ports_active PA, element_pathname EP
freshports.dev-#  WHERE PA.element_id = EP.element_id
freshports.dev-#    AND EP.pathname NOT LIKE '/ports/branches/%'
freshports.dev-#    AND PA.category = 'hungarian'
freshports.dev-#  ORDER BY EP.pathname
freshports.dev-# LIMIT 10;
 category  |           name           |                    pathname                    
-----------+--------------------------+------------------------------------------------
 hungarian | aspell                   | /ports/head/hungarian/aspell
 hungarian | hunspell                 | /ports/head/hungarian/hunspell
 hungarian | hyphen                   | /ports/head/hungarian/hyphen
 hungarian | jdictionary-eng-hun      | /ports/head/hungarian/jdictionary-eng-hun
 hungarian | jdictionary-eng-hun-expr | /ports/head/hungarian/jdictionary-eng-hun-expr
 hungarian | libreoffice              | /ports/head/hungarian/libreoffice
 hungarian | mythes                   | /ports/head/hungarian/mythes
(7 rows)

freshports.dev=# 

Let’s compare that with what is on disk:

[dan@pkg01:~/ports/head/hungarian] $ ls -l
total 13
-rw-r--r--  1 dan  dan  332 May  5  2020 Makefile
-rw-r--r--  1 dan  dan   97 Oct 27  2019 Makefile.inc
drwxr-xr-x  2 dan  dan    6 Oct 27  2019 aspell
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 hunspell
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 hyphen
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 jdictionary-eng-hun
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 jdictionary-eng-hun-expr
drwxr-xr-x  2 dan  dan    4 Nov 12 15:09 libreoffice
drwxr-xr-x  2 dan  dan    5 Oct 27  2019 mythes
[dan@pkg01:~/ports/head/hungarian] $ svn info

Don’t trust me. Look at subversion for ports/head/hungarian/

The FreshPorts count is correct. What is FreeBSD talking about then?

Comparing with https://www.freebsd.org/ports/hungarian.html, I see that FreeBSD is including:

Will this account for the differences? I don’t know.

The 33399 count listed at https://www.freebsd.org/ports/ (on 2020-11-18) seems close the value contained within INDEX-12 (33406).

The category totals at https://www.freebsd.org/ports/categories-grouped.html include ports listed in their secondary categories. This counts some ports more than once.

Looking at INDEX

Let’s look at INDEX-12:

[dan@pkg01:~/ports/head] $ make fetchindex
/usr/bin/env  fetch -am -o /usr/home/dan/ports/head/INDEX-12.bz2 https://www.FreeBSD.org/ports/INDEX-12.bz2
/usr/home/dan/ports/head/INDEX-12.bz2                 2315 kB 1436 kBps    02s
[dan@pkg01:~/ports/head] $ 


[dan@pkg01:~/ports/head] $ wc -l INDEX-12 
   33406 INDEX-12

[dan@pkg01:~/ports/head] $ grep -c jdictionary-ger-hun INDEX-12 
1

OK, it’s only counted once within INDEX.

So far, we know why the port counts on the web pages differ.

Let’s pick a category which is not language related: x11-servers

This is what FreshPorts has:

freshports.org=# SELECT PA.category, PA.name, EP.pathname
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
   AND PA.category = 'x11-servers'
 ORDER BY EP.pathname;
  category   |      name       |                pathname                 
-------------+-----------------+-----------------------------------------
 x11-servers | Xfstt           | /ports/head/x11-servers/Xfstt
 x11-servers | x2vnc           | /ports/head/x11-servers/x2vnc
 x11-servers | x2x             | /ports/head/x11-servers/x2x
 x11-servers | xephyr          | /ports/head/x11-servers/xephyr
 x11-servers | xorg-dmx        | /ports/head/x11-servers/xorg-dmx
 x11-servers | xorg-nestserver | /ports/head/x11-servers/xorg-nestserver
 x11-servers | xorg-server     | /ports/head/x11-servers/xorg-server
 x11-servers | xorg-vfbserver  | /ports/head/x11-servers/xorg-vfbserver
 x11-servers | xwayland        | /ports/head/x11-servers/xwayland
 x11-servers | xwayland-devel  | /ports/head/x11-servers/xwayland-devel
(10 rows)

freshports.org=# 

From disk:

[dan@pkg01:~/ports/head/x11-servers] $ ls -l
total 34
-rw-r--r--  1 dan  dan  375 Feb 14  2020 Makefile
drwxr-xr-x  3 dan  dan    7 Sep 19 01:19 Xfstt
drwxr-xr-x  2 dan  dan    5 Nov  9  2019 x2vnc
drwxr-xr-x  3 dan  dan    6 Nov  9  2019 x2x
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xephyr
drwxr-xr-x  2 dan  dan    5 Feb 25  2020 xorg-dmx
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xorg-nestserver
drwxr-xr-x  3 dan  dan    8 Sep 19 01:19 xorg-server
drwxr-xr-x  2 dan  dan    4 Feb 25  2020 xorg-vfbserver
drwxr-xr-x  2 dan  dan    4 Oct 11 13:22 xwayland
drwxr-xr-x  2 dan  dan    5 Nov 16 15:28 xwayland-devel
[dan@pkg01:~/ports/head/x11-servers] $ 

That matches.

Looking at https://www.freebsd.org/ports/x11-servers.html I find listings not found above:

  1. tigervnc-server net/tigervnc-server
  2. tigervnc-viewer net/tigervnc-viewer
  3. xorg-minima x11/xorg-minimal

Again, it is ports listed here, which are not actually in this category. Ports are being counted twice, at least in the web page.

This extracts the list of ports from INDEX:

[dan@pkg01:~/ports/head] $ cut -f 2 -d '|' INDEX-12 > ~/tmp/INDEX-12-list
[dan@pkg01:~/ports/head] $ head -4  ~/tmp/INDEX-12-list
/usr/ports/accessibility/accerciser
/usr/ports/accessibility/at-spi2-atk
/usr/ports/accessibility/at-spi2-core
/usr/ports/accessibility/atkmm
[dan@pkg01:~/ports/head] $ 
[dan@pkg01:~/ports/head] $ wc -l INDEX-12  ~/tmp/INDEX-12-list
   33406 INDEX-12
   33406 /usr/home/dan/tmp/INDEX-12-list
   66812 total

The line count matches. Let’s get the same information out of FreshPorts, but this time, I’ll use production.

cat << EOF | psql -t freshports.org > INDEX.FreshPorts
SELECT '/usr/ports/' || PA.category || '/' || PA.name
  FROM ports_active PA, element_pathname EP
 WHERE PA.element_id = EP.element_id
   AND EP.pathname NOT LIKE '/ports/branches/%'
 ORDER BY 1;
EOF

We have 28759 entries there.

$ wc -l ~/INDEX.FreshPorts 
   28759 /usr/home/dan/INDEX.FreshPorts

That is far from the 33406 lines in INDEX-12.

Removing flavors from INDEX-12 list of ports

When I started comparing the output, I noticed that INDEX-12 listed accessibility/py-atspi twice. Why? Because of flavors. Here are the first two columns from INDEX-12:

py27-atspi-2.38.0|/usr/ports/accessibility/py-atspi
py37-atspi-2.38.0|/usr/ports/accessibility/py-atspi

Let’s remove duplicate lines from INDEX-12:

[dan@pkg01:~/ports/head] $ wc -l INDEX-12  ~/tmp/INDEX-12-list ~/tmp/INDEX-12-list-nodups
   33406 INDEX-12
   33406 /usr/home/dan/tmp/INDEX-12-list
   28755 /usr/home/dan/tmp/INDEX-12-list-nodups
   95567 total
[dan@pkg01:~/ports/head] $ 

That means 4651 lines relate directly to flavors.

That uniq output is much closer to the FreshPorts count of 28759. It is off by 4.

Comparing INDEX-12 and FreshPorts

Let’s do a diff.

All the + lines indicates a port included in FreshPorts, but not INDEX-12. I have annotated the output to indicate what my investigations found.

All the lines indicate something not found on FreshPorts.

When you see DELETED, that means FreshPorts has marked this port was deleted.

[dan@pkg01:~/ports/head] $ diff -ruN ~/tmp/INDEX-12-list-nodups ~/INDEX.FreshPorts
--- /usr/home/dan/tmp/INDEX-12-list-nodups	2020-11-18 16:58:12.360853000 +0000
+++ /usr/home/dan/INDEX.FreshPorts	2020-11-18 17:49:49.321133000 +0000
@@ -1290,6 +1290,7 @@
 /usr/ports/audio/zita-resampler
 /usr/ports/audio/zrythm
 /usr/ports/audio/zynaddsubfx
+/usr/ports/base/binutils NOT A PORT
 /usr/ports/benchmarks/ali
 /usr/ports/benchmarks/apib
 /usr/ports/benchmarks/autobench
@@ -7405,7 +7406,6 @@
 /usr/ports/devel/php80-sysvsem
 /usr/ports/devel/php80-sysvshm
 /usr/ports/devel/php80-tokenizer
-/usr/ports/devel/phpunit6 PORT MARKED AS DELETED
 /usr/ports/devel/phpunit7
 /usr/ports/devel/phpunit8
 /usr/ports/devel/physfs
@@ -7703,6 +7703,7 @@
 /usr/ports/devel/py-cachy
 /usr/ports/devel/py-canonicaljson
 /usr/ports/devel/py-capstone
+/usr/ports/devel/py-case NEWLY CREATED
 /usr/ports/devel/py-castellan
 /usr/ports/devel/py-castellan1
 /usr/ports/devel/py-cbor
@@ -9873,6 +9874,7 @@
 /usr/ports/devel/rubygem-rspec-support
 /usr/ports/devel/rubygem-rspec_junit_formatter
 /usr/ports/devel/rubygem-rubocop
+/usr/ports/devel/rubygem-rubocop-ast NOT IN INDEX-12
 /usr/ports/devel/rubygem-ruby-atmos-pure
 /usr/ports/devel/rubygem-ruby-bugzilla
 /usr/ports/devel/rubygem-ruby-enum
@@ -14078,7 +14080,6 @@
 /usr/ports/korean/hanyangfonts
 /usr/ports/korean/hcode
 /usr/ports/korean/hmconv
-/usr/ports/korean/hpscat PORT MARKED AS DELETED
 /usr/ports/korean/hunspell
 /usr/ports/korean/ibus-hangul
 /usr/ports/korean/imhangul-gtk2
@@ -14439,6 +14440,7 @@
 /usr/ports/lang/spidermonkey24
 /usr/ports/lang/spidermonkey52
 /usr/ports/lang/spidermonkey60
+/usr/ports/lang/spidermonkey68 NOT IN SUBVERSION
 /usr/ports/lang/spidermonkey78
 /usr/ports/lang/spl
 /usr/ports/lang/squeak
@@ -14958,6 +14960,7 @@
 /usr/ports/mail/py-dkimpy
 /usr/ports/mail/py-email-validator
 /usr/ports/mail/py-email_reply_parser
+/usr/ports/mail/py-flanker NOT IN INDEX-12
 /usr/ports/mail/py-flask-mail
 /usr/ports/mail/py-flufl.bounce
 /usr/ports/mail/py-fuglu
[dan@pkg01:~/ports/head] $ 

Totals:

  1. NOT A PORT – base looks a category to FreshPorts so that is included
  2. PORT MARKED AS DELETED – FreshPorts thinks this port is deleted, but it is not
  3. NEWLY CREATED – this port was created today. INDEX-12 predates that
  4. NOT IN INDEX-12 – no idea why this is not included
  5. NOT IN SUBVERSION – this port is not listed in subversion.

Conclusion

FreshPorts has some errors, which I will look into.

The actual number of ports is wrong on both sites and the correct values is in the 28,800 range.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive