On November 19 2024, these errors appeared on every FreshPorts node:
Nov 19 19:12:16 dvl-ingress01 process_vuxml.sh[50796]: FATAL process_vuxml.pl finished with an error: 1
That’s my clue to go look closer as to why the processing failed. This processing is related to the security/vuxml port which documents known vulnerabilities within the FreeBSD ports tree and operating system. This tool helps system administrators identify and patch known problems.
The file data set is not trivial. It’s about 6MB in side. At one time, it was one file. A few years ago, the data was divided up by year and several files are now in use.
The errors
The following command reproduces the error:
[13:15 dvl-ingress01 dvl /usr/local/libexec/freshports] % echo /usr/local/bin/perl ./process_vuxml.pl --filename=/jails/freshports/usr/ports/security/vuxml/vuln.xml --showreasons | sudo su -fm freshports process_vuxml.pl starts reasons will be displayed (there is usually a delay before further output) limit on input amplification factor (from DTD and entities) breached at line 202, column 0, byte 6739 error in processing external entity reference at line 104, column 0, byte 3887 at /usr/local/lib/perl5/site_perl/mach/5.38/XML/Parser.pm line 187.
This is the error in question. This sounds like we’re reached limitation in the code. FreshPorts is using textproc/p5-XML-Parser.
The message “there is usually a delay before further output” is displayed because it takes about 5 seconds for the code to read all the data.
It is a size issue
During my testing, I remove a VID entry from security/vuxml/vuln/2024.xml and ran the script again – the script ran without error. This tells me it is definitely a capacity/limit issue.
I ran a similar test after restore that file and removing items from security/vuxml/vuln/2023.xml – success. This again points to some capacity limitation.
Why all the data?
Why does FreshPorts parse all the data? Because it must. It has no idea what has changed. It only knows that the port has changed and the list of files. We could start parsing just the modified files. That would require some perl skills, much more skill my perl skills.
What’s next
Investigation of the error and hopefully a solution which works around the problem. Hopefully it is not much work to solve.
expat
Matthew Seaman helped by pointing out the error message comes from https://github.com/libexpat/libexpat/blob/master/expat/lib/xmlparse.c#L2528
And that this document seems relevant: https://libexpat.github.io/doc/api/latest#XML_SetBillionLaughsAttackProtectionMaximumAmplification
The default amplification factor is 100 – somehow I need to change that. However, the docs say: Note: If you ever need to increase this value for non-attack payload, please file a bug report” – So I created: https://github.com/libexpat/libexpat/issues/928
expat patch
The following patch fixed the problem for me:
[17:15 pkg01 dvl /usr/local/poudriere/ports/default/textproc/expat2] % cat files/patch-lib_internal.h --- lib/internal.h.orig 2024-11-23 17:12:40 UTC +++ lib/internal.h @@ -144,7 +144,7 @@ #define EXPAT_BILLION_LAUGHS_ATTACK_PROTECTION_MAXIMUM_AMPLIFICATION_DEFAULT \ 100.0f #define EXPAT_BILLION_LAUGHS_ATTACK_PROTECTION_ACTIVATION_THRESHOLD_DEFAULT \ - 8388608 // 8 MiB, 2^23 + 16777216 // 8 MiB, 2^23 /* NOTE END */ #include "expat.h" // so we can use type XML_Parser below
XML-Parser issue
https://github.com/cpan-authors/XML-Parser/issues/102 has also been opened.