Oct 312012
 

Why can’t things be simple? The basic premise of FreshPorts is parse, an email, put it in the database. Sounds simple? Well, it is.

Until you start dealing with the exceptions. Such as slave ports. Such as UTF-8….

I only just got used to dealing with base64-encoded messages when a real UTF-8 character shows up.

Case in point: this commit against www/nginx-devel produced this error:

Wide character in print at /usr/local/lib/perl5/5.12.4/mach/IO/Handle.pm line 406, <> line 97.

I suspect it was complaining about this:

*) Feature: the $bytes_sent, $connection, and $connection_requests
   variables can now be used not only in the "log_format" directive.
   Thanks to Benjamin Gr?F?U?ssing.

The above isn’t the real representation of the letters, but that’s as close as you’re going to see here.

I decided to start using Text::Unidecode, otherwise known as converters/p5-Text-Unidecode.

After installing, I started invoking it like this:

    $encoding = myGetMessage_ContentTransferEncoding($message);
#    print $encoding  . "\n";

    if ($encoding eq 'base64')  
    {
        # we need to extract the body from this message, base64 decode it, and go from there...
        my $parsed = Email::MIME->new($message);
        my $content_type = $parsed->content_type;

        $parsed->body_set($parsed->body);

        my $header = $parsed->header_obj;
        # we decode to avoid UTF-8 characters... we want only ASCII
        $message = unidecode($header->as_string . $parsed->body_str);

    }

The solution is on line 14. I guess I could just decode only the body_str; the header is very unlikely to contain UTF-8….

Things are much better now that that commit is in.

Questions? Comments?

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive