Stripping HTML from announcements

Some of you may be noticing HTML in the daily outgoing messages. I apologise for that. You must have seen the various notices that appear at the top of each FreshPorts webpage. At present, they look like this:

Absolute BSD – $25 shippedorder yours now
I’m looking for work. I’m a software developer/sysadmin. Please read my resume and contact me if you have any leads. Thank you.

The candidates from the Design Contest are now available for review and voting.

Vote early. Vote often.

Those entries are stored in a table:

freshports.org=# \d announcements
                                    Table "public.announcements"
   Column   |    Type    |                         Modifiers
------------+------------+------------------------------------------------------------
 id         | integer    | not null default nextval('announcements_id_seq'::regclass)
 text       | text       | not null
 start_date | timestamp  |
 end_date   | timestamp  |
Indexes:
    "announcements_pkey" PRIMARY KEY, btree (id)

freshports.org=#

A long time ago, I wrote code to extract this information and place it into the outgoing announcements. The table allows the end_date to be null, and the announcement would run forever. However, the code that extracts this data for announcements does not consider this:

   SELECT text
      FROM announcements
     WHERE start_date <= current_date
       AND end_date   >= current_date;

I see two things I need to change here:

  1. Alter the SQL to make end_date optional
  2. Alter the table to add a new column for the plain text equivalent of the HTML version

The SQL might look like this:

   SELECT text
      FROM announcements
     WHERE start_date <= current_date
       AND (end_date IS NULL OR end_date >= current_date);

A very simple and subtle change, but with much improved results. :)

As for the table, I would do this:

ALTER TABLE announcements add column text_plain text;
UPDATE announcements SET text_plain = text;
ALTER TABLE announcements ALTER text_plain SET NOT NULL;

This would copy the HTML into the plain text field, but I can amend those values manually through the web interface later. I initially thought I would just strip the HTML tags from the original text and use that as plain text. That won’t work. Yes, the tags can be stripped easily enough, but if you take the above examples, that would leave you with this:

Absolute BSD – $25 shipped – order yours now
I’m looking for work. I’m a software developer/sysadmin. Please read my resume and contact me if you have any leads. Thank you.
The candidates from the Design Contest are now available for review and voting.
Vote early. Vote often.

That’s not very useful. We’ll need different versions. I’ll need to add a URL manually. Such as this:

The candidates from the Design Contest [http://news.freshports.org/2007/01/30/freshports-design-contest/] are now available for review [http://www.freshports.org/DesignContestVoting/] and voting.
Vote early. Vote often.

As you can see, the plain text version needs to be distinctly different. That is why I added a new column.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top