git: database upgrade process

This post documents the database upgrade process.

Issues encountered during initial attempts

There are permission issues on the database. Not everything is owned by the postgres super-user. In fact, many of them are owned by dan – this database is 20+ years old.

To easily resolve the ownership issue, I take the database dump (named freshports.org.dump) from production (the hostname is supernews) and load it into a development database using the –no-owner option on the pg_restore command. More on that later. The –no-owner option ensures that the restore process does not “set ownership of objects to match the original database“.

The permission issue also extends to functions. If I look at the production database, I see this:

freshports.org=# \df+ armor
                                                                          List of functions
 Schema | Name  | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Description 
--------+-------+------------------+---------------------+------+------------+----------+-------+----------+-------------------+----------+-------------+-------------
 public | armor | text             | bytea               | func | immutable  | unsafe   | dan   | invoker  |                   | c        | pg_armor    | 
(1 row)

freshports.org=# 

That permission issue is also fixed by the –no-owner option. After importing in the dev database, that function looks like this:

freshports.prod.test.owners=# \df+ armor
                                                                            List of functions
 Schema | Name  | Result data type | Argument data types | Type | Volatility | Parallel |  Owner   | Security | Access privileges | Language | Source code | Description 
--------+-------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+-------------+-------------
 public | armor | text             | bytea               | func | immutable  | unsafe   | postgres | invoker  |                   | c        | pg_armor    | 
(1 row)

freshports.prod.test.owners=# 

The current procedure

This section outlines the steps to convert the existing production database and copy it over to Amazon’s RDS.

Dump production

On the production server:

pg_dump -Fc freshports.org -f freshports.org.dump

Load into dev

Load that into the development server:

sudo su -l postgres
createdb -O postgres -T template0 -E SQL_ASCII freshports.prod
pg_restore -j 20 --no-owner -d freshports.prod freshports.org.dump 

Doing this as the postgres use ensures optimal permissions.

Estimated time: 25 minutes.

Adjust the database schema

Once we had the database loaded into the dev environment, I ran this script to update the schema to be git-ready.

This is taken from the SQL for git issue on the git_proc_commit project.

-- repo table needs to know which repository tool we are using

ALTER TABLE public.repo
    ADD COLUMN repository text NOT NULL DEFAULT 'subversion';

COMMENT ON COLUMN public.repo.repository
    IS 'subversion? git? cvs?';


-- Let's rename the svn_hostname column:
ALTER TABLE public.repo
    RENAME svn_hostname TO repo_hostname;


-- We don't need these unique indexes. We might do something with constraints later.

drop index repo_description;
drop index repo_name;
drop index repo_path_to_repo;

-- Add in the git data:

insert into repo (name, description, repo_hostname, path_to_repo, repository) values ('ports', 'The FreeBSD Ports tree', 'github.com', '/freebsd/freebsd-ports', 'git');
insert into repo (name, description, repo_hostname, path_to_repo, repository) values ('doc', 'The FreeBSD doc tree', 'github.com', '/freebsd/doc', 'git');
insert into repo (name, description, repo_hostname, path_to_repo, repository) values ('src', 'The FreeBSD src tree', 'github.com', '/freebsd/freebsd', 'git');


-- Also, the short hash.
ALTER TABLE public.commit_log
    ADD COLUMN commit_hash_short text;

COMMENT ON COLUMN public.commit_log. commit_hash_short
    IS 'This is the short version of the git hash stored in
 svn_revision.

If null/empty, it is not a git hash.';

-- we need an index on the short stuff

-- Index: commit_log_commit_hash_short

-- DROP INDEX public.commit_log_commit_hash_short;

CREATE INDEX commit_log_commit_hash_short
    ON public.commit_log USING btree
    (commit_hash_short COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;

Permissions

This fixes up a lingering permissions issue – the permissions in question were too generous.

Verify this:

select grantee, privilege_type from information_schema.role_table_grants where table_name='pg_am';
select grantee, privilege_type from information_schema.role_table_grants where table_name='pg_authid';

If rsyncer is listed, then this:

revoke all PRIVILEGES ON all tables in schema pg_catalog from rsyncer;

Extensions made location

These steps are mentioned because it came up in response to these errors:

pg_restore: from TOC entry 351; 1255 81083 FUNCTION armor(bytea) dan
pg_restore: error: could not execute query: ERROR: permission denied for language c

Why is that? This is an old database, and the permissions issues I mentioned above.

This was the recommended fix. The –no-owner option is insufficient to fix this problem.

begin;
create extension pgcrypto      from unpackaged;
create extension fuzzystrmatch from unpackaged;
create extension plperl        from unpackaged;

If all good, also do a commit.

datatypes, relational integrity, and functions

With the table changes, we have updates to datatypes, triggers, and functions (stored procedures). These steps update those:

psql freshports.prod
begin;
\i datatype.txt
\i ri.txt
\i sp.txt
commit;

NOTICES during the datatype.txt command are expected. There should be no errors.

That is the last step.

Dump the fixed schema

Now we dump this new schema and all the data:

pg_dump -Fc freshports.prod > freshports.prod.new-schema.dump

Estimated time: 8 minutes

Copy to AWS

The dump was copied to AWS and loaded into the RDS instance.

scp freshports.prod.new-schema.dump ec2-user@aws:

Load fixed schema

On AWS, create and load the database:

createdb -h pg01.[redacted].us-east-1.rds.amazonaws.com -U postgres -E SQL_ASCII -T template0 --locale=C freshports.fixed
time pg_restore -h pg01.[redacted].us-east-1.rds.amazonaws.com -U postgres -d freshports.fixed freshports.prod.new-schema.dump

Estimate time: 35 minutes

These errors on loading is expected and does not affect the outcome.

pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 465; 1255 847666 FUNCTION plpgsql_call_handler() postgres
pg_restore: error: could not execute query: ERROR:  permission denied for language c
Command was: CREATE FUNCTION public.plpgsql_call_handler() RETURNS language_handler
    LANGUAGE c
    AS '$libdir/plpgsql', 'plpgsql_call_handler';


pg_restore: error: could not execute query: ERROR:  function public.plpgsql_call_handler() does not exist
Command was: ALTER FUNCTION public.plpgsql_call_handler() OWNER TO postgres;

pg_restore: warning: errors ignored on restore: 2

Using the new database

Now that the new database is over on AWS, you can modifying the configuration files to start using it.

The list of configuration files which need to be updated are covered by Moving devgit.freshports.org from GitHub to git.FreeBSD.org

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top