I did a pg_dump of our data warehouse, and it is 26 gigabytes. I wanted to load this data into a different database product (one that forked from Postgresql, so its syntax is very similar).
I needed to change the syntax of the "COPY" command in the pg_dump output, remove references to "public", and change the table owner name. sed did these for me (albeit slowly). The output from pg_dump is a file called "pg-dump-output.sql".
pg_dump produces this:
Code:
...
COPY table_name (col1, col2, col3, col4, col5) FROM stdin;
...
ALTER TABLE public.table_name OWNER TO postgres_owner;
I needed:
Code:
...
COPY table_name FROM stdin;
...
ALTER TABLE table_name OWNER TO otherdb_owner;
Here is how I did it:
Code:
$ sed -e "s:\(COPY [a-z_]\+\) ([a-zA-Z0-9_, ]\+):\1:" -e "s:public\.::" -e "s:OWNER TO postgres_owner:OWNER TO otherdb_owner:" -i pg-dump-output.sql
The last part of this problem is this: pg_dump puts 14 lines of unnecessary junk at the start of the file (unnecessary for the target database, at least) ...
Code:
--
-- PostgreSQL database dump
--
-- Started on 2009-07-08 12:46:10 UTC
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
I wanted to trim that off -- without re-writing the entire 26 gigabyte file.
I came up with a solution that worked for me, but am not sure it is the 'best' one. I thought I'd post here to see if anyone else had a better idea.
First, use 'wc' to find out how many bytes these unnecessary lines take up:
Code:
$ head -14 pg-dump-output.sql | wc -
14 37 286 -
Then I used the "offset" feature of losetup to create a loopback file that points 286 bytes into the beginning of the 26 gigabyte file:
Code:
$ sudo losetup -f -o 286 /somedir/otherdir/pg-dump-output.sql
$ sudo chmod 644 /dev/loop0
I then used /dev/loop0 as the input 'file' for the SQL command shell for the other database.
Thoughts?