Fedora Linux Support Community & Resources Center

Go Back   FedoraForum.org > Fedora 17/18 > Using Fedora
FedoraForum Search

Forgot Password? Join Us!

Using Fedora General support for current versions. Ask questions about Fedora and it's software that do not belong in any other forum.

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 15th July 2009, 03:03 PM
nt4cats Offline
Registered User
 
Join Date: Aug 2005
Location: Philadelphia, PA
Posts: 99
editing a 26 gigabyte text file

I did a pg_dump of our data warehouse, and it is 26 gigabytes. I wanted to load this data into a different database product (one that forked from Postgresql, so its syntax is very similar).

I needed to change the syntax of the "COPY" command in the pg_dump output, remove references to "public", and change the table owner name. sed did these for me (albeit slowly). The output from pg_dump is a file called "pg-dump-output.sql".

pg_dump produces this:
Code:
...
COPY table_name (col1, col2, col3, col4, col5) FROM stdin;
...
ALTER TABLE public.table_name OWNER TO postgres_owner;
I needed:
Code:
...
COPY table_name FROM stdin;
...
ALTER TABLE table_name OWNER TO otherdb_owner;
Here is how I did it:
Code:
$ sed -e "s:\(COPY [a-z_]\+\) ([a-zA-Z0-9_, ]\+):\1:" -e "s:public\.::" -e "s:OWNER TO postgres_owner:OWNER TO otherdb_owner:" -i pg-dump-output.sql
The last part of this problem is this: pg_dump puts 14 lines of unnecessary junk at the start of the file (unnecessary for the target database, at least) ...
Code:
--
-- PostgreSQL database dump
--

-- Started on 2009-07-08 12:46:10 UTC

SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

SET search_path = public, pg_catalog;
I wanted to trim that off -- without re-writing the entire 26 gigabyte file.

I came up with a solution that worked for me, but am not sure it is the 'best' one. I thought I'd post here to see if anyone else had a better idea.

First, use 'wc' to find out how many bytes these unnecessary lines take up:
Code:
$ head -14 pg-dump-output.sql | wc -
     14      37     286 -
Then I used the "offset" feature of losetup to create a loopback file that points 286 bytes into the beginning of the 26 gigabyte file:
Code:
$ sudo losetup -f -o 286 /somedir/otherdir/pg-dump-output.sql
$ sudo chmod 644 /dev/loop0
I then used /dev/loop0 as the input 'file' for the SQL command shell for the other database.

Thoughts?
__________________
OS: Fedora12 x86_64
Hardware: Dell Precision M4300 laptop, core2duo 2.4GHz (9575.30 BogoMIPS), 4GB RAM, NVidia Quadro FX 360M (512M) @ 1680x1050, 256GB SSD

Last edited by nt4cats; 15th July 2009 at 03:48 PM. Reason: I put the incorrect offest into the original post (oops)
Reply With Quote
  #2  
Old 15th July 2009, 03:55 PM
stevea's Avatar
stevea Offline
Registered User
 
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,346
Your methods seem fine - good use of stream tools. I think your regular expression after COPY might be improved to speed things a bit, but it's a one-time job.

Use of the losetup loop device is a clean way to solve the problem without re-writing. If the extra was at end-of-file we could truncate. At the beginning-OF there aren't a lot of good options.

If you do go to copy it, note the dd has a "seek" offsetitng option too, but itwould probably be best ti use the losetup anyway.
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe
Reply With Quote
  #3  
Old 15th July 2009, 04:01 PM
Gödel's Avatar
Gödel Offline
Registered User
 
Join Date: Jul 2009
Location: London,England
Posts: 1,098
Yes, nice solution.

I would probably have written a small piece of c code to parse the file in one go, skipping the first lines and modifying the rest.

Another speed tip is to specify 'LANG=C' or 'LANG=en_US' before the sed command to bypass the slow UTF-8 handling.
Reply With Quote
  #4  
Old 16th July 2009, 01:36 PM
nt4cats Offline
Registered User
 
Join Date: Aug 2005
Location: Philadelphia, PA
Posts: 99
I'm curious to know how you'd improve the regex. I don't doubt it can be improved upon. I know enough about regex's to be occasionally useful, but I'm always interested in learning better ways to do things. I'm sure that other lurkers on this thread would like to know as well!
__________________
OS: Fedora12 x86_64
Hardware: Dell Precision M4300 laptop, core2duo 2.4GHz (9575.30 BogoMIPS), 4GB RAM, NVidia Quadro FX 360M (512M) @ 1680x1050, 256GB SSD
Reply With Quote
  #5  
Old 16th July 2009, 05:25 PM
RupertPupkin's Avatar
RupertPupkin Offline
Registered User
 
Join Date: Nov 2006
Location: Detroit
Posts: 4,728
You could have deleted the first 14 lines like this:
Code:
sed -i -e 1,14d pg-dump-output.sql

Last edited by RupertPupkin; 16th July 2009 at 05:44 PM. Reason: Separated the -i and -e options
Reply With Quote
Reply

Tags
editing, file, gigabyte, text

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Disappearing thumbdrive Gigabyte bios file. JONOR Using Fedora 0 17th June 2009 05:21 PM
Text Editing Shortcuts stevevm Using Fedora 1 25th November 2004 07:40 PM
Searching for software with scanning, text/graphic editing, etc. functions satimis Using Fedora 3 8th November 2004 01:15 AM


Current GMT-time: 09:36 (Thursday, 20-06-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat