Fedora Linux Support Community & Resources Center

Go Back   FedoraForum.org > Fedora 17/18 > Using Fedora
FedoraForum Search

Forgot Password? Join Us!

Using Fedora General support for current versions. Ask questions about Fedora and it's software that do not belong in any other forum.

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 28th May 2012, 07:07 AM
papori Offline
Registered User
 
Join Date: Nov 2010
Posts: 58
linuxfedorafirefox
Delete lines faster than sed

Hy all!

i have huge text file (150G)
i want to delete 2 first lines.
Can i do this faster than using:
sed '1,2d'
?

Thanks,
Pap
Reply With Quote
  #2  
Old 28th May 2012, 08:22 AM
stevea's Avatar
stevea Offline
Registered User
 
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,302
linuxfirefox
Re: Delete lines faster than sed

I suspect sed and awk will be pretty ugly. How fast is this ?

tail -n +2 filename

This will read and write the 150GB - so it will be pretty ugly.
Maybe an hour w/ a fast disk.
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe
Reply With Quote
  #3  
Old 28th May 2012, 08:48 AM
papori Offline
Registered User
 
Join Date: Nov 2010
Posts: 58
linuxfedorafirefox
Re: Delete lines faster than sed

Thanks stevea!
You really helped me today!
Reply With Quote
  #4  
Old 28th May 2012, 08:56 AM
pindakoe Offline
Registered User
 
Join Date: Dec 2011
Posts: 32
linuxchrome
Re: Delete lines faster than sed

May be use straight ed with some indirection. Create a file chop (for want of a beter name) with following contents:
Code:
1,2d
wq
and process your file 'largefile' as follows: $ ed largefile <chop
Reply With Quote
  #5  
Old 28th May 2012, 12:29 PM
george_toolan Offline
Registered User
 
Join Date: Dec 2006
Posts: 1,718
linuxfirefox
Re: Delete lines faster than sed

If you had more than one HDD it would probably be a lot faster if you could read the file from one drive and save the result to another ;-)
Reply With Quote
  #6  
Old 28th May 2012, 08:39 PM
RupertPupkin's Avatar
RupertPupkin Offline
Registered User
 
Join Date: Nov 2006
Location: Detroit
Posts: 4,619
linuxfedorafirefox
Re: Delete lines faster than sed

Quote:
Originally Posted by pindakoe View Post
May be use straight ed with some indirection. Create a file chop (for want of a beter name) with following contents:
Code:
1,2d
wq
and process your file 'largefile' as follows: $ ed largefile <chop
I tried that on a 1.9GB text file and it bombed out after 2 minutes, with no changes made.

Using ex, however, did work on the same file:
Code:
$ time printf ":1,2d\n:wq\n" | ex file.txt

real    8m57.601s
user    0m34.070s
sys     0m23.025s
I don't think there would be anything that's significantly faster than sed. Here's what I got using sed, awk, tail, and perl:
Code:
$ time sed -e '1,2d' file.txt > newfile.txt

real    2m16.408s
user    0m22.224s
sys     0m7.256s

$ time awk 'NR > 2' file.txt > newfile.txt

real    2m21.860s
user    0m42.957s
sys     0m7.440s

$ time tail -n +2 file.txt > newfile.txt

real    2m13.582s
user    0m0.385s
sys     0m6.899s

$ time perl -p -e '$_ = "" if ($. < 3);' file.txt > newfile.txt

real    3m29.129s
user    1m28.169s
sys     0m6.926s
In each case I re-used the original 1.9GB text file and verified that only the first 2 lines were deleted.
Both sed and perl have a -i option to edit the file in place, but that didn't change the time for perl and actually made it almost 3 times as long for sed (6m27.653s).

I did get a slight improvement using Java:
Code:
import java.io.*;
public class delFromStart {
 public static void main(String[] args) {
   try {
      RandomAccessFile myfile = new RandomAccessFile(args[0],"rw");
      int n = Integer.parseInt(args[1]); //Bytes to remove from start
      long writePos = myfile.getFilePointer();
      myfile.seek(n);
      long readPos = myfile.getFilePointer();
      byte[] buf = new byte[1024];
      int m;
      while (-1 != (m = myfile.read(buf))) {
         myfile.seek(writePos);
         myfile.write(buf, 0, m);
         readPos += m;
         writePos += m;
         myfile.seek(readPos);
      }
      myfile.setLength(writePos);
      myfile.close();
   } catch (Exception e) {
      e.printStackTrace();
   }
 }
}
The program (delFromStart.java) takes 2 command-line parameters: the file name and the number of bytes to delete from the start of the file (which can be obtained automatically using the head command):
Code:
$ javac delFromStart.java
$ time java delFromStart file.txt $(head -2 file.txt | wc -c)

real    1m52.617s
user    0m3.497s
sys     0m7.402s
This edited the file in place, and I verified that it deleted only the first 2 lines.

I think deleting lines from the top of a file will always be inefficient, because (unless I'm wrong) it involves copying or at least moving everything from the end of the file "up" to the top. Deleting from the end just means chopping off the bottom with nothing moved or copied. For example, modifying the Java program to delete from the end of a file makes the deletion almost instantaneous (0m0.229s). And the 'truncate' utility from the GNU coreutils pakage is even faster (0m0.001s). It's possible, though, that someone could come up with a fast way of deleting from the top, perhaps using a functional language
__________________
OS: Fedora 18 x86_64 | CPU: AMD64 3700+ 2.2GHz | RAM: 2GB PC3200 DDR | Disk: 160GB PATA | Video: ATI Radeon 7500 AGP 64MB | Sound: Turtle Beach Santa Cruz CS4630 | Ethernet: Realtek 8110SC
Reply With Quote
  #7  
Old 28th May 2012, 09:47 PM
jamielinux Offline
Registered User
 
Join Date: Jun 2011
Posts: 64
linuxfirefox
Re: Delete lines faster than sed

Just a quick note that sed has a "-i" option that allows you to edit the file in place:

Code:
sed -i -e '1,2d' file.txt
Reply With Quote
Reply

Tags
delete, faster, lines, sed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Restore Delete instead of CTRL+Delete in Nautilus T3STY Using Fedora 8 21st June 2012 08:12 PM
delete lines with condition! papori Using Fedora 5 19th April 2012 02:17 PM
allow user to save/edit/delete documents in a folder but not delete that folder lothario Using Fedora 25 7th September 2011 01:01 AM
user account delete bug - unable to delete standard user account from administrator caesargunasingh Using Fedora 2 28th June 2011 05:16 PM
Delete lines from a file panpol Using Fedora 3 27th October 2005 04:46 PM


Current GMT-time: 01:44 (Friday, 24-05-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat