Quote:
Originally Posted by pindakoe
May be use straight ed with some indirection. Create a file chop (for want of a beter name) with following contents:
and process your file 'largefile' as follows: $ ed largefile <chop
|
I tried that on a 1.9GB text file and it bombed out after 2 minutes, with no changes made.
Using ex, however, did work on the same file:
Code:
$ time printf ":1,2d\n:wq\n" | ex file.txt
real 8m57.601s
user 0m34.070s
sys 0m23.025s
I don't think there would be anything that's significantly faster than sed. Here's what I got using sed, awk, tail, and perl:
Code:
$ time sed -e '1,2d' file.txt > newfile.txt
real 2m16.408s
user 0m22.224s
sys 0m7.256s
$ time awk 'NR > 2' file.txt > newfile.txt
real 2m21.860s
user 0m42.957s
sys 0m7.440s
$ time tail -n +2 file.txt > newfile.txt
real 2m13.582s
user 0m0.385s
sys 0m6.899s
$ time perl -p -e '$_ = "" if ($. < 3);' file.txt > newfile.txt
real 3m29.129s
user 1m28.169s
sys 0m6.926s
In each case I re-used the original 1.9GB text file and verified that only the first 2 lines were deleted.
Both sed and perl have a -i option to edit the file in place, but that didn't change the time for perl and actually made it almost 3 times as long for sed (6m27.653s).
I did get a slight improvement using Java:
Code:
import java.io.*;
public class delFromStart {
public static void main(String[] args) {
try {
RandomAccessFile myfile = new RandomAccessFile(args[0],"rw");
int n = Integer.parseInt(args[1]); //Bytes to remove from start
long writePos = myfile.getFilePointer();
myfile.seek(n);
long readPos = myfile.getFilePointer();
byte[] buf = new byte[1024];
int m;
while (-1 != (m = myfile.read(buf))) {
myfile.seek(writePos);
myfile.write(buf, 0, m);
readPos += m;
writePos += m;
myfile.seek(readPos);
}
myfile.setLength(writePos);
myfile.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The program (delFromStart.java) takes 2 command-line parameters: the file name and the number of bytes to delete from the start of the file (which can be obtained automatically using the head command):
Code:
$ javac delFromStart.java
$ time java delFromStart file.txt $(head -2 file.txt | wc -c)
real 1m52.617s
user 0m3.497s
sys 0m7.402s
This edited the file in place, and I verified that it deleted only the first 2 lines.
I think deleting lines from the top of a file will always be inefficient, because (unless I'm wrong) it involves copying or at least moving everything from the end of the file "up" to the top. Deleting from the end just means chopping off the bottom with nothing moved or copied. For example, modifying the Java program to delete from the end of a file makes the deletion almost instantaneous (0m0.229s). And the 'truncate' utility from the GNU coreutils pakage is even faster (0m0.001s). It's possible, though, that someone could come up with a fast way of deleting from the top, perhaps using a functional language