 |
 |
 |
 |
| Using Fedora General support for current versions. Ask questions about Fedora and it's software that do not belong in any other forum. |

19th March 2012, 05:58 AM
|
|
Registered User
|
|
Join Date: Jan 2011
Posts: 20

|
|
|
help with awk command, I need the best and brightest
I deal with a program that spits out a lot of .txt files into a folder for these files. Each of these files, while encoded as .txt has a sequential number extension, foo.98, foo.97 etc. I need to find a way to print, for example the last eleven lines or even better the next five lines following a string on a line. For example, If each of the documents in this folder has a line with the characters "after this line is the data I need" the next five lines will be printed to a file. I have been working with this command awk 'FNR==2 {print} FNR>=140 && FNR<=170 {print}' foo.* > ~/output. I works as it should however the data I need to collect does not always wind up in lines 140-170.
The data I need to collect does always wind up in the last 11 lines in the file and is always the five lines following a specific line (cut and pasted)" | delimiter offset |" In english the script would say from folder foofolder scan files called foo.* and print line 2 followed by the five lines after the line that has "delimiter offset" in it. The data I need does always wind up un the last 11 lines of the file though. Any ideas are appreciated . Thank you
|

19th March 2012, 10:57 AM
|
 |
Registered User
|
|
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883

|
|
|
Re: help with awk command, I need the best and brightest
Please use more sensible titles for your posts in the future. A clear description such as "need help extracting text from files" would be better and would get you more help. If you mention a specific command people might avoid your post as they are not familiar with the command (of course, if you really do need help with that command, put it in the title; In this case others should do teh job better).
Also there is no need to ask for "the best and brightest"; Firstly the support on here is provided by people out of the goodness of there heart and it is offensive to imply some of them are not up to the task of helping you. Secondly we are the best and brightest, so no need to ask
As for your problem:
I think you will be better served by grep. It can find lines matching a string, and so many lines above and below.
Code:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
Let us look for "string_to_search_for" and then print it and the 5 lines after it.
Code:
grep 'string_to_search_for' -A 5 file_to_look_in
Then, assuming you do not need the first line and only want the last five, you could pipe it into "tail"
Code:
grep 'string_to_search_for' -A 5 file_to_look_in | tail -5
If they are very long files and you know that they are ALWAYS in the last 11 lines of the file, you could speed it up by doing:
Code:
tail -11 file_to_look_in | grep 'string_to_search_for' -A 5 file_to_look_in | tail -5
I suspect you will have more questions on the same issue. Ask away and I shall try my best to answer.
|

19th March 2012, 06:08 PM
|
|
Registered User
|
|
Join Date: Jan 2011
Posts: 20

|
|
|
Re: help with awk command, I need the best and brightest
I knew Fedora forums has the best and the brightest, Thats why I posted here.
You were right about grep being the best tool for what I am trying to do. It works like a charm. However, is there a way to also print the first two or even better just the second line from the text file? The second line in these text files contains information about when the file was created and it would be great to have this printed just before the output from grep for each file. Thanks again.
|

19th March 2012, 06:17 PM
|
|
Registered User
|
|
Join Date: Aug 2009
Location: Waldorf, Maryland
Posts: 6,105

|
|
|
Re: help with awk command, I need the best and brightest
head -2 <file>
comes to mind for getting the first two lines.
head -2 <file>| tail -1
comes to mind for getting the second line only.
|

19th March 2012, 06:28 PM
|
|
Registered User
|
|
Join Date: Jan 2011
Posts: 20

|
|
|
Re: help with awk command, I need the best and brightest
I was playing around with the head command. As I have it now the output file has the output of the head command then the output of the grep command. So the beginning of the file is a list of dates and times, then after that comes all the data I need to collect. I am trying to get the output to be "the second line from a file followed by the 7 lines following the string "delimiter offset"
from each of these files. This is not as easy as I thought it would be
|

19th March 2012, 07:30 PM
|
 |
Registered User
|
|
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883

|
|
|
Re: help with awk command, I need the best and brightest
Could you provide an example (or pseudo-example) of the output file and what you want out of it.
Its just that I am struggling to understand what you need.
|

19th March 2012, 07:54 PM
|
|
Registered User
|
|
Join Date: Jan 2011
Posts: 20

|
|
|
Re: help with awk command, I need the best and brightest
I included an attachment. I have hundreds of these files and I need to get just the 7 lines after string "CCMvdripper" and the line that starts with the word "Operator" at the top . It would be great to be able to gather this information from all these files and spit it out into one long file that would contain
Operator line
data from the seven lines after string
Operator line
data from the seven lines after string
etc etc
etc etc
Thanks so much for the help,
|

19th March 2012, 08:33 PM
|
 |
Registered User
|
|
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883

|
|
|
Re: help with awk command, I need the best and brightest
I was going to hint at things, but I am in a generous mood tonight (and I was a little sharp in my first post).
Does this do what you want?
Code:
for i in `ls sample.*.txt`; do grep Operator sample.12.txt >> tmp.file.txt; grep CCMvdripper -A 7 sample.12.txt | tail -7 >> tmp.file.txt; done;
|

20th March 2012, 12:59 AM
|
 |
Registered User
|
|
Join Date: Jun 2004
Location: Laurel, MD USA
Posts: 5,449

|
|
|
Re: help with awk command, I need the best and brightest
I think this would do what you want. Something like this works better with a real language than
struggling with complicated command line piped statements. You can modify the variables
under the comment "below can be changed", the triggers list would need some understanding
of python regex strings to change correctly but basically the '|' s have to escaped by \ and the \s+ means
match to any amount of whitespace.
Run as dataget.py <directory path to data files>
i.e. dataget.py /home/username/datafiles
Code:
#!/usr/bin/python
#
# dataget.py <path to data>
#
import os
import sys
import re
if len(sys.argv) != 2:
sys.exit("USAGE: {0} {1}".format(sys.argv[0], sys.argv[1]) )
datadir = sys.argv[1] # the first arg is the path to the data dir
if not os.path.isdir(datadir):
sys.exit("data directory {0} does not exist".format(datadir))
os.chdir(datadir)
# below can be changed
outputfilename = 'output'
inputfileprefix = 'sample.'
linecount = 7
triggers = [ 'Operator:', '\|\s+CCMvdripper\s+\|' ]
# this is a list of all the regex patterns
patterns = [ re.compile(reg) for reg in triggers ]
inputfiles = [ f for f in os.listdir(datadir) if f.startswith(inputfileprefix) ]
#print "input files: ", inputfiles
with open(os.path.join(datadir, outputfilename), 'w') as outf:
for filename in inputfiles:
with open(filename, 'r') as inf:
step = 0
for line in inf:
#print "line is %s" % line
#print "testing with pattern %s step: %s" % (patterns[step], step)
if patterns[step].match(line):
#print "line matched for step %s" % (step)
if step == 0: # just output this one line
outf.write(line)
#print "writing out line %s" % line.strip()
step = 1
else: # output the multiple lines after trigger line
count = linecount
while(count > 0):
line = inf.next()
outf.write(line)
#print "writing out mline %s %s" % (line.strip(), count)
count -= 1
break
else:
pass
#print "line %s did not match at step %s" % (line.strip(),step)
else:
print "file {0} did not have both delimiters".format(filename)
ps. Note that regex match() is used instead of search(), match in this case is faster since it
gives up the matching on a string if it's not at the start. search() goes all the way to the end.
But that means if the trigger string is not at the start of the line the script won't work. In that case
change the "match()" to "search()"
Last edited by marko; 20th March 2012 at 02:26 PM.
Reason: badly placed commented out debug statement
|

20th March 2012, 03:27 PM
|
|
Registered User
|
|
Join Date: Jan 2011
Posts: 20

|
|
|
Re: help with awk command, I need the best and brightest
Man, That is a work of art, and it works like a charm. Thank you. I am trying to learn scripting when I have the time (which is far in between). You saved the day. Thank you all for the great help. I learned alot from these replies.
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
Current GMT-time: 08:35 (Thursday, 23-05-2013)
|
|
 |
 |
 |
 |
|
|