Fedora Linux Support Community & Resources Center

Go Back   FedoraForum.org > Fedora 17/18 > Using Fedora
FedoraForum Search

Forgot Password? Join Us!

Using Fedora General support for current versions. Ask questions about Fedora and it's software that do not belong in any other forum.

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 19th March 2012, 05:58 AM
chockfullonuts Offline
Registered User
 
Join Date: Jan 2011
Posts: 20
linuxubuntufirefox
help with awk command, I need the best and brightest

I deal with a program that spits out a lot of .txt files into a folder for these files. Each of these files, while encoded as .txt has a sequential number extension, foo.98, foo.97 etc. I need to find a way to print, for example the last eleven lines or even better the next five lines following a string on a line. For example, If each of the documents in this folder has a line with the characters "after this line is the data I need" the next five lines will be printed to a file. I have been working with this command awk 'FNR==2 {print} FNR>=140 && FNR<=170 {print}' foo.* > ~/output. I works as it should however the data I need to collect does not always wind up in lines 140-170.
The data I need to collect does always wind up in the last 11 lines in the file and is always the five lines following a specific line (cut and pasted)" | delimiter offset |" In english the script would say from folder foofolder scan files called foo.* and print line 2 followed by the five lines after the line that has "delimiter offset" in it. The data I need does always wind up un the last 11 lines of the file though. Any ideas are appreciated . Thank you
Reply With Quote
  #2  
Old 19th March 2012, 10:57 AM
Adunaic's Avatar
Adunaic Offline
Registered User
 
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883
linuxfirefox
Re: help with awk command, I need the best and brightest

Please use more sensible titles for your posts in the future. A clear description such as "need help extracting text from files" would be better and would get you more help. If you mention a specific command people might avoid your post as they are not familiar with the command (of course, if you really do need help with that command, put it in the title; In this case others should do teh job better).

Also there is no need to ask for "the best and brightest"; Firstly the support on here is provided by people out of the goodness of there heart and it is offensive to imply some of them are not up to the task of helping you. Secondly we are the best and brightest, so no need to ask

As for your problem:

I think you will be better served by grep. It can find lines matching a string, and so many lines above and below.

Code:
       -A NUM, --after-context=NUM
              Print NUM  lines  of  trailing  context  after  matching  lines.
              Places   a  line  containing  a  group  separator  (--)  between
              contiguous groups of matches.  With the  -o  or  --only-matching
              option, this has no effect and a warning is given.

       -B NUM, --before-context=NUM
              Print  NUM  lines  of  leading  context  before  matching lines.
              Places  a  line  containing  a  group  separator  (--)   between
              contiguous  groups  of  matches.  With the -o or --only-matching
              option, this has no effect and a warning is given.
Let us look for "string_to_search_for" and then print it and the 5 lines after it.
Code:
grep 'string_to_search_for' -A 5 file_to_look_in
Then, assuming you do not need the first line and only want the last five, you could pipe it into "tail"

Code:
grep 'string_to_search_for' -A 5 file_to_look_in | tail -5
If they are very long files and you know that they are ALWAYS in the last 11 lines of the file, you could speed it up by doing:

Code:
tail -11 file_to_look_in | grep 'string_to_search_for' -A 5 file_to_look_in | tail -5
I suspect you will have more questions on the same issue. Ask away and I shall try my best to answer.
Reply With Quote
  #3  
Old 19th March 2012, 06:08 PM
chockfullonuts Offline
Registered User
 
Join Date: Jan 2011
Posts: 20
linuxubuntufirefox
Re: help with awk command, I need the best and brightest

I knew Fedora forums has the best and the brightest, Thats why I posted here.
You were right about grep being the best tool for what I am trying to do. It works like a charm. However, is there a way to also print the first two or even better just the second line from the text file? The second line in these text files contains information about when the file was created and it would be great to have this printed just before the output from grep for each file. Thanks again.
Reply With Quote
  #4  
Old 19th March 2012, 06:17 PM
jpollard Offline
Registered User
 
Join Date: Aug 2009
Location: Waldorf, Maryland
Posts: 6,105
linuxfirefox
Re: help with awk command, I need the best and brightest

head -2 <file>

comes to mind for getting the first two lines.

head -2 <file>| tail -1

comes to mind for getting the second line only.
Reply With Quote
  #5  
Old 19th March 2012, 06:28 PM
chockfullonuts Offline
Registered User
 
Join Date: Jan 2011
Posts: 20
linuxubuntufirefox
Re: help with awk command, I need the best and brightest

I was playing around with the head command. As I have it now the output file has the output of the head command then the output of the grep command. So the beginning of the file is a list of dates and times, then after that comes all the data I need to collect. I am trying to get the output to be "the second line from a file followed by the 7 lines following the string "delimiter offset"
from each of these files. This is not as easy as I thought it would be
Reply With Quote
  #6  
Old 19th March 2012, 07:30 PM
Adunaic's Avatar
Adunaic Offline
Registered User
 
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883
linuxfirefox
Re: help with awk command, I need the best and brightest

Could you provide an example (or pseudo-example) of the output file and what you want out of it.
Its just that I am struggling to understand what you need.
Reply With Quote
  #7  
Old 19th March 2012, 07:54 PM
chockfullonuts Offline
Registered User
 
Join Date: Jan 2011
Posts: 20
linuxubuntufirefox
Re: help with awk command, I need the best and brightest

I included an attachment. I have hundreds of these files and I need to get just the 7 lines after string "CCMvdripper" and the line that starts with the word "Operator" at the top . It would be great to be able to gather this information from all these files and spit it out into one long file that would contain

Operator line
data from the seven lines after string
Operator line
data from the seven lines after string
etc etc
etc etc

Thanks so much for the help,
Attached Files
File Type: txt sample.12.txt (1.4 KB, 21 views)
Reply With Quote
  #8  
Old 19th March 2012, 08:33 PM
Adunaic's Avatar
Adunaic Offline
Registered User
 
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883
linuxfirefox
Re: help with awk command, I need the best and brightest

I was going to hint at things, but I am in a generous mood tonight (and I was a little sharp in my first post).

Does this do what you want?

Code:
for i in `ls sample.*.txt`; do grep Operator sample.12.txt >> tmp.file.txt; grep CCMvdripper -A 7 sample.12.txt | tail -7 >> tmp.file.txt; done;
Reply With Quote
  #9  
Old 20th March 2012, 12:59 AM
marko's Avatar
marko Offline
Registered User
 
Join Date: Jun 2004
Location: Laurel, MD USA
Posts: 5,449
linuxfirefox
Re: help with awk command, I need the best and brightest

I think this would do what you want. Something like this works better with a real language than
struggling with complicated command line piped statements. You can modify the variables
under the comment "below can be changed", the triggers list would need some understanding
of python regex strings to change correctly but basically the '|' s have to escaped by \ and the \s+ means
match to any amount of whitespace.

Run as dataget.py <directory path to data files>
i.e. dataget.py /home/username/datafiles

Code:
#!/usr/bin/python
#
# dataget.py   <path to data>
#

import os
import sys
import re

if len(sys.argv) != 2:
    sys.exit("USAGE: {0} {1}".format(sys.argv[0], sys.argv[1]) )

datadir = sys.argv[1] # the first arg is the path to the data dir
if not os.path.isdir(datadir):
    sys.exit("data directory {0} does not exist".format(datadir))

os.chdir(datadir)

# below can be changed
outputfilename = 'output'
inputfileprefix = 'sample.'
linecount = 7
triggers = [ 'Operator:', '\|\s+CCMvdripper\s+\|' ]
# this is a list of all the regex patterns
patterns = [ re.compile(reg) for reg in triggers ]

inputfiles = [ f for f in os.listdir(datadir) if f.startswith(inputfileprefix) ]
#print "input files: ", inputfiles

with open(os.path.join(datadir, outputfilename), 'w') as outf:
    for filename in inputfiles:
        with open(filename, 'r') as inf:
            step = 0
            for line in inf:
                #print "line is %s" % line
                #print "testing with pattern %s step: %s" % (patterns[step], step)
                if patterns[step].match(line):
                    #print "line matched for step %s" % (step)
                    if step == 0: # just output this one line
                        outf.write(line)
                        #print "writing out line %s" % line.strip()
                        step = 1
                    else: # output the multiple lines after trigger line
                        count = linecount
                        while(count > 0):
                            line = inf.next()
                            outf.write(line)
                            #print "writing out mline %s %s" % (line.strip(), count)
                            count -= 1
                        break
                else:
                    pass
                    #print "line %s did not match at step %s" % (line.strip(),step)
            else:
                print "file {0} did not have both delimiters".format(filename)
ps. Note that regex match() is used instead of search(), match in this case is faster since it
gives up the matching on a string if it's not at the start. search() goes all the way to the end.
But that means if the trigger string is not at the start of the line the script won't work. In that case
change the "match()" to "search()"

Last edited by marko; 20th March 2012 at 02:26 PM. Reason: badly placed commented out debug statement
Reply With Quote
  #10  
Old 20th March 2012, 03:27 PM
chockfullonuts Offline
Registered User
 
Join Date: Jan 2011
Posts: 20
linuxubuntufirefox
Re: help with awk command, I need the best and brightest

Man, That is a work of art, and it works like a charm. Thank you. I am trying to learn scripting when I have the time (which is far in between). You saved the day. Thank you all for the great help. I learned alot from these replies.
Reply With Quote
Reply

Tags
awk, brightest, command

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Java jre1.6.0_97 Export Command Line Command richm8027 Using Fedora 23 7th September 2008 08:04 AM


Current GMT-time: 08:35 (Thursday, 23-05-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat