Fedora Linux Support Community & Resources Center
  #1  
Old 19th September 2012, 12:03 AM
rudra-b Offline
Registered User
 
Join Date: Jun 2008
Posts: 173
linuxfirefox
get the full string when a word matches

I have files that contain line like:
Code:
(2010) 4287-4293. doi:10.1016/j.physb.2010.07.028
etc.
where the doi:* is of importance to me.
is there any way in C/shell script to get the string(even from the middle of line) begins with doi: and delimited only by space or newline?
so that from the string, I will get
Quote:
10.1016/j.physb.2010.07.028
I cannot do awk(in my knowledge), as in which column of the line doi: will appear is uncertain.
Reply With Quote
  #2  
Old 19th September 2012, 01:58 AM
marko's Avatar
marko Offline
Registered User
 
Join Date: Jun 2004
Location: Laurel, MD USA
Posts: 5,449
linuxfirefox
Re: get the full string when a word matches

I don't know of how to get a c-shell script to do that because I don't know of a way to get regex support into a c-script but I hacked up an example perl script that works for me (below). Assuming the input file is "infile.fil", this reads the whole thing into an array, then processes it line by line from the array

Code:
#!/usr/bin/perl -w
use strict;
use warnings;

open(MF, "<infile.fil");
my @slurp = <MF>;
close(MF);
foreach my $line (@slurp) {
    if ($line =~ m/\s+doi:(.*)$/) {
        print "$1\n";
    }
}
If the input file is really huge, you'd want to not use a slurp style processing of the file but do it line by line.

The \s+ means that any amount of leading white spaces in front of the "doi:" is allowed and matches. But it requires at least one char of space. \s* would do 0 or more white spaces.

Last edited by marko; 19th September 2012 at 02:07 AM.
Reply With Quote
  #3  
Old 19th September 2012, 02:01 AM
jpollard Online
Registered User
 
Join Date: Aug 2009
Location: Waldorf, Maryland
Posts: 6,105
linuxfirefox
Re: get the full string when a word matches

I would use perl

Code:
$ perl -ne 'if (/doi:(.*)\s/){ print $1,"\n";}' <input_file >outputfile
Its shorter. Basic explaination - it only prints if the pattern is found. The pattern /doi.*)\s/ starts and ends with the "/" pattern matching string. What the pattern starts at is doi: followed by any number of characters (the .*) and ends with a whitespace (\s for spacer) which includes the end of line possibility. The "(.*)" identifies the part of the pattern (when identified) to extract - each () would identify another substring, each numbered starting from 1. So the print $1 prints the first identified substring...

This does the same as the previous poster, but uses the perl options -ne (the n puts the -e parameter in a loop that doesn't echo the input data, without the n you get the pattern identified and the following line is what it came from).

Last edited by jpollard; 19th September 2012 at 02:08 AM.
Reply With Quote
  #4  
Old 19th September 2012, 05:37 AM
RupertPupkin's Avatar
RupertPupkin Offline
Registered User
 
Join Date: Nov 2006
Location: Detroit
Posts: 4,619
linuxfedorafirefox
Re: get the full string when a word matches

Quote:
Originally Posted by rudra-b View Post
I cannot do awk(in my knowledge), as in which column of the line doi: will appear is uncertain.
You could use awk's gsub function. For example, suppose you have this input file (infile.txt):
Code:
some stuff
(2010) 4287-4293. doi:10.1016/j.physb.2010.07.028
(2010) 4287-4293. doi:10.1016/j.physb.2010.07.029
some other stuff
Hey doi:!
(2010) 4287-4293. doi:10.1016/j.physb.2010.07.030
some more stuff
Then you could do this:
Code:
$ awk '$0 ~ /doi:/ {gsub(/^.*doi:/, ""); print}' infile.txt
10.1016/j.physb.2010.07.028
10.1016/j.physb.2010.07.029
!
10.1016/j.physb.2010.07.030
__________________
OS: Fedora 18 x86_64 | CPU: AMD64 3700+ 2.2GHz | RAM: 2GB PC3200 DDR | Disk: 160GB PATA | Video: ATI Radeon 7500 AGP 64MB | Sound: Turtle Beach Santa Cruz CS4630 | Ethernet: Realtek 8110SC
Reply With Quote
  #5  
Old 19th September 2012, 06:53 AM
stevea's Avatar
stevea Offline
Registered User
 
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,302
linuxfirefox
Re: get the full string when a word matches

grep -o "doi:.*" /tmp/foo | cut -b5-
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe
Reply With Quote
  #6  
Old 19th September 2012, 08:39 AM
ocratato Offline
Registered User
 
Join Date: Oct 2010
Location: Canberra
Posts: 551
linuxfirefox
Re: get the full string when a word matches

Most of the answers seem to haved missed " and delimited only by space or newline?" which leads me to think that there might sometimes be more stuff on the end of the line occasionally.

Code:
sed -n 's/\(^.*\)doi:\([^ ]*\)\(.*$\)/\2/p' < infile > outfile
should do the job.
Reply With Quote
  #7  
Old 19th September 2012, 09:29 PM
rudra-b Offline
Registered User
 
Join Date: Jun 2008
Posts: 173
linuxfirefox
Re: get the full string when a word matches

Thanks to all of you. ocratato's solution worked.
Reply With Quote
Reply

Tags
matches, string, word

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to replace a string by a string in text file bghayad Using Fedora 3 26th April 2011 12:20 PM
Password could not matches (login problem) abhijitsarangi Using Fedora 3 3rd July 2010 08:17 PM
Change ip in order to watch football matches nlkrio Servers & Networking 2 20th November 2007 08:27 PM
server glx vendor string: SGI, server glx version string: 1.2 + ATI X1950 akp Using Fedora 1 2nd September 2007 02:56 AM
sed regex string replacement after match string issue adosch Using Fedora 10 29th June 2007 04:07 PM


Current GMT-time: 08:50 (Friday, 24-05-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat