Fedora Linux Support Community & Resources Center
  #1  
Old 18th April 2012, 07:00 PM
mkass Offline
Registered User
 
Join Date: Apr 2012
Location: Denver, CO
Posts: 2
windows_7chrome
New Cluster

Hey all

I'm trying to do a simple operation on a new cluster we have, but this is my first attempt and I'm having trouble. I have five nodes, each with 12 threads. Right now, I'm the only one using this cluster.

I have a serial program that I basically want to run multiple instances across all the threads available. I need to run this serial program on 1000 datasets, each of which is independent. I wrote a serial bash script which simply runs the program 1000 times and iterates through the datasets. I would like to be able to run 60 concurrent instances (or whatever number of threads are available), with each moving on to the next available dataset as it finishes. Since it's only 5 nodes, I have no problem manually parsing the data into five chunks and running them on each, but if it's simple to automate from the master node that would be preferable.

My serial test script runs just fine using qsub, but is only using one single thread.

I apologize for such a pedestrian question, but I don't even have a concept of the architecture required. Thanks so much.

---------- Post added at 08:20 AM ---------- Previous post was at 08:06 AM ----------

More information:

Here is my serial test scripts

Simple test program
Code:
#include <iostream>
#include <time.h>
using namespace std;

int main() {

    clock_t launch=clock();
    time_t curr=time(0);
    cout << endl;
    cout << "  Start time: " << ctime(&curr) << endl;
    sleep(5); //just to make sure they're not all running at once
    time_t curre=time(0);
    cout << "  End time: " << ctime(&curre) << endl;
    return 0;
}
and the script:

Code:
#!/bin/bash
COUNTER=1
while [ $COUNTER -lt 500]; do
    echo
    echo "  "$COUNTER 
    $HOME/Jobs/qsub_test/pauseexample
    let COUNTER=COUNTER+1
done


---------- Post added at 11:00 AM ---------- Previous post was at 08:20 AM ----------

Two partial solutions from these sites:

https://wikis.nyu.edu/display/NYUHPC/PBSDSH
using pbsdsh and

http://www.nas.nasa.gov/hecc/support...ltime_184.html
using mpiexec.
Reply With Quote
  #2  
Old 23rd April 2012, 05:49 PM
mkass Offline
Registered User
 
Join Date: Apr 2012
Location: Denver, CO
Posts: 2
windows_7chrome
Re: New Cluster

OK, I have a set of scripts that work for me now. They're clunky and it's certainly not optimal, but they work.

Using qsub, I run a master script:
Code:
#!/bin/sh
#PBS -N pbsdsh
#PBS -l nodes=5:ppn=12

#cd $PBS_O_WORKDIR
cd /data/home/mkass/Jobs/aniak
octave parsejobs.m

pbsdsh -v sh $PBS_O_WORKDIR/thrrun.sh
I have an octave script, parsejobs, that creates a script for each node (I think this is how Skynet starts).
It's a little extra complicated because my 'line numbers' are not sequential, and (for reasons evident in the next script) I need sequential, numeric values for each directory.
It's in this script that the jobs are parsed out and sent to each processor.
Code:
fid = fopen('linelist.txt','rt');
nNodes = 5;
nThr = 12;

ii = 0;
while ~feof(fid)
	temp = fscanf(fid,'%s',1);
	ii=ii+1;
end
fclose(fid);
nSounds = ii;

fid = fopen('linelist.txt','rt');
lines = cell(nSounds,1);
for ii = 1:nSounds
	lines{ii}=fscanf(fid,'%s',1);
end
fclose(fid);

b = floor(nSounds/nNodes);
ro = mod(nSounds,nNodes);

opspnode = zeros(nNodes,1);
opspnode(1:nNodes) = b;
for ii = 1:ro
	opspnode(ii) = opspnode(ii)+1;
end

opspthread = zeros(nNodes,nThr);

for ii = 1:nNodes
	b = floor(opspnode(ii)/nThr);
	ro = mod(opspnode(ii),nThr);
	opspthread(ii,:) = b;
	for jj = 1:ro
		opspthread(ii,jj) = opspthread(ii,jj) + 1;
	end
end
%disp(opspthread');

save 'parsejobs.mat' opspthread;
save 'lines.mat' lines;

opsthrcat = reshape(opspthread,nNodes*nThr,1);
%% Make 60 scripts
for ii = 1:(nNodes*nThr)
%	fid = fopen(strcat('thrrun.',num2str(ii),'.temp.m'),'wt');
	fid = fopen(strcat('thrrun.',num2str(ii-1),'.temp.sh'),'wt');
	%each script handles 
	%sum(opsthrcat(1:ii))-(opsthrcat(ii)+1):sum(opsthrcat(1:ii))
	%sum(opsthrcat(1:ii))-(opsthrcat(ii)+1):sum(opsthrcat(1:ii))
	fprintf(fid,'%s\n','#!/bin/bash');
	%fprintf(fid,'%s\n','cd $PBS_O_WORKDIR');
	fprintf(fid,'%s\n','cd /data/home/mkass/Jobs/aniak');
	t = strcat('COUNTER=',num2str(sum(opsthrcat(1:ii))-(opsthrcat(ii))+1));
	fprintf(fid,'%s\n',t);
	%t = strcat('while [ $COUNTER -lt ',num2str(sum(opsthrcat(1:ii))+1));
	t = ['while [ $COUNTER -lt ',num2str(sum(opsthrcat(1:ii))+1)];
	t2 = strcat(t,' ]; do');
	fprintf(fid,'%s\n',t2);
	fprintf(fid,'%s\n','mkdir $COUNTER');
	fprintf(fid,'%s\n','cp data/$COUNTER.txt $COUNTER/');
	fprintf(fid,'%s\n','cp makeEM1DFMobs.m $COUNTER/');
	fprintf(fid,'%s\n','cp writeEM1DFMobsV3.m $COUNTER/');
	fprintf(fid,'%s\n','cp em1dfmgen.m $COUNTER/');
	fprintf(fid,'%s\n','cp start.mod $COUNTER/');
	fprintf(fid,'%s\n','cp ref.mod $COUNTER/');
	fprintf(fid,'%s\n','cp em1dfm/* $COUNTER/');
	fprintf(fid,'%s\n','cd $COUNTER/');
	fprintf(fid,'%s\n','octave makeEM1DFMobs.m');
	fprintf(fid,'%s\n','octave em1dfmgen.m');
	fprintf(fid,'%s\n','wine em1dfm.exe');
	fprintf(fid,'%s\n','rm *.exe');
	fprintf(fid,'%s\n','cd ../');
	fprintf(fid,'%s\n','let COUNTER=COUNTER+1');
	fprintf(fid,'%s\n','done');
%	fclose(fid);
	fclose(fid);
end
And finally, this script executes each script written out above.
Code:
#!/bin/sh
cd /data/home/mkass/Jobs/aniak
sh thrrun.$PBS_VNODENUM.temp.sh
So there you have it. An operation that would have taken 13 days was done in 5 hours. There's plenty of room for improvement, but this works for me. Cheers!

Last edited by mkass; 23rd April 2012 at 05:50 PM. Reason: Typo in the last script
Reply With Quote
Reply

Tags
cluster, qsub

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Cluster bghayad Servers & Networking 3 28th February 2011 11:36 AM
GFS and Cluster on FC5/FC6 sameeh Servers & Networking 1 1st November 2006 08:51 PM
Bad Cluster Sr Ramitas Using Fedora 0 8th October 2005 05:28 PM
Cluster careca2004 Using Fedora 1 20th July 2005 10:53 AM
trying to set up a cluster Kalen Using Fedora 4 14th December 2004 10:52 PM


Current GMT-time: 04:54 (Tuesday, 16-09-2014)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat