PopcornKing
10th March 2010, 03:07 PM
So I was executing some bash scripts on a directory of data (~600 files) last night.
My processor usage was limited to 1 core and I couldnt really see it on a graph.
I have dual quadcore and was thinking what a shame to not use other cores.
So I threw together a little bash script to try to run my jobs in parallel.
My total processor usage now looks like a saw tooth function.
To make it a rectangle/uniform function:
I think I need a more sophisticated wait/spawn subshell scheme.
I currently wait for all subshells to finish, but ideally once one is done I should start another.
Whats the best way to implement that?
Thoughts? Comments in general are welcome?
#already have created file_list containing filenames
num_files=${#file_list }
ncpu=`cat /proc/cpuinfo | grep processor | wc -l`
#determine remainder when dividing file_list size by the number of your cpus/cores
n_start_files=`expr ${num_files} % ${ncpu}`
#process the remainder in parallel
echo "Processing initial num of files"
for(( i=0; i < ${n_start_files}; i++))
do
#spawn all in subshells
( call_my_script "${file_list[${i}]}" ) &
done
#wait for the subshells/scripts to finish
wait
echo "Done processing inital num of files"
i=${n_start_files}
#number of full parallel loops required to run your data
num_loops=`expr ${num_files}/${ncpu}`
echo "Begin parallel processing of files"
for (( j=0; j < ${num_loops}; j++))
do
#this spawns $ncpu subshells executing the script
for ((k=0; k < ${ncpu}; k++))
do
( call_my_script "${file_list[${i}+${j}*${ncpu}+${k}]}" ) &
done
#wait for subshells/script to finish
wait
echo "finished ${j}th parallel iteration"
done
echo "Done parallel processing of files"
echo "Leaving ${0}"
exit 0
My processor usage was limited to 1 core and I couldnt really see it on a graph.
I have dual quadcore and was thinking what a shame to not use other cores.
So I threw together a little bash script to try to run my jobs in parallel.
My total processor usage now looks like a saw tooth function.
To make it a rectangle/uniform function:
I think I need a more sophisticated wait/spawn subshell scheme.
I currently wait for all subshells to finish, but ideally once one is done I should start another.
Whats the best way to implement that?
Thoughts? Comments in general are welcome?
#already have created file_list containing filenames
num_files=${#file_list }
ncpu=`cat /proc/cpuinfo | grep processor | wc -l`
#determine remainder when dividing file_list size by the number of your cpus/cores
n_start_files=`expr ${num_files} % ${ncpu}`
#process the remainder in parallel
echo "Processing initial num of files"
for(( i=0; i < ${n_start_files}; i++))
do
#spawn all in subshells
( call_my_script "${file_list[${i}]}" ) &
done
#wait for the subshells/scripts to finish
wait
echo "Done processing inital num of files"
i=${n_start_files}
#number of full parallel loops required to run your data
num_loops=`expr ${num_files}/${ncpu}`
echo "Begin parallel processing of files"
for (( j=0; j < ${num_loops}; j++))
do
#this spawns $ncpu subshells executing the script
for ((k=0; k < ${ncpu}; k++))
do
( call_my_script "${file_list[${i}+${j}*${ncpu}+${k}]}" ) &
done
#wait for subshells/script to finish
wait
echo "finished ${j}th parallel iteration"
done
echo "Done parallel processing of files"
echo "Leaving ${0}"
exit 0