So I've got this file server/torrent box running raid 5 with 5 discs ATM, but i  can add more disks to it when it gets full and have been doing so for the last year. Lately we've been noticing terrible performance from it, like browsing a folder (over samba) could take about 30 seconds to load.

Looking into this i found that the files being downloaded slowly by the torrents are getting terribly fragmented,  like, from filefrag
old file: 629 extents found, perfection would be 29 extents

recently downloaded file: 56289 extents found, perfection would be 12 extents

So thats about 4.6k times worse than it should be.

It seems theres been some negligence of this issue, people argue as long as you have some free space ext3 wont get fragmented. Thus there is no proper defragger for ext3. There was one for ext2, and ext3 is backward compatible so could use that, but i'd have to take the fileserver down for some time, and i like using it. There are some other small tools "Shake" and "defrag", but im going to make my own.

I'm going to do some shitty defragging, by copying fragmented files to another place on the disk deleting the old copy and moving the new one in its place. The new file should be less fragmented as the files ystem will try to allocate space in a continuous block. There may be a good reason why I shouldn't do this but, i'm going to strive ahead and see what happens.

Heres a simple test on one file. copying it, looking at its fragmentation.
84025 to 434 : 1.5gig file, perfection would be 13.

So its not making stuff perfect which is disappointing, i may end up doing multiple sweeps with this. But thats still 193 times better than it was.

Second sweep:
434 to 223: 1.5 gig file, perfection would be 13.

Second sweep only gives a 2x improvement.

Heres my script to do some defragmenting, its pretty over the top, with the md5sum checks. I've never seen the copy corrupt a file, but i'd rather be safe than sorry. By sorting the files by most fragmented, and keeping track of the improvement, i can stop if it starts making stuff worse. And by sorting, work on the worst files first.

#!/bin/bash
# a script to fight file fragmentation.

DIR=$1
if [[ $DIR == "" ]]
then
echo "invalid usage"
exit 1
fi

TMP='/tmp/fragment_1.tmp'
CDIR="$DIR/copied_for_fragmentation/"
mkdir "$CDIR"

rm $TMP
# filefrag everything under directory.

echo "calculating frags for files."

find "$DIR" |
while read line
do
FGS=`filefrag "$line" | awk -F':' '{print $2}' | awk '{print $1}'`
echo $FGS::$line >> $TMP
done

# sort by most fragmented
sort -nr $TMP > "$TMP"2
mv "$TMP"2 $TMP

cat "$TMP" |
while read line
do
FG1=`echo $line| awk -F'::' '{print $1}'` # number of fragments file is in
# copy file to different place.

C1=`echo $line| awk -F'::' '{print $2}'`
C1copy="$CDIR/c1.bin"
# echo "copying to new file $C1"
cp "$C1" "$C1copy"
if [ $? -ne 0 ]; then echo "error copying $C1, aborting"; exit 1; fi

# check files have some contents
# echo "checking md5sums"
MD1=`md5sum "$C1" | awk '{print $1}'`
MD2=`md5sum "$C1copy" | awk '{print $1}'`

if [ $MD1 != $MD2 ]; then echo "md5's dont match for $C1 "; rm "$C1copy"; exit 1; fi

#check frags
# echo "checking frags"
FG2=`filefrag "$C1copy" | awk -F':' '{print $2}' | awk '{print $1}'` # number of fragments new file is in.
if (( $FG1 < $FG2 ))
then
echo "copying made this file more fragmented! $C1"
rm "$C1copy"
exit 1
fi

# move new into olds place
# echo "moving $C1"
mv "$C1copy" "$C1"
if [ $? -ne 0 ]; then echo " error moving copy overtop of file"; exit 1; fi

# permissions
chown root:shared "$C1" &amp;amp;amp;&amp;amp;amp; chmod a+rwx "$C1"
if [ $? -ne 0 ]; then echo "error setting permissions on $C1"; exit 1; fi

echo "$FG1 to $FG2 : $C1"
done

rmdir "$CDIR"
rm $TMP

That script changes all the file permissions & mod/create times, while these aren't important for me, it would have been nice to handle those.