ARCHIVE LARGE FILES USING SHELL SCRIPT

Shubham Soni
5 min readNov 21, 2023

--

It is a kind of real-life project to show how we create a script in real life.

In this project, we are filtering files that are greater than 20MB in a specific path or folder, then we have to zip those files and move the zip file to a new folder named “archive” in the same path or folder in which these files are present.

PROJECT REQUIREMENT

In the given directory, if you find files more than a given size (20mb) or files older than given days (10 days),

Compress those files and move them in an “archive” folder.

PURPOSE OF THE SCRIPT

In real life, we have a server running on a limited disk space (1tb, 2tb,…) and we have an application running on that server. So the application produces logs (1gb, 2gb 5gb, 10gb,…) on a regular basis in a particular directory then the disk space become full and it is running out of space. So we can archive the older files or large files to a specific folder so that the disk space becomes free and if we need those files we can access them as well.

STEPS OF SCRIPT:

Provide the path of the directory.

Check if the directory is present or not.

Create “archive” folder if not already present.

Find all the files with size more than 20mb.

Compress all files.

Move the compressed files in the “archive” folder.

Make a cron job to run the script every day at a given time.

SCRIPT:

Go to the specific folder where you want to create the script and then create a new file called “archive_files.sh” using vim editor.

$cd /home/Yuvraj/projects/
$vim archive_files.sh

step 1:

After creating a new file, we start with writing the shebang line of the script, then we comment and inform about the version of script and date of creation of file.

#!/bin/bash
# Version- 1
# Date of creation of script –

Step 2: to create variables in the script

#variables
LOC=/home/yuvraj/project_data #Path of files you want to filter
DAYS=10
DEPTH=1
RUN=0

Here LOC means the path of the directory where all the files are present on which we have to perform the operation.

DAYS mean the number of days after which the file is moved to the archive folder.

DEPTH means how deep the script works on the folder. Here 1 means the script only works on the files present in the given directory, not in the sub-directories.

Step 3: to check whether the given Path(LOC) is present or not

#Check if the directory is present or not
if [ ! -d $LOC ]
then
echo "directory does not exist- $LOC"
exit 1
fi

in the above step we are checking whether the data directory is present or not on which we have to perform the operation.

For this, we are using if condition.

if [ ! -d $LOC] means if the directory in variable LOC is not present,

then echo “directory does not exist-” and “write the given path of the directory” and then exit the script

fi is used to tell the if condition ends here.

Step 4: Create ‘archive’ folder if not present

#create an 'archive' folder if not present
if [ ! -d $LOC/archive]
then
mkdir $LOC/archive
fi

in this step we are checking whether the archive folder is present or not, if it is not present then create a directory called ‘archive’ in the same directory(LOC) where our data is present.

Step 5: how to find the files larger than 20mb in the directory

#Find the list of files larger than 20mb
for i in `find $LOC -maxdepth $DEPTH -type f -size +20M`
do
if [ $RUN -eq 0 ]
then
echo " $date archiving $i è $LOC/archive"
gzip $i || exit 1
mv $i.gz $LOC/archive || exit 1
fi
done

here we are using for-loop, we are storing the files that are greater than size 20mb only in the given directory at LOC with the help of find command in i.

after that, we use the if-condition that for all the values present in i,

gzip all the files one by one and if it is unable to do that then exit the script at the same time.

After it is done gzip of file, in the next step we move all the files stored in $i, and add “.gz” in its name, to the “archive” folder one by one and if it is unable to do that then exit the script at the same time using “exit 1”.

To end the ‘if-condition’ using “fi” and ‘for-loop’ using “done”.

NOTE: to filter out the files based on time

for i in `find $LOC -maxdepth $DEPTH -type -mtime +10`

Here, the scripting is completed and to save the file and exit the editor press ESC and the write

:wq

the script.

There are the following ways to execute a script, which are as below:

1. use chmod +x archive_files.sh to provide the execute permission of the file to all and then execute the script using archive_files.sh

2. Direct execute using bash archive_files.sh

To verify whether our script is running perfectly or not we can go to the PATH and see whether the archive folder is present or not.

Step 6: To Automate our script we can use the cron job for this:

$crontab -e
00 12 * * * /home/Yuvraj/projects/archive_files.sh

This means our script is running every day at 12’o clock in the noon.

To verify whether our cron job is created or not, use “crontab -l”.

You can also visit my Github repository for the complete script:

Hope you like my work and got something new. Thank You!!!

--

--

No responses yet