Now, lists of metadata and data files that were last backed up is stored in the /var/metacat/metacat-backup folder, and these are diffed with the current file lists to discover new files. These are then copied to AWS S3. The AWS commands were made more efficient as well by using aws s3 cp with a parallel option, and xargs to launch multiple copies of the aws client to maximize throughput. These changes reduced backups from taking ~3days to a few minutes, depending on how much new data is added.
Modified backup script to improve efficiency.
Now, lists of metadata and data files that were last backed up is stored in the /var/metacat/metacat-backup folder, and
these are diffed with the current file lists to discover new files. These are then copied to AWS S3. The AWS
commands were made more efficient as well by using aws s3 cp with a parallel option, and xargs to launch multiple copies of
the aws client to maximize throughput. These changes reduced backups from taking ~3days to a few minutes, depending on how
much new data is added.