splitDL – Downloading huge files from slow and unstable internet connections

Imagine you want install GNU/Linux but your bandwidth won’t let you…

tl;dr: I wrote a rather small Bash script which splits huge files into several smaller ones and downloads them. To ensure the integrity, every small files is being checked for its hashsum and file size.

That’s the problem I was facing in the past days. In the school I’m working at (Moshi Institute of Technology, MIT) I set up a GNU/Linux server to provide services like file sharing, website design (on local servers to avoid the slow internet) and central backups. The ongoing plan is the setup of 5-10 (and later more) new computers with a GNU/Linux OS in contrast to the ancient and non-free WindowsXP installations – project „Linux Classroom“ is officially born.

But to install an operating system on a computer you need an installation medium. In the school a lot of (dubious) WindowsXP installation CD-ROMs are flying around but no current GNU/Linux. In the first world you would just download an .iso file and ~10 minutes later you could start installing it on your computer.

But not here in Tanzania. With download rates of average 10kb/s it needs a hell of a time to download only one image file (not to mention the costs for the internet usage, ~1-3$ per 1GB). And that’s not all: Periodical power cuts cancel ongoing downloads abruptly. Of course you can restart a download but the large file may be already damaged and you loose even more time.

My solution – splitDL

To circumvent this drawback I coded a rather small Bash program called splitDL. With this helper script, one is able to split a huge file into smaller pieces. If during the download the power cuts off and damages the file, one just has to re-download this single small file instead of the huge complete file. To detect whether a small file is unharmed the script creates hashsums of the original huge and the several small files. The script also supports continuation of the download thanks to the great default built-in application wget.

You might now think „BitTorrent (or any other program) is also able to do the same, if not more!“. Yes, but this requires a) the installation of another program and b) a download source which supports this protocol. On the contrary splitDL can handle every HTTP, HTTPS or FTP download.

The downside in the current state is that splitDL requires shell access to the server where the file is saved to be able to split the file and create the necessary hashsums. So in my current situation I use my own virtual server in Germany on which I download the wanted file with high-speed and then use splitDL to prepare the file for the slow download from my server to the Tanzanian school.

The project is of course still in an ongoing phase and only tested in my own environment. Please feel free to have a look at it and download it via my Git instance. I’m always looking forward to feedback. The application is licensed under GPLv3 or later.

Some examples

Server-side

Split the file debian.iso into smaller parts with the default options (MD5 hashsum, 10MB size)

split-dl.sh -m server -f debian.iso

Split the file but use the SHA1 hashsum and split the file into pieces with 50MB.

split-dl.sh -m server -f debian.iso -c sha1sum -s 50M

After one of the commands, a new folder called dl-debian.iso/ will be created. There, the splitted files and a document containing the hashsums and file sizes are located. You just have to move the folder to a web-accessible location on your server.

Client-side

Download the splitted files with the default options.

split-dl.sh -m client -f http://server.tld/dl-debian.iso/

Download the splitted files but use SHA1 hashsum (has to be the same than what was used on the creation process) and override the wget options (default: -nv –show-progress).

split-dl.sh -m client -f http://server.tld/dl-debian.iso/ -c sha1sum -w --limit-date=100k

Current bugs/drawbacks

  • Currently only single files are possible to split. This will be fixed soon.
  • Currently the script only works with files in the current directory. This is also only a matter of some lines of code.


Comments