ADLDownloader Class

Download remote file(s) using chunks and threads

Launches multiple threads for efficient downloading, with chunksize assigned to each. The remote path can be a single file, a directory of files or a glob pattern.

Inheritance
builtins.object
ADLDownloader

Constructor

ADLDownloader(adlfs, rpath, lpath, nthreads=None, chunksize=268435456, buffersize=4194304, blocksize=4194304, client=None, run=True, overwrite=False, verbose=False, progress_callback=None, timeout=0)

Parameters

adlfs
<xref:<xref:ADL filesystem instance>>
Required
rpath
str
Required

remote path/globstring to use to find remote files. Recursive glob patterns using **** are not supported.

lpath
str
Required

local path. If downloading a single file, will write to this specific file, unless it is an existing directory, in which case a file is created within it. If downloading multiple files, this is the root directory to write within. Will create directories as required.

nthreads
int[None]
default value: None

Number of threads to use. If None, uses the number of cores.

chunksize
int[<xref:228>]
default value: 268435456

Number of bytes for a chunk. Large files are split into chunks. Files smaller than this number will always be transferred in a single thread.

buffersize
int[<xref:222>]
default value: 4194304

Ignored in curret implementation. Number of bytes for internal buffer. This block cannot be bigger than a chunk and cannot be smaller than a block.

blocksize
int[<xref:222>]
default value: 4194304

Number of bytes for a block. Within each chunk, we write a smaller block for each API call. This block cannot be bigger than a chunk.

client
ADLTransferClient[None]
default value: None

Set an instance of ADLTransferClient when finer-grained control over transfer parameters is needed. Ignores nthreads and chunksize set by constructor.

run
bool[True]
default value: True

Whether to begin executing immediately.

overwrite
bool[False]
default value: False

Whether to forcibly overwrite existing files/directories. If False and local path is a directory, will quit regardless if any files would be overwritten or not. If True, only matching filenames are actually overwritten.

progress_callback
callable[None]
default value: False

Callback for progress with signature function(current, total) where current is the number of bytes transfered so far, and total is the size of the blob, or None if the total size is unknown.

timeout
int(<xref:0>)
default value: None

Default value 0 means infinite timeout. Otherwise time in seconds before the process will stop and raise an exception if transfer is still in progress

timeout
default value: 0

Methods

active

Return whether the downloader is active

clear_saved

Remove references to all persisted downloads.

load

Load list of persisted transfers from disk, for possible resumption.

run

Populate transfer queue and execute downloads

save

Persist this download

Saves a copy of this transfer process in its current state to disk. This is done automatically for a running transfer, so that as a chunk is completed, this is reflected. Thus, if a transfer is interrupted, e.g., by user action, the transfer can be restarted at another time. All chunks that were not already completed will be restarted at that time.

See methods load to retrieved saved transfers and run to resume a stopped transfer.

successful

Return whether the downloader completed successfully.

It will raise AssertionError if the downloader is active.

active

Return whether the downloader is active

active()

clear_saved

Remove references to all persisted downloads.

static clear_saved()

load

Load list of persisted transfers from disk, for possible resumption.

static load()

Returns

  • A dictionary of download instances. The hashes are auto-

  • generated unique. The state of the chunks completed, errored, etc.,

  • can be seen in the status attribute. Instances can be resumed with

  • run().

run

Populate transfer queue and execute downloads

run(nthreads=None, monitor=True)

Parameters

nthreads
int[None]
default value: None

Override default nthreads, if given

monitor
bool[True]
default value: True

To watch and wait (block) until completion.

save

Persist this download

Saves a copy of this transfer process in its current state to disk. This is done automatically for a running transfer, so that as a chunk is completed, this is reflected. Thus, if a transfer is interrupted, e.g., by user action, the transfer can be restarted at another time. All chunks that were not already completed will be restarted at that time.

See methods load to retrieved saved transfers and run to resume a stopped transfer.

save(keep=True)

Parameters

keep
bool(True)
default value: True

If True, transfer will be saved if some chunks remain to be completed; the transfer will be sure to be removed otherwise.

successful

Return whether the downloader completed successfully.

It will raise AssertionError if the downloader is active.

successful()

Attributes

hash