ADLTransferClient Class

Client for transferring data from/to Azure DataLake Store

This is intended as the underlying class for ADLDownloader and ADLUploader. If necessary, it can be used directly for additional control.

:param : :param fn(adlfs: :param src: :param dst: :param offset: :param size: :param buffersize: :param blocksize: :param shutdown_event).: :param adlfs is the ADL filesystem instance. src and dst refer to the source: :param and destination of the respective file transfer. offset is the location: :param in src to read size bytes from. buffersize is the number of bytes: :param used for internal buffering before transfer. blocksize is the number of: :param bytes in a chunk to write at one time. The callable should return an: :param integer representing the number of bytes written.: :param The merge callable has the function signature: :param : :param fn(adlfs: :param outfile: :param files: :param shutdown_event). adlfs is the ADL filesystem: :param instance. outfile is the result of merging files.: :param For both transfer callables: :param shutdown_event is optional. In particular: :param : :param shutdown_event is a threading.Event that is passed to the callable.: :param The event will be set when a shutdown is requested. It is good practice: :param to listen for this.: :param Internal State: :param ————–: :param self._fstates: This captures the current state of each transferred file. :type self._fstates: StateManager :param self._files: Using a tuple of the file source/destination as the key, this

dictionary stores the file metadata and all chunk states. The dictionary key is (src, dst) and the value is dict(length, cstates, exception).

Inheritance
builtins.object
ADLTransferClient

Constructor

ADLTransferClient(adlfs, transfer, merge=None, nthreads=None, chunksize=268435456, blocksize=33554432, chunked=True, unique_temporary=True, delimiter=None, parent=None, verbose=False, buffersize=33554432, progress_callback=None, timeout=0)

Parameters

self._chunks
dict
Required

Using a tuple of the chunk name/offset as the key, this dictionary stores the chunk metadata and has a reference to the chunk's parent file. The dictionary key is (name, offset) and the value is dict(parent=(src, dst), expected, actual, exception).

self._ffutures
dict
Required

Using a Future object as the key, this dictionary provides a reverse lookup for the file associated with the given future. The returned value is the file's primary key, (src, dst).

self._cfutures
dict
default value: None

Using a Future object as the key, this dictionary provides a reverse lookup for the chunk associated with the given future. The returned value is the chunk's primary key, (name, offset).

nthreads
default value: None
chunksize
default value: 268435456
blocksize
default value: 33554432
chunked
default value: True
unique_temporary
default value: True
delimiter
default value: None
parent
default value: None
verbose
default value: False
buffersize
default value: 33554432
progress_callback
default value: None
timeout
default value: 0

Methods

monitor

Wait for download to happen

run
save
shutdown

Shutdown task threads in an orderly fashion.

Within the context of this method, we disable Ctrl+C keystroke events until all threads have exited. We re-enable Ctrl+C keystroke events before leaving.

submit

Split a given file into chunks.

All submitted files/chunks start in the pending state until run() is called.

monitor

Wait for download to happen

monitor(poll=0.1, timeout=0)

Parameters

poll
default value: 0.1
timeout
default value: 0

run

run(nthreads=None, monitor=True, before_start=None)

Parameters

nthreads
default value: None
monitor
default value: True
before_start
default value: None

save

save(keep=True)

Parameters

keep
default value: True

shutdown

Shutdown task threads in an orderly fashion.

Within the context of this method, we disable Ctrl+C keystroke events until all threads have exited. We re-enable Ctrl+C keystroke events before leaving.

shutdown()

submit

Split a given file into chunks.

All submitted files/chunks start in the pending state until run() is called.

submit(src, dst, length)

Parameters

src
Required
dst
Required
length
Required

Attributes

active

Return whether the transfer is active

progress

Return a summary of all transferred file/chunks

status

successful

Return whether the transfer completed successfully.

It will raise AssertionError if the transfer is active.