mltk.utils.archive_downloader

Utilities for downloading and extracting archives

See the source code on Github: mltk/utils/archive_downloader.py

Functions

download_url(url, dst_path[, show_progress, ...])

Downloads the tarball or zip file from url into dst_path.

download_verify_extract(url[, dest_dir, ...])

Download an archive, verify its hash, and extract

verify_extract(archive_path[, dest_dir, ...])

Verify the archive hash and extract

verify_file_hash(file_path, file_hash, ...)

Return True if the calculated hash of the file matches the given hash, false else

verify_sha1(file_path, expected_sha1)

verify_sha256(file_path, expected_sha256)

download_verify_extract(url, dest_dir=None, dest_subdir=None, download_dir=None, archive_fname=None, show_progress=False, file_hash=None, file_hash_algorithm='auto', logger=None, extract_nested=False, remove_root_dir=False, clean_dest_dir=True, update_onchange_only=True, download_details_fname=None, extract=True, return_uptodate=False)[source]

Download an archive, verify its hash, and extract

Parameters:
  • url (str) – Download URL

  • dest_dir (str) – Directory to extract archive into If omitted, defaults to MLTK_CACHE_DIR/<dest_subdir>/ OR ~/.mltk/<dest_subdir>/

  • dest_subdir (str) – Destination sub-directory, if omitted default to archive path’s basename This is only used if dest_dir is omitted

  • download_dir (str) – Directory to download archive to If omitted, defaults to MLTK_CACHE_DIR/downloads/<archive_fname> OR ~/.mltk/downloads/<archive_fname>

  • archive_fname (str) – Name of downloaded archive file, if omitted default to URL filename

  • show_progress (bool) – Show a download progressbar

  • file_hash (str) – md5, sha1, sha256 hash of file

  • file_hash_algorithm (str) – File hashing algorithm, if auto then determine automatically

  • extract_nested (bool) – If the archive has a sub archive, then extract that as well

  • remove_root_dir (bool) – If the archive has a root directory, then remove it from the extracted path

  • clean_dest_dir (bool) – Remove the destination directory BEFORE extracting

  • update_onchange_only (bool) – Only download and extract if given url hasn’t been previously downloaded and extracted, otherwise return immediately

  • download_details_fname (str) – If update_onchange_only=True then a download details .json file is generated. This argument specifies the name of that file. If omitted, then the filename is <archive filename>-mltk.json

  • extract (bool) – If false, then do NOT extract the downloaded file. In this case, return the path to the downloaded file

  • return_uptodate – If true, then return a tuple, (path, <is up-to-date bool>)

  • logger (Logger) –

Return type:

Union[str, Tuple[str, bool]]

Returns:

If return_uptodate=False, Path to extracted directory OR path to downloaded archive is extract=False if return_uptodate=True, (<path>, <is up-to-date bool>)

verify_extract(archive_path, dest_dir=None, dest_subdir=None, show_progress=False, file_hash=None, file_hash_algorithm='auto', logger=None, extract_nested=False, remove_root_dir=False, clean_dest_dir=True, update_onchange_only=True, extract_details_fname=None)[source]

Verify the archive hash and extract

Parameters:
  • archive_path (str) – File path to archive

  • dest_dir (str) – Directory to extract archive into If omitted, defaults to MLTK_CACHE_DIR/<dest_subdir>/ OR ~/.mltk/<dest_subdir>/

  • dest_subdir (str) – Destination sub-directory, if omitted default to archive path’s basename This is only used if dest_dir is omitted

  • show_progress (bool) – Show a download progressbar

  • file_hash (str) – md5, sha1, sha256 hash of file

  • file_hash_algorithm (str) – File hashing algorithm, if auto then determine automatically

  • extract_nested (bool) – If the archive has a sub archive, then extract that as well

  • remove_root_dir (bool) – If the archive has a root directory, then remove it from the extracted path

  • clean_dest_dir (bool) – Remove the destination directory BEFORE extracting

  • update_onchange_only (bool) – Only download and extract if given url hasn’t been previously downloaded and extracted, otherwise return immediately

  • extract_details_fname (str) – If update_onchange_only=True then a details .json file is generated. This argument specifies the name of that file. If omitted, then the filename is <archive filename>-mltk.json

  • logger (Logger) –

Return type:

str

Returns:

Path to extracted directory

download_url(url, dst_path, show_progress=False, logger=None)[source]

Downloads the tarball or zip file from url into dst_path. :type url: str :param url: The URL of a tarball or zip file. :type dst_path: str :param dst_path: The path where the file is download :param show_progress: Show a progress bar while downloading

If the file at dst_path is already found, then just return the local version without downloading

Return type:

str

Parameters:
  • url (str) –

  • dst_path (str) –

verify_file_hash(file_path, file_hash, file_hash_algorithm)[source]

Return True if the calculated hash of the file matches the given hash, false else

Parameters:
  • file_path (str) –

  • file_hash (str) –

  • file_hash_algorithm (str) –

verify_sha1(file_path, expected_sha1)[source]
verify_sha256(file_path, expected_sha256)[source]