mltk.datasets.image.mnist

MNIST

This is a dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images. More info can be found at the MNIST homepage

Example

(x_train, y_train), (x_test, y_test) = mnist.load_data()
assert x_train.shape == (60000, 28, 28)
assert x_test.shape == (10000, 28, 28)
assert y_train.shape == (60000,)
assert y_test.shape == (10000,)

License

Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset, which is a derivative work from original NIST datasets. MNIST dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.

Variables

INPUT_SHAPE

The shape of each sample

CLASSES

Labels for dataset samples

DOWNLOAD_URL

Public download URL

VERIFY_SHA1

SHA1 hash of archive file

Functions

load_data([dest_dir, dest_subdir, logger, ...])

Download the dataset, extract, load into memory, and return as a tuple of numpy arrays

load_data_directory([dest_dir, dest_subdir, ...])

Download the dataset, extract all sample images to a directory, and return the path to the directory.

INPUT_SHAPE = (28, 28)

The shape of each sample

CLASSES = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

Labels for dataset samples

DOWNLOAD_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'

Public download URL

VERIFY_SHA1 = '731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1'

SHA1 hash of archive file

load_data(dest_dir=None, dest_subdir='datasets/mnist', logger=None, clean_dest_dir=False)[source]

Download the dataset, extract, load into memory, and return as a tuple of numpy arrays

Returns:

(x_train, y_train), (x_test, y_test)

Return type:

Tuple of NumPy arrays

x_train: uint8 NumPy array of grayscale image data with shapes

(60000, 28, 28), containing the training data. Pixel values range from 0 to 255.

y_train: uint8 NumPy array of digit labels (integers in range 0-9)

with shape (60000,) for the training data.

x_test: uint8 NumPy array of grayscale image data with shapes

(10000, 28, 28), containing the test data. Pixel values range from 0 to 255.

y_test: uint8 NumPy array of digit labels (integers in range 0-9)

with shape (10000,) for the test data.

Parameters:
  • dest_dir (str) –

  • logger (Logger) –

load_data_directory(dest_dir=None, dest_subdir='datasets/mnist', logger=None, clean_dest_dir=False)[source]

Download the dataset, extract all sample images to a directory, and return the path to the directory.

Each sample type is extract to its corresponding subdirectory, e.g.:

~/.mltk/datasets/mnist/0 ~/.mltk/datasets/mnist/1 …

Return type:

Path to extract directory

Parameters:
  • dest_dir (str) –

  • logger (Logger) –