mltk.utils.audio_dataset_generator.AudioDatasetGenerator

class AudioDatasetGenerator[source]

Utility for generating synthetic keyword datasets

See the Synthetic Audio Dataset Generation tutorial for more details.

Note

The generated audio files are 16kHz, 16-bit PCM .wav files.

Parameters:
  • out_dir (str) – Directory where dataset will be generated

  • n_jobs (int) – Number of parallel processing jobs

Properties

is_running

Return if the processing pool is active

out_dir

Return the output directory where the dataset is generated

Methods

__init__

count_characters

Count the number of characters that will be sent to each backend

generate

Generate a keyword using the given configuration

get_summary

Generate a summary of the given configurations

is_backend_loaded

Return if the given backend has been loaded

join

Wait for all generation tasks to complete

list_configurations

Generate a list of generation configurations

list_languages

Return a list of the available language codes

list_supported_backends

Return a list of the available backends

list_voices

Return a list of the available "voices"

load_backend

Load the specified backend

shutdown

Shutdown the underlying thread pool

__init__(out_dir, n_jobs=4)[source]
Parameters:
  • out_dir (str) –

  • n_jobs (int) –

static list_supported_backends()[source]

Return a list of the available backends

Return type:

List[str]

property is_running: bool

Return if the processing pool is active

Return type:

bool

property out_dir: bool

Return the output directory where the dataset is generated

Return type:

bool

is_backend_loaded(backend, raise_exception=False)[source]

Return if the given backend has been loaded

Return type:

bool

Parameters:

backend (str) –

load_backend(name, install_python_package=False, **kwargs)[source]

Load the specified backend

NOTE: The backend’s corresponding “credentials” must be provided

Additional kwargs may be passed to the backend’s initialization. Refer the the backend’s docs for the available kwargs:

Parameters:
  • name (str) – The name of the cloud backend, see list_supported_backends()

  • auto_install_python_package – If true, then automatically install the backend’s corresponding Python package (if necessary)

  • kwargs – Additional keyword args to pass to the backend’s Python package (see comment above)

list_languages(backend=None)[source]

Return a list of the available language codes

Parameters:

backend (str) – If provided, then only return languages supported by backend, else return languages for all loaded backends

Return type:

List[str]

Returns:

List of languages codes

list_voices(language_code=None, backend=None)[source]

Return a list of the available “voices”

Parameters:
  • language_code (str) – If provided, then only returned voices that support given language code, else return all languages

  • backend (str) – If provided, then only return voices supported by backend, else return voices for all loaded backends

Return type:

List[Voice]

Returns:

List of voices

list_configurations(keywords, augmentations, voices, truncate=False, seed=None)[source]

Generate a list of generation configurations

Generate a list of all possible combinations of the given keywords, augmentations, and voices. If the truncate argument is provided, then shuffle the generated list and return the truncated list based on the max_count specified in the keywords.

Parameters:
  • keywords (List[Keyword]) – List of keywords to use for the generation configurations

  • augmentations (List[Augmentation]) – List of augmentations to apply to each keyword

  • voices (List[Voice]) – List of voices to use for keyword generation

  • truncate – If true, then randomly shuffle all possible combinations and return a truncated list of configurations. The truncated count is specified in the max_count field of the keywords

  • seed (int) – Seed to use for randomly shuffling the truncated list

Return type:

Dict[Keyword, List[GenerationConfig]]

Returns:

Dictionary of keywords and corresponding list of configurations

count_characters(config)[source]

Count the number of characters that will be sent to each backend

The cloud backends charge per character that is sent. This API returns the number of characters required for each keyword.

Parameters:

config (Dict[Keyword, List[GenerationConfig]]) – Dictionary of keywords and corresponding list of configurations returned by list_configurations()

Return type:

Dict[Keyword, Dict[str, int]]

Returns:

Dictionary<keyword, Dictionary<backend, char count>>

get_summary(config, as_dict=False)[source]

Generate a summary of the given configurations

Parameters:
  • config (Dict[Keyword, List[GenerationConfig]]) – Dictionary of keywords and corresponding list of configurations returned by list_configurations()

  • as_dict – If true then return the summary as a dictionary, else return the summary as a string

Return type:

Union[dict, str]

Returns:

If as_dict=True then return the summary as a dictionary,

else return the summary as a string

generate(config, on_finished=None)[source]

Generate a keyword using the given configuration

This will generate a keyword using the given configuration in the specified out_dir. Processing is done asynchronously in a thread pool. The on_finished will be invoked when processing is complete. Alternatively, call join() to wait for all processing to complete.

Parameters:
  • config (GenerationConfig) – The configuration to use for keyword generation

  • on_finished (Callable[[str], None]) – Optional callback to be invoked when generation completes The parameter given to the callback contains the file path to the generated audio file

join(timeout=None)[source]

Wait for all generation tasks to complete

Parameters:

timeout (float) – The maximum amount of time in seconds to wait If not specified then wait forever

Return type:

bool

Returns:

True if processing has completed, false else

shutdown()[source]

Shutdown the underlying thread pool