mltk.utils.audio_dataset_generator.AudioDatasetGenerator¶

class AudioDatasetGenerator[source]¶

Utility for generating synthetic keyword datasets

See the Synthetic Audio Dataset Generation tutorial for more details.

Note

The generated audio files are 16kHz, 16-bit PCM .wav files.

Parameters:

out_dir (str) – Directory where dataset will be generated
n_jobs (int) – Number of parallel processing jobs

Properties

`is_running`	Return if the processing pool is active
`out_dir`	Return the output directory where the dataset is generated

Methods

`__init__`
`count_characters`	Count the number of characters that will be sent to each backend
`generate`	Generate a keyword using the given configuration
`get_summary`	Generate a summary of the given configurations
`is_backend_loaded`	Return if the given backend has been loaded
`join`	Wait for all generation tasks to complete
`list_configurations`	Generate a list of generation configurations
`list_languages`	Return a list of the available language codes
`list_supported_backends`	Return a list of the available backends
`list_voices`	Return a list of the available "voices"
`load_backend`	Load the specified backend
`shutdown`	Shutdown the underlying thread pool

__init__(out_dir, n_jobs=4)[source]¶

Parameters:

out_dir (str) –
n_jobs (int) –

static list_supported_backends()[source]¶

Return a list of the available backends

Return type:: List[str]

property is_running: bool¶

Return if the processing pool is active

Return type:: bool

property out_dir: bool¶

Return the output directory where the dataset is generated

Return type:: bool

is_backend_loaded(backend, raise_exception=False)[source]¶

Return if the given backend has been loaded

Return type:: bool
Parameters:: backend (str) –

load_backend(name, install_python_package=False, **kwargs)[source]¶

Load the specified backend

NOTE: The backend’s corresponding “credentials” must be provided

Additional kwargs may be passed to the backend’s initialization. Refer the the backend’s docs for the available kwargs:

name=aws –> boto3.session.Session
name=azure –> azure.cognitiveservices.speech.SpeechConfig
name=gcp –> google.cloud.texttospeech.TextToSpeechClient

Parameters:

name (str) – The name of the cloud backend, see list_supported_backends()
auto_install_python_package – If true, then automatically install the backend’s corresponding Python package (if necessary)
kwargs – Additional keyword args to pass to the backend’s Python package (see comment above)

list_languages(backend=None)[source]¶

Return a list of the available language codes

Parameters:: backend (str) – If provided, then only return languages supported by backend, else return languages for all loaded backends
Return type:: List[str]
Returns:: List of languages codes

list_voices(language_code=None, backend=None)[source]¶

Return a list of the available “voices”

Parameters:

language_code (str) – If provided, then only returned voices that support given language code, else return all languages
backend (str) – If provided, then only return voices supported by backend, else return voices for all loaded backends

Return type:

List[Voice]

Returns:

List of voices

list_configurations(keywords, augmentations, voices, truncate=False, seed=None)[source]¶

Generate a list of generation configurations

Generate a list of all possible combinations of the given keywords, augmentations, and voices. If the truncate argument is provided, then shuffle the generated list and return the truncated list based on the max_count specified in the keywords.

Parameters:

keywords (List[Keyword]) – List of keywords to use for the generation configurations
augmentations (List[Augmentation]) – List of augmentations to apply to each keyword
voices (List[Voice]) – List of voices to use for keyword generation
truncate – If true, then randomly shuffle all possible combinations and return a truncated list of configurations. The truncated count is specified in the max_count field of the keywords
seed (int) – Seed to use for randomly shuffling the truncated list

Return type:

Dict[Keyword, List[GenerationConfig]]

Returns:

Dictionary of keywords and corresponding list of configurations

count_characters(config)[source]¶

Count the number of characters that will be sent to each backend

The cloud backends charge per character that is sent. This API returns the number of characters required for each keyword.

Parameters:: config (Dict[Keyword, List[GenerationConfig]]) – Dictionary of keywords and corresponding list of configurations returned by list_configurations()
Return type:: Dict[Keyword, Dict[str, int]]
Returns:: Dictionary<keyword, Dictionary<backend, char count>>

get_summary(config, as_dict=False)[source]¶

Generate a summary of the given configurations

Parameters:

config (Dict[Keyword, List[GenerationConfig]]) – Dictionary of keywords and corresponding list of configurations returned by list_configurations()
as_dict – If true then return the summary as a dictionary, else return the summary as a string

Return type:

Union[dict, str]

Returns:

If as_dict=True then return the summary as a dictionary,: else return the summary as a string

generate(config, on_finished=None)[source]¶

Generate a keyword using the given configuration

This will generate a keyword using the given configuration in the specified out_dir. Processing is done asynchronously in a thread pool. The on_finished will be invoked when processing is complete. Alternatively, call join() to wait for all processing to complete.

Parameters:

config (GenerationConfig) – The configuration to use for keyword generation
on_finished (Callable[[str], None]) – Optional callback to be invoked when generation completes The parameter given to the callback contains the file path to the generated audio file

join(timeout=None)[source]¶

Wait for all generation tasks to complete

Parameters:: timeout (float) – The maximum amount of time in seconds to wait If not specified then wait forever
Return type:: bool
Returns:: True if processing has completed, false else

shutdown()[source]¶: Shutdown the underlying thread pool