mltk.utils.audio_dataset_generator.AudioDatasetGenerator¶
- class AudioDatasetGenerator[source]¶
Utility for generating synthetic keyword datasets
See the Synthetic Audio Dataset Generation tutorial for more details.
Note
The generated audio files are 16kHz, 16-bit PCM
.wav
files.- Parameters:
out_dir (
str
) – Directory where dataset will be generatedn_jobs (
int
) – Number of parallel processing jobs
Properties
Return if the processing pool is active
Return the output directory where the dataset is generated
Methods
Count the number of characters that will be sent to each backend
Generate a keyword using the given configuration
Generate a summary of the given configurations
Return if the given backend has been loaded
Wait for all generation tasks to complete
Generate a list of generation configurations
Return a list of the available language codes
Return a list of the available backends
Return a list of the available "voices"
Load the specified backend
Shutdown the underlying thread pool
- static list_supported_backends()[source]¶
Return a list of the available backends
- Return type:
List
[str
]
- property is_running: bool¶
Return if the processing pool is active
- Return type:
bool
- property out_dir: bool¶
Return the output directory where the dataset is generated
- Return type:
bool
- is_backend_loaded(backend, raise_exception=False)[source]¶
Return if the given backend has been loaded
- Return type:
bool
- Parameters:
backend (str) –
- load_backend(name, install_python_package=False, **kwargs)[source]¶
Load the specified backend
NOTE: The backend’s corresponding “credentials” must be provided
Additional kwargs may be passed to the backend’s initialization. Refer the the backend’s docs for the available kwargs:
name=aws
–> boto3.session.Sessionname=azure
–> azure.cognitiveservices.speech.SpeechConfigname=gcp
–> google.cloud.texttospeech.TextToSpeechClient
- Parameters:
name (
str
) – The name of the cloud backend, seelist_supported_backends()
auto_install_python_package – If true, then automatically install the backend’s corresponding Python package (if necessary)
kwargs – Additional keyword args to pass to the backend’s Python package (see comment above)
- list_languages(backend=None)[source]¶
Return a list of the available language codes
- Parameters:
backend (
str
) – If provided, then only return languages supported by backend, else return languages for all loaded backends- Return type:
List
[str
]- Returns:
List of languages codes
- list_voices(language_code=None, backend=None)[source]¶
Return a list of the available “voices”
- Parameters:
language_code (
str
) – If provided, then only returned voices that support given language code, else return all languagesbackend (
str
) – If provided, then only return voices supported by backend, else return voices for all loaded backends
- Return type:
List
[Voice
]- Returns:
List of voices
- list_configurations(keywords, augmentations, voices, truncate=False, seed=None)[source]¶
Generate a list of generation configurations
Generate a list of all possible combinations of the given keywords, augmentations, and voices. If the
truncate
argument is provided, then shuffle the generated list and return the truncated list based on themax_count
specified in thekeywords
.- Parameters:
keywords (
List
[Keyword
]) – List of keywords to use for the generation configurationsaugmentations (
List
[Augmentation
]) – List of augmentations to apply to each keywordvoices (
List
[Voice
]) – List of voices to use for keyword generationtruncate – If true, then randomly shuffle all possible combinations and return a truncated list of configurations. The truncated count is specified in the
max_count
field of the keywordsseed (
int
) – Seed to use for randomly shuffling the truncated list
- Return type:
Dict
[Keyword
,List
[GenerationConfig
]]- Returns:
Dictionary of keywords and corresponding list of configurations
- count_characters(config)[source]¶
Count the number of characters that will be sent to each backend
The cloud backends charge per character that is sent. This API returns the number of characters required for each keyword.
- Parameters:
config (
Dict
[Keyword
,List
[GenerationConfig
]]) – Dictionary of keywords and corresponding list of configurations returned bylist_configurations()
- Return type:
Dict
[Keyword
,Dict
[str
,int
]]- Returns:
Dictionary<keyword, Dictionary<backend, char count>>
- get_summary(config, as_dict=False)[source]¶
Generate a summary of the given configurations
- Parameters:
config (
Dict
[Keyword
,List
[GenerationConfig
]]) – Dictionary of keywords and corresponding list of configurations returned bylist_configurations()
as_dict – If true then return the summary as a dictionary, else return the summary as a string
- Return type:
Union
[dict
,str
]- Returns:
- If
as_dict=True
then return the summary as a dictionary, else return the summary as a string
- If
- generate(config, on_finished=None)[source]¶
Generate a keyword using the given configuration
This will generate a keyword using the given configuration in the specified
out_dir
. Processing is done asynchronously in a thread pool. Theon_finished
will be invoked when processing is complete. Alternatively, calljoin()
to wait for all processing to complete.- Parameters:
config (
GenerationConfig
) – The configuration to use for keyword generationon_finished (
Callable
[[str
],None
]) – Optional callback to be invoked when generation completes The parameter given to the callback contains the file path to the generated audio file