mltk.core.preprocess.utils.audio¶

Utilities for processing audio data

Functions

`adjust_length`(sample[, target_sr, ...])	Adjust the audio sample length to fit the out_length parameter This will audio re-sample the audio to the target sample rate and pad with zeros or crop the input sample as necessary.
`apply_frontend`(sample, settings[, dtype])	Send the audio sample through the AudioFeatureGenerator and return the generated spectrogram
`read_audio_file`(path[, return_sample_rate, ...])	Reads and decodes an audio file.
`write_audio_file`(path, sample, sample_rate)	Write audio data to a file

read_audio_file(path, return_sample_rate=False, return_numpy=True, **kwargs)[source]¶

Reads and decodes an audio file.

Note

Only mono data is returned as a 1D array/tensor

Parameters:

path (Union[str, ndarray, Tensor]) – Path to audio file as a python string, numpy string, or tensorflow string
return_sample_rate – If true then a tuple is returned: (audio data, audio sample rate)
return_numpy – If true then return numpy array, else return TF tensor

Return type:

Union[ndarray, Tensor]

Returns:

If return_sample_rate = False, Audio data as numpy array or TF tensor If return_sample_rate = True, (audio data, sample rate)

write_audio_file(path, sample, sample_rate)[source]¶

Write audio data to a file

Parameters:

path (str) – File path to save audio If this is does NOT end with .wav, then the path is assumed to be a directory. In this case, the audio path is generated as: <path>/<timestamp>.wav
sample (Union[ndarray, Tensor]) – Audio data to write, if the data type is: - int16 then it is converted to float32 and scaled by 32768
sample_rate (int) – Sample rate of audio

Return type:

Union[str, Tensor]

Returns:

Path to written file. If this is executing in a non-eager TF function then the path is a TF Tensor, otherwise it is a Python string

adjust_length(sample, target_sr=None, original_sr=None, out_length=None, offset=0.0, trim_threshold_db=30.0)[source]¶

Adjust the audio sample length to fit the out_length parameter This will audio re-sample the audio to the target sample rate and pad with zeros or crop the input sample as necessary.

Parameters:

sample (ndarray) – Audio sample as a numpy array
target_sr (int) – The sample rate to re-sample the audio. The original_sr arg must also be provided
original_sr (int) – The original sample rate of teh given audio
out_length (int) – The length of the output audio sample. If omitted then return the input sample length
offset – If in_length > out_length, then this is the percentage offset from the beginning of the input to use for the output If in_length < out_length, then this is the percentage to pad with zeros before the input sample
trim_threshold_db – The threshold (in decibels) below reference to consider as silence

Return type:

ndarray

Returns:

The adjusted audio sample

apply_frontend(sample, settings, dtype=<class 'numpy.float32'>)[source]¶

Send the audio sample through the AudioFeatureGenerator and return the generated spectrogram

Parameters:

sample (ndarray) – The audio sample to process in the AudioFeatureGenerator
settings (AudioFeatureGeneratorSettings) – The settings to use in the AudioFeatureGenerator
dtype –
The expected audio output data type, support types are:
- uint16: This the raw value generated by the internal AudioFeatureGenerator library
- float32: This is the uint16 value directly casted to a float32
- int8: This is the int8 value generated by the TFLM “micro features” library.
  Refer to the following for the magic that happens here: micro_features_generator.cc#L84

Return type:

ndarray

Returns:

Generated spectrogram of audio