mltk.core.preprocess.utils.audio¶
Utilities for processing audio data
Functions
|
Adjust the audio sample length to fit the out_length parameter This will audio re-sample the audio to the target sample rate and pad with zeros or crop the input sample as necessary. |
|
Send the audio sample through the AudioFeatureGenerator and return the generated spectrogram |
|
Reads and decodes an audio file. |
|
Write audio data to a file |
- read_audio_file(path, return_sample_rate=False, return_numpy=True, **kwargs)[source]¶
Reads and decodes an audio file.
Note
Only mono data is returned as a 1D array/tensor
- Parameters:
path (
Union
[str
,ndarray
,Tensor
]) – Path to audio file as a python string, numpy string, or tensorflow stringreturn_sample_rate – If true then a tuple is returned: (audio data, audio sample rate)
return_numpy – If true then return numpy array, else return TF tensor
- Return type:
Union
[ndarray
,Tensor
]- Returns:
If return_sample_rate = False, Audio data as numpy array or TF tensor If return_sample_rate = True, (audio data, sample rate)
- write_audio_file(path, sample, sample_rate)[source]¶
Write audio data to a file
- Parameters:
path (
str
) – File path to save audio If this is does NOT end with .wav, then the path is assumed to be a directory. In this case, the audio path is generated as: <path>/<timestamp>.wavsample (
Union
[ndarray
,Tensor
]) – Audio data to write, if the data type is: -int16
then it is converted to float32 and scaled by 32768sample_rate (
int
) – Sample rate of audio
- Return type:
Union
[str
,Tensor
]- Returns:
Path to written file. If this is executing in a non-eager TF function then the path is a TF Tensor, otherwise it is a Python string
- adjust_length(sample, target_sr=None, original_sr=None, out_length=None, offset=0.0, trim_threshold_db=30.0)[source]¶
Adjust the audio sample length to fit the out_length parameter This will audio re-sample the audio to the target sample rate and pad with zeros or crop the input sample as necessary.
- Parameters:
sample (
ndarray
) – Audio sample as a numpy arraytarget_sr (
int
) – The sample rate to re-sample the audio. The original_sr arg must also be providedoriginal_sr (
int
) – The original sample rate of teh given audioout_length (
int
) – The length of the output audio sample. If omitted then return the input sample lengthoffset – If in_length > out_length, then this is the percentage offset from the beginning of the input to use for the output If in_length < out_length, then this is the percentage to pad with zeros before the input sample
trim_threshold_db – The threshold (in decibels) below reference to consider as silence
- Return type:
ndarray
- Returns:
The adjusted audio sample
- apply_frontend(sample, settings, dtype=<class 'numpy.float32'>)[source]¶
Send the audio sample through the AudioFeatureGenerator and return the generated spectrogram
- Parameters:
sample (
ndarray
) – The audio sample to process in the AudioFeatureGeneratorsettings (
AudioFeatureGeneratorSettings
) – The settings to use in the AudioFeatureGeneratordtype –
The expected audio output data type, support types are:
uint16: This the raw value generated by the internal AudioFeatureGenerator library
float32: This is the uint16 value directly casted to a float32
- int8: This is the int8 value generated by the TFLM “micro features” library.
Refer to the following for the magic that happens here: micro_features_generator.cc#L84
- Return type:
ndarray
- Returns:
Generated spectrogram of audio