mltk.core.preprocess.utils.audio

Utilities for processing audio data

Functions

adjust_length(sample[, target_sr, ...])

Adjust the audio sample length to fit the out_length parameter This will audio re-sample the audio to the target sample rate and pad with zeros or crop the input sample as necessary.

apply_frontend(sample, settings[, dtype])

Send the audio sample through the AudioFeatureGenerator and return the generated spectrogram

read_audio_file(path[, return_sample_rate, ...])

Reads and decodes an audio file.

write_audio_file(path, sample, sample_rate)

Write audio data to a file

read_audio_file(path, return_sample_rate=False, return_numpy=True, **kwargs)[source]

Reads and decodes an audio file.

Note

Only mono data is returned as a 1D array/tensor

Parameters:
  • path (Union[str, ndarray, Tensor]) – Path to audio file as a python string, numpy string, or tensorflow string

  • return_sample_rate – If true then a tuple is returned: (audio data, audio sample rate)

  • return_numpy – If true then return numpy array, else return TF tensor

Return type:

Union[ndarray, Tensor]

Returns:

If return_sample_rate = False, Audio data as numpy array or TF tensor If return_sample_rate = True, (audio data, sample rate)

write_audio_file(path, sample, sample_rate)[source]

Write audio data to a file

Parameters:
  • path (str) – File path to save audio If this is does NOT end with .wav, then the path is assumed to be a directory. In this case, the audio path is generated as: <path>/<timestamp>.wav

  • sample (Union[ndarray, Tensor]) – Audio data to write, if the data type is: - int16 then it is converted to float32 and scaled by 32768

  • sample_rate (int) – Sample rate of audio

Return type:

Union[str, Tensor]

Returns:

Path to written file. If this is executing in a non-eager TF function then the path is a TF Tensor, otherwise it is a Python string

adjust_length(sample, target_sr=None, original_sr=None, out_length=None, offset=0.0, trim_threshold_db=30.0)[source]

Adjust the audio sample length to fit the out_length parameter This will audio re-sample the audio to the target sample rate and pad with zeros or crop the input sample as necessary.

Parameters:
  • sample (ndarray) – Audio sample as a numpy array

  • target_sr (int) – The sample rate to re-sample the audio. The original_sr arg must also be provided

  • original_sr (int) – The original sample rate of teh given audio

  • out_length (int) – The length of the output audio sample. If omitted then return the input sample length

  • offset – If in_length > out_length, then this is the percentage offset from the beginning of the input to use for the output If in_length < out_length, then this is the percentage to pad with zeros before the input sample

  • trim_threshold_db – The threshold (in decibels) below reference to consider as silence

Return type:

ndarray

Returns:

The adjusted audio sample

apply_frontend(sample, settings, dtype=<class 'numpy.float32'>)[source]

Send the audio sample through the AudioFeatureGenerator and return the generated spectrogram

Parameters:
  • sample (ndarray) – The audio sample to process in the AudioFeatureGenerator

  • settings (AudioFeatureGeneratorSettings) – The settings to use in the AudioFeatureGenerator

  • dtype

    The expected audio output data type, support types are:

    • uint16: This the raw value generated by the internal AudioFeatureGenerator library

    • float32: This is the uint16 value directly casted to a float32

    • int8: This is the int8 value generated by the TFLM “micro features” library.

      Refer to the following for the magic that happens here: micro_features_generator.cc#L84

Return type:

ndarray

Returns:

Generated spectrogram of audio