AudioFeatureGenerator

class mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator(settings)[source]

AudioFeatureGenerator Interface

__init__(settings)[source]
Parameters

settings (AudioFeatureGeneratorSettings) – The settings to use for processing the audio sample

Methods

__init__

type settings

AudioFeatureGeneratorSettings

activity_was_detected

Return if activity was detected in the previously processed sample

process_sample

Convert the provided 1D audio sample to a 2D spectrogram using the AudioFeatureGenerator

process_sample(sample, dtype=<class 'numpy.float32'>)[source]

Convert the provided 1D audio sample to a 2D spectrogram using the AudioFeatureGenerator

The generated 2D spectrogram dimensions are calculated as follows:

sample_length = len(sample) = int(sample_length_ms*sample_rate_hz / 1000)
window_size_length = int(window_size_ms * sample_rate_hz / 1000)
window_step_length = int(window_step_ms * sample_rate_hz / 1000)
height = n_features = (sample_length - window_size_length) // window_step_length + 1
width = n_channels = AudioFeatureGeneratorSettings.filterbank_n_channels

The dtype argument specifies the data type of the returned spectrogram. This must be one of the following:

  • uint16: This the raw value generated by the internal AudioFeatureGenerator library

  • float32: This is the uint16 value directly casted to a float32

  • int8: This is the int8 value generated by the TFLM “micro features” library.

    Refer to the following for the magic that happens here: micro_features_generator.cc#L84

Parameters
  • sample (ndarray) – [sample_length] int16 audio sample

  • dtype – Output data type, must be int8, uint16, or float32

Return type

ndarray

Returns

[n_features, n_channels] int8, uint16, or float32 spectrogram

activity_was_detected()[source]

Return if activity was detected in the previously processed sample

Return type

bool