AudioFeatureGenerator

class mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator(settings)[source]

AudioFeatureGenerator Interface

process_sample(sample, dtype=<class 'numpy.float32'>)[source]

Convert the provided 1D audio sample to a 2D spectrogram using the AudioFeatureGenerator

The generated 2D spectrogram dimensions are calculated as follows:

sample_length = len(sample) = int(sample_length_ms*sample_rate_hz / 1000)
window_size_length = int(window_size_ms * sample_rate_hz / 1000)
window_step_length = int(window_step_ms * sample_rate_hz / 1000)
height = n_features = (sample_length - window_size_length) // window_step_length + 1
width = n_channels = AudioFeatureGeneratorSettings.filterbank_n_channels

The dtype argument specifies the data type of the returned spectrogram. This must be one of the following:

  • uint16: This the raw value generated by the internal AudioFeatureGenerator library

  • float32: This is the uint16 value directly casted to a float32

  • int8: This is the int8 value generated by the TFLM “micro features” library.

    Refer to the following for the magic that happens here: micro_features_generator.cc#L84

Parameters
  • sample (ndarray) – [sample_length] int16 audio sample

  • dtype – Output data type, must be int8, uint16, or float32

Return type

ndarray

Returns

[n_features, n_channels] int8, uint16, or float32 spectrogram

activity_was_detected()[source]

Return if activity was detected in the previously processed sample

Return type

bool

AudioFeatureGenerator Settings

class mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGeneratorSettings(*args, **kwargs)[source]

AudioFeatureGenerator Settings

See the Audio Feature Generator guide for more details.

property spectrogram_shape: Tuple[int, int]

Return the generated spectrogram shape as (height, width) i.e. (n_features, filterbank_n_channels)

Return type

Tuple[int, int]

property sample_rate_hz: int

The sample rate of the audio in Hz, default 16000

Return type

int

property sample_length_ms: int

The length of an audio sample in milliseconds, default 1000

Return type

int

property window_size_ms: int

length of desired time frames in ms, default 25

Return type

int

property window_step_ms: int

length of step size for the next frame in ms, default 10

Return type

int

property filterbank_n_channels: int

the number of filterbank channels to use, default 32

Return type

int

property filterbank_upper_band_limit: float

Float, the highest frequency included in the filterbanks, default 7500.0 NOTE: This should be no more than sample_rate_hz / 2

Return type

float

property filterbank_lower_band_limit: float

the lowest frequency included in the filterbanks, default 125.0

Return type

float

property noise_reduction_enable: bool

Enable/disable noise reduction module, default false

Return type

bool

property noise_reduction_smoothing_bits: int

scale up signal by 2^(smoothing_bits) before reduction, default 10

Return type

int

property noise_reduction_even_smoothing: float

smoothing coefficient for even-numbered channels, default 0.025

Return type

float

property noise_reduction_odd_smoothing: float

smoothing coefficient for odd-numbered channels, default 0.06

Return type

float

property noise_reduction_min_signal_remaining: float

fraction of signal to preserve in smoothing, default 0.05

Return type

float

property pcan_enable: bool

enable PCAN auto gain control, default false

Return type

bool

property pcan_strength: float

gain normalization exponent, default 0.95

Return type

float

property pcan_offset: float

positive value added in the normalization denominator, default 80.0

Return type

float

property pcan_gain_bits: int

number of fractional bits in the gain, default 21

Return type

int

property log_scale_enable: bool

enable logarithmic scaling of filterbanks, default true

Return type

bool

property log_scale_shift: int

scale filterbanks by 2^(scale_shift), default 6

Return type

int

property activity_detection_enable: bool

Enable the activity detection block. This indicates when activity, such as a speech command, is detected in the audio stream, default False

Return type

bool

property activity_detection_alpha_a: float

Activity detection filter A coefficient The activity detection “fast filter” coefficient. The filter is a 1-real pole IIR filter: computes out = (1-k)*in + k*out Default 0.5

Return type

float

property activity_detection_alpha_b: float

Activity detection filter B coefficient The activity detection “slow filter” coefficient. The filter is a 1-real pole IIR filter: computes out = (1-k)*in + k*out Default 0.8

Return type

float

property activity_detection_arm_threshold: float

Threshold for arming the detection block The threshold for when there should be considered possible activity in the audio stream Default 0.75

Return type

float

property activity_detection_trip_threshold: float

Threshold for tripping the detection block The threshold for when activity is considered detected in the audio stream Default 0.8

Return type

float

property dc_notch_filter_enable: bool

Enable the DC notch filter This will help negate any DC components in the audio signal Default False

Return type

bool

property dc_notch_filter_coefficient: float

Coefficient used by DC notch filter

The DC notch filter coefficient k in Q(16,15) format, H(z) = (1 - z^-1)/(1 - k*z^-1) Default 0.95

Return type

float

property quantize_dynamic_scale_enable: bool

Enable dynamic quantization

Enable dynamic quantization of the generated audio spectrogram. With this, the max spectrogram value is mapped to +127, and the max spectrogram minus quantize_dynamic_scale_range_db is mapped to -128. Anything below max spectrogram minus quantize_dynamic_scale_range_db is mapped to -128. Default False

Return type

bool

property quantize_dynamic_scale_range_db: float

Rhe dynamic range in dB used by the dynamic quantization, default 40.0

Return type

float