mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGeneratorSettings

class AudioFeatureGeneratorSettings[source]

Settings for the AudioFeatureGenerator

Example Usage

from mltk.core.preprocess.audio.audio_feature_generator import AudioFeatureGeneratorSettings

# Define the settings used to convert the audio into a spectrogram
frontend_settings = AudioFeatureGeneratorSettings()

frontend_settings.sample_rate_hz = 16000
frontend_settings.sample_length_ms = 1200
frontend_settings.window_size_ms = 30
frontend_settings.window_step_ms = 10
frontend_settings.filterbank_n_channels = 108
frontend_settings.filterbank_upper_band_limit = 7500.0
frontend_settings.filterbank_lower_band_limit = 125.0
frontend_settings.noise_reduction_enable = True
frontend_settings.noise_reduction_smoothing_bits = 10
frontend_settings.noise_reduction_even_smoothing =  0.025
frontend_settings.noise_reduction_odd_smoothing = 0.06
frontend_settings.noise_reduction_min_signal_remaining = 0.40
frontend_settings.quantize_dynamic_scale_enable = True # Enable dynamic quantization
frontend_settings.quantize_dynamic_scale_range_db = 40.0

# If this is used in a model specification file,
# be sure to add the Audio Feature generator settings to the model parameters.
# This way, they are included in the generated .tflite model file
# See https://siliconlabs.github.io/mltk/docs/guides/model_parameters.html
my_model.model_parameters.update(frontend_settings)

See the Audio Feature Generator guide for more details.

Properties

activity_detection_alpha_a

Activity detection filter A coefficient The activity detection "fast filter" coefficient.

activity_detection_alpha_b

Activity detection filter B coefficient The activity detection "slow filter" coefficient.

activity_detection_arm_threshold

Threshold for arming the detection block The threshold for when there should be considered possible activity in the audio stream Default 0.75

activity_detection_enable

Enable the activity detection block.

activity_detection_trip_threshold

Threshold for tripping the detection block The threshold for when activity is considered detected in the audio stream Default 0.8

dc_notch_filter_coefficient

Coefficient used by DC notch filter

dc_notch_filter_enable

Enable the DC notch filter This will help negate any DC components in the audio signal Default False

fft_length

The calculated size required to do an FFT.

filterbank_lower_band_limit

the lowest frequency included in the filterbanks, default 125.0

filterbank_n_channels

the number of filterbank channels to use, default 32

filterbank_upper_band_limit

Float, the highest frequency included in the filterbanks, default 7500.0 NOTE: This should be no more than sample_rate_hz / 2

log_scale_enable

enable logarithmic scaling of filterbanks, default true

log_scale_shift

scale filterbanks by 2^(scale_shift), default 6

noise_reduction_enable

Enable/disable noise reduction module, default false

noise_reduction_even_smoothing

smoothing coefficient for even-numbered channels, default 0.025

noise_reduction_min_signal_remaining

fraction of signal to preserve in smoothing, default 0.05

noise_reduction_odd_smoothing

smoothing coefficient for odd-numbered channels, default 0.06

noise_reduction_smoothing_bits

scale up signal by 2^(smoothing_bits) before reduction, default 10

pcan_enable

enable PCAN auto gain control, default false

pcan_gain_bits

number of fractional bits in the gain, default 21

pcan_offset

positive value added in the normalization denominator, default 80.0

pcan_strength

gain normalization exponent, default 0.95

quantize_dynamic_scale_enable

Enable dynamic quantization

quantize_dynamic_scale_range_db

Rhe dynamic range in dB used by the dynamic quantization, default 40.0

sample_length

Calculated length of an audio sample in frames sample_length = (self.sample_length_ms * self.sample_rate_hz) // 1000

sample_length_ms

The length of an audio sample in milliseconds, default 1000

sample_rate_hz

The sample rate of the audio in Hz, default 16000

spectrogram_shape

Return the generated spectrogram shape as (height, width) i.e. (n_features, filterbank_n_channels).

window_size_ms

length of desired time frames in ms, default 25

window_step_ms

length of step size for the next frame in ms, default 10

Methods

__init__

clear

copy

Return a deep copy of the current settings

fromkeys

Create a new dictionary with keys from iterable and values set to value.

get

Return the value for key if key is in the dictionary, else default.

items

keys

pop

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem

Remove and return a (key, value) pair as a 2-tuple.

setdefault

Insert key with a value of default if key is not in the dictionary.

update

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values

__init__(**kwargs)[source]
property spectrogram_shape: Tuple[int, int]

Return the generated spectrogram shape as (height, width) i.e. (n_features, filterbank_n_channels)

Return type:

Tuple[int, int]

property sample_rate_hz: int

The sample rate of the audio in Hz, default 16000

Return type:

int

property sample_length_ms: int

The length of an audio sample in milliseconds, default 1000

Return type:

int

__new__(*args, **kwargs)
clear() None.  Remove all items from D.
fromkeys(iterable, value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
pop(key, default=<unrepresentable>, /)

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem(/)

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

property sample_length: int

Calculated length of an audio sample in frames sample_length = (self.sample_length_ms * self.sample_rate_hz) // 1000

Return type:

int

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values
property window_size_ms: int

length of desired time frames in ms, default 25

Return type:

int

property window_step_ms: int

length of step size for the next frame in ms, default 10

Return type:

int

property filterbank_n_channels: int

the number of filterbank channels to use, default 32

Return type:

int

property filterbank_upper_band_limit: float

Float, the highest frequency included in the filterbanks, default 7500.0 NOTE: This should be no more than sample_rate_hz / 2

Return type:

float

property filterbank_lower_band_limit: float

the lowest frequency included in the filterbanks, default 125.0

Return type:

float

property noise_reduction_enable: bool

Enable/disable noise reduction module, default false

Return type:

bool

property noise_reduction_smoothing_bits: int

scale up signal by 2^(smoothing_bits) before reduction, default 10

Return type:

int

property noise_reduction_even_smoothing: float

smoothing coefficient for even-numbered channels, default 0.025

Return type:

float

property noise_reduction_odd_smoothing: float

smoothing coefficient for odd-numbered channels, default 0.06

Return type:

float

property noise_reduction_min_signal_remaining: float

fraction of signal to preserve in smoothing, default 0.05

Return type:

float

property pcan_enable: bool

enable PCAN auto gain control, default false

Return type:

bool

property pcan_strength: float

gain normalization exponent, default 0.95

Return type:

float

property pcan_offset: float

positive value added in the normalization denominator, default 80.0

Return type:

float

property pcan_gain_bits: int

number of fractional bits in the gain, default 21

Return type:

int

property log_scale_enable: bool

enable logarithmic scaling of filterbanks, default true

Return type:

bool

property log_scale_shift: int

scale filterbanks by 2^(scale_shift), default 6

Return type:

int

property activity_detection_enable: bool

Enable the activity detection block. This indicates when activity, such as a speech command, is detected in the audio stream, default False

Return type:

bool

property activity_detection_alpha_a: float

Activity detection filter A coefficient The activity detection “fast filter” coefficient. The filter is a 1-real pole IIR filter: computes out = (1-k)*in + k*out Default 0.5

Return type:

float

property activity_detection_alpha_b: float

Activity detection filter B coefficient The activity detection “slow filter” coefficient. The filter is a 1-real pole IIR filter: computes out = (1-k)*in + k*out Default 0.8

Return type:

float

property activity_detection_arm_threshold: float

Threshold for arming the detection block The threshold for when there should be considered possible activity in the audio stream Default 0.75

Return type:

float

property activity_detection_trip_threshold: float

Threshold for tripping the detection block The threshold for when activity is considered detected in the audio stream Default 0.8

Return type:

float

property dc_notch_filter_enable: bool

Enable the DC notch filter This will help negate any DC components in the audio signal Default False

Return type:

bool

property dc_notch_filter_coefficient: float

Coefficient used by DC notch filter

The DC notch filter coefficient k in Q(16,15) format, H(z) = (1 - z^-1)/(1 - k*z^-1) Default 0.95

Return type:

float

property quantize_dynamic_scale_enable: bool

Enable dynamic quantization

Enable dynamic quantization of the generated audio spectrogram. With this, the max spectrogram value is mapped to +127, and the max spectrogram minus quantize_dynamic_scale_range_db is mapped to -128. Anything below max spectrogram minus quantize_dynamic_scale_range_db is mapped to -128. Default False

Return type:

bool

property quantize_dynamic_scale_range_db: float

Rhe dynamic range in dB used by the dynamic quantization, default 40.0

Return type:

float

property fft_length: int

The calculated size required to do an FFT. This is dependent on the window_size_ms and sample_rate_hz values

Return type:

int

copy()[source]

Return a deep copy of the current settings

Return type:

AudioFeatureGeneratorSettings