mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGeneratorSettings¶

class AudioFeatureGeneratorSettings[source]¶

Example Usage

from mltk.core.preprocess.audio.audio_feature_generator import AudioFeatureGeneratorSettings

# Define the settings used to convert the audio into a spectrogram
frontend_settings = AudioFeatureGeneratorSettings()

frontend_settings.sample_rate_hz = 16000
frontend_settings.sample_length_ms = 1200
frontend_settings.window_size_ms = 30
frontend_settings.window_step_ms = 10
frontend_settings.filterbank_n_channels = 108
frontend_settings.filterbank_upper_band_limit = 7500.0
frontend_settings.filterbank_lower_band_limit = 125.0
frontend_settings.noise_reduction_enable = True
frontend_settings.noise_reduction_smoothing_bits = 10
frontend_settings.noise_reduction_even_smoothing =  0.025
frontend_settings.noise_reduction_odd_smoothing = 0.06
frontend_settings.noise_reduction_min_signal_remaining = 0.40
frontend_settings.quantize_dynamic_scale_enable = True # Enable dynamic quantization
frontend_settings.quantize_dynamic_scale_range_db = 40.0

# If this is used in a model specification file,
# be sure to add the Audio Feature generator settings to the model parameters.
# This way, they are included in the generated .tflite model file
# See https://siliconlabs.github.io/mltk/docs/guides/model_parameters.html
my_model.model_parameters.update(frontend_settings)

See the Audio Feature Generator guide for more details.

Properties

`activity_detection_alpha_a`	Activity detection filter A coefficient The activity detection "fast filter" coefficient.
`activity_detection_alpha_b`	Activity detection filter B coefficient The activity detection "slow filter" coefficient.
`activity_detection_arm_threshold`	Threshold for arming the detection block The threshold for when there should be considered possible activity in the audio stream Default 0.75
`activity_detection_enable`	Enable the activity detection block.
`activity_detection_trip_threshold`	Threshold for tripping the detection block The threshold for when activity is considered detected in the audio stream Default 0.8
`dc_notch_filter_coefficient`	Coefficient used by DC notch filter
`dc_notch_filter_enable`	Enable the DC notch filter This will help negate any DC components in the audio signal Default False
`fft_length`	The calculated size required to do an FFT.
`filterbank_lower_band_limit`	the lowest frequency included in the filterbanks, default 125.0
`filterbank_n_channels`	the number of filterbank channels to use, default 32
`filterbank_upper_band_limit`	Float, the highest frequency included in the filterbanks, default 7500.0 NOTE: This should be no more than sample_rate_hz / 2
`log_scale_enable`	enable logarithmic scaling of filterbanks, default true
`log_scale_shift`	scale filterbanks by 2^(scale_shift), default 6
`noise_reduction_enable`	Enable/disable noise reduction module, default false
`noise_reduction_even_smoothing`	smoothing coefficient for even-numbered channels, default 0.025
`noise_reduction_min_signal_remaining`	fraction of signal to preserve in smoothing, default 0.05
`noise_reduction_odd_smoothing`	smoothing coefficient for odd-numbered channels, default 0.06
`noise_reduction_smoothing_bits`	scale up signal by 2^(smoothing_bits) before reduction, default 10
`pcan_enable`	enable PCAN auto gain control, default false
`pcan_gain_bits`	number of fractional bits in the gain, default 21
`pcan_offset`	positive value added in the normalization denominator, default 80.0
`pcan_strength`	gain normalization exponent, default 0.95
`quantize_dynamic_scale_enable`	Enable dynamic quantization
`quantize_dynamic_scale_range_db`	Rhe dynamic range in dB used by the dynamic quantization, default 40.0
`sample_length`	Calculated length of an audio sample in frames sample_length = (self.sample_length_ms * self.sample_rate_hz) // 1000
`sample_length_ms`	The length of an audio sample in milliseconds, default 1000
`sample_rate_hz`	The sample rate of the audio in Hz, default 16000
`spectrogram_shape`	Return the generated spectrogram shape as (height, width) i.e. (n_features, filterbank_n_channels).
`window_size_ms`	length of desired time frames in ms, default 25
`window_step_ms`	length of step size for the next frame in ms, default 10

Methods

`__init__`
`clear`
`copy`	Return a deep copy of the current settings
`fromkeys`	Create a new dictionary with keys from iterable and values set to value.
`get`	Return the value for key if key is in the dictionary, else default.
`items`
`keys`
`pop`	If the key is not found, return the default if given; otherwise, raise a KeyError.
`popitem`	Remove and return a (key, value) pair as a 2-tuple.
`setdefault`	Insert key with a value of default if key is not in the dictionary.
`update`	If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
`values`

__init__(**kwargs)[source]¶

property spectrogram_shape: Tuple[int, int]¶

Return the generated spectrogram shape as (height, width) i.e. (n_features, filterbank_n_channels)

Return type:: Tuple[int, int]

property sample_rate_hz: int¶

The sample rate of the audio in Hz, default 16000

Return type:: int

property sample_length_ms: int¶

The length of an audio sample in milliseconds, default 1000

Return type:: int

__new__(*args, **kwargs)¶

clear() → None. Remove all items from D.¶

fromkeys(iterable, value=None, /)¶: Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)¶: Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items¶

keys() → a set-like object providing a view on D's keys¶

pop(key, default=<unrepresentable>, /)¶: If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem(/)¶

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

property sample_length: int¶

Calculated length of an audio sample in frames sample_length = (self.sample_length_ms * self.sample_rate_hz) // 1000

Return type:: int

setdefault(key, default=None, /)¶

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.¶: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values¶

property window_size_ms: int¶

length of desired time frames in ms, default 25

Return type:: int

property window_step_ms: int¶

length of step size for the next frame in ms, default 10

Return type:: int

property filterbank_n_channels: int¶

the number of filterbank channels to use, default 32

Return type:: int

property filterbank_upper_band_limit: float¶

Float, the highest frequency included in the filterbanks, default 7500.0 NOTE: This should be no more than sample_rate_hz / 2

Return type:: float

property filterbank_lower_band_limit: float¶

the lowest frequency included in the filterbanks, default 125.0

Return type:: float

property noise_reduction_enable: bool¶

Enable/disable noise reduction module, default false

Return type:: bool

property noise_reduction_smoothing_bits: int¶

scale up signal by 2^(smoothing_bits) before reduction, default 10

Return type:: int

property noise_reduction_even_smoothing: float¶

smoothing coefficient for even-numbered channels, default 0.025

Return type:: float

property noise_reduction_odd_smoothing: float¶

smoothing coefficient for odd-numbered channels, default 0.06

Return type:: float

property noise_reduction_min_signal_remaining: float¶

fraction of signal to preserve in smoothing, default 0.05

Return type:: float

property pcan_enable: bool¶

enable PCAN auto gain control, default false

Return type:: bool

property pcan_strength: float¶

gain normalization exponent, default 0.95

Return type:: float

property pcan_offset: float¶

positive value added in the normalization denominator, default 80.0

Return type:: float

property pcan_gain_bits: int¶

number of fractional bits in the gain, default 21

Return type:: int

property log_scale_enable: bool¶

enable logarithmic scaling of filterbanks, default true

Return type:: bool

property log_scale_shift: int¶

scale filterbanks by 2^(scale_shift), default 6

Return type:: int

property activity_detection_enable: bool¶

Enable the activity detection block. This indicates when activity, such as a speech command, is detected in the audio stream, default False

Return type:: bool

property activity_detection_alpha_a: float¶

Activity detection filter A coefficient The activity detection “fast filter” coefficient. The filter is a 1-real pole IIR filter: computes out = (1-k)*in + k*out Default 0.5

Return type:: float

property activity_detection_alpha_b: float¶

Activity detection filter B coefficient The activity detection “slow filter” coefficient. The filter is a 1-real pole IIR filter: computes out = (1-k)*in + k*out Default 0.8

Return type:: float

property activity_detection_arm_threshold: float¶

Threshold for arming the detection block The threshold for when there should be considered possible activity in the audio stream Default 0.75

Return type:: float

property activity_detection_trip_threshold: float¶

Threshold for tripping the detection block The threshold for when activity is considered detected in the audio stream Default 0.8

Return type:: float

property dc_notch_filter_enable: bool¶

Enable the DC notch filter This will help negate any DC components in the audio signal Default False

Return type:: bool

property dc_notch_filter_coefficient: float¶

Coefficient used by DC notch filter

The DC notch filter coefficient k in Q(16,15) format, H(z) = (1 - z^-1)/(1 - k*z^-1) Default 0.95

Return type:: float

property quantize_dynamic_scale_enable: bool¶

Enable dynamic quantization

Enable dynamic quantization of the generated audio spectrogram. With this, the max spectrogram value is mapped to +127, and the max spectrogram minus quantize_dynamic_scale_range_db is mapped to -128. Anything below max spectrogram minus quantize_dynamic_scale_range_db is mapped to -128. Default False

Return type:: bool

property quantize_dynamic_scale_range_db: float¶

Rhe dynamic range in dB used by the dynamic quantization, default 40.0

Return type:: float

property fft_length: int¶

The calculated size required to do an FFT. This is dependent on the window_size_ms and sample_rate_hz values

Return type:: int

copy()[source]¶

Return a deep copy of the current settings

Return type:: AudioFeatureGeneratorSettings