mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGeneratorSettings¶
- class AudioFeatureGeneratorSettings[source]¶
Settings for the AudioFeatureGenerator
Example Usage
from mltk.core.preprocess.audio.audio_feature_generator import AudioFeatureGeneratorSettings # Define the settings used to convert the audio into a spectrogram frontend_settings = AudioFeatureGeneratorSettings() frontend_settings.sample_rate_hz = 16000 frontend_settings.sample_length_ms = 1200 frontend_settings.window_size_ms = 30 frontend_settings.window_step_ms = 10 frontend_settings.filterbank_n_channels = 108 frontend_settings.filterbank_upper_band_limit = 7500.0 frontend_settings.filterbank_lower_band_limit = 125.0 frontend_settings.noise_reduction_enable = True frontend_settings.noise_reduction_smoothing_bits = 10 frontend_settings.noise_reduction_even_smoothing = 0.025 frontend_settings.noise_reduction_odd_smoothing = 0.06 frontend_settings.noise_reduction_min_signal_remaining = 0.40 frontend_settings.quantize_dynamic_scale_enable = True # Enable dynamic quantization frontend_settings.quantize_dynamic_scale_range_db = 40.0 # If this is used in a model specification file, # be sure to add the Audio Feature generator settings to the model parameters. # This way, they are included in the generated .tflite model file # See https://siliconlabs.github.io/mltk/docs/guides/model_parameters.html my_model.model_parameters.update(frontend_settings)
See the Audio Feature Generator guide for more details.
Properties
Activity detection filter A coefficient The activity detection "fast filter" coefficient.
Activity detection filter B coefficient The activity detection "slow filter" coefficient.
Threshold for arming the detection block The threshold for when there should be considered possible activity in the audio stream Default 0.75
Enable the activity detection block.
Threshold for tripping the detection block The threshold for when activity is considered detected in the audio stream Default 0.8
Coefficient used by DC notch filter
Enable the DC notch filter This will help negate any DC components in the audio signal Default False
The calculated size required to do an FFT.
the lowest frequency included in the filterbanks, default 125.0
the number of filterbank channels to use, default 32
Float, the highest frequency included in the filterbanks, default 7500.0 NOTE: This should be no more than sample_rate_hz / 2
enable logarithmic scaling of filterbanks, default true
scale filterbanks by 2^(scale_shift), default 6
Enable/disable noise reduction module, default false
smoothing coefficient for even-numbered channels, default 0.025
fraction of signal to preserve in smoothing, default 0.05
smoothing coefficient for odd-numbered channels, default 0.06
scale up signal by 2^(smoothing_bits) before reduction, default 10
enable PCAN auto gain control, default false
number of fractional bits in the gain, default 21
positive value added in the normalization denominator, default 80.0
gain normalization exponent, default 0.95
Enable dynamic quantization
Rhe dynamic range in dB used by the dynamic quantization, default 40.0
Calculated length of an audio sample in frames sample_length = (self.sample_length_ms * self.sample_rate_hz) // 1000
The length of an audio sample in milliseconds, default 1000
The sample rate of the audio in Hz, default 16000
Return the generated spectrogram shape as (height, width) i.e. (n_features, filterbank_n_channels).
length of desired time frames in ms, default 25
length of step size for the next frame in ms, default 10
Methods
Return a deep copy of the current settings
Create a new dictionary with keys from iterable and values set to value.
Return the value for key if key is in the dictionary, else default.
If the key is not found, return the default if given; otherwise, raise a KeyError.
Remove and return a (key, value) pair as a 2-tuple.
Insert key with a value of default if key is not in the dictionary.
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- property spectrogram_shape: Tuple[int, int]¶
Return the generated spectrogram shape as (height, width) i.e. (n_features, filterbank_n_channels)
- Return type:
Tuple
[int
,int
]
- property sample_rate_hz: int¶
The sample rate of the audio in Hz, default 16000
- Return type:
int
- property sample_length_ms: int¶
The length of an audio sample in milliseconds, default 1000
- Return type:
int
- __new__(*args, **kwargs)¶
- clear() None. Remove all items from D. ¶
- fromkeys(iterable, value=None, /)¶
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)¶
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D's items ¶
- keys() a set-like object providing a view on D's keys ¶
- pop(key, default=<unrepresentable>, /)¶
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem(/)¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- property sample_length: int¶
Calculated length of an audio sample in frames sample_length = (self.sample_length_ms * self.sample_rate_hz) // 1000
- Return type:
int
- setdefault(key, default=None, /)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F. ¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values ¶
- property window_size_ms: int¶
length of desired time frames in ms, default 25
- Return type:
int
- property window_step_ms: int¶
length of step size for the next frame in ms, default 10
- Return type:
int
- property filterbank_n_channels: int¶
the number of filterbank channels to use, default 32
- Return type:
int
- property filterbank_upper_band_limit: float¶
Float, the highest frequency included in the filterbanks, default 7500.0 NOTE: This should be no more than sample_rate_hz / 2
- Return type:
float
- property filterbank_lower_band_limit: float¶
the lowest frequency included in the filterbanks, default 125.0
- Return type:
float
- property noise_reduction_enable: bool¶
Enable/disable noise reduction module, default false
- Return type:
bool
- property noise_reduction_smoothing_bits: int¶
scale up signal by 2^(smoothing_bits) before reduction, default 10
- Return type:
int
- property noise_reduction_even_smoothing: float¶
smoothing coefficient for even-numbered channels, default 0.025
- Return type:
float
- property noise_reduction_odd_smoothing: float¶
smoothing coefficient for odd-numbered channels, default 0.06
- Return type:
float
- property noise_reduction_min_signal_remaining: float¶
fraction of signal to preserve in smoothing, default 0.05
- Return type:
float
- property pcan_enable: bool¶
enable PCAN auto gain control, default false
- Return type:
bool
- property pcan_strength: float¶
gain normalization exponent, default 0.95
- Return type:
float
- property pcan_offset: float¶
positive value added in the normalization denominator, default 80.0
- Return type:
float
- property pcan_gain_bits: int¶
number of fractional bits in the gain, default 21
- Return type:
int
- property log_scale_enable: bool¶
enable logarithmic scaling of filterbanks, default true
- Return type:
bool
- property log_scale_shift: int¶
scale filterbanks by 2^(scale_shift), default 6
- Return type:
int
- property activity_detection_enable: bool¶
Enable the activity detection block. This indicates when activity, such as a speech command, is detected in the audio stream, default False
- Return type:
bool
- property activity_detection_alpha_a: float¶
Activity detection filter A coefficient The activity detection “fast filter” coefficient. The filter is a 1-real pole IIR filter:
computes out = (1-k)*in + k*out
Default 0.5- Return type:
float
- property activity_detection_alpha_b: float¶
Activity detection filter B coefficient The activity detection “slow filter” coefficient. The filter is a 1-real pole IIR filter:
computes out = (1-k)*in + k*out
Default 0.8- Return type:
float
- property activity_detection_arm_threshold: float¶
Threshold for arming the detection block The threshold for when there should be considered possible activity in the audio stream Default 0.75
- Return type:
float
- property activity_detection_trip_threshold: float¶
Threshold for tripping the detection block The threshold for when activity is considered detected in the audio stream Default 0.8
- Return type:
float
- property dc_notch_filter_enable: bool¶
Enable the DC notch filter This will help negate any DC components in the audio signal Default False
- Return type:
bool
- property dc_notch_filter_coefficient: float¶
Coefficient used by DC notch filter
The DC notch filter coefficient k in Q(16,15) format,
H(z) = (1 - z^-1)/(1 - k*z^-1)
Default 0.95- Return type:
float
- property quantize_dynamic_scale_enable: bool¶
Enable dynamic quantization
Enable dynamic quantization of the generated audio spectrogram. With this, the max spectrogram value is mapped to +127, and the max spectrogram minus
quantize_dynamic_scale_range_db
is mapped to -128. Anything below max spectrogram minusquantize_dynamic_scale_range_db
is mapped to -128. Default False- Return type:
bool
- property quantize_dynamic_scale_range_db: float¶
Rhe dynamic range in dB used by the dynamic quantization, default 40.0
- Return type:
float
- property fft_length: int¶
The calculated size required to do an FFT. This is dependent on the window_size_ms and sample_rate_hz values
- Return type:
int