AudioFeatureGenerator¶
- class mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGenerator(settings)[source]¶
AudioFeatureGenerator Interface
See also
- process_sample(sample, dtype=<class 'numpy.float32'>)[source]¶
Convert the provided 1D audio sample to a 2D spectrogram using the AudioFeatureGenerator
The generated 2D spectrogram dimensions are calculated as follows:
sample_length = len(sample) = int(sample_length_ms*sample_rate_hz / 1000) window_size_length = int(window_size_ms * sample_rate_hz / 1000) window_step_length = int(window_step_ms * sample_rate_hz / 1000) height = n_features = (sample_length - window_size_length) // window_step_length + 1 width = n_channels = AudioFeatureGeneratorSettings.filterbank_n_channels
The dtype argument specifies the data type of the returned spectrogram. This must be one of the following:
uint16: This the raw value generated by the internal AudioFeatureGenerator library
float32: This is the uint16 value directly casted to a float32
- int8: This is the int8 value generated by the TFLM “micro features” library.
Refer to the following for the magic that happens here: micro_features_generator.cc#L84
- Parameters
sample (
ndarray
) – [sample_length] int16 audio sampledtype – Output data type, must be int8, uint16, or float32
- Return type
ndarray
- Returns
[n_features, n_channels] int8, uint16, or float32 spectrogram
AudioFeatureGenerator Settings¶
- class mltk.core.preprocess.audio.audio_feature_generator.AudioFeatureGeneratorSettings(*args, **kwargs)[source]¶
AudioFeatureGenerator Settings
See the Audio Feature Generator guide for more details.
- property spectrogram_shape: Tuple[int, int]¶
Return the generated spectrogram shape as (height, width) i.e. (n_features, filterbank_n_channels)
- Return type
Tuple
[int
,int
]
- property sample_rate_hz: int¶
The sample rate of the audio in Hz, default 16000
- Return type
int
- property sample_length_ms: int¶
The length of an audio sample in milliseconds, default 1000
- Return type
int
- property window_size_ms: int¶
length of desired time frames in ms, default 25
- Return type
int
- property window_step_ms: int¶
length of step size for the next frame in ms, default 10
- Return type
int
- property filterbank_n_channels: int¶
the number of filterbank channels to use, default 32
- Return type
int
- property filterbank_upper_band_limit: float¶
Float, the highest frequency included in the filterbanks, default 7500.0 NOTE: This should be no more than sample_rate_hz / 2
- Return type
float
- property filterbank_lower_band_limit: float¶
the lowest frequency included in the filterbanks, default 125.0
- Return type
float
- property noise_reduction_enable: bool¶
Enable/disable noise reduction module, default false
- Return type
bool
- property noise_reduction_smoothing_bits: int¶
scale up signal by 2^(smoothing_bits) before reduction, default 10
- Return type
int
- property noise_reduction_even_smoothing: float¶
smoothing coefficient for even-numbered channels, default 0.025
- Return type
float
- property noise_reduction_odd_smoothing: float¶
smoothing coefficient for odd-numbered channels, default 0.06
- Return type
float
- property noise_reduction_min_signal_remaining: float¶
fraction of signal to preserve in smoothing, default 0.05
- Return type
float
- property pcan_enable: bool¶
enable PCAN auto gain control, default false
- Return type
bool
- property pcan_strength: float¶
gain normalization exponent, default 0.95
- Return type
float
- property pcan_offset: float¶
positive value added in the normalization denominator, default 80.0
- Return type
float
- property pcan_gain_bits: int¶
number of fractional bits in the gain, default 21
- Return type
int
- property log_scale_enable: bool¶
enable logarithmic scaling of filterbanks, default true
- Return type
bool
- property log_scale_shift: int¶
scale filterbanks by 2^(scale_shift), default 6
- Return type
int
- property activity_detection_enable: bool¶
Enable the activity detection block. This indicates when activity, such as a speech command, is detected in the audio stream, default False
- Return type
bool
- property activity_detection_alpha_a: float¶
Activity detection filter A coefficient The activity detection “fast filter” coefficient. The filter is a 1-real pole IIR filter:
computes out = (1-k)*in + k*out
Default 0.5- Return type
float
- property activity_detection_alpha_b: float¶
Activity detection filter B coefficient The activity detection “slow filter” coefficient. The filter is a 1-real pole IIR filter:
computes out = (1-k)*in + k*out
Default 0.8- Return type
float
- property activity_detection_arm_threshold: float¶
Threshold for arming the detection block The threshold for when there should be considered possible activity in the audio stream Default 0.75
- Return type
float
- property activity_detection_trip_threshold: float¶
Threshold for tripping the detection block The threshold for when activity is considered detected in the audio stream Default 0.8
- Return type
float
- property dc_notch_filter_enable: bool¶
Enable the DC notch filter This will help negate any DC components in the audio signal Default False
- Return type
bool
- property dc_notch_filter_coefficient: float¶
Coefficient used by DC notch filter
The DC notch filter coefficient k in Q(16,15) format,
H(z) = (1 - z^-1)/(1 - k*z^-1)
Default 0.95- Return type
float
- property quantize_dynamic_scale_enable: bool¶
Enable dynamic quantization
Enable dynamic quantization of the generated audio spectrogram. With this, the max spectrogram value is mapped to +127, and the max spectrogram minus
quantize_dynamic_scale_range_db
is mapped to -128. Anything below max spectrogram minusquantize_dynamic_scale_range_db
is mapped to -128. Default False- Return type
bool
- property quantize_dynamic_scale_range_db: float¶
Rhe dynamic range in dB used by the dynamic quantization, default 40.0
- Return type
float