Page 1 of 1

MEMS I2S Audio and ESP-DSP FFT

Posted: Tue Aug 27, 2019 6:50 am
by technosf
I'd like to get feedback on an attempt to pull audio of a single MEMS I2S microphone on one channel and run it through the ESP-DSP FFT library...

The MEMS microphone produces I2S digital audio - 32bit words per channel, 24 bits of MSB aligned data per word, of which 18bits are significant. Data on non-connected channels are tri-state - adding a pull down resistor to the data line can clear up the unconected channel, which makes it easier to determine whats what.

Here's the I2S config:

Code: Select all

const i2s_config_t i2s_config = {
                .mode = i2s_mode_t( I2S_MODE_MASTER | I2S_MODE_RX ),    // Receive as Master
                .sample_rate = 22050,                         // 22kHz is equivalent to 11kHz maximum audio frequence, re Nyquist
                .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,    // 32 clock pulses per channel
                .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,    // Only capture the LHS channel
                .communication_format = i2s_comm_format_t( I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB ), // Philips format
                .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,     // Interrupt level 1
                .dma_buf_count = 9,                         // (9-1)*256 = 2048 samples per read (1 buffer in use and unavailable during read)
                .dma_buf_len = 256,                              //
                .use_apll = false,                              //
                .tx_desc_auto_clear = false,          //
                .fixed_mclk = 0 };
The I2S appears to be acting as one would expect, with values increasing in amplitude with sound pressure. I read off the uint32_t values and right shift them 14 bits and cast them into an int with 16 bit range. To this end I decide to try the ESP-DSP sc16 FFT following the DSP 32 bit float FFT basic_math demo logic. My FFT code looks like this:

Code: Select all

	// Create the complex vector
            for ( int i = 0; i < 2048; i++ ) 
            {
                int val = ( *(uint16_t*) &m_samples [ i ] );  // m_samples are the raw 32 bit words
                y_cf [ i * 2 + 0 ] = val;    // Not using windowing, but is in the basic math demo
                y_cf [ i * 2 + 1 ] = 0;
            }
            
            
            dsps_fft2r_sc16( data, length );			// The FFT calc
             dsps_bit_rev_sc16_ansi( data, length );		// Bit reverse used in the 32  - Do not know what this does or why       
            dsps_cplx2reC_sc16( data, length );	// Convert one complex vector to two complex vectors - Do not know what this does or why       
From here I sum the y_cf array into 1024 processed values (half of the 2048 raw data vales) using the formula in the demo where the processed value for bucket i is:

Code: Select all

10 * log10f(
              ( y_cf [ i * 2 + 0 ] * y_cf [ i * 2 + 0 ]
               + y_cf [ i * 2 + 1 ] * y_cf [ i * 2 + 1 ] ) / 2048) )
Looking at the data that results here, the value ranges are between 0 and 25, with the mean being around 15, but if I supply sound pressure, I'm not seeing any variation. As the raw data from I2S does vary I'm wondering if my use of the FFT is flawed.

Any thoughts appreciated!