Possible issue with VFS component in V4.4.8. Not acquiring lock during vfs_fat_lseek

NotMyRealName
Posts: 41
Joined: Thu Feb 13, 2020 1:35 am

Possible issue with VFS component in V4.4.8. Not acquiring lock during vfs_fat_lseek

Postby NotMyRealName » Wed Oct 30, 2024 4:42 am

I've been trying to debug an SD stability issue for a few days now and I think I might finally be onto something... Fingers crossed. :)

I have the SDMMC peripheral running in 1 Bit mode (also 4 bit on another design) and I'm having issues after startup with file access to the SD card tripping one of the assertions. This is an established commercial project with about a thousand devices running so far though admittedly most of those do not use the SD card.

Its crashing on this line here when trying to open/seek/write/close:
https://github.com/espressif/esp-idf/bl ... ff.c#L3466

When I inspect with gdb the whole 'obj' object is null (valid pointer, but empty structure fields).
I think this means it is somehow getting through the validation function in line 3463, and then having another thread modify the pointer or the object itself. (Or I have another major pointer/overflow issue). E.g. two threads opening files on the same disk at once.

Backtrace shows it is going through the lseek functions to get here.

I'm trying to understand why vfs_fat_lseek isn't acquiring the locks on the context object here?

Code: Select all

static off_t vfs_fat_lseek(void* ctx, int fd, off_t offset, int mode)
{
    vfs_fat_ctx_t* fat_ctx = (vfs_fat_ctx_t*) ctx;
    FIL* file = &fat_ctx->files[fd];
    off_t new_pos;
    if (mode == SEEK_SET) {
        new_pos = offset;
    } else if (mode == SEEK_CUR) {
        off_t cur_pos = f_tell(file);
        new_pos = cur_pos + offset;
    } else if (mode == SEEK_END) {
        off_t size = f_size(file);
        new_pos = size + offset;
    } else {
        errno = EINVAL;
        return -1;
    }

    ESP_LOGD(TAG, "%s: offset=%ld, filesize:=%d", __func__, new_pos, f_size(file));
    FRESULT res = f_lseek(file, new_pos);
    if (res != FR_OK) {
        ESP_LOGD(TAG, "%s: fresult=%d", __func__, res);
        errno = fresult_to_errno(res);
        return -1;
    }
    return new_pos;
}
https://github.com/espressif/esp-idf/bl ... c#L533-559

I've tried adding the lock and it seems to fix things so far, although I haven't done enough testing to be sure just yet.

Code: Select all

static off_t vfs_fat_lseek(void* ctx, int fd, off_t offset, int mode)
{
    vfs_fat_ctx_t* fat_ctx = (vfs_fat_ctx_t*) ctx;
    _lock_acquire(&fat_ctx->lock);					// ADDED LOCK HERE
    FIL* file = &fat_ctx->files[fd];
    off_t new_pos;
    if (mode == SEEK_SET) {
        new_pos = offset;
    } else if (mode == SEEK_CUR) {
        off_t cur_pos = f_tell(file);
        new_pos = cur_pos + offset;
    } else if (mode == SEEK_END) {
        off_t size = f_size(file);
        new_pos = size + offset;
    } else {
        errno = EINVAL;
        _lock_release(&fat_ctx->lock);                              // RELEASE LOCK - ERROR CONDITION
        return -1;
    }

    ESP_LOGD(TAG, "%s: offset=%ld, filesize:=%d", __func__, new_pos, f_size(file));
    FRESULT res = f_lseek(file, new_pos);
    _lock_release(&fat_ctx->lock);	    	                  // RELEASE LOCK - NORMAL FLOW
    if (res != FR_OK) {
        ESP_LOGD(TAG, "%s: fresult=%d", __func__, res);
        errno = fresult_to_errno(res);
        return -1;
    }
    return new_pos;
}
Is this an oversight in the vfs_implementation?

I think I have some secondary issue as well because the SD seems to unmount somehow after boot sometimes. It was doing that before and its still doing it now. I can find and fix that separately, but I haven't had any crashes since adding the above locks.

I am using SDK V4.4.8 (with a couple of our own minor patches - nothing different in VFS/SD components). Minor tweak to FAT to force allocation from External ram. It looks the same in the master branch at the moment.

Who is online

Users browsing this forum: No registered users and 84 guests