ESPUSB32 Full-speed USB Approach
ESPUSB32 Full-speed USB Approach
I intend to TRY to get USB 1.1 full speed implemented on the ESP32 over the next stretch of time. There are some approaches I have considered, but I was thinking perhaps you guys could point me in the direction you think may be the most fruitful.
The requirements are:
* Read data in at at least 12 Msample/sec.
* Very precise offset from first level change (GPIO), or the ability to sample fast enough to see transitions (I2S).
* VERY fast turn-around. With USB 1.1 you have only six bit-times between the end of one message to send an ACK. I want to check CRC inline.
* Timing must be precise. Inaccuracies, such as instructions taking unpredictable amounts of time instead of precise amounts of time are unacceptable. Can the DMA or other subsystems make executing code pause? Or does the processor have ultimate performance?
* Can I count on a dedicated core to answer an interrupt? Can I count on every instruction always taking the same number of cycles? I could on the ESP8266. Does the same hold for the ESP32?
* Need TWO pins. One is insufficient to detect a "stop" state.
* I am not worried about computation time. I find Xtensa much easier to play tricks with than ARM. I also learned several tricks when writing the first ESPUSB and ESPTHERNET. I think I can process about 2 or 3 USB timeslices per procedure call. And I can keep the procedure calls VERY small. <3 the Xtensa powerhouse.
Right now, I see a few approaches. I suppose I should get input from you guys before I trudge down any of these paths:
(0) Ultra-low overhead level-change interrupt?
Can I trigger an interrupt on IO change and have my code execute very quickly thereafter? Can I have it run on a dedicated core at highest priority? I have tried doing stuff like this on the ESP32 and found that the jitter is VERY low. So low, it should be fine. I am just worried about latency.
(1) Use the I2S engine.
Can the I2S engine have tiny block sizes in parallel mode? I.e. reading in 8- or 16- bits but only reading chains of 32 bytes or so? The ESP8266 cannot do this reliably. DMA with small blocks can become very buggy on the ESP8266, randomly incurring extra wait cycles between chains and DMA entries being skipped. Is the I2S engine in the ESP32 more robust with small block chains? The reason behind the small block size is because one has to return an answer to the host VERY fast. I can resynchronize every couple bits without much issue by using a software-PLL.
(2) Speed GPIO along.
After several tests, I found that the GPIO on the ESP32 is marginally faster. Though the wait states for reading from GPIO on the ESP32 is higher than on the ESP8266, the overall clock rate is higher as well. This gives me more time to actually work with the incoming data. THEORETICALLY there is enough time to do everything in GPIO. The only potential problem would be if program execution is non-deterministic, or, I can't synchronize off of the initial transition.
(3) Another side-channel
Is there another mechanism by which the IO can be read at high speed? Some mechanism to side-step the wait-states on the reading of I/O. Prefetching? Watching non-enabled interrupt flags? etc. I just can't find that much information about how the external IO is wired internally.
The requirements are:
* Read data in at at least 12 Msample/sec.
* Very precise offset from first level change (GPIO), or the ability to sample fast enough to see transitions (I2S).
* VERY fast turn-around. With USB 1.1 you have only six bit-times between the end of one message to send an ACK. I want to check CRC inline.
* Timing must be precise. Inaccuracies, such as instructions taking unpredictable amounts of time instead of precise amounts of time are unacceptable. Can the DMA or other subsystems make executing code pause? Or does the processor have ultimate performance?
* Can I count on a dedicated core to answer an interrupt? Can I count on every instruction always taking the same number of cycles? I could on the ESP8266. Does the same hold for the ESP32?
* Need TWO pins. One is insufficient to detect a "stop" state.
* I am not worried about computation time. I find Xtensa much easier to play tricks with than ARM. I also learned several tricks when writing the first ESPUSB and ESPTHERNET. I think I can process about 2 or 3 USB timeslices per procedure call. And I can keep the procedure calls VERY small. <3 the Xtensa powerhouse.
Right now, I see a few approaches. I suppose I should get input from you guys before I trudge down any of these paths:
(0) Ultra-low overhead level-change interrupt?
Can I trigger an interrupt on IO change and have my code execute very quickly thereafter? Can I have it run on a dedicated core at highest priority? I have tried doing stuff like this on the ESP32 and found that the jitter is VERY low. So low, it should be fine. I am just worried about latency.
(1) Use the I2S engine.
Can the I2S engine have tiny block sizes in parallel mode? I.e. reading in 8- or 16- bits but only reading chains of 32 bytes or so? The ESP8266 cannot do this reliably. DMA with small blocks can become very buggy on the ESP8266, randomly incurring extra wait cycles between chains and DMA entries being skipped. Is the I2S engine in the ESP32 more robust with small block chains? The reason behind the small block size is because one has to return an answer to the host VERY fast. I can resynchronize every couple bits without much issue by using a software-PLL.
(2) Speed GPIO along.
After several tests, I found that the GPIO on the ESP32 is marginally faster. Though the wait states for reading from GPIO on the ESP32 is higher than on the ESP8266, the overall clock rate is higher as well. This gives me more time to actually work with the incoming data. THEORETICALLY there is enough time to do everything in GPIO. The only potential problem would be if program execution is non-deterministic, or, I can't synchronize off of the initial transition.
(3) Another side-channel
Is there another mechanism by which the IO can be read at high speed? Some mechanism to side-step the wait-states on the reading of I/O. Prefetching? Watching non-enabled interrupt flags? etc. I just can't find that much information about how the external IO is wired internally.
-
- Posts: 9713
- Joined: Thu Nov 26, 2015 4:08 am
Re: ESPUSB32 Full-speed USB Approach
Not at the office, so I'll drop what I know:
You can tie a GPIO to a high-level interrupt. That should jump to that interrupt within a few cycles of the GPIO changing state. (High-prio interrupts are finicky though, you e.g. can't use the stack, but I'm sure you should be able to work with that.) We even have the NMI free, in theory, that should 100% guarantee you interrupt latency. The later versions of esp-idf actually have hooks so you don't need to go about messing in idf itself if you want to use high-level interrupts in your program.
I'm not entirely 100% sure if raw GPIO reads/writes are always latency-free. If you want to test this, make sure to do a write access to the slow RTC memory before poking a GPIO... if the two go over the same bus, you should see a big delay because the RTC memory only runs at 8MHz or so.
I'm not 100% sure, but I think accesses from the other core and DMA accesses can actually slow down the CPU. If this is an issue: look into HeapAllocCaps, the regions[] array, specifically the 'pool' comments. From what I understand, if you make sure nothing else (DMA/other CPU) is accessing the specific pool your CPU accesses data and instructions from, the CPU should run at 100% speed all the time.
Also: if you can get it to sync up, you may be able to use the RMT peripheral to capture USB packets. I've tried that and I have a low-speed proof-of-concept in pure C that enumerates... well... sometimes. It's not perfect, hence I haven't released it yet and for one, interrupt latency is too high in C for high speed. RMT has its own share of quirks if you want to use it at very high speeds, but it may be worth a look.
You can tie a GPIO to a high-level interrupt. That should jump to that interrupt within a few cycles of the GPIO changing state. (High-prio interrupts are finicky though, you e.g. can't use the stack, but I'm sure you should be able to work with that.) We even have the NMI free, in theory, that should 100% guarantee you interrupt latency. The later versions of esp-idf actually have hooks so you don't need to go about messing in idf itself if you want to use high-level interrupts in your program.
I'm not entirely 100% sure if raw GPIO reads/writes are always latency-free. If you want to test this, make sure to do a write access to the slow RTC memory before poking a GPIO... if the two go over the same bus, you should see a big delay because the RTC memory only runs at 8MHz or so.
I'm not 100% sure, but I think accesses from the other core and DMA accesses can actually slow down the CPU. If this is an issue: look into HeapAllocCaps, the regions[] array, specifically the 'pool' comments. From what I understand, if you make sure nothing else (DMA/other CPU) is accessing the specific pool your CPU accesses data and instructions from, the CPU should run at 100% speed all the time.
Also: if you can get it to sync up, you may be able to use the RMT peripheral to capture USB packets. I've tried that and I have a low-speed proof-of-concept in pure C that enumerates... well... sometimes. It's not perfect, hence I haven't released it yet and for one, interrupt latency is too high in C for high speed. RMT has its own share of quirks if you want to use it at very high speeds, but it may be worth a look.
-
- Posts: 263
- Joined: Sun Jun 19, 2016 12:00 am
Re: ESPUSB32 Full-speed USB Approach
dis_gon_b_gud.gif
Will there be a build log? The questions already taught me stuff.
Will there be a build log? The questions already taught me stuff.
Re: ESPUSB32 Full-speed USB Approach
I've played more with the I2S bus to much success. Found the bugs preventing me using it for TXing. I haven't played much with the RXing, but it looks like I should be able to keep very short DMA chains and it should be able to keep up. Unlike the 8266, when it can't maintain DMA requests it looks like it just ... stops, which I guess is good.
Successes:
(1) Dual-channel, dual-fifo mode is definitely the fastest/best when (ab)using the I2S bus.
(2) You can get exactly 12 MHz off the D2 bus. I don't know why I didn't realize B/A+NUM really, literally means you can use fractional values! Herp derp.
(3) DMA engine works with short chunks, chunked into two-events-per-word. (Ok, I knew this, but just convenient)
There are a few big questions I still have that would be helpful before I start running some tests:
(1) What is the best mechanism for doing a NMI (or high-level interrupt) on pin change event? Even better: if it's on a specific core.
(2) How can I reset the timer on the I2S engine without resetting the engine? Can I pause it in such a way that when I trigger a start condition it actually resets the I2S clock?
(3) Are there special conditions/rules when reading data that could be accessed simultaneously by DMA? I.e. I want to see a variable AS SOON as it becomes available, can set one of the I2S data pins to be inputted from 0x38 and look for when it becomes set to a 1? Or can repeatedly reading data that DMA will access a problem? I can't wait for full chains to complete, since I only have 6.5 time slices to respond!
Successes:
(1) Dual-channel, dual-fifo mode is definitely the fastest/best when (ab)using the I2S bus.
(2) You can get exactly 12 MHz off the D2 bus. I don't know why I didn't realize B/A+NUM really, literally means you can use fractional values! Herp derp.
(3) DMA engine works with short chunks, chunked into two-events-per-word. (Ok, I knew this, but just convenient)
There are a few big questions I still have that would be helpful before I start running some tests:
(1) What is the best mechanism for doing a NMI (or high-level interrupt) on pin change event? Even better: if it's on a specific core.
(2) How can I reset the timer on the I2S engine without resetting the engine? Can I pause it in such a way that when I trigger a start condition it actually resets the I2S clock?
(3) Are there special conditions/rules when reading data that could be accessed simultaneously by DMA? I.e. I want to see a variable AS SOON as it becomes available, can set one of the I2S data pins to be inputted from 0x38 and look for when it becomes set to a 1? Or can repeatedly reading data that DMA will access a problem? I can't wait for full chains to complete, since I only have 6.5 time slices to respond!
Re: ESPUSB32 Full-speed USB Approach
Another question: Is there any way of routing GPIOs to different interrupts? This could be useful for monitoring the digital value of the pins from within an interrupt. I could monitor the INTERRUPT register of the processor. This would minimize wait states.
Re: ESPUSB32 Full-speed USB Approach
I remember this software USB 1.1 for the AVR https://www.obdev.at/products/vusb/index.html maybe the well documented C can be helpful?
Re: ESPUSB32 Full-speed USB Approach
I know that stack inside and out. I've used it a TON. It was my inspiration for when I did ESPUSB (for the ESP8266).
Re: ESPUSB32 Full-speed USB Approach
@Franco : for your and other readers' information, Cnlohr already did his own share of reverse-engineering to implement a working, bit-banged "usb device" on ESP8266 with GPIO code in assembly and clever tricks (but even pushing it to the limits, hardware latencies prevented full-speed usb). A look at https://www.youtube.com/watch?v=-NxoNdTj_7U definitely worth it.
Charles, I hope you will get answers to your questions here, as this is still a good opportunity to better understand low-level stuff and hardware behaviour, and to get some technical insights from Espressif people. Please, when you're ready, let us know of the schedule of an eventual ESPUSB32 livestream (or of where you are likely to announce it.. twitter?).
Looks like you opted for I2S, I don't know the details but it seems I was wrong thinking that the input to output round trip via I2S FIFO's fixed length (32bit X 64 ?) would be a problem for latency... Really looking forwad to learn as many things as with espusb code (many thanks for that already).
But, in order to keep both I2S interfaces available for audio/lcd/tv outputs, I'd be interested in the RMT implementation of usb host for low speed HID devices anyway, so I'd be glad if SpriteTM releases his own usb code soon, even if it's unstable or experimental.
Charles, I hope you will get answers to your questions here, as this is still a good opportunity to better understand low-level stuff and hardware behaviour, and to get some technical insights from Espressif people. Please, when you're ready, let us know of the schedule of an eventual ESPUSB32 livestream (or of where you are likely to announce it.. twitter?).
Looks like you opted for I2S, I don't know the details but it seems I was wrong thinking that the input to output round trip via I2S FIFO's fixed length (32bit X 64 ?) would be a problem for latency... Really looking forwad to learn as many things as with espusb code (many thanks for that already).
But, in order to keep both I2S interfaces available for audio/lcd/tv outputs, I'd be interested in the RMT implementation of usb host for low speed HID devices anyway, so I'd be glad if SpriteTM releases his own usb code soon, even if it's unstable or experimental.
Re: ESPUSB32 Full-speed USB Approach
It "should" be trivial to implement a low-speed device on the ESP32, simply by taking ESPUSB and changing the delay times from 53.3 to 160.0. Though that would take up a core, totally. P.S. I did more tests with signal 11 and I do think it to be an excellent pairing.
For me, though, I want to implement device MIDI since that is available from the web browser. RMDIS ethernet, serial and mass storage. Full-speed opens up so much.
I haven't done any round-trip tests and I am really worried about that. Won't be long before I can find out more about it though.
EVEN IF the I2S engine can't work, "theoretically" doing it in raw software SHOULD be possible, though very difficult. Using the I2S engine should simplify things greatly because you don't have complicated wait states.
I, too wish there were three I2S engines. They are so darn amazing. I am still in awe that fractional fractions of D2 actually work. That's just insane.
For me, though, I want to implement device MIDI since that is available from the web browser. RMDIS ethernet, serial and mass storage. Full-speed opens up so much.
I haven't done any round-trip tests and I am really worried about that. Won't be long before I can find out more about it though.
EVEN IF the I2S engine can't work, "theoretically" doing it in raw software SHOULD be possible, though very difficult. Using the I2S engine should simplify things greatly because you don't have complicated wait states.
I, too wish there were three I2S engines. They are so darn amazing. I am still in awe that fractional fractions of D2 actually work. That's just insane.
-
- Posts: 9713
- Joined: Thu Nov 26, 2015 4:08 am
Re: ESPUSB32 Full-speed USB Approach
Will pass the request for 3 I2S engines on
Wrt the RMT implementation: the version I have is pretty early and I'm not sure if it still compiles with the current SDK. Will see if I can make it compile, maybe even work reliably, and post it.
Wrt the RMT implementation: the version I have is pretty early and I'm not sure if it still compiles with the current SDK. Will see if I can make it compile, maybe even work reliably, and post it.
Who is online
Users browsing this forum: No registered users and 77 guests