USB protocol error checking

Moderators: TomKerekes, dynomotion

Post Reply
SJHardy
Posts: 46
Joined: Thu Oct 03, 2019 12:36 am

USB protocol error checking

Post by SJHardy » Thu Oct 03, 2019 5:07 am

Hi Tom,

We occasionally encounter bad USB cables, which can cause random weirdness. Of course, the ideal is to have perfect connections, but it is hard to achieve in practice: we find cables with 10 ohm impedance in the shield connection, USB connectors that don't reliably contact the shield, and so on.

How can we make this rock-solid?

The problem we are encountering is that when there is a bad connection, there is no checksum in the pc->kflop protocol, so nonsensical data gets through to the software. This is very hard to deal with, since users think it is bugs in our software and the tech support becomes a nightmare. What we would like to do is reliably recognize when there are errors in the USB link, so that we can gracefully shut down the software and notify the user.

We are using Linux and have our own KMotionServer, but it is based on the Windows code. Maybe the FTDI library for Linux just sucks, but you may have some suggestions (being more familiar with the FTDI chip).

It would be most useful if we could somehow validate the main status message, since that is critical to our software function. This comes from the kflop as a big bunch of hex which we parse into numbers. If it gets truncated we can reject that message since the length is checked, but we are at its mercy if it decides to flip a few bits here and there. It really would be better if the FTDI library returned an error code, but it doesn't do that, and just seems to keep going (in a fashion) after mangling the data. I always thought USB would have some sort of checksum, but evidently it doesn't.

My impression is that the comms are really fragile. With Ethernet if you get a packet you know it's good, so it degrades gracefully with noise. But serial protocol over USB does not have robust error detection. Relying on CRLF at the end of a message gives no redundancy.

At any cost, we wish to detect comms errors reliably, or make the USB link electrically perfect. Our only option at this point seems to be to mount a Raspberry Pi etc. right near the kflop, and use Ethernet to connect the PC and R.Pi. This will make the USB perfect (or nearly so). But it's a PITA to have yet another device to program and fit in the cabinet.

Regards,
SJH

User avatar
TomKerekes
Posts: 2891
Joined: Mon Dec 04, 2017 1:49 am

Re: USB protocol error checking

Post by TomKerekes » Thu Oct 03, 2019 5:37 pm

Hi SJH,

As far as I understand USB has CRC checks and resending of packets similar to Ethernet. Here is one article. Also this article. Also the USB2.0 spec section 8.7 described error retries. Under Windows I have never experienced corrupted data. Un-correctable noise always seems to generate an error and any error should cause a disconnect.

Are you certain that you have received data with one or more bit errors? Is your code halting on any single disconnect? Or is it trying to automatically re-connect? Is it checking everywhere if an error has occurred?

Maybe this is a Linux only issue? Pär Hansson describes the open source driver being better than FTDI's.

2. Install libftdi

Skip this step if you will be using ftd2xx driver. However libftdi works a lot better on linux libftdi is an open source ftdi driver that might be used as replacement when running on Linux or MacOSX

sudo apt-get install libftdi-dev


Which are you using?

There might be an issue with the port of KMotionServer to Linux?

I suppose we could add an additional CRC to the status record if you think that is absolutely necessary.

We have also found many USB cables that don't meet shielding specifications. See the wiki here.
Regards,

Tom Kerekes
Dynomotion, Inc.

SJHardy
Posts: 46
Joined: Thu Oct 03, 2019 12:36 am

Re: USB protocol error checking

Post by SJHardy » Thu Oct 03, 2019 9:13 pm

We are using libftdi. I have made a lot of enhancements to KMotionServer since Par Hansson's initial port from windows, so from what you say the problem probably lies in there. One of the enhancements is that the server itself autonomously polls for status and sends it to the PC application via a separate channel whenever anything significant changes. This allowed us to write the application to be more event driven, but it does seem that it is not handling USB errors properly. I have made a hacked up cable that I can inject noise onto and trigger the errors at will, so with some additional logging I might get some insight. Will post back if anything found.

SJHardy
Posts: 46
Joined: Thu Oct 03, 2019 12:36 am

Re: USB protocol error checking

Post by SJHardy » Sun Oct 06, 2019 2:31 am

Some progress on this: we have improved reliability quite a lot by checking that the first word of the status response is the expected value (it is constant). We also had a few issues with parsing the response when it wasn't quite as expected - that's my bad - although when there are USB errors the entire response gets some data deleted, so it appears shifted down by 4 bytes etc. That should not happen if the USB driver is working correctly, but I don't have enough expertise to debug at that level.

So now if I inject noise onto the USB cable it detects errors and recovers (or closes the connection) which is satisfactory for us. We need to test a bit more, but I think for now it is good enough to go.

BTW, in case anyone is interested, the way we inject noise is to split the shield of the USB cable, then put an AC signal onto the two halves of the shield. One of the 4 signal wires (not split) is 'ground' and is effectively connected to the shield by a 10 ohm impedance (or so). The AC noise signal needs to be high frequency. We use 2VAC 60Hz from a transformer, and drag the connection so it sparks when we want to test it. Sparking (arcing) generates a lot of nasty HF energy, which reliably disrupts the USB connection in all sorts of unpleasant ways. Don't go much above 2V otherwise you might smoke something.

Regards,
SJH

User avatar
TomKerekes
Posts: 2891
Joined: Mon Dec 04, 2017 1:49 am

Re: USB protocol error checking

Post by TomKerekes » Sun Oct 06, 2019 5:14 pm

Hi SJH,

Glad you are making progress.

although when there are USB errors the entire response gets some data deleted, so it appears shifted down by 4 bytes etc. That should not happen if the USB driver is working correctly, but I don't have enough expertise to debug at that level.
Again I believe the data should be correct unless there is an error returned so I don't understand this.

Here is a thought. Whenever there is an error a call to Failed() should be made. This causes the KMotionServer to call:

int CKMotionIO::Failed()

which closes the USB Driver and marks the Board as disconnected. When the Board re-appears and is re-connected then a process occurs which flushes the buffers of any partial data on both the PC side and KFLOP sides. If this isn't always happening then leftover previous data could cause things to be shifted.
Regards,

Tom Kerekes
Dynomotion, Inc.

Post Reply