Yes, many TCP connections end in RSTs

| Comments (1) | Networking
George Ou made this argument at the FCC En Banc hearing at Stanford on 4/25 (A/V here).
It's actually quite common throughout the world that TCP RSTs are used.

...

Speaking of the 1:45 AM resets, ISPs all over the world, they've found that up to 12% of sessions get reset, all over the world. It's almost like there's this 12% of background noise of TCP resets that are happening that may not be coming from comcast but could be coming from any device on the Internet, all routers, all firewalls support that feature and we don't really know where it's coming from.

Here's ATT's response to Vuze's claim that they use RSTs for "network management purposes" (i.e., terminating connections they don't like):

In response to your specific question about AT&T's network management practices, AT&T does not use "false reset messages" to manage its network. We agree with Vuze that the use of the Vuze Plug-In to measure network traffic has numerous limitations and deficiencies, and does not demonstrate whether any particular network providers or their customers are using TCP Reset messages for network management purposes. Given that Vuze itself has recognized these problems with the measurements generated by its Plug-In, we believe that Vuze should not have published these misleading measurements, nor filed them with the FCC. Moreover, as Vuze and others have acknowledged, TCP resets are generated for many reasons wholly unrelated to the network management practices of broadband network providers, which explains why resets may appear on networks of companies such as AT&T who do not use TCP resets for network management (see, e.g., An Analysis ofTCP Reset Behaviour on the Internet, University of Calgary (2004)).

I've reviewed the paper by Arlitt and Williamson to which AT&T is referring (Ou didn't cite his sources), and while it's interesting work, I don't think it really speaks to Ou's argument. The RSTs that Arlitt and Williamson are talking about are primarily ungraceful terminations of TCP connections that would be ending anyway. The authors suggest a number of cases here:

  • Servers aggressively closing connections after short idle times, but the client already has a request in flight and the server responds with an RST.
  • Clients responding to FINs from the server with an RST. The reasons for this are a bit unclear.
  • Servers closing connections with RSTs.
  • Connections to servers which aren't listening on a given port and so are rejecting it.

In all of these cases but the last, though, the Web transactions are actually over, so while there may be some negative effects from not going through the correct TCP finish handshake (cf. RFC 1337), neither side perceives this as failed transactions. And in the final case, the server explicitly is rejecting the connection, so this seems appropriate as well. It's also fairly straightforward to distinguish these cases as a passive observer (as the authors have done) with the appropriate tools.

What Comcast has done, however, is something different: they were (are?) using RSTs to abort other people's transactions. The base rate of normal RSTs isn't really that useful for assessing the appropriateness of third party RSTs as a network traffic management technique. As a hypothetical, imagine that Comcast were forging FINs instead of RSTs. One could expand Ou's argument to say "FINs are a natural feature of the Internet", but it doesn't really follow that it's desirable to have third parties forging FINs on your connections.

It does bear, on the other hand, on what we can infer from Vuze's data. Vuze hasn't really published that many details of their methodology, but they claim to be measuring the total number of RSTs, not just those of Azureus/Bittorrent connections (Incidentally, I'm not sure how I'd feel as a user about installing some app that sniffed all the traffic on my network and sent statistics to Vuze)[-- see update below; EKR]:

The Vuze Plug\u2010In constantly monitors the rate of network interruptions occurring from RST ("reset") packets by measuring the total number of attempted network connections and the total number of network connections that were interrupted by a reset message. By comparing these two values, one can calculate the ratio of network connections interrupted by reset messages. We have chosen to reflect the median ratio in order to reduce variability in the data given the sample size.

The Plug\u2010In collects data for all Internet connections, not just connections occurring due to use of the Vuze application, and logs it every ten minutes Then, at the top of the hour, the Plug\u2010In aggregates the data into one\u2010hour blocks and transmits it to Vuze, Inc.. By definition, each source of data had the Vuze application installed and launched in order connections.

But if you're measuring all RSTs and not attempting to determine which ones are "normal" and which ones represent connection failure, then it's not clear how representative your data is. It is sort of interesting how much variation (about an order of magnitude from 2.5% to 24%) there is in terms the rate of RSTs, but as Iljitsch van Beijnum observes, this could be the result of caching proxies and the like in the network. You may not particularly want your ISP interposing a proxy, but that's a different question than whether they're actually blocking your P2P traffic.

This isn't the only possible reason, either. For instance, users might just have different software profiles. Given that Vuze claims to have 8000 users on 1200 ASs (with the data being reported for ASs with greater than 20 users, there could well just be a lot of statistical variation. Some evidence of this is that the results from Comcast alone span from 14% to 24%). In order to really make sense of data like Vuze's we'd need to try to distinguish normal RSTs from those injected in the network, which requires more forensics (TTL inspection, IP ID, etc.) than Vuze's paper describes.

UPDATE (8:49 AM): I was wrong about this needing to be a packet sniffer. I just read the source (here; thanks to Danny McPherson for pointing out that I could download it.) They're just using netstat to read the network statistics and grabbing the reset counter out of the results. On the other hand, this means that they're not even in principle able to differentiate between RSTs generated on Azureus connections and those on other connections or between those generated by some man in the middle or by endpoints. While the variation in reported RSTs remains interesting, you'd need a significantly more advanced tool than this to really diagnose what's going on.

1 Comments

I was at the PAM 2008 conference this week and one of the papers had statistics on connections ending in RSTs - If I remember correctly, the paper is " Trends and Differences in Connection Behavior within Classes of Internet Backbone Traffic" and the presenter was Wolfgang John. The paper isn't online yet, but the author might send you a copy if you're interested in the results.

Leave a comment