Recently in Networking Category


August 7, 2012

Yesterday, Microsoft published their CU-RTC-Web WebRTC API proposal as an alternative to the existing W3C WebRTC API being implemented in Chrome and Firefox. Microsoft's proposal is a "low-level API" proposal which basically exposes a bunch of media- and transport-level primitives to the JavaScript Web application, which is expected to stitch them together into a complete calling system. By contrast to the current "mid-level" API, the Microsoft API moves a lot of complexity from the browser to the JavaScript but the authors argue that this makes it more powerful and flexible. I don't find these arguments that convincing, however: a lot of them seem fairly abstract and rhetorical and when we get down to concrete use cases, the examples Microsoft gives seem like things that could easily be done within the existing framework. So, while it's clear that the Microsoft proposal is a lot more work for the application developer; it's a lot less clear that it's sufficiently more powerful to justify that additional complexity.

Microsoft's arguments for the superiority of this API fall into three major categories:

  • JSEP doesn't match with "key Web tenets"; i.e., it doesn't match the Web/HTML5 style.
  • It allows the development of applications that would otherwise be difficult to develop with the existing W3C API.
  • It will be easier to make it interoperate with existing VoIP endpoints.

Like any all-new design, this API has the significant advantage (which the authors don't mention) of architectural cleanliness. The existing API is a compromise between a number of different architectural notions and like any hybrid proposals has points of ugliness where those proposals come into contact with each other (especially in the area of SDP.) However, when we actually look at functionality rather than elegance, the advantages of an all-new design---not only one which is largely not based on preexisting technologies but one which involves discarding most of the existing work on WebRTC itself---start to look fairly thin.

Looking at the three claims listed above: the first seems more rhetorical than factual. It's certainly true that in the early years of the Web designers strove to keep state out of the Web browser, but that hasn't been the case with rich Web applications for quite some time. To the contrary, many modern HTML5 technologies (localstore, WebSockets, HSTS, WebGL) are about pushing state onto the browser from the server.

The interoperability argument is similarly weakly supported. Given that JSEP is based on existing VoIP technologies, it seems likely that it is easier to make it interoperate with existing endpoints since it's not first necessary to implement those technologies (principally SDP) in JavaScript before you can even try to interoperate. The idea here seems to be that it will be easier to accomodate existing noncompliant endpoints if you can adapt your Web application on the fly, but given the significant entry barrier to interoperating at all, this seems like an argument that needs rather more support than MS has currently offered.

Finally, with regard to the question of the flexibility/JavaScript complexity tradeoff, it's somewhat distressing that the specific applications that Microsoft cites (baby monitoring, security cameras, etc.) are so pedestrian and easily handled by JSEP. This isn't of course to say that there aren't applications which we can't currently envision which JSEP would handle badly, but it rather undercuts this argument if the only examples you cite in support of a new design are those which are easily handled by the old one.

None of this is to say that CU-RTC-Web wouldn't be better in some respects than JSEP. Obviously, any design has tradeoffs and as I said above, it's always appealing to throw all that annoying legacy stuff away and start fresh. However, that also comes with a lot of costs and before we consider that we really need to have a far better picture of what benefits other than elegance starting over would bring to the table.

More or less everyone agrees about the basic objectives of the WebRTC effort: to bring real-time communications (i.e., audio, video, and direct data) to browsers. Specifically, the idea is that Web applications should be able to use these capabilities directly. This sort of functionality was of course already available either via generic plugins such as Flash or via specific plugins such as Google Talk, but the idea here was to have a standardized API that was built into browsers.

In spite of this agreement about objectives, from the beginning there was debate about the style of API that was appropriate, and in particular how much of the complexity should be in the browser and how much in the JavaScript The initial proposals broke down into two main flavors:

  • High-level APIs — essentially a softphone in the browser. The Web application would request the creation of a call (perhaps with some settings as to what kinds of media it wanted) and then each browser would emit standardized signaling messages which the Web application would arrange to transit to the other browser. The original WHATWG HTML5/PeerConnection spec was of this type.
  • Low-level APIs — an API which exposed a bunch of primitive media and transport capabilities to the JavaScript. A browser that implemented this sort of API couldn't really do much by itself. Instead, you would need to write something like a softphone in JavaScript, including implementing the media negotiation, all the signaling state machinery, etc. Matthew Kaufman from Microsoft was one of the primary proponents of this design.

After a lot of debate, the WG ultimately rejected both of these and settled on a protocol called JavaScript Session Establishment Protocol (JSEP), which is probably best described as a mid-level API. That design, embodied in the current specifications [], keeps the transport establishment and media negotiation in the browser but moves a fair amount of the session establishment state machine into the JavaScript. While it doesn't standardize signaling, it also has a natural mapping to a simple signaling protocol as well as to SIP and Jingle, the two dominant standardized calling protocols. The idea is supposed to be that it's simple to write a basic application (indeed, a large number of such simple demonstration apps have been written) but that it's also possible to exercise advanced features by manipulating the various data structures emitted by the browser. This is obviously something of a compromise between the first two classes of proposals.

The decision to follow this trajectory was made somewhere around six months ago and at this point Google has a fairly mature JSEP implementation available in Chrome Canary while Mozilla has a less mature implementation which you could compile yourself but hasn't been released in any public build.

Yesterday, Microsoft made a new proposal, called CU-RTC-Web. See the blog post and the specification.

Below is an initial, high-level analysis of this proposal.

Disclaimer: I have been heavily involved with both the IETF and W3C working groups in this area and have contributed significant chunks of code to both the Chrome and Firefox implementations. I am also currently consulting for Mozilla on their implementation. However, the comments here are my own and don't necessarily represent those of any other organization.

What Microsoft is proposing is effectively a straight low level API.

There are a lot of different API points, and I don't plan to discuss the API in much detail, but it's helpful to talk about the API some to get a flavor of what's required to use it.

  • RealTimeMediaStream -- each RealTimeMediaStream represents a single flow of media (i.e., audio or video).
  • RealTimeMediaDescription -- a set of parameters for the RealTimeMediaStream.
  • RealTimeTransport -- a transport channel which a RealTimeMediaStream can run over.
  • RealTimePort -- a transport endpoint which can be paired with a RealTimePort on the other side to form a RealTimeTransport.

In order to set up an audio, video, or audio-video session, then, the JS has to do something like the following:

  1. Acquire local media streams on each browser via the getUserMedia() API, thus getting some set of MediaStreamTracks.
  2. Create RealTimePorts on each browser for all the local network addresses as well as for whatever media relays are available/ required.
  3. Communicate the coordinates for the RealTimePorts from each browser to the other.
  4. On each browser, run ICE connectivity checks for all combinations of remote and local RealTimePorts.
  5. Select a subset of the working remote/local RealTimePort pairs and establish RealTimeTransports based on those pairs. (This might be one or might be more than one depending on the number of media flows, level of multiplexing, and the level of redundancy required).
  6. Determine a common set of media capabilities and codecs between each browser, select a specific set of media parameters, and create matching RealTimeMediaDescriptions on each browser based on those parameters.
  7. Create RealTimeMediaStreams by combining RealTimeTransports, RealTimeMediaDescriptions, and MediaStreamTracks.
  8. Attach the remote RealTimeMediaStreams to some local display method (such as an audio or video tag).

For comparison, in JSEP you would do something like:

  1. Acquire local media streams on each browser via the getUserMedia() API, thus getting some set of MediaStreamTracks.
  2. Create a PeerConnection() and call AddStream() for each of the local streams.
  3. Create an offer on one brower send it to the other side, create an answer on the other side and send it back to the offering browser. In the simplest case, this just involves making some API calls with no arguments and passing the results to the other side.
  4. The PeerConnection fires callbacks announcing remote media streams which you attach to some local display method.

As should be clear, the CU-RTC-Web proposal requires significantly more complex JavaScript, and in particular requires that JavaScript to be a lot smarter about what it's doing. In a JSEP-style API, the Web programmer can be pretty ignorant about things like codecs and transport protocols, unless he wants to do something fancy, but with CU-RTC-Web, he needs to understand a lot of stuff to make things work at all. In some ways, this is a much better fit for the traditional Web approach of having simple default behaviors which fit a lot of cases but which can then be customized, albeit in ways that are somewtimes a bit clunky.

Note that it's not like this complexity doesn't exist in JSEP, it's just been pushed into the browser so that the user doesn't have to see it. As discussed below, Microsoft's argument is that this simplicity in the JavaScript comes at a price in terms of flexibility and robustness, and that libraries will be developed (think jQuery) to give the average Web programmer a simple experience, so that they won't have to accept a lot of complexity themselves. However, since those libraries don't exist, it seems kind of unclear how well that's going to work.

Microsoft's proposal and the associated blog post makes a number of major arguments for why it is a superior choice (the proposal just came out today so there haven't really been any public arguments for why it's worse). Combining the blog posts, you would get something like this:

  • That the current specification violates "fit with key web tenets", specifically that it's not stateless and that you can only make changes when in specific states. Also, that it depends on the SDP offer/answer model.
  • That it doesn't allow a "customizable response to changing network quality".
  • That it doesn't support "real-world interoperability" with existing equipment.
  • That it's too tied to specific media formats and codecs.
  • That JSEP requires a Web application to do some frankly inconvenient stuff if it wants to do something that the API doesn't have explicit support for.
  • That it's inflexible and/or brittle with respect to new applications and in particular that it's difficult to implement some specific "innovative" applications with JSEP.
Below we examine each of these arguments in turn.

MS writes:

Honoring key Web tenets-The Web favors stateless interactions which do not saddle either party of a data exchange with the responsibility to remember what the other did or expects. Doing otherwise is a recipe for extreme brittleness in implementations; it also raises considerably the development cost which reduces the reach of the standard itself.

This sounds rhetorically good, but I'm not sure how accurate it is. First, the idea that the Web is "stateless" feels fairly anachronistic in an era where more and more state is migrating from the server. To pick two examples, WebSockets involves forming a fairly long-term stateful two-way channel between the browser and the server, and localstore/localdb allow the server to persist data semi-permanently on the browser. Indeed, CU-RTC-Web requires forming a nontrivial amount of state on the browser in the form of the RealTimePorts, which represent actual resource reservations that cannot be reliably reconstructed if (for instance) the page reloads. I think the idea here is supposed to be that this is "soft state", in that it can be kept on the server and just reimposed on the browser at refresh time, but as the RealTimePorts example shows, it's not clear that this is the case. Similar comments apply to the state of the audio and video devices which are inherently controlled by the browser.

Moreover, it's never been true that neither party in the data exchange was "saddled" with remembering what the other did; rather, it used to be the case that most state sat on the server, and indeed, that's where the CU-RTC-Web proposal keeps it. This is the first time we have really built a Web-based peer-to-peer app. Pretty much all previous applications have been client-server applications, so it's hard to know what idioms are appropriate in a peer-to-peer case.

I'm a little puzzled by the argument about "development cost"; there are two kinds of development cost here: that to browser implementors and that to Web application programmers. The MS proposal puts more of that cost on Web programmers whereas JSEP puts more of the cost on browser implementors. One would ordinarily think that as long as the standard wasn't too difficult for browser implementors to develop at all, then pushing complexity away from Web programmers would tend to increase the reach of the standard. One could of course argue that this standard is too complicated for browser implementors to implement at all, but the existing state of Google and Mozilla's implementations would seem to belie that claim.

Finally, given that the original WHATWG draft had even more state in the browser (as noted above, it was basically a high-level API), it's a little odd to hear that Ian Hickson is out of touch with the "key Web tenets".

The CU-RTC-Web proposal writes:

Real time media applications have to run on networks with a wide range of capabilities varying in terms of bandwidth, latency, and noise. Likewise these characteristics can change while an application is running. Developers should be able to control how the user experience adapts to fluctuations in communication quality. For example, when communication quality degrades, the developer may prefer to favor the video channel, favor the audio channel, or suspend the app until acceptable quality is restored. An effective protocol and API will have to arm developers with the tools to tailor such answers to the exact needs of the moment, while minimizing the complexity of the resulting API surface.

It's certainly true that it's desirable to be able to respond to changing network conditions, but it's a lot less clear that the CU-RTC-Web API actually offers a useful response to such changes. In general, the browser is going to know a lot more about the bandwidth/quality tradeoff of a given codec is going to be than most JavaScript applications will, and so it seems at least plausible that you're going to do better with a small number of policies (audio is more important than video, video is more important than audio, etc.) than you would by having the JS try to make fine-grained decisions about what it wants to do. It's worth noting that the actual "customizable" policies that are proposed here seem pretty simple. The idea seems to be not that you would impose policy on the browser but rather that since you need to implement all the negotiation logic anyway, you get to implement whatever policy you want.

Moroever, there's a real concern that this sort of adaptation will have to happen in two places: as MS points out, this kind of network variability is really common and so applications have to handle it. Unless you want to force every JS calling application in the universe to include adaptation logic, the browser will need some (potentially configurable and/or disableable) logic. It's worth asking whether whatever logic you would write in JS is really going to be enough better to justify this design.

In their blog post today, MS writes about JSEP:

it shows no signs of offering real world interoperability with existing VoIP phones, and mobile phones, from behind firewalls and across routers and instead focuses on video communication between web browsers under ideal conditions. It does not allow an application to control how media is transmitted on the network.

I wish this argument had been elaborated more, since it seems like CU-RTC-Web is less focused on interoperability, not more. In particular, since JSEP is based on existing technologies such as SDP and ICE, it's relatively easy to build Web applications which gateway JSEP to SIP or Jingle signaling (indeed, relatively simple prototypes of these already exist). By contrast, gatewaying CU-RTC-Web signaling to either of these protocols would require developing an entire SDP stack, which is precisely the piece that the MS guys are implicitly arguing is expensive.

Based on Matthew Kaufman's mailing list postings, his concern seems to be that there are existing endpoints which don't implement some of the specifications required by WebRTC (principally ICE, which is used to set up the network transport channels) correctly, and that it will be easier to interoperate with them if your ICE implementation is written in JavaScript and downloaded by the application rather than in C++ and baked into the browser. This isn't a crazy theory, but I think there are serious open questions about whether it is correct. The basic problem is that it's actually quite hard to write a good ICE stack (though easy to write a bad one). The browser vendors have the resources to do a good job here, but it's less clear that random JS toolkits that people download will actually do that good a job (especially if they are simultaneously trying to compensate for broken legacy equipment). The result of having everyone write their own ICE stack might be good but it might also lead to a landscape where cross-Web application interop is basically impossible (or where there are islands of noninteroperable de facto standards based on popular toolkits or even popular toolkit versions).

A lot of people's instincts here seem to be based on an environment where updating the software on people's machines was hard but updating one's Web site was easy. But for about half of the population of browsers (Chrome and Firefox) do rapid auto-updates, so they actually are generally fairly modern. By contrast, Web applications often use downrev version of their JS libraries (I wish I had survey data here but it's easy to see just by opening up a JS debugger on you favorite sites). It's not at all clear that the JS is easy to upgrade/native is hard dynamic holds up any more.

The proposal says:

A successful standard cannot be tied to individual codecs, data formats or scenarios. They may soon be supplanted by newer versions, which would make such a tightly coupled standard obsolete just as quickly. The right approach is instead to to support multiple media formats and to bring the bulk of the logic to the application layer, enabling developers to innovate.

I can't make much sense of this at all. JSEP, like the standards that it is based on, is agnostic about the media formats and codecs that are used. There's certainly nothing in JSEP that requires you to use VP8 for your video codec, Opus for your audio codec, or anything else. Rather, two conformant JSEP implementations will converge on a common subset of interoperable formats. This should happen automatically without Web application intervention.

Arguably, in fact, CU-RTC-Web is *more* tied to a given codec because the codec negotiation logic is implemented either on the server or in the JavaScript. If a browser adds support for a new codec, the Web application needs to detect that and somehow know how to prioritize it against existing known codecs. By contrast, when the browser manufacturer adds a new codec, he knows how it performs compared to existing codecs and can adjust his negotiation algorithms accordingly. Moreover, as discussed below, JSEP provides (somewhat clumsy) mechanisms for the user to override the browser's default choices. These mechanisms could probably be made better within the JSEP architecture.

Based on Matthew Kaufman's interview with Janko Rogers [], it seems like this may actually be about the proposal to have a mandatory to implement video codec (the leading candidates seem to be H.264 or VP8). Obviously, there have been a lot of arguments about whether such a mandatory codec is required (the standard argument in favor of it is that then you know that any two implementations have at least one codec in common), but this isn't really a matter of "tightly coupling" the codec to the standard. To the contrary, if we mandated VP8 today and then next week decided to mandate H.264 it would be a one-line change in the specification. In any case, this doesn't seem like a structural argument about JSEP versus CU-RTC-Web. Indeed, if IETF and W3C decided to ditch JSEP and go with CU-RTC-Web, it seems likely that this wouldn't affect the question of mandatory codecs at all.

Probably the strongest point that the MS authors make is that if the API doesn't explicitly support doing something, the situation is kind of gross:

In particular, the negotiation model of the API relies on the SDP offer/answer model, which forces applications to parse and generate SDP in order to effect a change in browser behavior. An application is forced to only perform certain changes when the browser is in specific states, which further constrains options and increases complexity. Furthermore, the set of permitted transformations to SDP are constrained in non-obvious and undiscoverable ways, forcing applications to resort to trial-and-error and/or browser-specific code. All of this added complexity is an unnecessary burden on applications with little or no benefit in return.

What this is about is that in JSEP you call CreateOffer() on a PeerConnection in order to get an SDP offer. This doesn't actually change the PeerConnection state to accomodate the new offer; instead, you call SetLocalDescription() to install the offer. This gives the Web application the opportunity to apply its own preferences by editing the offer. For instance, it might delete a line containing a codec that it didn't want to use. Obviously, this requires a lot of knowledge of SDP in the application, which is irritating to say the least, for the reasons in the quote above.

The major mitigating factor is that the W3C/IETF WG members intend to allow most common manipulations to made through explicit settings parameters, so that only really advanced applications need to know anything about SDP at all. Obviously opinions vary about how good a job they have done, and of course it's possible to write libraries that would make this sort of manipulation easier. It's worth noting that there has been some discussion of extending the W3C APIs to have an explicit API for manipulating SDP objects rather than just editing the string versions (perhaps by borrowing some of the primitives in CU-RTC-Web). Such a change would make some things easier while not really representing a fundamental change to the JSEP model. However, it's not clear if there are enough SDP-editing tasks to make this project worthwhile.

With that said, that in order to have CU-RTC-Web interoperate with existing SIP endpoints at all you would need to know far more about SDP than would be required to do most anticipated transformations in a JSEP environment, so it's not like CU-RTC-Web frees you from SDP if you care about interoperability with existing equipment.

Finally, the MSFT authors argue that CU-RTC-Web is more flexible and/or less brittle than JSEP:

On the other hand, implementing innovative, real-world applications like security consoles, audio streaming services or baby monitoring through this API would be unwieldy, assuming it could be made to work at all. A Web RTC standard must equip developers with the ability to implement all scenarios, even those we haven't thought of.

Obviously the last sentence is true, but the first sentence provides scant support for the claim that CU-RTC-Web fulfills this requirement better than JSEP. The particular applications cited here, namely audio streaming, security consoles, and baby monitoring, seem not only doable with JSEP, but straightforward. In particular, security consoles and baby monitoring just look like one way audio and/or video calls from some camera somewhere. This seems like a trivial subset of the most basic JSEP functionality. Audio streaming is, if anything, even easier. Audio streaming from servers already exists without any WebRTC functionality at all, in the form of the audio tag, and audio streaming from client to server can be achieved with the combination of getUserMedia and WebSockets. Even if you decided that you wanted to use UDP rather than WebSockets, audio streaming is just a one-way audio call, so it's hard to see that this is a problem.

In e-mail to the W3C WebRTC mailing list, Matthew Kaufman mentions the use case of handling page reload:

An example would be recovery from call setup in the face of a browser page reload... a case where the state of the browser must be reinitialized, leading to edge cases where it becomes impossible with JSEP for a developer to write Javascript that behaves properly in all cases (because without an offer one cannot generate an answer, and once an offer has been generated one must not generate another offer until the first offer has been answered, but in either case there is no longer sufficient information as to how to proceed).

This use case, often called "rehydration" has been studied a fair bit and it's not entirely clear that there is a convenient solution with JSEP. However, the problem isn't the offer/answer state, which is actually easily handled, but rather the ICE and cryptographic state, which are just as troublesome with CU-RTC-Web as they are with JSEP [for a variety of technical reasons, you can't just reuse the previous settings here.] So, while rehydration is an issue, it's not clear that CU-RTC-Web makes matters any easier.

This argument, which should be the strongest of MS's arguments, feels rather like the weakest. Given how much effort has already gone into JSEP, both in terms of standards and implementation, if we're going to replace it with something else that something else should do something that JSEP can't, not just have a more attractive API. If MS can't come up with any use cases that JSEP can't accomplish, and if in fact the use cases they list are arguably more convenient with JSEP than with CU-RTC-Web, then that seems like a fairly strong argument that we should stick with JSEP, not one that we should replace it.

What I'd like to see Microsoft do here is describe some applications that are really a lot easier with CU-RTC-Web than they are with JSEP. Depending on the details, this might be a more or less convincing argument, but without some examples, it's pretty hard to see what considerations other than aesthetic would drive us towards CU-RTC-Web.

Thanks to Cullen Jennings, Randell Jesup, Maire Reavy, and Tim Terriberry for early comments on this draft.


March 11, 2011

I've complained before about Farhad Manjoo's shallow analysis of the social implications of technical decisions, which seems to begin and end with what would be convenient for him. His latest post is an argument against anonymous comments on Internet forums/message boards/etc. Manjoo writes:

I can't speak for my bosses, who might feel differently than I do. But as a writer, my answer is no-I don't want anonymous commenters. Everyone who works online knows that there's a direct correlation between the hurdles a site puts up in front of potential commenters and the number and quality of the comments it receives. The harder a site makes it for someone to post a comment, the fewer comments it gets, and those comments are generally better.

I can appreciate how Manjoo might feel like that. No doubt as a writer it's annoying to get anonymous people telling you that you suck (and much as I find Manjoo's writing annoying, I'm forced to admit that even good writing gets that sort of reaction from time to time). However, this claim simply isn't true—or at least isn't supported by any evidence I know of—to the contrary, the Slate comments section (which Manjoo endorses later in his article) isn't really that great and one of the most highly regarded blog comment sections, Obsidian Wings is almost completely anonymous (though moderated), with the only barrier to posting being a CAPTCHA. Similarly, some of the most entertaining pure-comments sites such as Fark only require e-mail confirm, which, as Manjoo admits, is virtually anonymous. I don't really know everything that makes a good comments section work, but it's a lot more complicated than just requiring people to use their real names.

I think Slate's commenting requirements-and those of many other sites-aren't stringent enough. Slate lets people log in with accounts from Google and Yahoo, which are essentially anonymous; if you want to be a jerk in Slate's comments, create a Google account and knock yourself out. If I ruled the Web, I'd change this. I'd make all commenters log in with Facebook or some equivalent third-party site, meaning they'd have to reveal their real names to say something in a public forum. Facebook has just revamped its third-party commenting "plug-in," making it easier for sites to outsource their commenting system to Facebook. Dozens of sites-including, most prominently, the blog TechCrunch-recently switched over to the Facebook system. Their results are encouraging: At TechCrunch, the movement to require real names has significantly reduced the number of trolls who tar the site with stupid comments.

This is an odd claim since Facebook actually makes no real attempt to verify your full name. Like most sites, they just verify that there is some e-mail addres that you can respond at. It's not even clear how Facebook would go about verifying people's real names. Obviously, they could prune out people who claim to be Alan Smithee, (though consider this) but the world is full of real John Smiths, so why shouldn't I be another one of them?

What's my beef with anonymity? For one thing, several social science studies have shown that when people know their identities are secret (whether offline or online), they behave much worse than they otherwise would have. Formally, this has been called the "online disinhibition effect," but in 2004, the Web comic Penny Arcade coined a much better name: The Greater Internet Fuckwad Theory. If you give a normal person anonymity and an audience, this theory posits, you turn him into a total fuckwad. Proof can be found in the comments section on YouTube, in multiplayer Xbox games, and under nearly every politics story on the Web. With so many fuckwads everywhere, sometimes it's hard to understand how anyone gets anything out of the Web.

I don't disagree that this is to some extent true, though I would observe that (a) the link Manjoo points to doesn't actually contain any studies as far as I can tell, just an article oriented towards the lay public and (b) it's not clear to what extent people's bad online behavior is a result of anonymity. Some of the most vicious behavior I've seen online has been on mailing lists where people's real-world identities (and employers!) are well-known and in some cases the participants actually know each other personally and are polite face-to-face.

As I said above, I don't think anyone really knows exactly what makes a good online community (though see here for some thoughts on it by others), but my intuition is that it's less an issue of anonymity than of getting the initial culture right, in a way that it resists trolling, flamewars, etc., or at least has a way to contain them. In comments sections that work, when someone shows up and starts trolling (even where this is easy and anonymous), the posters mostly ignore it and the moderators deal with it swiftly, so it never gets out of hand. Once the heat gets above some critical point on a regular basis, though, these social controls break down and it takes a really big hammer to get things back under control. It's not clear to me that knowing people's real names has much of an impact on any of that.


January 22, 2011

Slate has published another Farhad Manjoo screed against unlimited Internet service.
And say hooray, too, because unlimited data plans deserve to die. Letting everyone use the Internet as often as they like for no extra charge is unfair to all but the data-hoggiest among usand it's not even that great for those people, either. Why is it unfair? For one thing, unlimited plans are more expensive than pay-as-you-go plans for most people. That's because a carrier has to set the price of an unlimited plan high enough to make money from the few people who use the Internet like there's no tomorrow. But most of us aren't such heavy users. AT&T says that 65 percent of its smartphone customers consume less than 200 MB of broadband per month and 98 percent use less than 2 GB. This means that if AT&T offered only a $30 unlimited iPhone plan (as it once did, and as Verizon will soon do), the 65 percent of customers who can get by with a $15 planto say nothing of the 98 percent who'd be fine on the $25 planwould be overpaying.

This seems extremely confused. First, it's generally true that whenever a business offers a limited number of product offerings with each at a fixed price that some people overpay because they only want some cheaper offering that the company doesn't provide. For instance, when I bought my last car, Audi insisted on selling me the "winter sports package" (heated seats and a ski bag). Now, I don't do a lot of skiing and I didn't want either but thats the way the thing came. Now by Manjoo's logic, it was unfair that I had to pay more for a ski bag I would never use (the heated seats are great, by the way) but that's just the way the product comes. Sure, I'd rather the company offered exactly the package I wanted but a limited number of offerings is just a standard feature of capitalism.

Its worth observing that there's nothing special about the "unlimited" plan in Manjoo's logic (It's not really unlimited anyway, since the network has some finite amount of bandwidth available so that provides a hard upper limit on how much data you can transfer in a month; it's just that that limit is really high.) Say Verizon offered only a 2GB plan, would he be whining that he only used 200 MB of bandwidth and so he was being made to overpay so Verizon can make money on the 2GB-using bandwidth hogs? So, this objection is pretty hard to take seriously.

Manjoo goes on:

But it's not just that unlimited plans raise prices. They also ruin service. Imagine what would happen to your town's power grid if everyone paid a flat rate for electricity: You and your neighbors would set your thermostats really high in the winter and low in the summer, you'd keep your pool heated year-round, you'd switch to plug-in electric cars, and you'd never consider replacing your ancient, energy-hogging appliances. As a result, you'd suffer frequent brownouts, you'd curse your power company, and you'd all wish for a better way. Economists call this a tragedy of the commons, and it can happen on data networks just as easily as the power grid--faced with no marginal cost, it's in everyone's interest to use as much of the service as they can. When that happens, the network goes down for everyone.

So, first this is just wrong: it's actually reasonably common for utilities to be included in people's leases and yet when that happens people don't automatically switch to plug-in cars or start up home aluminum refineries. That isn't to say that at having to pay for each watt of power doesn't have some impact on your consumption, but there is only so much power that it's really convenient for people to use; it's not like power being free causes consumption to spin off into infinity. To take another example, it's absolutely standard for local voice telephony service to be sold flat rate and yet practically nobody leaves their phone line tied up 24x7 just in case they want to say something to Mom and don't feel like taking the trouble to dial the phone. (Full disclosure, I actually have used dialup internet as a replacement for a leased line this way, but that's a pretty rare use case.)

The second problem with this claim is that computer networks don't behave the way the electrical grid does in the face of contention. Like the electrical grid, computer networks are sized for a certain capacity, but unlike the grid, computers aren't built with the assumption that that capacity is effectively infinite. If the electrical grid in your area is operating at full capacity, and you turn on your AC, this can cause a brownout because there is no way for the power company to tell everyone to use 1% less power and even if there was, many of the devices in question are just designed to operate in a way where they draw constant power. By contrast, computer network protocols are already designed to operate in conditions where they can't use as much bandwidth as they would like because non-infinite bandwidth is a basic feature of the system. Even if there is no contention for the network, applications need to work behind a variety of connection types so people who build applications typically build them to automatically adapt to how much throughput they are actually getting. For instance, Netflix has adaptive streaming which means that it tries to detect how fast your network is and if it's slow it compresses the media harder to reduce the amount of data to send. What this means is that unlike the electrical grid where your computer may just crash if it doesn't get enough power, if the network suddenly gets slower, performance degrades relatively smoothly.

The second thing you need to know is that in data networks congestion is (almost) the only thing that matters. If nobody else is trying to use the network right now then it's fairly harmless if you decide to consume all the available capacity. What's important is that when other people do want to use the network you back off to give them room. So, to the extent to which there is a scarce resource it's not total download capacity but rather use of the network at times when it's actually congested. To a great extent network protocols (especially TCP) already do attempt to back off in the face of congestion but there's also nothing stopping the provider from deliberately imposing balance on you (cf. fair queueing). In either case, this is a relatively orthogonal issue to the volume of data transferred; a cap on total transfer is an extremely crude proxy for the kind of externality Manjoo is talking about. Not only is it crude, it's inefficient: it discourages use of the network which would be cost-free for others and of value to the customer using the network.

All this stuff has of course been hashed out endlessly in the networking economics literature and the above is only the barest sketch. Suffice to say that just applying this sort of naive "tragedy of the commons" analysis doesn't really get you very far.


September 14, 2009

David Coursey complains about how long it took IEEE to develop 802.11n:
802.11n is the poster child for a standards process gone wrong. Seven years after it began and at least two years after 802.11 "draft" devices arrived, the IEEE has finally adopted a final standard for faster, stronger, more secure wireless.

Ideally, standards arrive before the products that implement them. However, the IEEE process moved so slowly that vendors adopted a draft standard and started manufacturing hardware. After a few little glitches, the hardware became compatible and many of us have--for years--been running multivendor 802.11n networks despite the lack of an approved standard.


If standards bodies expect to be taken seriously, they need to do their work in reasonable periods. Releasing a "final" standard long after customer adoption has begun is not only anti-climatic but undercuts the value of the standards making process.

In this case, the process failed. The IEEE should either improve its process or get out of the way and left industry leaders just create de facto standards as they see fit. That is not preferable, but if the IEEE process is stuck, it will be what happens.

My experience with IEEE standards making is limited, but I have extensive experience with IETF's process, and I'm a little puzzled as to what Coursey thinks the problem is here. Developing standards is like developing any other technical artifact: you start out with an idea, do some initial prototypes, test those prototypes, modify the design in response to the testing, and iterate till you're satisfied. Now, in the case of a protocol standard, the artifact is the document that defines how implementations are supposed to behave, and the testing phase, at least in part, is implementors building systems that (nominally) conform the the spec and seeing how well they work, whether they interoperate, etc. With any complicated system, this process needs to include building systems which will be used by end-users and seeing how they function in the field. If you don't do this, you end up with systems which only work in the lab.

There's not too much you can do to avoid going through these steps; it's just really hard to build workable systems without a certain level of testing. Of course, that still leaves you with the question of when you call the document done. Roughly speaking, there are two strategies: you can stamp the document "standard" before it's seen any real deployment and then churn out a revision a few years later in response to your deployment experience. Alternately, you can go through a series of drafts, refining them in response to experience, until eventually you just publish a finished standard, but it's based on what people have been using for years. An intermediate possibility is to have different maturity levels. For instance, IETF has "proposed standards", "draft standards", and then "standards". This doesn't work that well in practice: it takes so long to develop each revision that many important protocols never make it past "proposed standard." In all three cases, you go through mostly the same system development process, you just label the documents differently.

With that in mind, it's not clear to me that IEEE has done anything wrong here: if they decided to take the second approach and publish a really polished document and 802.11n is indeed nice and polished and the new document won't need a revision for 5+ years, then this seems like a fairly successful effort. I should hasten to add that I don't know that this is true: 802.11n could be totally broken. However, the facts that Coursey presents sound like pretty normal standards development.


September 13, 2009

One of the results of Joe Wilson (R-South Carolina) calling President Obama a liar on national TV was that money started pouring in, both to Wilson and his likely opponent in 2010 (Rob Miller). Piryx, who hosts Wilson's site, claims that on Friday and Saturday they were then subject to a 10 hour DoS attack against their systems:
Yesterday (Friday) around 3:12pm CST we noticed the bandwidth spike on the downstream connections to server collocation facility. Our bandwidth and packet rate threshold monitors went off and we saw both traditional DOS bandwidth based attacks as well as very high packet rate, low bandwidth ICMP floods all destined for our IP address.

...At this point we have spent 40+ man hours, with 10 external techs fully monopolized in researching and mitigating this attack.

To give a sense of scale, the attacks were sending us 500+ Mbps of traffic, which would run about $147,500 per month in bandwidth overages.

I think most people would agree that technical attacks on candidates Web sites, donation systems, etc. aren't good for democracy—just as it would be bad if candidates were regularly assassinated—and it would be good if they didn't happen. While there are technical countermeasures against, DoS, they're expensive and only really work well if you have a site with a lot of capacity so that you can absorb the attack, which isn't necessarily something that every HSP has.

This may turn out to be a bad idea, but it occurred to me that one way to deal with this kind of attack might be for the federal government to simply run its own HSP, dedicated solely to hosting sites for candidates and to accepting payments on their behalf. Such a site could be large enough—though compared to big service providers, comparatively small—to resist most DoS attacks. Also, to the extent to which everyone ran their candidate sites there, it would remove the differential effect of DoS attacks: sure you can DoS the site, but you're damaging your own preferred candidate as much as the opposition. Obviously, this doesn't help if the event that precipitates the surge of donations massively favors one side, but in this case, at least, both sides saw a surge. I don't know if this is universally true though.

Of course, this would put the site operator (either the feds or whoever they outsourced it to) in a position to know who donated to which candidate, but in many cases this must be disclosed anyway, and presumably if the operation was outsourced, one could put a firewall in to keep the information not subject to disclosure away from the feds.


July 26, 2009

Hovav Shacham just alerted me to an Internet emergency: AT&T is blocking 4chan. I don't know any more than you, but I think it's probably time to upgrade to threatcon orange.

July 24, 2009

Ed Felten writes about the economic forces that drive cloud computing, arguing that a prime driver is the desire to reduce administrative costs:
Why, then, are we moving into the cloud? The key issue is the cost of management. Thus far we focused only on computing resources such as storage, computation, and data transfer; but the cost of managing all of this -- making sure the right software version is installed, that data is backed up, that spam filters are updated, and so on -- is a significant part of the picture. Indeed, as the cost of computing resources, on both client and server sides, continues to fall rapidly, management becomes a bigger and bigger fraction of the total cost. And so we move toward an approach that minimizes management cost, even if that approach is relatively wasteful of computing resources. The key is not that we're moving computation from client to server, but that we're moving management to the server, where a team of experts can manage matters for many users.

This certainly is true to an extent and it's one of the driving factors behind all sorts of outsourced hosting. Educated Guesswork, for instance, is hosted on Dreamhost, in large part because I didn't want the hassle of maintaining yet another public Internet-accessible server. I'm not sure I would call this "cloud computing", though, except retroactively.

That said, the term "cloud computing" covers a lot of ground (see the Wikipedia article), and I don't think Felten's argument holds up as well when we look at examples that look less like outsourced applications. Consider, for example Amazon's Elastic Compute Cluster (EC2). EC2 lets you rapidly spin up a large number of identical servers on Amazon's hardware and bring them up and down as required to service your load. Now, there is a substantial amount of management overhead reduction at the hardware level in that you don't need to contract for Internet, power, HVAC, etc., but since you're running a virtualized machine, you still have all the software management issues Ed mentions, and they're somewhat worse since you have to work within Amazon's infrastructure (see here for some complaining about this). Much of the benefit of an EC2-type solution is extreme resource flexibility: if you have a sudden load spike, you don't need to quickly roll out a bunch of new hardware, you just bring up some EC2 instances. When the spike goes away, you shut them down.

A related benefit is that this reduces resource consumption via a crude form of stochastic multiplexing: if EC2 is running a large number of Web sites, they're probably not all experiencing spikes at the same time, so the total amount of spare capacity required in the system is a lot smaller.

Both of these benefits apply as well to applications in the cloud (for instance, Ed's Gmail example). If you run your own mail server, it's idle almost all the time. On the other hand, if you use Gmail (or even a hosted service), then you are sharing that resource with a whole bunch of different people and so Amazon just needs enough capacity to service the projected aggregate usage of all those people, most of whom aren't using the system very hard (what, you thought that Amazon really had 8G of disk for each user?). At the end of the day, I suspect that the management cost Ed sites is the dominant issue here, though, which, I suppose argues that lumping outsourced applications ("software as a service") together with outsourced/virtualized hardware as "cloud computing" isn't really that helpful.


April 3, 2009

You may or may not have seen this article (Bill here courtesy of Lauren Weinstein; þ Joe Hall):
Key lawmakers are pushing to dramatically escalate U.S. defenses against cyberattacks, crafting proposals that would empower the government to set and enforce security standards for private industry for the first time.

OK, I'm going to stop you right there. I spend a large fraction of my time with computer security people and I don't think I've ever heard any of them use the term "cybersecurity", "cyberattacks", or pretty much "cyber-anything", except for when they're making fun of govspeak like this. Next they'll be talking about setting up speed traps on the Information Superhighway. Anyway, moving on...

The Rockefeller-Snowe measure would create the Office of the National Cybersecurity Adviser, whose leader would report directly to the president and would coordinate defense efforts across government agencies. It would require the National Institute of Standards and Technology to establish "measurable and auditable cybersecurity standards" that would apply to private companies as well as the government. It also would require licensing and certification of cybersecurity professionals.

So, it's sort of credible that NIST would generate some computer security standards. They've already done quite a few, especially in cryptography and communications security, with, I think it's fair to say, pretty mixed results. Some of their standards, especially the cryptographic ones like DES, AES, and SHA-1 have turned out OK, but as you start to move up the stack towards protocols and especially systems, the standards seem increasingly overconstrained and poorly matched to the kinds of practices that people actually engage in. In particular, there have been several attempts by USG to write standards about systems security (e.g., Common Criteria, and Rainbow Books) I think it's fair to say that uptake in the private sector has been minimal at best. Even more limited efforts like FIPS-140 (targeted at cryptographic systems) are widely seen as incredibly onerous and a hoop that developers have to jump through, rather than a best practice that they actually believe in.

I haven't gone through the bill completely, but check out this fun bit:

(4) SOFTWARE CONFIGURATION SPECIFICATION LANGUAGE.--The Institute shall, establish standard computer-readable language for completely specifying the configuration of software on computer systems widely used in the Federal government, by government contractors and grantees, and in private sector owned critical infrastructure information systems and networks.

I don't really know what this means but it sounds pretty hard. Even UNIX systems, which are extremely text-oriented, don't have what you'd call a standard computer readable configuration language. More like 10 such languages, I guess. I'm definitely looking forward to hearing about NIST's efforts to standardize

The licensing and certification clause seems even sillier. There are plenty of professional security certifications you can get, but most people I know view them as more a form of rent seeking by the people who run the certifying classes than as a meaningful credential. I don't know of anyone that I know has one of these certifications. I'm just imaginine the day when we're told Bruce Schneier and Ed Felten aren't allowed to work on critical infrastructure systems because they're not certified.

More as I read through the actual document.


March 24, 2009

Leslie Daigle just summed up the situation with IPv6 at today's ISOC IPv6 press event: "It's [IPv6] sort of a broccoli technology; good for you but not necessarily attractive in its own right."

UPDATE: Corrected the quote a bit. Thanks to Greg Lebovitz for the correction.


February 25, 2009

I've got some code that needs to convert an IP address into a string. This is one of those cases where there's a twisty maze of APIs, all slightly different. The traditional API here is:

    char *
    inet_ntoa(struct in_addr in);

inet_ntoa() has two deficiencies, one important and one trivial: it doesn't support IPv6 and it returns a pointer to a statically allocated buffer, so it's not thread safe (I'll let you figure out which is which). Luckily, there's another API: addr2ascii():

    char *
    addr2ascii(int af, const void *addrp, int len, char *buf);

If you pass buf=0, addr2ascii() will return a pointer to a static buffer like inet_ntoa(). However, if you pass it an allocated buffer it will return the result in buf. Unfortunately, if you actually try to use addr2ascii() in threaded code you will quickly discover something unpleasant, at least on FreeBSD: you occasionally get the result "[inet_ntoa error]" or some fraction thereof. The answer is hidden in the EXAMPLES section of the man page:

In actuality, this cannot be done because addr2ascii() and ascii2addr() are implemented in terms of the inet(3) functions, rather than the other way around.

More specifically, on FreeBSD, it looks like this:

    case AF_INET:
        if (len != sizeof(struct in_addr)) {
	    errno = ENAMETOOLONG;
            return 0;
        strcpy(buf, inet_ntoa(*(const struct in_addr *)addrp));

In other words, even though addr2ascii() doesn't explicitly use a static buffer, since it depends on inet_ntoa() it's still not thread safe. In order to get thread safety, you need to use yet another API:

    const char *
    inet_ntop(int af, const void *restrict src, char *restrict dst,
        socklen_t size);


UPDATE: Clarified that this is a problem on FreeBSD. I don't know if it's an issue on all other platforms. Linux, for instance, doesn't have addr2ascii()
UPDATE2: Trivial vs. important.