Search News from Limbo

Monday, August 12, 2013

Available data hints
NSA'S daily net message take
is on the order of 350 million
The NSA released data which permit a ball park estimate of how many internet messages it sifts and how many it examines. The NSA statement does not give percentages that would permit an estimate of the number of domestic messages picked up by the NSA. But some 8700 messages of U.S. residents are being examined daily without warrants, according to a very rough calculation based on NSA data.

We follow in the footsteps of Enrico Fermi and try to arrive at reasonable estimates based on incomplete data. First, we have the agency's admission that its robots select messages that, altogether, carry 0.016 x 1.826 x 10^15 bytes of internet information daily (about 30 trillion bytes daily). I was able to find an estimate of the average number of bytes per email message put at 75 x 10^3 bytes. Further, I found an estimate of the average number of bytes per web page put at 10^6 bytes. We are taking this second average to include Facebook posts of foreigners and some Americans.

Let's guess that a typical instant message in one direction is one or two sentences, and split the difference. A good rule of thumb is 10 words per sentence at 5 characters per word (plus spaces, which we can neglect). Or about 1500 characters per message (conversations being composed of a sequence of messages). One character is worth one byte.

We have 0.016 x 1.826 x 10^15 bytes = 2.9 x 10^13 bytes scanned by the NSA. From here out, in order to keep our estimates kosher, we'll just stick with orders of magnitude. The proportions of types of messages sifted by the NSA we estimate -- reasonably but not certainly -- at 70% emails (averaging 10^5 bytes each), 20% web pages (10^6 bytes each) and 0.05% chat and instant messaging messages (10^4 bytes each). Of course, it's possible to shift these percentages about a bit and we'll get different results, but what we have is plausible.

Plugging in the numbers, we estimate that NSA robots are flagging on the order of 350 million messages a day with some 875,000 messages a day examined. If 1 % of the inspected messages involve U.S. residents, we get 8700 messages per day being inspected without warrants.

We might be off by 100 million or so for the bulk scanning, but we are very likely in the right range. Similarly, our estimate of messages of Americans inspected without warrants might be as low as 2,000 or so. But it seems like a good bet that thousands of messages of Americans are being subjected to warrantless examination daily. And, in light of the high probability -- recently pointed out in a New York Times report -- that virtually all internet messages to and from the United States appear to be going through an NSA sieve, the figures seem plausible enough.

This puts in context the agency's attempt to minimize its surveillance by showing that it only sieves a tiny amount of internet traffic.

The "scope and scale of NSA collection," can be seen, the agency said, by this:

"According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of
information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However,
of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA
part in a million. Put another way, if a standard basketball court represented the global
collection would be represented by an area smaller than a
dime on that basketball court."

However, in talking about scale, the statement omits the collection of telephone data.

No comments:

Post a Comment