Itâ€™s making the news now and really should have come as no surprise if you have been following the news. Itâ€™s a reducto ad absurdum of the Administrationâ€™s â€œcreativeâ€ interpretation of the Constitution.
If you parse the up-to-date denials and spin, you will see they’re not denying it. The only claim is that the profiles are anonymous. Their interpretation of the law seems to be as long as its not tied to your social security number, no warrant is needed and the 4th is not violated. (There is this separation in their minds between â€œunreasonable search and seisureâ€ and â€œprobable causeâ€ along the lines of â€œthese are terrorists, this is war, it is not unreasonable to search these records without warrants in times of war.)
Iâ€™m not interesting in talking about the legality of that. Instead I want to think about two things: 1) Why the outrage now? 2) Even if you take the most restrictive definition of what that database does, how useful is it?
Why the outrage?
The interesting thing is why this has jumped from the front page of USA Today to the top of Google without more than a few hours passing. After all, theyâ€™ve been doing this with airplane records, international phone calls, and nearly everything else.
Iâ€™d have to say it is not because it is â€œdomesticâ€ vs. â€œinternationalâ€ it is because the database is 200 million Americans strong. Thatâ€™s 2/3 of the United States. It is reasonable to assume you are on the list, that you have been tracked.
(Personally, the database is probably much larger and can cover pretty much every American with a phone. You only need AT&T, Verizon, or Bell South (the sources of the leak) as one endpoint in the conversation before the profile can be built based on the caller ID on the other side. But this detail seems to have skipped the news reporters.)
Just like nobody cares about how many Iraqiâ€™s have died in the war. Just like McNamara has to put what we did to Japan in terms of which equivalent cities in the United States. Just like Bushâ€™s approval is strongly tied to the price at the gas pump instead of the hundreds of billions spent, tens of thousands injured, and thousands of Americans who have died in this far-off adventure. No longer in America does anyone care, unless it is about them. The 80â€™s greed morphed in to the 90â€™s rationalization of it to create a 00â€™s apathy: me = capitalist = capitalism = free markets = efficiency = beats the Commies = good.
You tell them, â€œThe NSA has a database of every phone call you made and every e-mail you sent, and every website you visitâ€ Then this gets people worked up.
What a bunch of selfish fuckers we are.
Reconfiguring our DLLsâ€¦
The interesting thing is what you can do with it if you had this information even if you assume the most restrictive part of what the government is saying and take the most conservative estimate of what they do with it.
Let’s assume all you do is track end to end phone numbers (not internet packet records) and calling records (times and amounts). Letâ€™s assume itâ€™s only for the 200 million people who are customers of Verizon, AT&T (SBC), and BellSouth.
This is what is claimed by Bush defenders who dismiss any outrage over this as coming from â€œprivacy advocates and Democrats.â€ The money quote comes from the business magazine Forbes who spins it: â€œThe paper said the NSA wasn’t wiretapping the calls and listening to the content, but was compiling extensive lists of who called who, and when they called them.â€
Letâ€™s ignore the obvious business intelligence aspect of â€œGee it would nice to know what other companies are vying for this business with that company. I wonder if knowing access to the phone call records of that business would help?â€ Letâ€™s look at it from purely like a business, shall we?
Even if it was anonymous and all you had were the call records, you could build a detailed and useful profile of where that phone number stands politically, who is connected to whom, and which political arguments would be most effective. Given NSA’s computing power and talent, that shouldn’t be too much of a stretch. Weâ€™re talking about almost half the countryâ€™s computing power and the top graduates of Americaâ€™s best math, science, and engineering schools here. (The former head of the NSA was from my school and they recruit from it heavily from itâ€”that school is Caltech.)
Let us look at how internet business databases are structured and compare that to the NSA one. I will consider advertising and search giant (Google), the â€œyet another social networking servicesâ€ (MySpace, Facebook, LinkedIn, Friendster), and a contact management service (Plaxo). And see how they compare.
Google is building exactly such a profile to combat click-fraud, but itâ€™s not a very reliable one. While it covers your searching and browsing habits, it only covers it during the times you search on Google or the web site you visit uses AdSense and you donâ€™t delete your cookies and you eventually log into Google service like GMail or GTalk. Else the profile canâ€™t be attached. That’s pretty limited given that nobody uses Google services beyond search.
Google makes so much money all the latest tech news cycle could talk about was Microsoft vs. Google. (It is important to remember that Microsoft is a monopolist convicted of levering their monopoly, so thatâ€™s a pretty high aspiration.). Unlike Microsoft, they make all their money off only a single product: advertising. And click fraud is the chink in Googleâ€™s advertising armor. This is why Google rationalizes their â€œDonâ€™t be evilâ€ core value with a more relevant one â€œorganize the worldâ€™s information.â€
Friendster, MySpace, LinkedIn, and FaceBook have connection profiles uniquely tied to the user. But theyâ€™re all in tiny verticals. For instance, LinkedIn does well among technology professionals and consultants, but not far beyond that. FaceBook expands instantly into any college in the country, but the security model doesnâ€™t work as well for high schools and it doesnâ€™t scale for connected companies that range from individuals to corporations (The security model is tied to the domain of your e-mail address. They â€œscale outâ€ by adding machines based on that domain. Colleges are relatively isolated communities so this works well, but it doesnâ€™t â€œscale up.â€)
And yet this is the best example of what you can do with the database. I might mention that MySpace was bought by NewsCorp for over half a billion dollars, that Friendster makes a lot of money, and LinkedIn and FaceBook are profitable. All based on advertising into those connected networks.
This means Plaxo has a nice niche (they serve you, not the advertisers) which allows them to be a Web 2.0 company (work with or as a plug-in to other web services and sites, instead of achieving all their value by locking out competitors from their network). But itâ€™s not a database in the same sense as the NSA database.
(Disclaimer: I work for Plaxo. All opinions are my ownâ€¦ yadda yadda.)
Note that all these networks are relatively small, incomplete, and do not handle changes well.
â€¦and the winnah isâ€¦
Iâ€™m calling it. The winner is the NSA.
That’s an amazing database. I’m really jealous. It’s like Google meets MySpace meets Plaxo only for every American.
We can reasonably assume that Sprint and the others have been roped into something similar. If they haven’t you can still build the profile if only one endpoint is belongs to it. So the size of this database is not 200 million users, but everyone with a phone or internet connection.
The connection profile is really complete because they can use phone number or IP to phone number (SBC DSL) as the UID to build the profile. So this covers anyone you have called, e-mailed or site you’ve browsed to as the inputs to build your profile.
It is consistent because the company does the work for the NSA. Heck you can change phone providers but your phone number stays the same. Sure there is some double counting (say you have two phones), but statistics solves that one and your mobile phone network, your home phone network, and your work phone network are not walled off from each otherâ€”your contacts are spread non-exclusively among these phones.
Their database contains the most number of users, the most complete set of connections, and is consistent.
We havenâ€™t even examined the database in terms of say, which pr0n sites you have visited, or if you ever made a disparaging comment on George Bush. We donâ€™t really need to. Your network connections have revealed the profile better than this. Thatâ€™s how MySpace and the others make money. We add to it your actual habits (a la Google) and the NSAâ€™s â€œjourney to the Dark Side will be complete.â€
After all, do you really think the NSA cares about COPA?
One of the interesting thing about business is how one business practice and solutions are mappable to another. Mathematicians call this â€œhomomorphism.â€
Say you have a problem like “I want to show related products” you build a database based on “SKU” numbers of the products, “who bought what” and “how many”. You then write a statistical algorithm to determine relevance. Thatâ€™s how Amazon shows you recommended products.
Call “SKU,” â€œwebsitesâ€, “who bought what” is now “who links whom” and “how many” is â€œhow many links.â€ You have the basics of how Google delivers relevant search results.
Take that same thing, and then say “SKU” is strings in your e-mail headers or body, “who bought what” is “marking e-mails with that string as spam” and “how many” is each piece of mail with that string. You have a junk mail filter.
We can say that the current business solutions to â€œdetermine related productsâ€, â€œfind relevant websitesâ€ and â€œfilter junk mailâ€ are â€œhomomorphic.â€
â€œGotta do evilâ€¦â€
Now take the same thing. Your â€œproductâ€ is a product like the latest Harry Potter movie. Your goal is to sell your product to people who will actually buy it without wasting resources on those who wonâ€™t (a certain demographic of children and fantasy lovers). You nail down the demographics of a few people in the network (both Harry Potter lovers and haters) and then you use this database figure out which people might be interesting in the Harry Potter movie, even if you donâ€™t know they are. (The network tells you they are.)
This is how FaceBook, Friendster, and MySpace make money.
Now you do the same thing where your â€œproductâ€ is â€œclick fraudersâ€ and your goal is to nail them. So you use your network, nail down a few points and then identify potential statistical abberations: â€œThat person is most likely a child, all of a sudden theyâ€™re clicking on Google Adwords for asbestos lawyers.â€
That is why Google is building this database.
These databases are â€œhomomorphicâ€ with one another.
Even the most conservative estimate would have to admit NSAâ€™s database is the best advertising database in the world. Besides mundane stuff like what Google and YASNSs do with it, what can you do with it?
Letâ€™s build some more homomorphisms:
Letâ€™s start it innocently. Say your goal is to find out who the terrorists are. Your â€œproductâ€ is â€œpeople who might be influenced to become terrorists or aid terroristsâ€, your goal then is to use this database to figure out how they might â€œsell this productâ€. You nail down the known terrorists and use this network to how that influence might â€œsellâ€ and you have a map of â€œthe terrorist network.â€
Say your â€œproductâ€ is â€œTell people we plan on building a wall and deporting all â€˜the Mexicansâ€™â€™.â€ Your goal is to get them to â€œbuy your productâ€ (â€œgo vote for you to make up the reality that the majority doesnâ€™t like youâ€). And you have the NSAâ€™s â€œadvertising databaseâ€â€¦
You want to test a talking point or a new frame? Why bother polling? You can inject the right frames to the right people to influence them without much fallout from those people to â€œthe terroristsâ€ (liberals).
Shorter Terry Chay
You don’t need to know anything about the person in the traditional sense. Who you are connected to (your network) reveals your personal preferences much better than anything else. Businesses today make a lot of money advertising onto database networks like this, but business databases are small, incomplete, and unreliable. There are a lot of (political) issues out there like that, like a business, are mappable to â€œadvertising a product for purchase.â€