Suche innerhalb des Archivs / Search the Archive All words Any words

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[] What's so bad about Total Information Awareness? by Ben Brunk,

Sehr gute Analyse der realen Probleme, die auf so ein System zukommen
würden. Nur ein Satz als Appetizer:
"Another matter that no journalist has touched on, and the one I think
is the biggest nail in TIA's coffin, is the matter of database error are
several orders of magnitude higher than the number of terrorists in the

Zur bisherigen Berichterstattung zum TIAO vgl. 

-------- Original Message --------
Betreff: FC: What's so bad about Total Information Awareness? by Ben
Datum: Mon, 09 Dec 2002 23:57:16 -0500
Von: Declan McCullagh <declan -!
- well -
Rückantwort: declan -!
- well -
An: politech -!
- politechbot -


Date: Mon, 09 Dec 2002 22:34:13 -0500
From: Ben Brunk <brunkb -!
- ils -
 unc -
To: declan -!
- well -
Subject: Debunking TIA


I'm in the middle of writing a dissertation relating to online privacy,
I have been completely sidetracked by the recent discussion over the
Information Awareness program authorized by the Homeland Security bill
just passed into law.  All I've seen so far are a lot of reactionary 
editorials written by people who haven't put an ounce of effort into 
analyzing the proposed system.  They seem infatuated with the TIA logo,
slogan, and Poindexter.  I have read, with avid fascination, all the
predictions and scary stories about a new Big Brother system spearheaded
a felon who managed to avoid accountability.  What I have yet to see is
rational analysis of the idea itself from someone who knows something
computers, databases and statistics. I hope to fill in that gap as best
can, though I'm sure there are experts out there with even better 
background in the appropriate research fields.

 From what I have been able to find out about the TIA program, it is 
supposed to be a massive computerized dragnet that culls information
dozens of different sources and is intended to locate potential
so that government agents can scrutinize them more closely.  This system 
will draw data from sources such as credit reports, bank records,
reservation systems, police records, gun purchase records, and many

Many of these sources of information are private databases owned and 
maintained by the corporations that rely on them.  Even if they were all 
implemented in say, Oracle, it would be difficult to match up records to 
any reliable degree.  Who knows if the John Poindexter in one database
the same as Jon Pointdexter in another?  The social security number,
is apparently the holy grail of database keys, is not necessarily going
help since many of these companies did not collect it or use it as a
Name and address might make a good cross referencing key, but people
all the time, and I get three catalogs from a company that I purchased 
items from three times-even their internal database is not sophisticated 
enough to detect slight differences in spacing or my apartment number
a '#' instead of  'apt' or 'apartment'.  This is just inside one 
organization; we're not even trying to connect any dots yet.  It will be 
easier to match records kept by the government, especially if they
SSNs and fingerprints.  However, errors in government databases are well 
documented (although not readily admitted to). Those systems contain
numbers of errors, and even when errors are located and fixed, they have
nasty tendency of recurring when data is shared or re-shared.  If you
an error in your Experian credit report, but not TRW, often times, the 
Experian error will reappear.  Many people play this sort of "whack a
game for years.

Another matter that no journalist has touched on, and the one I think is 
the biggest nail in TIA's coffin, is the matter of database error are 
several orders of magnitude higher than the number of terrorists in the 
world.  All databases contain errors.  Data culled from multiple, 
heterogeneous sources is going to have lots of errors.  I don't have 
current estimates on the average expected error rate in a database, but 
let's suppose it is 5%.  That means that in any given database, 95% of
data is right and 5% of it is junk.  Garbage in, garbage out.  Errors
as misspellings, flipped bits, juxtaposed numbers, and transaction
that never took place or were unintentionally duplicated or omitted. 
percent isn't a big deal until you look at it on the scale of what TIA
proposing.  There are approximately 300 million people in the United 
States.  Those 300 million people are very busy consumers, and their
trail is enormous.  There are trillions of transaction records, log 
entries, and records that TIA would have to amass, standardize, and then 
examine.  Even if the government buys all the necessary computing power
the very best staff, the government can't do anything about randomness.
5% expected error rate is the monkey wrench in the works.  5% of 300 
million is 15,000,000.  Multiply that number by however many data points 
will be looked at.  Say 500 data points for each person.  Now we are 
looking at 300 million times 500, or 150,000,000,000 data points. 5% of 
that number leaves us with 7,500,000,000.  Seven and one half billion
points if they want to look at every American.  Worse, this is not a 
one-time scan.  For any hope of success, they would have to look 
longitudinally.  That is, every year, month, day, hour, whatever.  Some 
indications of terrorism are very subtle:  People who plan terror don't 
just run out and buy their entire list of bomb making ingredients in one 
day and then book a flight.  Terrorists are slow and methodical.  They
over months and years.  So what we're looking at here is 7.5 billion
points examined day in and day out for years and years.  With a 5% error 
rate, the number of false positives is outrageous, no matter what
technique used (and any analysis technique will have its own error
There is not enough manpower in the entire federal government to
track down every lead generated, even if much of that work is automated. 
With each passing day, homeland security will drown a little more in a 
hopeless pile of randomly generated false leads that grow even on
and holidays.

Let's suppose there are 1,000 terrorists hiding out in the USA, waiting
strike, which I personally think is a greatly exaggerated number.  We
from the actions taken on 9/11 that these people are fairly cunning. 
know how to hide from the system and how to hide in plain sight.  They
in cash, or they buy what they need by proxy, and they don't act any 
different than anyone else.  Like the millions of illegal immigrants in
US, terrorist operatives are good at using social networks to "fly below 
the radar" and subvert the system.  One thousand people is a lot, but
out of 300 million is 3.33 * 10^-6, or .000033%.  In other words, TIA
be looking for a miniscule fraction of 1% of the population in their 
database, the exact people who are going out of their way to escape 
detection.  With an error rate of even 1%, detecting such a tiny
would be impossible.  You would not be able to separate the signal from
noise, no matter what techniques were used.  Pollsters run into this 
problem every election season when the 'margin of error' rises to a
greater than the projected differential between the candidates.  3%
of error in a race where the candidates differ by 1% is "too close to 
call."  The same problem exists for scanning all airport baggage, but
is fodder for another day.  The only way TIA would work is if some high 
percentage of Americans were terrorists-20%, 50%, whatever.  Only then 
could there be enough comparison data in both sets to draw testable 
conclusions from and be assured that those conclusions were not just
error phenomena.

Let's look at this on a much smaller scale:  Suppose the system worked
enough each day to render a list of 10,000 people, one (1) of which is
actual terrorist (unbelievably good odds for the government).  The 
government has a .0001% probability of successfully picking the
each day (using this system alone).  Could the FBI/CIA/NSA/whatever even 
investigate 10,000 people with other techniques carefully enough each
to locate the one terrorist?  Could they do it in a month or a year?  I 
suppose the government could err on the side of caution and detain large 
numbers of people, place them in custody, and hold them indefinitely 
without due process until certain that they weren't terrorists.  But
action presents nightmarish logistical and humanitarian prospects.  The
prison population is bursting at the seams with an all time high of two 
million.  There would have to be enormous concentration camps for the 
millions of suspected terrorists who would be detained until their 
innocence is proven.  That begs the question:  Is it even possible to
you are innocent in the current legal climate?  The Red Scare (and the
recent FBI watch lists) has already taught us the folly of black lists
unsubstantiated accusations.

Lastly, data mining as a useful technique has been thoroughly debunked. 
never lived up to its promises.  This is why you don't hear much about
mining in the CS and IS literature these days; what of it that is left
morphed into the more esoteric "knowledge management" or KD.  Like AI,
turned out to be quite a bit more difficult to do than expected and has 
been largely abandoned.  Had anyone in the government actually bothered
read any of the literature, they would already know this.

All in all, I can't see how TIA will do anything except harm innocent 
people and create new jobs for bureaucrats.  Any numerate person who
five minutes thinking about what is proposed will come to the same 
conclusion.  If our system is going to become this arbitrary, there are 
going to be an awful lot of lives ruined in this country.  I fail to see 
how the TIA approach could do anything positive for the war on terror or 
for America in general.  It will eat up resources better spent on more 
proven and acceptable approaches.  In fact, such a data-drive approach 
might actually be more successful if it simply took a random sampling of 
the population each day.

My hope is that this editorial will awaken those who are even more
in computer science, statistics, game theory, etc. and that they find
courage to speak up so we can put the brakes on the wasteful and 
destructive blind alley called TIA.

Benjamin Brunk

POLITECH -- Declan McCullagh's politics and technology mailing list
You may redistribute this message freely if you include this notice.
To subscribe to Politech:
This message is archived at
Declan McCullagh's photographs are at
Like Politech? Make a donation here:
Recent CNET articles:

Liste verlassen: 
Mail an infowar -
 de-request -!
- infopeace -
 de mit "unsubscribe" im Text.