When people sometimes ask me why I am not on facebook, or some of the other social networking sites (despite the title, this test is general) I usually joke it away saying something like "oh, I know way too much computer science for that"; just implying that, well, I don't like to share so much information about my self in one space, and to one closed company.
Most people sympathize with that, even if they do not agree, and some have thought in a similar way, or at least read something in a news paper on how hard it is to get deleted from facebook, et c. So, what I am saying now does probably not make sense to you: It does not matter, it is too late, if facebook wanted they would know a lot about you!
In fact how they could do this is something very, very fascinating once you start thinking about it! I will explain!
But first, let me say that I still won't join. Call it out of principle…
So, let me explain what I mean. Let us use facebook as our example, because people know them. (Although any social networking site above a critical size - that is with enough users - will do.) Then take a person not on facebook, me for instance. I have never registered, and not submitted any information.
However, one day a friend searches for me. Empty result, assuming my name is unique (which it is, more or less). Story ends there probably. No result, my friend does something else, facebook's memory of the search is cleared.
However imagine instead that the facebook software instead made a table of all unmatched searches, and who made them. The first time we may say that do not know anything, it could just be a misspelled name. However, say that more people searched. After a while enough people have typed in my name and come up empty handed for a system to say with some security that there may actually exist a person somewhere with the name Lukas Ahrenberg. It can not be a coincidence that (say) 10 people searched for that name.
So, the system then creates a secret file, let us call it a shadow page. This far it has only my name associated with it, but we will see how it can be filled with a lot of information.
So, with time, more people search for me, and the system saves this. At some point, we can start looking at the information they have of themselves on their facebook pages. What could be learned?
Well, probably most of them would be 28-35 years old. Pretty average for any facebook user, but the lack of early 20-year olds would probably say that I am at least 30. People from many different occupations made the searches, but a somewhat higher percentage (again than the average over all face book users) is in a computer related field of work, so it is a good guess that so am I. Many people from different universities, so he may be an academic.
Then, let us see, a few of the people searching have marked each other as friends. They share some common features. Schools perhaps, so they would know where I went to university, and maybe other schools. Some work places with a bit of guessing.
A somewhat higher percentage of both schools and searching persons might be Swedes, so there is a good guess for my nationality. By analyzing the type of connection, and location of the searchers we could find out where he is born, or at least have lived during his life. Again, analyze who searches and their location/relation to others.
There is already a lot of information on our shadow page, but we can do better. Keep it updated when new searches come in, narrow down the probabilities for all information, and check when they happened in time. For instance, within a week two searches on my name happens from people in Vancouver, Canada. Hmm, did he just get to know someone there? No these two people have no direct relation, but they still live in the same city and work for the same University. Maybe he moved there?
And so on.
It is very interesting when you start to think about it!
By now you might say that "hey, this guy think that several thousands of persons will search for him on facebook, talk about ego"! True, probably not even 10 have searched for my name. But I am just trying to prove a point. I do not know how many would need to search for the probablilites to become small enough, however I am fairly certain that they are not so many as you may think. Remember that the links in facebook carry information; or at least meta information, they do specify types of relations. This specifies distances between people in facebook space, so those searching already have a relation to each other. Combine this with some basic assumptions and statistics on the life pattern of modern day people, and I think you can narrow it down pretty good.
The requirement here is more that of a broad base, that is most a lot of people need already use the site. Which they do in the facebook case for instance, so one could identify a specific critical mass of the system after which these kind of information collection would be feasible.
The metainformation is also useful for the 'John Smith'-type searches, where a name may not be as unique as mine. Sure there may be a lot of them, but you can probably assume that the people searching for the same 'John Smith' has some kind of relation to each other as well (living in the same city, same job… school, you get it). So, you will probably find a small cluster, separating the different Smiths.
There you have it. Shadow pages. Your life without the need to type in anything. It is enough that enough of your friends already have. Come to think of it, this is a business idea for facebook to serve already complete pages to stressed out CEOs.
The funny thing is that you can only "protect" yourself by being so average that you can not be picked out statistically (disapphear in the information, what I call online camouflage), or by having no friends - as soon as any network is big enough the relation of the information will create a synergy hinting towards the "missing parts" (you).
Finally, I just want to say that this is a thought experiment. I do not seriously think, or want to imply that facebook, or any other site of that size, actually does this. Ah, and before the privacy people start to yell at someone, well, this is not a problem with the networking sites. It is the way of information. Check what I said above about no protection (this is actually not an adequate word because the need of protection is based on your initial mindset about the whole thing, but anyway), this can not be forbidden.
If you see this as a problem for you and not a thought experiment you should remember that: a) information wants to be free! (weather you like it or not) b) just because you are not paranoid it doesn't mean they're not out to get you!
- As my friend Grzegorz points out… I break with the principles outlined below by actually having been "tricked" to create a linkedin profile (since deleted) and I'm also using google+ now and then. :) Shame on me I guess. ;) I can only defend myself by waving my hands, saying that this text is more meant as an extention of thought when it comes to social networking, and the role of the network.
- And while I am on the updating streak: I got a couple of questions regarding the t.h.e.m … from other readers; that is just a fun reference Bosco from the Sam and Max games. (I am not really that paranoid, and I don't wear a tin-foil hat.)
- March 2010
- Seems like there are some research on this. Got noted on the web today. A study from MPI-SWS, and a news post on the subject. Will have a look on that paper.
- Update 2013-ish
- No more LinkedIn profile. Pissed me off.