Social Search

Given the dramatic and globally destabilizing effects of the deteriorating relationship between the Middle East and North Africa region and the West, efforts to increase the sheer number of interactions between these two regions should be a global priority. Our hope is to provide new opportunities for such positive interactions via the Web.

For example, between post 9/11 2001 and 2008, feelings of mistrust have only deepened. A 2006 Washington Post/ABC News poll found almost half (46%) of Americans have a negative view of Islam, seven percentage points higher than was found just post 9/11. In the Middle East, the data shows similar feelings of distrust of Western counterparts: A Gallup World Poll of Muslim opinions (2001-2007) found that significant percentages of people in Middle Eastern countries say that the West does not show concern for better relations (57% Egyptians, 53% Kuwaitis) and disagree with the statement that the US is serious about encouraging democracy in the region. Significantly this negative view does not translate to isolationist sentiments; a Zogby poll found that more than 70% of Moroccans, Jordanians, and Lebanese responded that they would "like to know" an American citizen.

As a newly burgeoning planetary civilization we need to build tools to help us bridge these cultural barriers.  The alternative is ongoing war, needless conflict, environmental devastation and other such artifacts of a global society that is incapable of understanding itself.  We need to take advantage of emerging technology to deliver news with persistence, proximity and relevance.

Today we have an emerging phenomena of 'social media'. Examples of these sites include: blogs - personal online journals that allow users to share their thoughts and receive feedback on them; Wikipedia, Flickr, and Digg.  These are web sites that allow users to collaborate on, publish, discuss and rank text, photos, web pages and news stories respectively.  Amazon's Mechanical Turk is another excellent example - solving problems by distributing them to a network of willing users.

The Web 2.0 social model is one of a shared garden.  Participants enter the shared space, browse and collectively groom the content of that space.  They repair dead links, they decorate articles with tags, they relate two articles together, relate media organizations to articles or to other content, provide geographic or temporal context, provide comments, bookmark or favorite various subjects, upscore or downscore content, forward links to friends, transclude, cite and contribute new content or new sources of content.

As Eric Raymond suggested in the context of open source: many eyes make hard problems shallow.  In our case many eyes make problems of information organization, media source quality and subjective quality more tractable.  However today much of the activity captures only explicit interest gestures on the part of the user.  For example a user "bookmarks" a link or "forwards" a photo.  Sites use this information to manufacture an "objective" point of view regarding which links are the most popular or which have the highest velocity. We instead need to manufacture "subjective points of view".  We want to let you spend a day in another persons shoes.

The primordial means of organizing meaning in our world is the virtual neighborhood; that community of overlapping interests: the social network. The function of a social network is three fold.  1) First they extend and reinforce a group network identity such as MySpace or FaceBook or Twitter.  2) Secondly they act as a medium for organizing and permissioning content.  3) Thirdly social networks manufacture meaning by separating content that the community enjoys from content that the community does not.

In our view, a social network consists of both people and the objects that they interact with.  We push the definition of a social network to include concepts such as people, articles, blogs, comments, searches, attention, tags, media outlets, locations and social groups.

In a perfect world any point in a social network should be a 'lens' or 'vantage point' for viewing the rest of that network.  A determined reader should be able to  surface the articles similar to an article, or surface the articles on an issue as seen from a given persons point of interest.  It should be possible to apply a geographic and temporal lens to an issue.  To see the issue as filtered through the perspective of people who actually live there.  It should be possible to see how event is first perceived, which new facts change that perception, and what the outcome is.  It should be possible to understand when new event is rapidly unfolding.  It should be possible to have a durable gaze directed at an issue even when it passes outside the interest of the bread and circuses mainstream media event horizon.

How can we do this?

In the real world if your gaze is drawn to something specific: if you turn your head to look at something - other people around you may see that action and they may also look over to see what you are looking at.  This kind of bio-mechanical social signaling has a parallel with the simple act of reading an article on the web, or simply being in the same chat room as somebody else, or simply doing a search.  We can refer to this as a "digital gesture" or "an implicit publishing moment".

Watching user activity is not uncommon.  Industry has been looking at click-through traffic for a long time. That's one type of gesture (a view), but it's very weak.  Even so, there is an enormous industry largely built on it: online advertising. As well, other people are now beginning to look at views in a more sophisticated way, mostly as a way of making the online ad economy more efficient.  For example, Google Analytics has a variety of features that do clustering of user activity.  It's all part of a drive to understand gesture better: intention, meaning, discourse, relatedness.

However there are digital gestures that are routinely ignored or underused - some of which (from a social point of view) are considerably stronger than a click-through.  Consider the actions of forwarding or recommending. The former case illustrates a vote that a piece of data should be related to another person for some reason(s).  The latter is perhaps a weaker instance, where an individual recommends content for a group of people.

The key issue isn't that these gestures are not being caught but rather that the Internet does not use these digital gestures to benefit actual communities of use.  For example many websites will capture the number of visitors who look at a photo but there rarely is a sense of tracing out who those visitors were, what their interests were or how highly they scored in your personal social network.  The network shows absolute popularity and velocity but it could also use what it knows to show you photos that your peers scored highly or that semantically related to your profile as based on your previously signaled interests.

Taken together with a wide range of behavioral signals, digital gestures are rich resources for analysis - especially if they take for granted the existence of information in a temporal and social context.  That context is made up of a web of changing relationships between people and information.  Digital gestures are ephemeral instances of these relationships, and are a rich, largely neglected source of information about users.

We need to mine this social data for the people - not just for marketers or advertisers.

Effectively this mining is a form of what might be called social information processing. Social information processing allows users to collaborate, either explicitly or implicitly, to solve hard information problems by leveraging the opinions and expertise of others. In addition to collaborative problem solving and what Luis von Ahn calls "human computation", social information processing may lead to wholly new kinds of knowledge – knowledge that emerges from the rich data-sets that are not the end-product (like a folksonomy, or a reliable wikipedia page) but rather the "byproduct" of the distributed activities of many richly identified users. For instance, I would like to be able to dissect a document by showing only the data created by individuals who identify themselves as ‘x’ or as ‘belonging to a given network’ or as ‘located within a given country’. These filters - the social, geo, temporal, and keyword based - could be used additively to generate complex 'lenses' into an event.  To see how opinion shifts, hardens, fractures or factionalizes as new facts about an event emerges and the social networks form conclusions.

This is a level of analysis that presents itself for machine learning and data mining applications. Where information is not just used, but shared, deployed, mediated and socially constructed and – importantly – recognizes the concept of temporal boundaries. Such an analysis would require tracing energized social network graphs to evaluate the strength of connections between a point of reference and a target set of documents or corpus.  This takes as a core the notion of "semi-structured information" and implies traversing the wide range of both implicit and explicit signals. The relationship network can be used to restrict range of a targeted query, e.g., 'social search'.

This compute intensive approach is possible only because we are now arriving at a point where we have enough server power and enough data to take in an increasingly larger number of signal and variables. "The best data is more data" goes the adage. Techniques in machine learning, latent semantic indexing, contextual network graph clustering and other quantitative multidimensional analysis can crunch these relationships and present highly filtered views of the world on demand.

In implementation the starting point is a large scale news aggregation engine. A service that collects blogs, articles, comments.  Many such services already exist, and most blogging tools and social discourse tools publish their data in a way that is easy to aggregate. That tracks who responds to whom and builds relationships between persons as well as between persons and other social objects. Geolocation can be done by algorithms such as MetaCarta provides which use contextual data to geolocate articles based on their semantic content.  We can also presume basic topic clustering of similar topics, topics posted about the same place, and around the same time.

The focus of the labor however is precisely what is discussed; to find and deploy analytics on top of such a starting point - to catch gaze wherever possible - and to provide these analytic tools to a user through a web or mobile interface.

Such an approach is feels like it must be part of answer to the ever-increasing noise on the web - helping where traditional approaches to search have a difficult time keeping up. It can be used to complement human organizing activity.  Machine learning and related methods can spark new constructive social contexts for an individual user on demand - providing the users a way to leap gaps. Effectively machine methods become a digital lens or prosthetic (in the Douglas Engelbart sense) for users new to an issue become informed about that issue.

There's also a possibility of a deeper outcome. Meaning itself becomes manufactured out of the iterative behavior of people reinforcing the life of social objects.  If we look closely at the digital traces things and people leave behind as a byproduct of their use - we see a net of values.  These traces include (as mentioned) the usual suspects: keywords, links, location, time stamp, affiliations - but they also include the deeper indicators of relevance: interest, attention, affiliations that are implied in word counts, kinds of words used, time spent on a page, forwarding behavior, simple proximity in a shared room, how deeply an article is pursued linked and navigated, number of links inbound and outbound, proximity to ones social network or other content objects.  Some subjects are valued more than others, and some relationships are valued more than others.  This hidden layer of values and relationships is the key to Ethan Zuckerman's beautiful notion of 'engineering for serendipity'. We think of these as 'ephemeral networks'. The networks themselves become the expressed meaning; the lens against which future data is evaluated. The algorithms that determine how we weight and filter will themselves be evolved.  As users try to incrementally order and understand their world the network itself more accurately models their interests.

Our understanding is indebted to Wittgenstein's brilliant, simple observation 'use is meaning' - an insight that still reverberates today. We note that 'use' is terrifically complex, socially defined, and, at least somewhat, mappable. While Wittgenstein was pondering the notion of 'use' as it relates to language and as it is expressed and mediated via the elaborate, complex, social, and analogue 'rules of the language game'--we think of the notion of 'use' in a digital context. In this less poetic but more tractable context we can think about mapping the digital semantic footprint of any given system entity,   defined as it may be by words, places, urls, times, people, types and qualities of gestures (rate, view, translate, forward, observe, etc.) and any various combination of these.

We began this by thinking about how we might create a system that would provide a way of comparing the various narratives that defined the 'meaning' of a given event in the world. It's taken as axiomatic that in any conflict the various parties have radically different understandings of the 'truth'. It ought to be easier to see how this is constructed; it ought to be easier in a digital age to engage as real time archaelogists, in the tradition of Foucault and to the end of understanding better 'what a given event in the world looks like through the lens of, say, Amman, Jordan.  The hope is that we might be able to see not only 'see' better the sometimes divergent narratives that tag to the same event, but to by there doing support substantive and rich dialogues.  A dialogue is formed between nations not by repeating rhetoric and hardened stances, but by unearthing the hidden stiction points and bringing them to the light of day - by a structured dialogue. We can foster peace by finding a third way, by examining the super-set of all vested interests, as Paulo Freire speaks of - by employing a praxis of starting with a position, exploring another, coming to a new position, and then exploring again.

Ed Bice, David Gutelius, Anselm Hook