Friday, September 25, 2009

Decentralizing social media over HTTP and HTTPS

From a recent email I sent a friend:

I've been thinking about Facebook recently: their misuse of personal data, specifically. FB is a pretty fun site. And, for me, at least, it has been very useful. Too bad you end up locked-in to a specific vendor like FB. Not only do you have to trust them, you also have no way of porting your personal data to a site like ning.

A better scheme, I imagine, would involve making social media [personal] data, and indeed its whole ecosystem, more decentralized--more like the rest of the web.

I've been thinking about the requirements for a decentralized, standards driven, web-based social media ecosystem. At the very least, I imagine you need an easy-to-configure access control mechanism that lets you choose which friends can read what. The picture I have in mind is a file specification (maybe a zip file with a standard directory structure) that completely describes the state of a user account, and a (HTTP) container specification for loading and implementing the "intent" of the file specification as well as a network protocol (over HTTP) that implements cross-container messaging for user accounts. The spec would not concern itself with the presentation layer.

Do you know of some such project already underway? (My searches came up naught.) And is this interesting, silly, or old?




Update [15 May 2010]: There is now: Diaspora. I'm trying to find out how I can contribute time instead of money to the cause.


Update [1 Nov. 2009]: My friend sent me this link to a recent paper entitled Privacy, Cost, and Availability Tradeoffs in Decentralized OSNs. Here's an abstract:

Online Social Networks (OSNs) have become enormously popular. However, two aspects of many current OSNs have important implications with regards to privacy: their centralized nature and their acquisition of rights to users’ data. Recent work has proposed decentralized OSNs as more privacy-preserving alternatives to the prevailing OSN model. We present three schemes for decentralized OSNs. In all three, each user stores his own personal data in his own machine, which we term a Virtual Individual Server (VIS). VISs self-organize into peer-to-peer overlay networks, one overlay per social group with which the VIS owner wishes to share information. The schemes differ in where VISs and data reside: (a) on a virtualized utility computing infrastructure in the cloud, (b) on desktop machines augmented with socially-informed data replication, and (c) on desktop machines during normal operation, with failover to a standby virtual machine in the cloud when the primary VIS becomes unavailable. We focus on tradeoffs between these schemes in the areas of privacy, cost, and availability.

I've done a bit more reading and thinking since. First, I think it's a good idea. Second, it's not a particularly clever idea. We already have decentralized login (think OpenID): controlling access to personal data is a no-brainer. A lot of people have been thinking about this problem and have proposed various implementations--see Henry Story's RDF presentation, for example). So this is old, but as Marshall Kirkpatrick points out, perhaps it's an idea whose time has come.

Why then aren't people already developing such a thing? I would venture that it's because
  1. there is little profit motive in such an undertaking, or
  2. the community that could pull this off is all wrapped up in that proprietary, gated, winner-takes-all battle which Facebook dominates, or
  3. the W3C crowd, the folks you'd expect to be most involved in such a project, are too busy shoe-horning RDF to real world problems. Or,
  4. it's just a bad idea.
So what is it? Here's a sketch of what I'm imagining. (A sketch is all I have right now..)

Each individual controls a mini website, which we'll call an indisite. This indisite serves its authenticated owner (logged in, say over an OpenID protcol) a customized view into their social universe. That view (over HTTPS) is something like what you see at Facebook or some other social media site.

Beside providing this individuated presentation layer for its owner, an indisite also serves other [usually] authenticated users (friends of the owner) raw data (without presentation markup) and files. Some files and data may be public (for discovery purposes, for example), in which case, no authentication is required. For example, an indisite's default page might be the owner's profile page.

A key feature of an indisite is that it allows its owner to control access to their data and files. For example, as a user, I might not want to share a particular family album with all my friends. An indisite would allow me an easy, convenient way to assign access rights to only those friends I want to share the pictures with.

Indisites are designed to work with friend indisites (sites operated by the owner's friends). Privileged information is shared across sites over HTTPS. A user adds information to their network by publishing new information to their indisite. Their indisite in turn routes notification to friend indisites (again, over RESTful HTTPS calls). How and when this routing is done requires much thought. Also, there obviously needs to be a way for an indisite to poll friend sites.

Those considerations aside, the information exchange is XML-based. A "wall" posting notification (or meta description) may look something like..

<wall xmlns=.. >
<posting id="https://friend2.host/wall/posting/549">
<type> .. </type>
<date> .. </date>
</posting>
</wall>

Other types of information may involve rule-based authentication schemes--for establishing a friend-of-a-friend relationship, for example.

An individual's indisite, then, is both an aggregator and publisher of user information. Information is exchanged and used based on an honorary protocol. It's honorary because friendships themselves are honorary.

Implementation route

I'm thinking an indisite could be packaged as a .war file to be run in a servlet container. But in order to use it, you'd need a trustworthy service provider who'd let you drop in the .war file as well as provide storage space for the data that will be published, aggregated, or cached. The web application also allows the owner to download the entire state of the application as a single compressed file. This feature allows application portability across service providers.

At first glance, it's hard to see how anyone could make a business out of this (becoming a service provider) without charging users. There are few opportunities to sell advertising under such a scheme since a lot of privileged information is encrypted. (And so it should be!) But users (indisite operators) may opt to make a lot of information public (for example, if the scheme implements, or is bundled with blogging) and rent advertising space. So perhaps there is a business angle to providing such services for free.

This would have to be a community-driven project. Some ideas take a very basic reference implementation to take off. I can't see how this is one of those, but I'm hopeful that I'm wrong. I think I'll share and give it a try..