|[berkman] Joshua Schachter|
Joshua Schachter of del.icio.us is giving a lunchtime talk. His presence has sold out the small conference room at the Berkman Center so we’ve moved to a bigger room.
What follows are paraphrases of what he said; I am certain not only to have omitted much but to have gotten stuff wrong, so before you get pissed at Joshua for saying x, you might want to check that he in fact didn’t say y and I said he said x.
He built delicious in 2003 to manage his own links. He had been using a text file, but twenty entries into it he had already introduced a tag into it.
Currently at delicious: 5M links, about 10M posts, on average about two tags per item. About 500,000 unique tags. Growth in tags is slow.
The Chinese firewall blocks delicious now.
Hard core tech pages have gone from 25% to 17% over the course of this year. „So interests are starting to broaden.“
Q: How would you describe delicious to a layperson?
A: It’s a way to remember stuff. Links initially but we’re adding some new types.
Q: How do you improve performance, i.e., latency time?
A: We’re continually upgrading. At times scrapers/spiders are half the load.
Q: Delicious is aggressively without a user interface, so I think of it as a pipe instead of as a consumer destination…
A: I’ve finally hired people who have a different sense of user design than I do. We’ve done a round of UI testing – the one-way mirror, etc. That was an entirely terrifying day. Once they figured out the point and got through the URL, people like the interface. It does what it does without a lot of jokey stuff, etc.
The API: People do get value out of it, but it’s also a political statement that it’s your data. Plus I’m lazy.
Q: What’s the financial model?
A: The same as any other advertising-backed discovery engine, like Google. The people who are using it are paying us with information. Ten times the number of people are on the site but not signed in than those who are signed in.
Q: What about the August spike on Alexa?
A: Everyone had it. Don’t know why.
If I had imported the categories out of DMOZ, people would have said „Screw this“ and would have left. Tagging is the easiest thing people could do and get any signal out of it all. People tag things „read later“ which is useful to them but not to others.
I’ve spent too much time working with fuzzy models of the world to need discrete taxonomies. There’s no such thing as a perfect categorization. There is value in controlled vocabularies, but that doesn’t really map to the task. I’m not trying to categorize the web but helping people find stuff later.
Q: I found people because only 4 of us were using the „Africa“ tag. How about making that more explicit?
A: I wrote the code to do that but it wasn’t pleasing. It tended to be dominated by people who have more tags overall.
Q: Is this compatible with the Semantic Web?
A: It’s easy to express your tags in RDF. That’s easy. Doing OWL is as hard as everything else is, namely, impossibly hard.
Q: What’s the infrastructure?
A: Mason (?), SQL [did he say „mySQL“?], lots and lots of replicas of the database for scaling, which isn’t good. The data still fits on a single disk. The search engine is a full text store and the recommendation engine is a database I wrote by hand (BerekelyDB).
Q (me): Which are you going to push, the individual or social uses?
A: You won’t use it if it’s not useful to you. But we’ll put in more social structure. Group tags are coming – tags that are lightly permissioned. You’d tag it as for a group, e.g., „groupname: tag.“ (Example: nptech, a tag used by people in the non-profit tech field.) In the case of people collectively organizing around a tag, I think you want to amplify that. We’re trying to put in privacy now; it’s a little bit of a challenge to do and keep it fast.
I worry about systems that stay in stealth mode. There’s stuff you’re not learning. We generally push code out to the live site 2-3/week.
Q: Say more about group tags and privacy…
A: Items can be private. If it’s tagged for you or your group you’ll be able to see them. The items won’t be visible (in order to avoid problems with totalitarian governments.)
There are 8 people at Joshua’s company now.
Q: Why „tags“ instead of „keywords“ in coming up with the terminology?
A: It was inadvertently clever. I wish I could say I did it intentionally. Typically, when keywords are used, you don’t see a list of the aggregated keywords. Maybe it is a slightly new thing.
Q: (me) Will we see typed tags, e.g., for events you get a field for time and a field for place?
A: I would like to store more rich datat types but that won’t happen immediately, e.g. contacts and events. You can make a date tag now: „date._____“ There’s stuff about the url, stuff about the post, stuff that belongs to you. E.g., if you bookmark an Amazon url. I could go get the bookcover, the price, etc. Then how do you represent them. We have to figure out how to do that once we’ve got performance up.
Q: As delicious scales, certain tags become meaningless. E.g., the „china“ feed is pretty useless. But if I could specify subsets or groups…
A: You’d create a group and let people in. It will be implemented as a tag, so you could get a feed of (say) „berkman“ and „china.“ (With your inbox you can map tags, i.e., this person’s „china“ is that person’s „asia.“) We have something called „the nework“ coming; I originally called it „friends“ but that was somewhat creepy. You identify people as being in your network and get feeds from them. [A group will be an established set of people who opt in. A network is a set of people you designate; they will not know they’re a member of your network. I point out that flickr tells you. Joshua says that every time he gets a notice from some random person that he’s been added as a contact „I want to rip my face off.“]
I’m not trying to build up the delicious community. There are plenty of communities.
Almost no one subscribes to a person/tag. Most subscribe either to a person or a tag. So, if you bookmark something and someone else has notes (nee „extended“) on that thing, you’ll be able to see them in your inbox. („Inbox“ is badly named, Joshua says.)
About a third of people who create accounts never come back.
Q: Do people use ISBNs as tags?
A: Not many. Amazon is one of the top bookmarked things. The number one bookmarked site is delicious itself.
Q: Tag spam?
A: In general it’s not that big a deal. Every couple of days, and they pop right out as outliers [or as „outliars“? :)]
Q: Are you building systems to monitor the trends of what people are doing?
A: Right now it’s not hard to identify the outliers. It’s not our focus. But my background is in analyzing bulk data.
Q: How about letting your users see that data?
A: I’m generally wary of this. If I publish the most clicked-on list, then it becomes a high score list that people will try to get on.
Q: Do you think there is a niche for something that is delicious but with more structured data?
A: That’s faceted classification and there are other people doing it.