Tuesday, May 22, 2007

Metacrap – the metadata myth

The mess is the message
I have long be;lieved that the standards police have been wasting millions (usually flying to long meetings in exotic locations) while the world ignores their blinkered schemas. Wonderful article from Doctorow (thanks again to Seb Schmoller) on why metadata has turned out to be a top-down, hopelessly utopian, mythical solution.

http://www.well.com/%7Edoctorow/metacrap.htm

People lie
Metadata won’t stop people doing their own thing and undermining your metadata or using it to sell porn or any other damn thing that comes into their head – metadata is a spammers’ paradise.

People are lazy
People forget to send attachments, miss subject fields in emails and generally don’t tagand can’t spell – most folk are far too lazy to metatag.

People are stupid
Metadata standards rely on more basic standards in spelling, punctuation and grammar. These have been abandoned by most web users.

Mission: Impossible - know thyself
People are bad at describing their own behaviour. Nielsen’s log books showed families watching documentaries and Sesame Street. The set-top box data showed they were really watching naked midget wrestling and car-chase programmes.

Schemas aren't neutral
Classifications are fuzzy and hierarchies do not describe the real world. Try doing this for any concept and you’ll run into disputes and blurred boundaries.

Metrics influence results
Standards people want to promote their stuff through their metrics and think that everyone else is wrong. They’re usually wrong. Everybody else is also wrong. It’s all very messy.

There's more than one way to describe something
"This isn't smut, it's art." Language use is inherently vague.


2 comments:

ALT and Nxxxx said...

Nothing to do with metadata but Doctorow has this comprehensive review of The Case Against Homework which caught my eye.

Anonymous said...

THe link that Seb provides presents a distinction between implicit and explicit metadata; a simple rule for any system design is not to ask the user to declare what the system might otherwise observe. That same rule might address some of your concerns.

Rigid taxonomies may be difficult to create, labour-intensive to maintain and prone to abuse but - until we have a natural-language parser worth its name - they're all we have.

As an aside: would you include user rating/popularity in your definition of 'metadata'?