Books and social data mining

Tom Owad's Data Mining 101: Finding Subversives with Amazon Wishlists is superb read (via Boing Boing):

"Using a pair of 5-year-old computers, two home DSL connections, 42 hours of computer time, and 5 man hours, I now had documents describing the reading preferences of 260,000 U.S. citizens.

I downloaded all the files to an external 120 GB Firewire drive in UFS format. The raw data occupied little more than 5 GB. I initially wanted to move all the files into a single directory to facilitate searching, but as the directory contents exceeded 100,000 items, the speed became glacially slow, so I kept the data divided into chunks of 25,000 wishlists."

The sad part is, I can't even get my wishlist out of Amazon without some furious hacking.

That's why I'm using Library Thing (thanks Steve!). Sure, you could mine it all day, but least I can get at my data by exporting my catalog as a CSV file. Making this catalog into an OPML list is the next step (export function would be nice Tim)...

Library Thing's tagging stuff is great (tagtastic, even) and social data on each book makes this site very it's a very RSS-friendly service.

Tim Spalding is the developer, great job Tim!.

If you like / love books you *have* to check out Library Thing.

Tags: Attention, OPML, RSS, books, tags