Big data is pretty useless by itself. So is a building-sized pile of paperclips, or an endless amount of pictures of your cat. A small few of those paperclips could save a school secretary some headaches with dead-tree records and maybe a dozen of those photos of Mr. KittyPants are worth enlarging for that montage you have planned for the bathroom, but thousands of entries in a category need one thing to become useful: a filter.
Tech companies are constantly tweaking algorithms to sort through the huge dumps of data that come out of places like Facebook, Twitter, MMORPG’s, or the whole of the Interwebz. Too much data exists for humans to handle, even if we hired entire continents of people to do it. It’s like a trip to another galaxy: we’d have to plan for multiple generations to be made during the trip, and it would still take eleventy billion years to get there.
But big data manipulators do have one advantage: humans populate the Internet. And what do humans do really, really well, even before they can speak? They love to categorize. Big sticks, little sticks, hard rocks, flaky rocks, young mates, old people, what have you. Our brains are programmed to filter.
Human behavior on the Internet is the same as human behavior in the caves of yore. We sort. We categorize. If we cannot sort of categorize, then the whole is disregarded. The modern office supply shopper will walk past a display of “fill your own box” bin of unsorted paperclips to go over to the nicely separated or packaged ones, even if they have to pay more. The enthusiastic home photographer may be smart enough to back up their massive photo file but they rarely take the effort to re-label and sort their work. How many attachments have you received with some title like IMG_7869.jpg? Exactly.
So, what’s a non-psychology-non-sociology-trained engineer to do? Look for the human filtering, that’s what!
Incorporate into your design some of the following algorithm-ready human filtering that are already present online:
- Twitter lists. Users filter followers/followees into lists. They spend human hours sorting people, according to their own opinion of those people, into categories. For the most part, lists on twitter are also named pretty aptly, like “philosophers” or “funny people” (we can also assume that those two categories are mutually exclusive). Your algorithm can compare the results of these human hours and then build results again. Perhaps you are looking for who’s famous in the paperclip community? Compare a bunch of Twitter lists, then find the most-mentioned person. Twitter’s API has a great amount of human filtering, you just need to know where to look. Language use is pretty common amongst cultures, certain terminology, etc. etc. Facebook groups will work in the same way (once the API is open).
- Tagging and Grouping on Photo Sites. Flickr is a great example of a community that puts in a lot of human filtering hours. They tag and group photos to within inches of their lives. Flickr users also have a low tolerance level for bullshit. They call out sneaky photoshopping, they gripe about mis-tagged photos. Many of them also share their exif data (fancy photo tech terms) of each photo. If a company needs to process photographic evidence that may come in droves, then a Flickr group is a perfect way to get humans to tell your algorithm whether or not the photos are legit. In Flickr’s design, human filtering is a key element. Also with Pinterest and other curation sites. Figure out a way to use that culture of filtering to your advantage. Then go pay Flickr lots o’ start-up cash for use of the API.
- Networks: The measurement and tracking of human networks online dominates the design thinking in every new website and app. It drives me crazy. The credibility measurement algorithms of Klout, Kred, PeerIndex, etc., all take number of Twitter followers into consideration. This is ludicrous and about as useful as our pile of perplexed paperclips. Followers can be bought and gamed, as is evidenced by #teamfollowback. Facebook networks are almost equally as useless, as users add total strangers to their Friends lists. What is useful, if anything, about follower numbers is the ratios that surround them. We can assume, say, that a user who is followed 5 times more than they follow and has no history of mentioning the terms “follow” “back” and “me” together and has built lists of people who also have similar high ratios, is a different sort of person who has mentioned those terms and does not build lists of users. This is not about the numbers in networks, it’s about the human behavior of users.
This are just a few beginning thoughts on how to harness the power of human behavior in your algorithm. Hire a Psychologist or Sociologist, or me, for that matter, to find you more easily-tapped, custom-fitted examples of online (and offline!) human filters that you can use in your website, application or algorithm design.
Anything to add? Let me know in the comments.