Chris Anderson: Beware of ‘Spurious Correlations,’ But Not Messy Data

SAN DIEGO — CIOs are “drowning in data,” don’t have the resources to analyze it, and haven’t figured out how to effectively make use of it, according to Chris Anderson, founder and CEO of 3D Robotics and former editor of Wired. The biggest danger confronting decision-makers is making the wrong decisions because of what Mr. Anderson called “spurious correlations.” He said “we need analytic skills to use [Big Data] and don’t have it.”

Gary Fong/Genesis Photos
3D Robotics CEO Chris Anderson
But he also said CIOs shouldn’t be afraid to act on messy or imperfect data, and that companies should follow the example of companies like General Electric Co. , which create contests to solicit algorithmic queries from people outside their organizations. Mr. Anderson was participating on stage at the WSJ CIO Network Conference here.
Mr. Anderson was one of the early proponents of Big Data – the analysis of enormous amounts of data of all types, often in real time, using advanced algorithms and commodity hardware – and his 2008 article, The End of Theory, on the importance of Big Data, arguably spurred much of the current hype around the technology. And while most large organizations are currently at the very least experimenting with it, examples of successful Big Data projects and case studies are still far and few between. The reason, according to Mr. Anderson, is not that the technology is overhyped (he says that, along with the Internet, it seems overhyped in the short term, but under-hyped in the long term), but rather that companies haven’t hired enough statisticians and analysts, and have been too timid when it comes to using imperfect data.
According to Mr. Anderson, CIOs should rely on traditional A-B testing, reducing the number of variables they test even as they increase the amount of data measured. He also said executives have to become more comfortable with the fact that data is often imperfect, and rely on the large data sets to smooth out anomalies in results. That approach will allow companies to experiment more quickly and act on what the data tells them. “You need to act on the 80% probability. It’s a messy approach but it’s necessary,” he said.
This is more a question of a willingness to take this approach than a purely technical  issue. “It’s about dealing with probability rather than certainty and allowing the law of big numbers to compensate for that,” he said.
Mr. Anderson also said CIOs shouldn’t obsess over developing better algorithms, and rely instead on large data sets and raw processing power to overcome imperfect data sets. “Even a crappy algorithm is better than laboring for years to develop a more perfect algorithm,” he said. Instead, “throw more data and processing power at what you’ve already got and you’ll be surprised at what will work.”
While it’s important for companies to develop their own Big Data philosophies and skills, companies can also reach out to game-players and other hobbyists for help developing algorithms. For example, he said GE is holding a competition for an algorithm applied to the petabytes of flight data accumulated by commercial airplanes that could help the company make more efficient use of fuel.

Original Source

0 comments:

Post a Comment