Spreadsheets are a good tool for looking at data; but if you want
more robust insight into your information, software like SAS and SPSS
can be somewhat daunting for the non-statistically savvy. "There's a
huge gap between Excel and the high-end tools," argues Greg Laughlin,
whose fledgling startup Statwing hopes to fill part of that space.
In fact, Excel includes a reasonable number of statistical functions
-- the issue is more that even many power users don't know how and when
to use them. The idea behind Statwing is to provide some basic,
automated statistical analysis on data that users upload to the site --
correlations, frequencies, visualizations and so on -- without requiring
you to know when, say, to use a chi-squared distribution versus a z-test.
Once you upload (or copy and paste) data to Statwing, you can select
different variables to be used in analysis. The site determines what
tests to run on the data depending on the characteristics of the factors
you pick, such as your data's sample size and whether variables are
binary (i.e. "for" and "against") or continuous (such as a range of
numbers).
In one demo, data on Congressional SOPA/PIPA
positions was matched with campaign donations from both the
pro-SOPA/PIPA entertainment industry and anti-SOPA/PIPA tech lobbies.
Statwing's analysis showed a "medium clearly significant" correlation
between a legislator's support for SOPA/PIPA and the amount of
entertainment industry political contributions he or she received
(although there was no statistical significance between opposition to
SOPA/PIPA and tech industry contributions).
In the Statwing advanced tab, you can see how the site reaches its conclusions. In the SOPA/PIPA example, the correlation was determined via a ranked T-test, a variation on a statistical test that checks for differences between two groups when their variances -- that is, how much the values are spread out from the group's average -- may be unequal.
The site's analysis also found a medium significance in age and support for SOAP/PIPA, with the average age of Congressional supporters almost 6 years higher than opponents.
Statwing currently keeps all data and analyses private, but plans in the work will allow users to share links to data, download and export results and eventually embed analyses and data into a Web page. For now, the company consists solely of its two founders: Laughlin, a former consultant and product manager who sought easier data analysis tools, and John Le, an engineer and data scientist. Both are Stanford grads who previously worked at CrowdFlower.
Statwing was built using the Clojure programming language, Laughlin said, for "actual math" and data handling (not using, as I'd assumed, the R Project for Statistical Computing as the statistics engine); some Ruby on Rails for packaging and Web basics; Coffeescript, which aims to simplify JavaScript syntax; Backbone for organizing front-end JavaScript and the D3 JavaScript library for visualization. The company just launched from the YCombinator entrepreneurial incubator program last week.
Just how useful is Statwing? An automated data analysis service in the cloud is certainly no replacement for an in-house data scientist who can mine your mission-critical data. And, I'd be hard pressed to recommend making a multi-million-dollar business decision based on an automated analysis alone -- especially from a site that's still in beta. No automated tool can ask customized questions about the integrity of your data set or raise a red flag when you're jumping the gun from correlation to causation. Nevertheless, Statwing looks like an appealing resource for professionals who want to try taking their data skills up a notch from means, medians and pivot tables in Excel; it's an interesting way to learn at least one approach to statistically analyzing a data set, or perhaps brush up on statistical skills that have gone a little rusty since college.
If you sign up for the public beta, you can currently try and use the site for free. There will be a limited free option in the future, Laughlin said, with such accounts restricted to analyzing and storing just one data set at a time. Paid accounts will likely run anywhere from $20-$30/month to a couple of hundred dollars a month.
Source
0 comments:
Post a Comment