Why We Need To Kill “Big Data”

It’s the New Year and along with resolutions about eating healthier, being kinder and exercising more frequently, I’d like to add one more to the list. Let’s banish the term “big data” with pivot, cloud and all the other meaningless buzzwords we have grown to hate.
To be completely honest–I have been one of the bigger abusers of the term in posts, as you can see here, here and here. It seems like every enterprise startup nowadays is in “big data.” There are even venture funds devoted to investing in “big data” startups.
Why have I grown to hate the words “big data”? Because I think the term itself is outdated, and consists of an overly general set of words that don’t reflect what is actually happening now with data. It’s no longer about big data, it’s about what you can do with the data. It’s about the apps that layer on top of data stored, and insights these apps can provide. And I’m not the only one who has tired of the buzzword. I’ve talked to a number of investors, data experts and entrepreneurs who feel the same way.
According to Vincent McBurney, ”Big Data” originates from Francis Diebold of the University of Pennsylvania, who in July 2000 wrote about the term in relation to financial modeling. That was over 10 years ago. In the meantime, so much has happened since then with respect to how and what people can do with these enormous data sets.
And big data is not just about the enterprise. The fact is that every company, from consumer giants like Facebook and Twitter to the fast-growing enterprise companies like Cloudera, Box, Okta and Good Data are all big data companies by definition of the word. Every technology company with a set of engaged regular users is collecting large amounts of data, a.k.a. “big data.” In a world where data is the key to most product innovation, being a “big data” startup isn’t that unique, and honestly doesn’t say much about the company at all.
According to IBM, big data spans four dimensions: Volume, Velocity, Variety, and Veracity. Nowadays, in the worlds of social networking, e-commerce, and even enterprise data storage, these factors apply across so many sectors. Large data sets are the norm. Big data doesn’t really mean much when there are so many different ways that we are sifting through and using these massive amounts of data.
That’s not to under-estimate the importance of innovation in cleaning, analyzing and sorting through massive amounts of data. In fact, the future of many industries, including e-commerce and advertising, rests on being able to make sense of the data. Startups like GoodData, Infochimps, Cloudera, Moat, and many others are tackling compelling ways to actually make use of data.
Another fact worth pointing out is that enterprise companies like IBM, large retailers, financial services giants and many others have been parsing through massive amounts of data for some time now, before this word was even coined. It’s just that the types of data we are now parsing through is different, and we don’t need to be using these data analytics systems through on-site data centers.
So let’s figure out a different way to describe startups that are dealing with large quantities of data. Perhaps it’s about the actual functionality of apps vs. the data. It’s the New Year and a great time to brainstorm over ways we can avoid “the term that must not be named.”

Why We Need To Kill “Big Data”

Forrester: SaaS And Data-Driven “Smart” Apps Fueling Worldwide Software Growth

Forrester Research is citing SaaS and data-driven smart apps as the major growth engines for the worldwide software market.
The SaaS software market will increase 25 percent in 2013 to $59 billion, a 25 percent increase. In 2014, the market is expected to total $75 billion. Forrester uses the term “smart computing” to define apps that, for instance, provide direct access to data for decision-making. It also includes data analytics and business intelligence in the category.
The research firm forecasts the smart computing software market to be $41 billion in 2013, increasing to $48 billion in 2014. According to Forrester, these smart-process apps overlap with SaaS products “because the browser-based access model for SaaS products works better for collaboration among internal and external participants than behind-the-firewall deployments.” As a corollary, these smart computing products, like SaaS, are growing far faster than the overall software market.
Here’s a look at the overall market:
techmarketoutlook
And the overall rankings:
fastestgropwingapps
Some of the more noteworthy findings come from what sectors will fall behind this year. Forrester cites Google Apps as putting pressure on desktop apps, which will increase just 4 percent this year. Enterprise content management will decrease 26 percent.
Interestingly, the area of highest growth is in what Forrester cites as “other,” meaning embedded apps that are added to software, such as CRM or ERP solutions.
Many of Forrester’s findings are not a surprise as we ease into 2013. The new year will be more about the dynamics of what comes into play as these trends unfold.

EMC Acquires Storage, Cloud Automation Software Developer iWave

EMC (NYSE:EMC) has quietly acquired iWave Software, a developer of storage and cloud automation software solutions for enterprises, solution providers and OEMs.
An EMC spokesperson confirmed the acquisition, noting that it is not uncommon for EMC to make a small acquisition and not formally announce it.
iWave, however, was a bit less quiet. The company on Friday replaced the homepage of its website with an open letter from CEO Brent Rhymes, who wrote that iWave has "successfully collaborated with EMC Corporation for years."

Rhymes wrote that iWave in 2012 established itself as an early storage automation market leader in large part because of its EMC partnership. "Now, as part of EMC, iWave is poised for some truly exciting and exponential growth in what is becoming one of the hottest storage software capabilities," he wrote.
Rhymes, however, did not mention other companies who partner with iWave, including EMC arch rivals NetApp and Dell.
iWave Software brings two primary product lines to EMC.
Its primary product is iWave IT Automator, an enterprise solution for process automation and orchestration in heterogeneous private cloud, private storage cloud and data center environments.
iWave Software on its website called iWave IT Automator a "secure, scalable and cost-effective solution for some of today's most complex IT administrative challenges," including server provisioning and enterprise sprawl, private cloud administration, storage provisioning and reclamation, event response and fault remediation, IT service management integration, and resource allocation.
The second product, iWave Transport Manager, provides consistent and business-driven change, configuration and release management as well as enforceable Sarbanes-Oxley compliance in SAP (NYSE:SAP) environments, iWave said on its website.
In an emailed response to CRN, an EMC spokesperson said EMC acquired iWave Software primarily for its iWave's Storage Automator and cloud technologies, which EMC expects will further strengthen its storage management portfolio and help its customers accelerate their journey to the public, private and hybrid cloud.
EMC will absorb iWave employees into its EMC Advanced Storage Division (ASD) organization, with Rhymes reporting to Jay Mastaj, senior vice president and chief operating officer of that organization.
EMC is dedicated to taking care of iWave's customers and keeping them whole after the acquisition, and is finalizing the plan to integrate iWave's products into EMC's product portfolio, the spokesperson said. The spokesperson did not directly address the question of how EMC will manage iWave's relationships with competing storage vendors, but said EMC will work to ensure that as many existing iWave partner relationships remain as mutually beneficial as possible.
The spokesperson said EMC is not disclosing the value of the acquisition, which actually closed on Dec. 27, but said it will not have a material impact on the company's GAAP or non-GAAP earnings for fiscal 2012.

Original Source

How To Make Big Decisions Quickly

What do I do all day ? I get this question asked of me from time to time and it’s a fair one: as a CTO, what exactly do I do all day? The fact is my daily routine has changed dramatically over the years. While I still spend a lot of hands-on time immersed in technology, our code, architecture and checking in on development milestones, I would characterize a large portion of my time is now spent doing two things: growing a talent base and helping my team make big decisions.
In a future post, I will talk about the process and techniques I use to build technical and managerial talent. But today I want to spend a few minutes describing the steps I use to make decisions under uncertainty.
I’ve found that in my role if an issue finds its way to my inbox, it means it’s not a problem that’s easy to solve. These decisions span the gamut – from the way we handle over a petabyte of gameplay data, to the topology of our data centers, to the development platform for the company, to the way we do final go/no go before releasing our products into live operations. Every decision I’m faced with directly impacts our consumer experience, impacts our P&L materially and impacts our employees and company over many years to come. These are generally high impact to the business and involve a variety of strong diverse points of view.
There are five things I try to do every time I am confronted with this situation that help me evaluate the scenario and make a decision effectively:
1) Get to the heart of the matter
Many times I find that perspectives around a big decision are both passionate and accurate but reflect only a particular (sometimes narrow) view of the problem. The first step for any major decision is based on critically and objectively dissecting its different elements to get to the bottom of the issue. A technique I use and encourage is called precision questioning(PQ). It is a structured method to quickly get to the underlying assumptions, sources of data, measures and cause-effect relationships to separate out causation variables from associations. PQ was developed by Dennis Matthies, a long-time Stanford professor. It is a good framework that enables teams to create a shared vocabulary and consistent way to develop critical thinking skills. For me personally, without working on each and every project day to day, using precision questioning helps me understand all aspects of the issue at hand and make an informed decision.

2) Encourage dissenting points of view
Everybody has a set of experiences that inform their decision making. It’s the muscle memory they rely on when confronted with a big choice. I find this method dangerous because you can risk only listening to those arguments that support your predisposed points of view whereas you need diverse perspectives to make the best decision. I always try and seek out those that have a different perspective, listen seriously to the counter points of view and ensure every voice is heard. When people are reticent or shy to express an unpopular opinion, giving them a voice and an environment to disagree is incredibly important.

3) Foster boldness
A key element of good decision making is timeliness. Paralysis of analysis can render a great decision useless if it is made too late. A deep and accurate understanding of all aspects of an issue is important, but seeking perfection is not practical. One doesn’t need to be 100% sure of an answer before making a decision. In fact, I almost never will be – it would require entirely too much time to come to a conclusion where you are that certain. It’s about making a decision when you have sufficiently evaluated all the factors so the team can move forward and I’m not serving as a bottleneck.

4) Ensure the discussion is not personal
Getting to the heart of an issue in a rapid manner especially with techniques like PQ can be quite intense. Make this a natural part of your organization’s DNA to ensure there is respect and civility in all discussions. It takes time but once set, it pays large dividends and everyone is more likely to participate and share opinions the next time.

5) Provide clarity in the decision
Often times a decision is made but not well communicated and therefore not well understood. Once a decision is made, it is important for everyone involved to understand the “what” and “why” of the decision. Transparency builds the foundation for support. It provides predictability in the way an organization decides and picks between options. In addition, don’t forget that every decision has an inverse. It is equally critical to spell out clearly what you will not be doing.


You’ll never be right in every decision you make. But being bold, being thoughtful and clearly understanding all factors are key. When you've gone through his process and based your decisions on research and facts, it is also easier to modify direction if the underlying factors change substantially. I always “reserve the right to wake up smarter every morning” - meaning I have a framework to track the decision logic over time and course correct if needed. I have found that using this five-step process to make decisions is the best way to foster an environment where we can make big bold bets and smart people thrive.


Original Source

Why You'll Need A Big Data Ethics Expert


Big data is a big deal for companies in 2013. The prospect of outdistancing your competition by leveraging your company's data with huge data sources such as NASA, the government, video and demographic services is compelling. But there's evidence that technology is advancing faster than companies and governments can manage it. Along with big data technology developers, your company should be thinking about adding a "big data ethicist."
The need for big data technologists is well covered in the media. At the annual Gartner IT Symposium, I reported on a looming gap between big data needs and technologists to fill those needs:
"The Gartner analysts predicted that by 2015, 4.4 million IT jobs globally will be created to support big data, with 1.9 million of those jobs in the United States. That employment projection carries further weight when, as the Gartner analysts pointed out, each of those jobs will create employment for three more people outside of IT.
"However, while the jobs will be created, there is no assurance that there will be employees to fill those positions. Sondergaard provided the dour prediction that only one-third of the jobs will be filled due to a lack of skilled big data applicants. One of the biggest tasks for CIOs is to rethink how to hire and train a workforce able to meet this demand for big data talent."
As InformationWeek executive editor Doug Henschen explained in an article on the big data talent war, technology executives need to engage a seven-point hiring and training plan for big data professionals.
So the need for technology talent in the big data segment is clear. But recent tragic events also show how big data extends far beyond a company's technology or marketing departments.


Recently, the Journal News, a newspaper based in White Plains New York, touched off a furor when it published a Google map showing the location for 44,000 registered handgun owners in Westchester, Rockland and Putnam counties in New York State. The registration information obtained under the Federal Freedom of Information Act is a vivid example of, as the Christian Science Monitor reported, the disputes that can arise when constitutional rights -- in this case, the First and Second Amendments -- clash.
The tragedy at the Sandy Hook Elementary School was the catalyst for the Journal News' decision to publish the gun owner information. The ease with which the paper published information obtained from the Freedom of Information Act and Google Maps shows how data is becoming more accessible in the big data era. While Putnam County officials have so far resisted providing the newspaper with gun ownership information, it appears they are unlikely to block access going forward.
Marc Parrish continues the discussion of big data's role in Second Amendment rights in The Atlantic. In his article he states, "Big data might have stopped the massacres in Newtown, Aurora, and Oak Creek. But it didn't, because there is no national database of gun owners, and no national record-keeping of firearm and ammunition purchases. Most states don't even require a license to buy or keep a gun.
"That's a tragedy, because combining simple math and the power of crowds could give us the tools we need to red flag potential killers even without new restrictions on the guns anyone can buy. Privacy advocates may hate the idea, but an open national database of ammunition and gun purchases may be what America needs if we're ever going to get our mass shooting problem under control."

While it is beyond the editorial mission of business publications like this one to take a side on the Second Amendment controversy, one technology-related aspect of Parrish's article is undeniably correct: The task of developing a database of 300 million guns and their owners is now so trivial it hardly falls into the big data category.
The looming issue in big data isn't technology but the decisions associated with how, when and if results should be provided. Widespread access to public information, interfaces that make it easy to combine big data sources, and the ability to publish information to the Internet is going to yield some difficult decisions for the big data community.
And those decisions are only going to become more intense. A substantial amount of data by government organizations, for example, is still locked up in paper format. However, companies such as Captricity have developed innovative ways to turn massive amounts of paper-based data into digital form. Companies such as Panjiva are using big data and business intelligence to meld buyer and seller data in multiple public and private databases to create a unique global commerce engine.
In the enthusiasm around big data, there has been little discussion about what that data might uncover. Privacy issues will surface as data analytics becomes able to reveal identities by combining what was previously considered anonymous data with location and purchasing information. Alistair Croll, at O'Reilly Media, put it succinctly in an article entitled "Big Data Is Our Generation's Civil Rights Issue, and We Don't Know It": "Time for you to plan for not just how your big data strategy will be implemented, but what are the implications of the data your company will be creating and publishing."

Original Source

Facebook Mobile User Counts Revealed: 192M Android, 147M iPhone, 48M iPad, 56M Messenger

Facebook keeps user counts for its mobile apps hidden, but analyst Benedict Evans found a way to uncover them and they provide critical insight into the direction and performance of Facebook’s mobile efforts. Most interestingly, Facebook’s Android user count is growing much faster than its iPhone user base, but is found on a lower percentage of Android devices. Let’s take a closer look at the data.
Old Facebook Mobile Stats HovercardsA year ago, Facebook stopped reporting user counts for its own mobile apps via the Graph API. But if you searched for one that none of your friends used and hovered over the search result, you could see its monthly active user count (MAU). Evans of Enders Analysis meticulously recorded until “some time in November [2012], those disappeared and were replaced” with hover cards lacking the usage data, he tells me. He incorrectly calculated Facebook’s mobile web site stats due to overlap between native app and HTML5 site users. Facebook declined to comment but solid analytics sources and old official numbers say the rest of his stats are accurate.
Evans gave me the raw data dump from his research, which is more current than his blog post, and here’s what it shows.

iOS vs. Android

As of September 2011, Facebook for Android has 66 million MAU and Facebook for iPhone had 91 million MAU. In December 2011, right before Facebook stopped openly publishing stats, Android surpassed the iOS app. By just 11 months later in November 2012, Android had grown to 192.8 million MAU while iPhone only had 147.2 million MAU.
This shows Android is a core source of growth that helped Facebook reach 604 million mobile users by the end of Q3 2012. This underscores the need for Facebook to speed up Android development. Many new features and sometimes entirely new apps like Pages Manager launch first on iPhone. This could be because Facebook defaulted to giving employees iPhones for a long time, and still more team members carry them than Androids.
Screen%20Shot%202013-01-02%20at%209.46.22%20PM
While Facebook for Android may have more absolute users than its iPhone counterpart, the iPhone has a much better penetration rate. Facebook’s native app is actively used by 73.6 percent of the estimated 200 million iPhone install base. Only 35 percent of the estimated 550 million Android install base see monthly usage of Facebook’s native app. This may be in part due to the popularity of Android in China where Facebook is blocked. However, it may also show Facebook’s lagging penetration in emerging markets like India where Androids are common.
This all leaves out the iPad, though. Facebook for iPad rapidly grew from just a few million users in September 2011 to 48 million MAU in September 2012. If you estimate iPad’s install base at 100 million, 48 percent use the Facebook app monthly. That’s a lower penetration than on iPhone but worthy of regular updates.
Meanwhile, out of the 195.2 million iOS devices regularly accessing Facebook’s native apps, only 53.8 million or 27.5 percent of devices have turned on Facebook’s iOS 6 integration. That means there’s lot of people who aren’t using contact sync, easy sharing, and single sign-on for third-party apps. Facebook may need to come up with a way to convince more users to turn on the integration, both for its own benefit, and to convince Apple that Facebook is a powerful partner.
The big takeaway from the iOS / Android platform battle is that Facebook needs to focus more on Android. If Facebook’s iOS and Android apps have continued on the same growth trajectories, by now Android likely has more MAU than the iPhone and iPad apps combined. Even if Android is not the preferred mobile OS of employees, building for it is critical to keeping its overall mobile usage growing.

Feature Phones Are Big. RIM, Nokia, Windows Not So Much

Facebook For Every Phone ScreenFrom September 2011 to November 2012, the Facebook’s feature phone app called Facebook For Every Phone that’s built on the Java Platform, Micro Edition, more than doubled in MAU to 82 million. The feature phone app’s growth shows emerging markets around the world are getting on mobile, and a decent number are using Facebook.
We don’t hear much about this app from Facebook. That might be because most of its employees carry smartphones, and so it may be harder to see how important it is, brainstorm improvements, and test updates. But until low-cost smartphones start displacing feature phones in the developing world, Facebook needs to innovate here.
What it doesn’t need to worry as much about are the second-tier smartphone platforms. RIM’s BlackBerry still has a somewhat significant Facebook user base of 60.2 million as of December 2012. Unfortunately that was only up from 48.9 million in November 2011, and its failed PlayBook tablet’s Facebook app had just 690,000 MAU by December 2012. Meanwhile Nokia had 15.7 million MAU by November 2012, and Windows Phone had only a couple million Facebook users. Fracturing engineering resources across these platforms is likely inefficient for Facebook.

Messenger Grows Quickly, But Is Still Far Behind

Facebook does a lot. Having a ton of features on the web makes sense, but cramming them all in a single mobile app can make it feel bloated. That’s why Facebook began releasing standalone apps in August 2011. They give users quick access and a dedicated interface to a popular feature, and helps Facebook experiment with new capabilities it might add to its primary smartphone apps.
After buying the group messaging and SMS-replacement app Beluga in March 2011, Facebook re-skinned it, and hooked it into its unified web/mobile messaging system. The result was Facebook Messenger which launched for iOS and Android in August 2011.
Facebook Messenger And Camera Stats
A month later it had almost 3 million MAU. Growth picked up in the fall and it had 10 million users on each platform by November. It continued steadily gaining users, and Android pulled in front of iOS in Fall 2012. By late November 2012, Messenger had 22.8 million iOS MAU, 32.3 million Android MAU, and 1.6 million BlackBerry MAU for a combined 56.7 million MAU.
That sounds impressive but Messenger still lags far behind several international messaging apps. WhatsApp is believed to have several hundred million users and China’s TenCent says its WeChat app had 200 million users as of September 2012. That’s why we’ve heard Facebook has made inquiries about acquiring WhatsApp as well as Snapchat, which it instead ended up cloning as Poke. Owning the platform you private message on is critical to Facebook because knowing who you message with helps it refine its content-sorting relevancy algorithms. There’s also potential monetization options within messaging.

Facebook Camera Can’t Compete With Instagram

Facebook knew it had do something unique with photos on mobile. So, long before it began negotiations to buy Instagram, it started building Facebook Camera. The Instagram deal was signed quickly, and Camera was almost done so it launched the standalone app a month later in May 2012.
Though Camera offered its own filters, a powerful bulk upload option, and more cropping flexibility, Instagram had too much momentum and a loyal user base. Instagram passed 100 million users in September 2012.
Thanks to Evans, we’re now getting our first look at Camera’s progress, and its lackluster performance. A month after its launch it hit 1.4 million users, dipped for a while, and now six months later it only has 1.5 million MAU. That doesn’t mean it’s not valuable to Facebook. It showed a slick photo selection flow, filters, and bulk uploads were popular, so Facebook added them to its primary apps. But in the end, Facebook may be better off dedicating development resources to Instagram.

What’s Next For Facebook Mobile?

facebook-app-overload-folderTo put it simply — going hard at Android, making its feature phone app more viral, and figuring out whether to concentrate on one omni-app or several standalone apps. Obviously there’s monetization, but that’s for another article.
Android’s growth momentum means Facebook for Android needs to become its premier app. Facebook’s competition with Google might make that painful, but it needs to stick to its social layer strategy. It should view building an incredible Android app as a way to take advantage of Google’s mobile install base, not the other way around.
There’s a ton of of feature phone users, and not enough are on Facebook. The social network should look to how it can convince its feature phone users to get their friends on-board too. That might mean some kind of incentive program for feature phone recruiters or recruits, such as mobile data discounts.
Finally, with Messenger, Camera, Poke, and Pages Manager, its standalone app portfolio is starting to bulge. Users might not want a home screen full of Facebook, and that might lead them to bury the apps in a folder. Then again, Google has done well with a suite of standalone mobile apps. Either way, 2012 was about Facebook getting serious about mobile in general. 2013 will be about trading the shotgun for the scalpel.

Original Article

Like Big Data, Operational Intelligence is Evolving to Deliver Right Time Value

Ventana Research has been researching and advocating operational intelligence for the past 10 years, but not always with that name. From the use of events and analytics in business process management and the need for hourly and daily operational business intelligence, but its alignment with traditional BI architecture didn’t allow for a seamless system, so a few years later the discussion started to focus around business process management and the ability of companies to monitor and analyze BPM on top of their enterprise applications. Business activity monitoring became the vogue term, but that term did not denote the action orientation necessary to accurately describe this emerging area. Ventana Research had already defined a category of technology and approaches that allow both monitoring and management of operational activities and systems along with taking action on critical events. Today, Ventana Research defines Operational Intelligence as a set of event-centered information and analytics processes operating across the organization that enable people to take effective actions and make better decisions.
The challenge in defining a category in today’s enterprise software market is that prolific innovation  is driving a fundamental reassessment of category taxonomies. It’s nearly impossible to define a mutually exclusive and combinatorially exhaustive set of categories, and without that, there will necessarily be overlapping categories and definitions. Take the category of big data; when we ask our community for the definition, we get many perspectives and ideas of what big data represents.
Operational intelligence overlaps in many ways with big data. In technological terms, both deal with a diversity of data sources and data structures, both need to provide data in a timely manner, and both must deal with the exponential growth of data.
Also, business users and technologists often see both from different perspectives. Much like the wise men touching the elephant, each group feels that OI has a specific purpose based on their perspective. The technologist looks at operational intelligence from a systems and network management perspective, while business users look at things from a business performance perspective. This is apparent when we look into the data sources used for operational intelligence: IT places more importance on IT systems management (79% vs. 40% for business), while business places more importance on financial data (54% vs. 39% for IT) and customer data (40% vs. 27% for IT). Business is also more likely to use business intelligence tools for operational intelligence (50% vs. 43%), while IT is more likely to use specialized operational intelligence tools (17% vs. 9% for business).
The last and perhaps biggest parallel is that in both cases, the terms are general, but their implementations and business benefits are specific. The top use cases in our study for operational intelligence were managing performance (59%), fraud and security (59%), compliance (58%) and risk management (58%). Overall we see relative parity in the top four, but when we drill down by industry, in areas such as financial services, government, healthcare and manufacturing, we see many differences. We conclude that each industry has unique requirements for operational intelligence, and this is very similar to what we see with big data.
It is not surprising that our definition of operational intelligence is still evolving. As we move from the century of designed data to the century of organic data (terminology coined by Census Director Robert Groves), many of our traditional labels are evolving. Business intelligence is beginning to overlap with categories such as big data, advanced analytics and operational intelligence. As I discussed in a recent blog post, The Brave New World of Business Intelligence, the business intelligence category was mature and was showing incremental growth only a few years ago, but it is difficult to call the BI category mature any longer.
Based on the results of our latest operational intelligence benchmark research, we feel confident that our current definition encompasses the evolving state of the market. As operational intelligence advances, we will continue to help put a frame around it. For now, it acts very much like what might be called “right-time big data.”
Regards,
Tony Cosentino
VP & Research Director

Original

9 + 5 Reasons to Upgrade to Informatica PowerCenter 9.5

  1. Increase Agility in Delivering Critical Data and Reports to the Business
    Access, profile and merge existing data and “big” data in real-time to deliver new insights rapidly with the PowerCenter Data Virtualization Edition.
  2. Harness the Power of Big Data
    Leverage leading Hadoop distributions with enhanced capabilities loading to, processing on, and extracting from Hadoop.
  3. Connect with Social Media Data
    Capture feedback from social media streams with a streaming API for incoming data feeds, including enhanced Facebook, Twitter and Linkedin connectivity now optimized for data services, data quality, profiling and MDM use cases.
  4. Enable true business-IT collaboration
    Empower the analyst to directly find the business terms and data using Business Glossary and define mappings they need from higher level specifications with the Data Integration Analyst Tool. Analysts can now perform simple data integration tasks on their own with guided mappings that auto-map source to target, provide join recommendations, and preview data midstream.
  5. Simplify Web services development Build Web services from WSDL or existing data objects, transformations and mapplets, handle complex data types, consume RESTful Web services and reuse services much more easily than ever before with the new Data Services Web Services Option.
  6. Boost productivity with metadata management
    Manage change better while preserving data integrity with Metadata Manager and Business Glossary’s optimized data lineage and impact analysis..
  7. Test changes and upgrades up to 10x faster and increase test coverage with the Informatica Data Validation Option Any time you move or transform data, you run the risk of introducing errors. Become 10x more productive and reduce the risk of negatively impacting the business while increasing test coverage at the same time with the Informatica Data Validation Option.
  8. Validate Your Production Data with the Informatica Data Validation Option
    Schedule automatic data reconciliation of any data updates to production systems and help identify issues before they impact the business using the Informatica Data Validation Option.
  9. Proactively Identify Data Integration Risks
    Protect data integration projects by leveraging more than 25 pre-built rules and template that can help identify and alert on situations that may lead to process failures with Proactive Monitoring for PowerCenter Operations.
  10. Minimize risk through data governance
    Implement data governance with Business Glossary, Metadata Manager, and integrated data quality support, now with approval workflow and change notifications for business terms for Business Glossary.
  11. Improve mainframe access and performance
    Maximize your mainframe investments with access and performance enhancements including real-time change data capture, VSAM partitioning, DB2 bulk loading and high availability. Reduce MIPS cost by offloading processing to zIIP processors.
  12. Harden your real-time operations
    Take advantage of improvements to PowerCenter Real Time Edition including real-time change data capture enhancements, even greater Web services scalability, and guaranteed once-only message delivery.
  13. Bullet-proof your security
    Secure your data integration environment with centralized user and role administration, enhanced LDAP integration, SSL remote access and support for Internet Protocol version 6 (IPv6).
  14. Support mission critical 24x7 operations
    Access, integrate and deliver your critical data whenever you need it with enhanced capabilities including high availability for PowerExchange and dynamic lookup cache for real-time data.
Source

Big Data Apps: The Next Big Thing?

What's the best solution to the looming shortage of data scientists, those high priests of analytics who glean meaning from big data? One option is to build big data applications that automate many data scientist tasks, thereby enabling less technical business workers to make data-driven decisions without first consulting the resident data guru.
In a similar vein, big data can play a major role in the development of learning machines that make recommendations, not simply serve up results and leave the analysis and interpretation up to humans.
In a phone interview with InformationWeek, Opera Solutions chief strategy officer Laura Teller predicted that a growing sophistication in software and machine learning will help enterprises cope with the rising velocity, variety and volume of data in the coming years.
[ Big data has value that's often not reflected in the books. Read more at What's Your Big Data Worth? ]
Opera Solutions is a predictive analytics firm that employs more than 230 data scientists -- nearly a third of its staff. In 2012 it partnered with Oracle and SAP to connect the Oracle Exadata and SAP HANA data appliances with Opera Solutions' Signal Hub technologies, which use machine learning and data science to pull domain- and business-problem information from big data flows.
"The human brain was not meant to deal with this massive flood of information," Teller told InformationWeek. "And a machine has to stand between that flood of information and humans' ability to interpret and take action based on the information."
A new generation of learning machines needs to distill core information from the noise of big data and present it in ways that allow humans to take action. "We spend a lot of time thinking about this with our interfaces," Teller said. "We want the machine to serve up a set of directed actions in every application that we create. We want the machine to make recommendations to humans about what you should do, what you can do."
The development of big data applications is an emerging trend that Opera Solutions predicts could grow significantly in 2013. "If you can prepackage the science into something that's prebuilt, you can insert it on top of existing systems and workflows, and push it into the world of the operator," Teller explained.
For instance, healthcare is one industry that could benefit from prepackaged big data apps. "The area of healthcare billing, particularly hospital billing, is fraught with errors," Teller said. "A lot of it is handwritten and happens very quickly. So hospitals miss a ton of bills that they could -- and should -- legitimately bill for."
Hospitals today often use rules-based systems for billing. For instance, if one medical procedure appears on a bill, then an associated required procedure should be listed too. But Opera Solutions suggests an alternative: a patterns-based approach that studies how patients, diagnoses, and hospitals "behave" in the billing process. "We can find -- with much greater accuracy -- things that have been potentially dropped off the bill and serve those up to humans," said Teller. "We lay this on top of their existing system. It takes us about 500 man-hours to be able to hook in and train the models, which really isn't very much when you think about how much money is at stake here."
Another benefit of a patterns-based app is that it can continue to learn without human intervention. "You don't have to stop and reprogram it -- like you have to do for a rules-based system every time the rules change," Teller pointed out.
Another emerging trend to watch: The linking of a company's valuation with its big data stockpile. "I think there's going to be (more) people who help investors value companies on the basis of big data equity," said Teller. She predicts that a new "science and art" of valuing a company based on the data it has, the data it can attract, and what it can do with that data is going to come to the forefront. "And it's going to be as important as brand equity."

Original Article

'The Hobbit' Creates Big Data Challenge For Moviemaker

When it comes to manipulating massive amounts of digital information, few creative industries can match the data-intensive workloads of movie and TV production. And the advent of cutting-edge motion picture and HDTV technologies, including high frame rate (HFR) 3-D and 4K TV, will generate even more data.
"High-definition cameras and the new wave in which people create effects and try to enthrall the audience have resulted in a pretty big explosion in the amount of content produced per movie," said Jeff Denworth, vice president of marketing for DataDirect Networks (DDN), in a phone interview with InformationWeek.

DDN sells high-performance, scalable storage systems for a variety of industries, including film and post production. "We say that we've been in big data long before the term was even invented," said Denworth, who noted that 2013 will mark the company's 15th year in business.
[ What do you expect to see in big data this year? Read 5 Big Data Predictions For 2013. ]
The amount of data generated by movie and TV productions continues to rise. In 2009, director James Cameron's 3-D sci-fi epic "Avatar" was one of the first movies to generate about a petabyte of information, according to Denworth. (One petabyte equals 1,048,576 gigabytes.)
"Obviously, 3-D films weren’t anything new -- people had been releasing 3-D (movies) for a long time," said Denworth. "But Avatar was the first film to show that you could commercialize 3-D in a big way. So the market really rallied around the concept of 3-D technology."
And that meant a lot more data. "The 3-D effect on filmmaking will create anywhere from 100% to 200% more data per frame," Denworth said.
While the allure of 3-D movies has certainly cooled among moviegoers, filmmakers continue to try new technologies designed to heighten the cinematic experience.
For instance, "The Hobbit: An Unexpected Journey," which is currently in theaters, was shot in a new digital format called High Frame Rate 3-D (HFR 3-D). The format shows the movie at 48 frames per second (fps), twice the standard 24-fps rate that's been in place for more than 80 years. (Only select theaters with HFR equipment can show "The Hobbit" in HFR 3-D.) For viewers, HDR delivers a more realistic, immersive experience, according to "Hobbit" director Peter Jackson.
"The thing around 'The Hobbit' is that we're now crossing over a new threshold into the next wave of filmmaking, which will ultimately create even more data in the film-production process," said Denworth.
The Hobbit - An Unexpected Journey movie poster
And 48-fps may be just the beginning. Filmmaker Cameron, for instance, reportedly plans to use 60-fps cameras for his upcoming "Avatar" sequels. On the TV side of things, television manufacturers are starting to beat the drum for 4K TV -- or nearly four times the resolution of 1080p -- as the next big thing in home entertainment.
"Within the last five years, we've seen something in the order of an 8x increase in the amount of content being generated per every two-hour cinematic piece," Denworth said.
As every data aficionado knows, the term "big data" isn’t defined by volume alone. The velocity and variety of data are equally important, particularly when you're managing massive amounts of information.
"The data sizes are growing, but an 8x increase in (content) can't result in an 8x increase in the amount of time it takes to create a movie," said Denworth. As a result, filmmakers will need extremely high-frame-rate processing that grows with their data sizes.
"The commercial success of 'The Hobbit' could certainly wake up the whole industry to this new way of filmmaking, and that will create a lot of havoc for people who have lesser-capable storage platforms," added Denworth.

Original article

Why do we need Business Intelligence ?

It’s a classic question that has a classic answer – companies need to translate data into information in order to make strategic business decisions.
Companies continuously create data whether they store it in flat files, spreadsheets or databases.  This data is extremely valuable to your company.  It’s more than just a record of what was sold yesterday, last week or last month.  It should be used to look at sales trends in order to plan marketing campaigns or to decide what resources to allocate to specific sales teams.  It should be used to analyse market trends to ensure that your products are viable in today’s marketplace.  It should be used to plan for future expansion of your business.  It should be used to analyse customer behaviour.  The bottom line is that your data should be used to maximize revenue and increase profit.
All companies produce reports from the data they collect from their business activities.  Every manager has a manager who needs reports unless you’re the CEO in which case you just need reports.
Some important questions you need to ask are:
1. How much resources (i.e. people, time, dollars) does it take to produce these reports?
2. Am I sure that the data in these reports is accurate?
3. Am I concerned with the security of these reports?
4. Am I receiving these reports in a timely manner?
If the answers to these questions are too much, no, yes, and no then you need a Business Intelligence solution.
IT are the first people to begin the process of creating a report.  They need to extract the required data and pass it to the person creating the report.  That person then has to spend time manipulating the data to create the required report.  This process can take many hours, even days, of effort.  And this process needs to be carried out for each and every report that the company requires.
Business Intelligence solutions automate the process of extracting data and producing reports thereby eliminating all of the manual effort of IT and the people creating the reports from raw data.
A number of studies have been conducted on spreadsheet errors.  An often cited report, What We Know About Spreadsheet Errors by Raymond R.Panko of the University of Hawaii, concludes that “every study that has attempted to measure errors [in spreadsheets] has found them and has found them in abundance”.  If decisions are made based on inaccurate reports then these decisions are more than likely the wrong decisions.  This could lead to disastrous results for the company involved.
A Business Intelligence solution produces reports using data that has been automatically extracted from a cleansed data source (typically a database or data mart) to produce accurate reports.  In order to make important business decisions, for example, as to what new products to carry or what products to drop, it is vital that managers have accurate data in the reports on which they base these decisions.
Data security is a very real problem.  As soon as data is extracted to spreadsheets the potential for abuse is greatly increased.  Spreadsheets can be “lost”, private corporate and sensitive data can be copied onto a number of portable devices, and laptops can be stolen or misplaced.  Cases where private data is made public through negligence occur daily. Think Wiki.
Business Intelligence solutions take advantage of existing security infrastructures to keep private data secure and within the company.  Data within reports is typically presented to employees via the company’s intranet and employees are given access to only the data they require to carry out their specific job functions.
Without a Business Intelligence solution companies may have to resort to dumping vast amounts of data into spreadsheets from their databases.  This in itself is a manual and, in most cases, an extremely time- consuming task.  The spreadsheets then have to be delivered to the person creating the report.  Spreadsheets then have to be consolidated and the data manipulated manually to produce the desired reports.  All this takes time and the data within the reports may be days or weeks old by the time the reports are complete and delivered to the manager.
A Business Intelligence solution provides real-time reports directly to the manager on-demand from any location.  The data in these reports is typically as recent as the data in the data-source it is being extracted from which allows the manager to monitor the business in real-time.  The manager can then base decisions on what is happening now and not last week or even yesterday.
There is a reason that Business Intelligence continues to show up on CIOs’ priority list.  The amount of data being stored by companies is growing exponentially and it needs to be managed.  It needs to be secured and distributed efficiently to enable employees to make important up-to-date business decisions. CIO’s are beginning to understand the realities of this problem and are working to implement Business Intelligence solutions that fit their particular company’s requirements

Source

Why do we need Business Intelligence ?

It’s a classic question that has a classic answer – companies need to translate data into information in order to make strategic business decisions.
Companies continuously create data whether they store it in flat files, spreadsheets or databases.  This data is extremely valuable to your company.  It’s more than just a record of what was sold yesterday, last week or last month.  It should be used to look at sales trends in order to plan marketing campaigns or to decide what resources to allocate to specific sales teams.  It should be used to analyse market trends to ensure that your products are viable in today’s marketplace.  It should be used to plan for future expansion of your business.  It should be used to analyse customer behaviour.  The bottom line is that your data should be used to maximize revenue and increase profit.
All companies produce reports from the data they collect from their business activities.  Every manager has a manager who needs reports unless you’re the CEO in which case you just need reports.
Some important questions you need to ask are:
1. How much resources (i.e. people, time, dollars) does it take to produce these reports?
2. Am I sure that the data in these reports is accurate?
3. Am I concerned with the security of these reports?
4. Am I receiving these reports in a timely manner?
If the answers to these questions are too much, no, yes, and no then you need a Business Intelligence solution.
IT are the first people to begin the process of creating a report.  They need to extract the required data and pass it to the person creating the report.  That person then has to spend time manipulating the data to create the required report.  This process can take many hours, even days, of effort.  And this process needs to be carried out for each and every report that the company requires.
Business Intelligence solutions automate the process of extracting data and producing reports thereby eliminating all of the manual effort of IT and the people creating the reports from raw data.
A number of studies have been conducted on spreadsheet errors.  An often cited report, What We Know About Spreadsheet Errors by Raymond R.Panko of the University of Hawaii, concludes that “every study that has attempted to measure errors [in spreadsheets] has found them and has found them in abundance”.  If decisions are made based on inaccurate reports then these decisions are more than likely the wrong decisions.  This could lead to disastrous results for the company involved.
A Business Intelligence solution produces reports using data that has been automatically extracted from a cleansed data source (typically a database or data mart) to produce accurate reports.  In order to make important business decisions, for example, as to what new products to carry or what products to drop, it is vital that managers have accurate data in the reports on which they base these decisions.
Data security is a very real problem.  As soon as data is extracted to spreadsheets the potential for abuse is greatly increased.  Spreadsheets can be “lost”, private corporate and sensitive data can be copied onto a number of portable devices, and laptops can be stolen or misplaced.  Cases where private data is made public through negligence occur daily. Think Wiki.
Business Intelligence solutions take advantage of existing security infrastructures to keep private data secure and within the company.  Data within reports is typically presented to employees via the company’s intranet and employees are given access to only the data they require to carry out their specific job functions.
Without a Business Intelligence solution companies may have to resort to dumping vast amounts of data into spreadsheets from their databases.  This in itself is a manual and, in most cases, an extremely time- consuming task.  The spreadsheets then have to be delivered to the person creating the report.  Spreadsheets then have to be consolidated and the data manipulated manually to produce the desired reports.  All this takes time and the data within the reports may be days or weeks old by the time the reports are complete and delivered to the manager.
A Business Intelligence solution provides real-time reports directly to the manager on-demand from any location.  The data in these reports is typically as recent as the data in the data-source it is being extracted from which allows the manager to monitor the business in real-time.  The manager can then base decisions on what is happening now and not last week or even yesterday.
There is a reason that Business Intelligence continues to show up on CIOs’ priority list.  The amount of data being stored by companies is growing exponentially and it needs to be managed.  It needs to be secured and distributed efficiently to enable employees to make important up-to-date business decisions. CIO’s are beginning to understand the realities of this problem and are working to implement Business Intelligence solutions that fit their particular company’s requirements

Source

30+ free tools for data visualization and analysis

The chart below originally accompanied our story 22 free tools for data visualization and analysis (April 20, 2011). We're updating it as we cover additional tools, including 8 cool tools for data analysis, visualization and presentation (March 27, 2012), Startup offers 1-click data analysis (Aug. 29, 2012), Infogr.am offers quick Web charts (Oct. 16, 2012) and Create simple, free charts with Datawrapper (Nov. 21, 2012). Click through to those articles for full tool reviews.
Features: You can sort the chart by clicking on any column header once to sort in ascending order and a second time to sort by descending (browser JavaScript required).
Skill levels are represented as numbers from easiest to most difficult to learn and use:
  1. Users who are comfortable with basic spreadsheet tasks
  2. Users who are technically proficient enough not to be frightened off by spending a couple of hours learning a new application
  3. Power users
  4. Users with coding experience or specialized knowledge in a field like GIS or network analysis.     Next page: Screenshots of several tools

Data visualization and analysis tools


Tool
Category Multi-purpose
visualization

Mapping  

Platform
Skill
level   
Data stored
or processed
Designed for
Web publishing?
Data Wrangler Data cleaningNo No Browser 2 External server No
Google Refine Data cleaning No No Browser 2 Local No
R Project Statistical analysis Yes With plugin Linux, Mac OS X, Unix, Windows XP or later 4 Local No
Google Fusion Tables Visualization app/service Yes Yes Browser 1 External server Yes
Impure Visualization app/service Yes No Browser 3 Varies Yes
Many Eyes Visualization app/service Yes Limited Browser 1 Public external server Yes
Tableau Public Visualization app/service Yes Yes Windows 3 Public external server Yes
VIDI Visualization app/service Yes Yes Browser 1 External server Yes
Zoho Reports Visualization app/service Yes No Browser 2 External server Yes
Choosel Framework Yes Yes Chrome, Firefox, Safari 4 Local or external server Not yet
Exhibit Library Yes Yes Code editor and browser 4 Local or external server Yes
Google Chart Tools Library and Visualization app/service Yes Yes Code editor and browser 2 Local or external server Yes
JavaScript InfoVis Toolkit Library Yes No Code editor and browser 4 Local or external server Yes
Protovis Library Yes Yes Code editor and browser 4 Local or external server Yes
Quantum GIS (QGIS) GIS/mapping: Desktop No Yes Linux, Unix, Mac OS X, Windows 4 Local With plugin
OpenHeatMap GIS/mapping: Web No Yes Browser 1 External server Yes
OpenLayers GIS/mapping: Web, Library No Yes Code editor and browser 4 local or external server Yes
OpenStreetMap GIS/mapping: Web No Yes Browser or desktops running Java 3 Local or external server Yes
TimeFlow Temporal data analysis No No Desktops running Java 1 Local No
IBM Word-Cloud Generator Word clouds No No Desktops running Java 2 Local As image
Gephi Network analysis No No Desktops running Java 4 Local As image
NodeXL Network analysis No No Excel 2007 and 2010 on Windows 4 Local As image
CSVKit CSV file analysis No No Linux, Mac OS X or Linux with Python installed 3 Local No
DataTables Create sortable, searchable tables No No Code editor and browser 3 Local or external server Yes
FreeDive Create sortable, searchable tables No No Browser 2 External server Yes
Highcharts* Library Yes No Code editor and browser 3 Local or external server Yes
Mr. Data Converter Data reformatting No No Browser 1 Local or external server No
Panda Project Create searchable tables No No Browser with Amazon EC2 or Ubuntu Linux 2 Local or external server No
PowerPivot Analysis and charting Yes No Excel 2010 on Windows 3 Local No
Weave Visualization app/service Yes Yes Flash-enabled browsers; Linux server on backend 4 Local or external server Yes
Statwing Visualization app/service Yes No Browser 1 External server Not yet
Infogr.am Visualization app/service Yes Limited Browser 1 External server Yes
Datawrapper Visualization app/service Yes No Browser 1 Local or external server Yes
*Highcharts is free for non-commercial use and $80 for most single-site-wide licenses.

Startup offers 1-click data analysis

Spreadsheets are a good tool for looking at data; but if you want more robust insight into your information, software like SAS and SPSS can be somewhat daunting for the non-statistically savvy. "There's a huge gap between Excel and the high-end tools," argues Greg Laughlin, whose fledgling startup Statwing hopes to fill part of that space.
In fact, Excel includes a reasonable number of statistical functions -- the issue is more that even many power users don't know how and when to use them. The idea behind Statwing is to provide some basic, automated statistical analysis on data that users upload to the site -- correlations, frequencies, visualizations and so on -- without requiring you to know when, say, to use a chi-squared distribution versus a z-test.
Once you upload (or copy and paste) data to Statwing, you can select different variables to be used in analysis. The site determines what tests to run on the data depending on the characteristics of the factors you pick, such as your data's sample size and whether variables are binary (i.e. "for" and "against") or continuous (such as a range of numbers).
In one demo, data on Congressional SOPA/PIPA positions was matched with campaign donations from both the pro-SOPA/PIPA entertainment industry and anti-SOPA/PIPA tech lobbies. Statwing's analysis showed a "medium clearly significant" correlation between a legislator's support for SOPA/PIPA and the amount of entertainment industry political contributions he or she received (although there was no statistical significance between opposition to SOPA/PIPA and tech industry contributions).
Statwing sample analysis card
Sample Statwing analysis card

In the Statwing advanced tab, you can see how the site reaches its conclusions. In the SOPA/PIPA example, the correlation was determined via a ranked T-test, a variation on a statistical test that checks for differences between two groups when their variances -- that is, how much the values are spread out from the group's average -- may be unequal.
The site's analysis also found a medium significance in age and support for SOAP/PIPA, with the average age of Congressional supporters almost 6 years higher than opponents.
Statwing currently keeps all data and analyses private, but plans in the work will allow users to share links to data, download and export results and eventually embed analyses and data into a Web page. For now, the company consists solely of its two founders: Laughlin, a former consultant and product manager who sought easier data analysis tools, and John Le, an engineer and data scientist. Both are Stanford grads who previously worked at CrowdFlower.
Statwing was built using the Clojure programming language, Laughlin said, for "actual math" and data handling (not using, as I'd assumed, the R Project for Statistical Computing as the statistics engine); some Ruby on Rails for packaging and Web basics; Coffeescript, which aims to simplify JavaScript syntax; Backbone for organizing front-end JavaScript and the D3 JavaScript library for visualization. The company just launched from the YCombinator entrepreneurial incubator program last week.
Just how useful is Statwing? An automated data analysis service in the cloud is certainly no replacement for an in-house data scientist who can mine your mission-critical data. And, I'd be hard pressed to recommend making a multi-million-dollar business decision based on an automated analysis alone -- especially from a site that's still in beta. No automated tool can ask customized questions about the integrity of your data set or raise a red flag when you're jumping the gun from correlation to causation. Nevertheless, Statwing looks like an appealing resource for professionals who want to try taking their data skills up a notch from means, medians and pivot tables in Excel; it's an interesting way to learn at least one approach to statistically analyzing a data set, or perhaps brush up on statistical skills that have gone a little rusty since college.
If you sign up for the public beta, you can currently try and use the site for free. There will be a limited free option in the future, Laughlin said, with such accounts restricted to analyzing and storing just one data set at a time. Paid accounts will likely run anywhere from $20-$30/month to a couple of hundred dollars a month.


Source