Big data involves interplay between different data management
approaches and business intelligence and operational systems, which
makes it imperative that all sources of business data be integrated
efficiently and that organizations be able to easily adapt to new data
types and sources. Our recent
big data benchmark research confirmed that big data storagetechnologies
continue to follow many approaches, including appliances, Hadoop, and
in-memory and specialized DBMSes. With the variety, velocity and volume
of big data being part of today’s information architecture, and the
potential for big data to be a source to feed other systems, integration
should be a top priority.
Many organizations that have already
deployed big data technology now struggle to access, transform,
transport and load information using conventional technology. Even
replication or migration of data from existing sources can be
troublesome, requiring custom programming and manual processing, which
are always a tax on resources and time. Barriers such as having data spread across too many applications and systems, which
our benchmark research found
in 67 percent of organizations, do not go away just because an
organization is using big data technology; in fact, they get more
complicated. However, big data also creates opportunities to use
information to innovate and to improve business processes. To avoid the
risks and take advantage of the opportunities, organizations need
efficient processes and effective technology that makes information
drawn from big data available to all people who need it.
Organizations
need integration technology flexible enough to handle big data
regardless of whether it originates in the enterprise or across the
Internet. For this reason, tools for big data integration must be able
to work with a range of underlying architectures and data technologies,
including appliances, flat files, Hadoop, in-memory computing and
conventional databases, and move data seamlessly between relational and
non-relational structures. They must be able to adapt to events or
streams of data, and they must harvest data from transactional systems
and business applications in enterprise data warehouses. Supporting data
quality and master data management needs is also part of supporting big
data with data integration.
Selecting the right approach to big
data integration is difficult when organizations lack knowledge of the
functional requirements and best practices relevant to their industries,
lines of business and IT. Deficiencies in existing software and data
environments can further complicate the ability to choose wisely and so
should be factored into the deployment decision-making process.
Organizations must identify the types of integration being used or under
consideration to handle data other than that formatted for relational
databases, and evaluate processing capabilities and techniques to handle
the proliferation of big data. IT professionals therefore must
understand how to work with analysts and business management to deliver
timely, benefit-based big data deployments.
IT
should evaluate whether it can use existing skills to shorten the time
it takes to get big data to users. Since our research has found lack of
resources to be the top barrier to using innovative technology,
according to 51 percent of organizations, businesses should make sure
their IT staff does everything possible to maximize skills and resources
internally and not waste them on custom, manual siloes of effort.
Having the right data integration processes and data management methods
can help IT work more efficiently and partner better with the business
units.
Not having a dialogue about what information management
competencies a business needs is a mistake. I have seen most IT industry
analyst firms’ content deal with just a portion of the big data
picture, discussing for example just the technologies for storing and
accessing data, with a fixation on variety, velocity and volume.
However, decision-makers must consider the efficient flow of data across
its entire path of travel, from its origins to user systems, to ensure
the effective functioning of any big data project. Failure to do that
means failing to optimize information across its life-cycle for business
value. Without the ability see the entire big data value chain, a
business may find its initiatives exceed available limits of cost and
time and damage a business case built on time-to-value metrics.
According to our research, the most important benefits of big data
technologies include retaining and analyzing more data (74%) and
increasing the speed of analysis (70%). Organizations need to make sure
they do not increase the number of manual processes they run and the
time spent on them, thus impairing the value of big data.
We have
begun research to assess the latest big data integration technologies
and best practices to help advance these efforts, as we outlined in our
research agenda on big data and information optimization for 2013.
We will document emerging best practices in big data integration to
meet business needs, from basic access and replication to
transformational migration. Until we can share our results, be sure to
consider big data integration as part of your business case and project,
because it is essential to gaining the most value from your big data
investments.