It’s well known that big data is usually stated in terms of the three Vs: Volume, Variety and Velocity. The three Vs
appropriately sum up the characteristics of big data and convey that
big data is heterogeneous, noisy, dynamic, inter-related and not
trustworthy. Companies now strive to convert the three Vs into Big Promises. And Big Data’s promise can be summarized by three new descriptive terms: Veracity, Value and Victory.
1. Three Vs of Big Promise - Veracity, Value and Victory
Like the three Vs of big data that well describe the characteristics of big data, the volume is based on both variety and velocity; the three Vs of Big Promise also has an internal relationship. The Veracity mined from big data, based on volume and variety, determines the Value of big data. The value determines the Victory when a business appropriately applies in a timely manner. The higher the Veracity
mined from raw data, the more valuable the result, the smarter decision
a business can make, and the more successful the business will become.
All those will lead to big Victory for the business.
While
much around big data remains hype, many companies are in the fledging
stages of drawing value from their big data corpus, and given an army of
discussions and opinions around the topic, it’s still hard to find a
clear roadmap to arrive at the Big Promise.
Step 1: Big Data Collection – Gathering Organic Material
Regardless where you are in the journey – it has to start with understanding the nature of the big data defined by three V’s defined though there is voice that put more dimensions into the big data such as value and veracity. However I do not think they are characteristics of big data in raw. Instead I defined them as two characteristics of big data promise.
Step 2: Big Data Analytics – Gleaning Big Insight
The core technologies are big data platform and big data analytics. The big data platform provides the power of speedy processing with millions of records per second. It harness an integrated technologies for transforming organic/raw content to designed content like Natural Language process (NLP), Data Cleansing, transformation (ETL) and filtering methods. The goal aims to transform semi-structured or unstructured data into structured format for easier understanding, analysis and visualization.
Though in the world of analytics, there are many different kinds of analytics terminologies used and referenced like text analytics, social media analytics, customer, social network, business or sentiment analytics, if given deep thoughts on those terminologies, basically analytics can be categorized into three categories functionally, they are Descriptive Analytics, Relationship Analytics, Prescriptive Analytics. The detail is explained as below for each of them.
1. Descriptive Analytics
Once organic data are transformed into designed data from data processing phase, the first analytics is descriptive or exploratory. This phase uses simple statistics to get a general understanding about the data such as data properties like dimensions and field types, statistical profile or summaries like number of records, missing values or field value max, min, median, field value distribution, etc. The exploratory analysis provides us with initial knowledge about the raw content without any deeper digging internal relationships. The process can suggest right strategies to perform deeper analysis. The phase can be done on a random sampled dataset with simple tools like excel sheet and visualized with basic chart types like bar chart, pie chart and scatter plot, etc. The characteristics of the descriptive analytics are:
- Autonomy, the analytics performed is based on individual fields and their values and it’s self-government and independent of other fields without considering any connections between different fields and contents.
- Shallow and Straight forward, the result from the analysis is usually shallow basic statistics like the frequencies of word count, the number and percent of employees with a earning about 5k within a certain geographic area.
- Simple and Easier understanding – As the method to analyze the data is basic statistical profiling without any extra effort involved, so the result is also simple and easier to understand and visualize.
2. Relationship Analytics
This level analytics aims to dig out embedded valuable insight among the big data. Comparing with the descriptive analytics, the analysis is deeper – in order to succeed at this level, it requires ample mining algorithm or methods like advanced statistics, sophisticated machine learning, inter-disciplined studies, meta or scalable algorithms; the process involved is usually also complicated and performance demanding both in speed and volume.
The reason I called the analytics at this level as relationship analytics because, at this deeper level analytics, its primary goal is to find connection among data elements – the connection may be timely based like sequential dependent relationship or geo location based or functional category based like relationship between production and customer purchasing pattern or transaction based like marketing basket analytics.
During this level analysis, the methods used may be as below:
- Inferential or Association draws insight from data through random processes that are developed with statistical methods. Inferential depends on the right population and randomly sampled. For example, the average children height tends to higher than their parents who are usually lower than average height of adults. For basket analysis, through mining millions of transactions, some of items have the higher probability to be bought together by customers like coffee and coffee mater – creams, etc. some of the conclusions are easier to understand and make common sense, however, the high value comes from the conclusions that are against people’s common sense or wrong assumption.
- Model based analysis uses pre-developed model based on the known observed data to infer or predict what will happen in the future. Under this category, two sub categories are commonly known, classification and predictive modeling. Usually when the target variable is in different categories and the method is called classification; when it’s numerical or continuous variable, it’s called predictive method. Both methods need a training data set that are well labeled and a test dataset that are drawn from the same population with the training dataset. The analysis has two phases involved, first a model is built with training dataset then evaluated or tested with test dataset for measuring its performance. Once the model is developed, it’s used to predict the future events or target variables based on the independent variables. For example, a linear regression model can be built to predict sales amount based on the factors that affect sales in the last three months then predict the next month sales; a decision tree model can be built to predict whether a specific twitter message is positive or negative, etc. Sometimes classification and predictive methods are overlapped based on the business applications.
- Segmentation dynamically group data into different clusters based predefined measurement like distance method. The method is different than the classification or predictive method. It does not need training data or test data. For example, an algorithm can be used to dynamically group similar twitter messages into different clusters.
Prescriptive analysis is actually a business decision based on the conclusions or results drawn from relationship analysis. For a given situation, what kinds of best action to take so that we gain the expected result in the future? Suppose a patient go to see a doctor, first the doctor performs descriptive analysis, fact finding phase, to understand what happened to the patient and some relative factors like daily activities and workloads and food nutrition, next the doctor perform relationship analysis to find out what are the possible factors that cause the patient sick, finally the doctor will give prescription to the patient like medicines to take so that the patient can get well.
Step 3: Reap Big Promise
0 comments:
Post a Comment