Over the last 20 years, there has been a lot of big hype about Big Data. At the Big Data Innovation Summit in San Jose, CA last year, I was surprised to find that although most sophisticated, leading-edge companies are interested in big data, many have yet to receive the payoff they were expecting, and few had been able to tap into the power they intuitively feel is still at their fingertips.
Drawing conclusions and generating insight from a vast sea of Big Data can be daunting, even for the most senior data scientist.
Gartner describes Big Data as “high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”
Well what is a cost-effective, innovative form of information processing? You could call it the elusive Algorithm of Desire: the most lightweight, most powerful, most elegant way of solving your business problem.
We know Big Data comes from collecting information that a company creates as it does business. Each team in the business has their own set of tools, and all the pieces of information they collect to do effective business in their day-to-day lives.
For example, Salesforce has many fields, forms, and objects you can create to enhance your sales team’s knowledge of and relationship to customers. Your product will have usage analytics of some form to capture events and performance metrics, usually from a system like Mixpanel, Segment, or Tealeaf. Marketing will use marketing automation, like Marketo or Eloqua, and may have another tool to send transactional email, like Mandrill. IT may use a bug-tracking system, such as JIRA. Customer success may have a ticketing system like Zendesk.
All of these systems contain a multitude of data and signals, all of which can be collect, analyzed, and interpreted for insights.
So how do you do that? Well, you must find the pattern — the relationship between the metrics, the answer hidden in the math behind the sea of values. And what is a metric, but a quantified measure of a specific business operation — a relation between the operation and success.
So the challenge at hand is to take all of these massive sources of data from multiple sources and make sense of them in a way that is meaningful, impactful, and also accurate, and validate that it still exists.
And I believe this is possible through understanding and applying relationships.
In mathematics, a relation is a set of ordered pairs and describes the way things can be connected. Relations can have properties such as being symmetric or isomorphic.
One of the most well-known types of relations are Equivalence relations, which makes it symmetric, transitive, and reflexive.
So how does this apply to relationships in business operations data between disconnected systems?
In a set of business operations data, if you can find equality between the object field values, then you can link the two as having that equivalence relation — and therefore, traverse, manipulate, and make inferences from them.
When I was working as an analyst in travel, our biggest Big Data challenge was breadth: the size of legacy systems, the internal services and teams required to manage all of it, and the data being all over the place.
We were drowning in information. You could never consolidate the data into One Source of Truth. But there were many attempts. Every team that decided they were going to get the edge from Big Data would kick off a project to make their tool the source of truth, and funnel all relevant systems data into theirs for “True” analytics.
Many problems exist with this approach. First, you have to choose which systems you are going to collect, because any collection and connection means development and/or configuration, which in turn equates to cost. So you select the systems which seem most relevant to your business problems, for us it was booking management systems and team tools.
Next, you realize that the amount of data you can collect is all at once massive, duplicative, and incomplete. You then just pick the slice of that data that you think is important, which may also mean removing entries and transforming entries, either because of type or because of value.
Now from a data validity standpoint, you have to admit that the data is no longer the same. It is the data that you have massaged, changed, altered, manipulated. And can you truly say that you did not alter the data in a way to skew it towards what your ultimate goals are? Can you really say that you were objective in the fashion you filtered, or normalized, or sampled? And then again from that standpoint, is this answer or this insight that you plan to create from the data, is it in your faith to be accurate? Is it valid? Is it precise?
Maybe not, but seasoned professionals know that it is possible to get precise “enough.” That makes it valid. That equates to accurate. There is an entire branch of mathematics based on a close-enough guess, which is statistics.
Earlier on in my career I worked for a behavioral assessment firm. We used massive sets of data from behavioral surveys to tell an employer which candidate would be the best for an open position.
To do that, you had to crunch all of the equations regarding traits, defined by the Psych PhDs at the company, and find the relationship between them, which is represented by a fit score. Candidates’ performance “DNA” was compared to the DNA of the highest-performing existing employees, and then those with the highest fit score would be presented as the suggested hires. They had a very high ROI because companies time and time again could show that finding the relationship between behavior and performance was quantifiable.
Now as a Product Manager in Analytics, I spend my time helping customers use the data they have at their disposal to uncover unknown problems, and debug their hardest known problems. Understanding the relationship between behavior and revenue is an ever-increasing superpower in Analytics.
Usermind’s powerful system can mine through all of these data collections, jumping across the separations with ease. You can automate to ensure that your preferred source of truth has the most up-to-date information, as early as real-time for some applications.
You can respond more rapidly, efficiently, and you can do so without constantly having to clean up or massage the data. You are allowed to easily maintain your separate systems while also aggregating their data into one place.
With our intelligent storage system, you can find those hidden relationships between your systems. Don’t worry about having business analysts on hand, who are required to ramp up to each system, and stay there long enough to impact your bottom line. With Usermind, you can find equality relationships in your systems and synthetically build that relationship between your systems within our pipeline.