Predictive Analytics: How to Forecast the Future
One of the most popular features of Big Data is predictive analytics. Far from the latest business buzzword, predictive analytics is a set of techniques that have become fundamental to the business strategies of many household name brand firms, such as Netflix, Google, and Amazon. These firms, and many others, dominate their respective markets, due in large part to the significant use of predictive analytics.
Predictive analytics is a form of business intelligence gathering, the strategic business use of which is powerful enough to upend an industry. Driven by the tremendous-revenue generating potential of predictive analytics, more firms are investing in the necessary infrastructure, such as data storage and processing hardware and software and both database administrators and data analysts. As they do so, predictive analytics tools and techniques, grow in sophistication and refinement. Moreover, as more firms adopt predictive analytics, and incorporate it into their existing strategies, they fuel its widespread adoption, as competitors must adopt it or risk losing significant market share.
In this article, we will cover 1) the definition of predictive analytics; 2) discuss data analysis; and 3) the types of predictive analytics; as well as cover 4) using predictive analytics; 5) the benefits of predictive analytics; 6) the risks of predictive analytics; and 7) a real-life example of a firm using predictive analytics.
WHAT IS PREDICTIVE ANALYTICS?
Predictive analytics is an assortment of statistical and mathematical techniques used to predict the probability of future events occurring. Fundamentally, statisticians and data scientists combine and standardize a variety of historical datasets to develop correlative statistical models that firms, research organizations, and even governments use to forecast a wide range of phenomena.
The field’s origins lie in the beginnings of the computer age in the 1940s, specifically with the U.S. government’s use of computational models during World War II. Notable examples include the development of the Kerrison Predictor in 1940, which automated anti-aircraft weapon targeting, and the use of computer simulations by the Manhattan Project to determine the probable results of nuclear chain reactions in 1944.
Just as computers and computing technology have grown exponentially since then, so too has the field of predictive analytics. In 2012 alone, technology users generated 2.5 exabytes of data per day – an estimated three-quarters of which is text, audio, or video messages. That’s a lot of data for firms to leverage, and with data storage prices and space requirements having shrunk exponentially since the 1940s (indeed, from even a decade ago), the adoption of predictive analytics is an increasingly cost-effective proposition – if not, exactly a simple one.
Eric Siegel answers eight questions about predictive analytics
In addition to either developing the necessary infrastructure in-house to leverage predictive analytics, or outsourcing their business intelligence gathering, a firm must determine what questions they will use predictive analytics to answer. Predictive analytics, whether done externally or internally, is costly in terms of time and labor, as the answers to these questions are the result of intensive research, involving multiple datasets with many variables.
It is important for data scientists to be able to link and visualize datasets in order to interpret them better. While computers have gotten faster and better at processing vast amounts of data, human insights lie at the root of the answers to Big Data questions. It is also important to understand that the answers to predictive analytics are, for the most part, correlative, not causative, by nature. This means that data scientists are looking at the probability of an event based on the event happening under similar conditions. A failure to understand the deeper underlying reasons – the causes – of the event, can lead to inaccurate predictions.
TYPES OF PREDICTIVE ANALYTICS
There are several types of predictive analytics methods, including predictive modeling, design analysis and optimization, transaction profiling, and predictive search.
When most laypeople discuss predictive analytics, they are usually discussing it in terms of predictive modeling. Indeed, predictive modeling is at the heart of predictive analytics, and has been popularized in science fiction as well as by the financial services industry.
It involves mathematically modeling associations between variables in historical data, in order to predict or forecast the likelihood of a future event. Commonly used in the financial services industry to predict the behavior of capital markets, predictive analytics is increasingly being used for sales and revenue forecasting, dynamic pricing, online recommendation systems, strategic planning, and other business areas requiring decision-making about the future.
Predictive modeling yields the probabilities of event occurrences based on previous event occurrences; as such there is no guarantee that a desired event will occur (or conversely an undesired event will fail to occur). Understanding this can reduce overreliance on the models.
Decision analysis and optimization
Decision analysis and optimization is a subfield of predictive analytics that deals with reducing the uncertainty inherent in decision-making. Specifically, it involves aspects of a decision, and/or multiple decisions to determine the one likely to yield the most success. Firms often use decision analysis and optimization in functional areas, such as supply chain management to ensure the firm’s decisions maximize revenue and result in a firm achieving and/or exceeding other key performance goals.
For example, a distribution chain optimization problem might involve determining the ideal mix of online and brick-and-mortar retailers to use to achieve a target revenue goal. Using SAS Analytics, IBM SPSS Modeler, another popular predictive modeling application suite, or internal proprietary software, a data scientist can import multiple datasets (such as historical wholesale prices, local and online retailers, distribution costs by distribution method, and more), build models, and test and retest results.
Transaction profiling involves aggregating and filtering information from transactions involving enterprise software. These can include, but are not limited to, credit card transactions on an online retailer’s website, and logins to a proprietary social network; there are often isolated datapoints. This subfield involves standardizing this data and clustering it with relevant data in ways that can allow a firm to create predictive models of transactional data.
Predictive search, fundamentally, involves creating algorithms that take one set of inputs and finds a particular output. However, the increasing sophistication, and in some cases, the incompleteness, of inputs requires algorithms that return the best possible answer.
To illustrate this, consider two co-workers. The first asks the second for a restaurant suggestion for a business lunch. The second can make the recommendation based on their knowledge of the first co-workers personal preferences, likes/dislikes, and knowledge of the area. A search engine, hypothetically, has realms of data to make a strong recommendation, such as the user’s geographic location, online mentions of personal preferences.
Further, the second co-worker might immediately realize, that the first co-worker might actually need a vegetarian restaurant for this particular meeting. Predictive search also involves deep dives into multiple datasets to provide you with a personalized output that gets at the underlying reason for your input. Ideally, a search query might “recognize” that the restaurant recommendation is likely for a particular meeting on your online calendar, further “recognize” that the client is a vegetarian, and return restaurants that fit this need. Predictive search developments will harness more and more data in assessing the best possible answer to return.
USING PREDICTIVE ANALYTICS
Predictive analytics can be used for a variety of business strategies, and has even give rise to many business models, such as search, search advertising, and recommendation engines. Firms must determine the costs and benefits of developing the in-house capabilities to do this, or outsourcing their Big Data needs to a third-party market research firm. Both approaches have time, cost and labor benefits and drawbacks for any firm; however, with other firms increasingly using predictive analytics, each firm will have to map its Big Data strategy now or in the near future. Once a strategy has been determined, the firm must determine what insights will best inform their strategy and then use predictive analytics to obtain them.
BENEFITS OF PREDICTIVE ANALYTICS
Predictive analytics benefit any decision by providing executives, managers and other decision-makers with the tools to make the best possible decision. Some applications include, but are not limited to predictions of customer purchasing likelihood, for use in targeted marketing and upselling; sales and revenue forecasting; optimize marketing channel, supply chain, distribution chain, and manufacturing optimization; and new product development.
Really, there are no limits to the potential applications of predictive analytics for optimization and forecasting. Even scientific organizations and governments have begun to invest in the resources necessary to leverage predictive analytics.
RISKS OF PREDICTIVE ANALYTICS
There are several risks to using predictive analytics, though most stem from overreliance on this set of tools. Executives and managers must understand that predictive analytics involves probabilities and correlation, which are not absolute. Data scientists must strive to filter out all of the noise from datasets to ensure accurate and replicable modeling results. They must further strive to present these results as actionable insights with risk parameters for each choice.
Asking the wrong questions
Awash in reams of data, it is critical that firms ask the right questions. Predictive analytics is most efficient when used to determine the answer to a narrow inquiry, such as the likelihood of customer A to buy product X at time Y for price Z, rather than the likelihood of customers buying product X (as might be asked by a layman). Further, data scientists must be able to test assumptions and pivot quickly from erroneous ones. For example, if a question involves the impact of a marketing technique on sales – one favored by the CEO and widely assumed to have a significant impact, and later studies determine it has no effect, the data scientist must be able to assess the remainder of the question freely.
Data scientists must take the general questions that may come from executives and managers and extract the root business need. To fulfill this need, they must use the data to create appropriate recommendations by determining the appropriate datasets, filter out extraneous information, build models, and test and retest them.
Data scientists must be aware that not all data is accurate, arrive at an estimate of bad data, and correct for it in their studies. Data can be bad for any number of reasons, including self-reporting errors, corrupted files, poorly phrased questions, incomplete data aggregation, and poor standardization methods.
It is critical that data scientists quickly recognize and filter bad data from their data sets. They must also make sure they do not create bad data themselves – for example through an imperfectly calculated transformation function. Further, they must take the time to improve aggregation and standardization methods to limit the collection of bad data. Without reasonably accurate data, data scientists cannot build predictive analytics models whose assumptions will hold.
Complexity and unpredictability
Big Data is messy, consisting of everything from social media mentions to traffic camera images to website logs. Predictive analytics, being a set of statistical techniques, requires all data to be standardized and quantified. Quantifying non-numeric data has its own risks and creates uncertainty.
Further, data is unpredictable, especially dynamic data. A model that accurately forecasts future events could be thrown into disarray by a sudden unanticipated cascade of events, which were not initially estimated. Such was the case in 2007, when the majority of financial services firms failed in incorporate the possibility of sudden credit defaults, which triggered a series of other events that prior to 2007 would have been improbable.
Privacy and security
Many privacy advocates find such data usage invasive and alarming. There is something inherently intrusive about firms collecting information about individuals in order to predict their behavior. Advocacy efforts include lobbying for limitations to data collection types, amounts and methods in nations across the globe. Executives and data managers must be aware of the ever-changing Big Data regulatory landscape.
Privacy is a huge concern for another reason – security. Hackers target data storage devices and facilities for financial gain, ideological reasons, and thrills. With many nations holding firms at least partially responsible for the damage caused by loss of secured data, firms must ensure they keep up-to-date with the latest data security measures. If they outsource their data analysis to a business intelligence vendor, they are likewise compelled to ensure that the business intelligence vendor secures the firm’s data appropriately.
Predictive analytics are a major source of competitive advantage for Amazon, so much so that Amazon has taken market share from many brick and mortar retailers across the U.S., and even other parts of the world. Amazon uses predictive analytics to power its recommendation algorithms that help the retailing giant upsell, as well as to make its distribution system more efficient.
Amazon provides site visitors with product recommendations based on your viewing history. As that viewing history grows, Amazon’s algorithms, using the increased data, create increasingly useful and accurate recommendations. The firm also offers discounted pricing, and/or package deals in order to entice you upsell, as well as premium pricing when demand is high and inventory is low.
Beyond Amazon’s on-screen predictive analytics applications, the retailer has begun to ship products in advance of customer orders, based on the results of its predictive models. Amazon filed a patent on a “method and system for anticipatory package shipping” in 2012, designed to increase the efficiency of its distribution chain. By harnessing this method during peak volume periods, such as the holidays, Amazon, whose predictive analytics models have already demonstrated a high probability of accuracy, can ensure that it has the inventory on hand to distribute and that goods are distributed beforehand, minimizing customer dissatisfaction.
Amazon’s use of predictive analytics has been instrumental in its dominance of the online retail space in the U.S., in which it is the market leader as of 2014, with net sales of nearly $60 billion.
In Palo Alto (CA), we talked with venture capitalist Andrew Ogawa from Quest Venture Partners …