Machine Learning and Alternative Data Approach to Investing
Big Data and Machine Learning Overview
Fund managers are using more and more quantitative methods in their pursuit of alpha and uncorrelated strategies. With the availability of other data sources and new quantitative Machine Learning tools to analyze this data, a new source of competitive advantage is developing beyond tactics focused on alternative risk.
Big Data has the potential to significantly alter the economic environment in this regard and further shift market trends from a discretionary to a quantitative investment approach.
The Big Data revolution was made possible by three trends:
- The amount of data available has increased exponentially.
- More affordable increases in processing power and data storage.
- Improvements in machine learning techniques for complicated dataset analysis.
Figure 1: Factors leading to Big Data Revolution
Over the past ten years, systematic large-scale innovative data gathering, organization, and distribution have produced the concept of big data; see Laney (2001). The designation “Big” denotes three salient qualities:
Volume: The quantity of data gathered and saved through transactions, tables, files, records, etc., is quite vast, and the subjective lower constraint for being referred to as “Big” is continuously being updated upward.
Velocity: Data is frequently identified as being Big Data based on how quickly it is transmitted or received. Data can arrive in real-time or close to real-time and can be streamed or received in batch mode.
Variety: Data is frequently received in a variety of formats, including structured data (such as SQL tables or CSV files) and semi-structured data (such as JSON or HTML) (e.g., blog posts or video messages).
ML (machine learning)
The more general disciplines of computer science and statistics include machine learning. Machine learning aims to provide computers with the ability to gain knowledge from their experience in certain activities. Additionally, machine learning helps the system to become more effective over time. Machine learning is a model-free (or statistically- or data-driven) approach to finding patterns in massive amounts of data. Deep learning is based on neural network techniques and is used for both pattern detection in structured data and the processing of unstructured data (such as photos, speech, sentiment, etc.).
Artificial Intelligence (AI)
Artificial intelligence (AI) is a more general concept that enables robots to possess cognitive intelligence similar to that of humans (note that in this report, we sparsely use this term due to its ambiguous interpretation). In the early stages of artificial intelligence development, many rules and pieces of knowledge were hardcoded into computer memory. The most serious attempt at attaining AI to date is represented by machine learning and, more particularly, deep learning. Deep Learning has already achieved some outstanding successes in the areas of pattern and picture recognition, language comprehension and translation, and task automation for difficult jobs like driving a car.
Big Data Industry
Currently valued at $130 billion, the worldwide market for big data, related technologies, and analytics is predicted to reach $200 billion by 2020. One of the main forces behind this increase is the financial sector, which accounts for 15% of all spending. Our estimate of the Big Data spending in the investment management sector is in the $2–3 billion range, and a double-digit yearly increase is anticipated (e.g., 10–20 percent, in line with Big Data growth in other sectors). Spending on dataset acquisition, the development of Big Data technologies, and employing qualified personnel are all included.
Alternative data providers now have a very fragmented market. Over 500 specialist data companies are included in our database of alternative data sources. As the big data industry develops, we anticipate some amount of consolidation among data providers.
A Snippet of Big Data Analysis History
Our data sets are referred to as big/alternative data in this primer by DCPTG. When referring to huge data sets, such as financial time series like tick-level order book data, the three Vs. of volume, velocity, and variety are frequently used to identify them. Alternative data refers to information that has gotten less attention from market players but may still be useful in predicting future returns for some financial assets. This information is often, but not always, non-financial. Traditional data, by which we mean common financial information like daily market prices, corporate filings, and management reports, is distinguished from alternative data.