Analysis of Things (AoT)

Similar documents
Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Predicting Wine Quality

Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

THE STATISTICAL SOMMELIER

2016 China Dry Bean Historical production And Estimated planting intentions Analysis

FACTORS DETERMINING UNITED STATES IMPORTS OF COFFEE

How LWIN helped to transform operations at LCB Vinothèque

STAT 5302 Applied Regression Analysis. Hawkins

Credit Supply and Monetary Policy: Identifying the Bank Balance-Sheet Channel with Loan Applications. Web Appendix

Relation between Grape Wine Quality and Related Physicochemical Indexes

Investment Wines. - Risk Analysis. Prepared by: Michael Shortell & Adiam Woldetensae Date: 06/09/2015

Comparative Analysis of Fresh and Dried Fish Consumption in Ondo State, Nigeria

Gasoline Empirical Analysis: Competition Bureau March 2005

EFFECT OF TOMATO GENETIC VARIATION ON LYE PEELING EFFICACY TOMATO SOLUTIONS JIM AND ADAM DICK SUMMARY

Fromage Frais and Quark Market in Portugal: Market Profile to 2019

OF THE VARIOUS DECIDUOUS and

MBA 503 Final Project Guidelines and Rubric

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

Big Data and the Productivity Challenge for Wine Grapes. Nick Dokoozlian Agricultural Outlook Forum February

Valuation in the Life Settlements Market

From VOC to IPA: This Beer s For You!

Flexible Working Arrangements, Collaboration, ICT and Innovation

Wine Rating Prediction

Monthly Economic Letter

Zeitschrift für Soziologie, Jg., Heft 5, 2015, Online- Anhang

Emerging Local Food Systems in the Caribbean and Southern USA July 6, 2014

Regression Models for Saffron Yields in Iran

How Many of Each Kind?

Introduction to Management Science Midterm Exam October 29, 2002

Report to Zespri Innovation Company Ltd. An Analysis of Zespri s 2003 Organic Kiwifruit Database: Factors Affecting Production

FIRST MIDTERM EXAM. Economics 452 International Trade Theory and Policy Fall 2010

Weather Sensitive Adjustment Using the WSA Factor Method

Starbucks / Dunkin Donuts research. Presented by Alex Hockley and Molly Fox. Wednesday, June 13, 2012

R A W E D U C A T I O N T R A I N I N G C O U R S E S. w w w. r a w c o f f e e c o m p a n y. c o m

FIRST MIDTERM EXAM. Economics 452 International Trade Theory and Policy Spring 2010

Acidity and ph Analysis

End to End Chilled Water Optimization Merck West Point, PA Site

Global Flavor and Fragrance Market Report

and the World Market for Wine The Central Valley is a Central Part of the Competitive World of Wine What is happening in the world of wine?

Vineyard Cash Flows Tremain Hatch

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Demographic Change, Price Subsidy and the Rising Oil Demand in OPEC

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

More information at Global and Chinese Pressure Seal Machines Industry, 2018 Market Research Report

Structures of Life. Investigation 1: Origin of Seeds. Big Question: 3 rd Science Notebook. Name:

Economic Contributions of the Florida Citrus Industry in and for Reduced Production

The role of non-performing loans in the transmission of monetary policy

Acreage Forecast

Evaluating Population Forecast Accuracy: A Regression Approach Using County Data

By Type Still, Sparkling, Spring. By Volume- Liters Consumed. By Region - North America, Europe, Asia Pacific, Latin America and Middle East

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Buying Filberts On a Sample Basis

Global Hot Dogs Market Insights, Forecast to 2025

*p <.05. **p <.01. ***p <.001.

Growth in early yyears: statistical and clinical insights

Method for the imputation of the earnings variable in the Belgian LFS

Chapter 3. Labor Productivity and Comparative Advantage: The Ricardian Model

Y9 EXAM. Mostly on Science techniques!

Resident manager. The ticket to success set up for future of Dining in senior care

What does radical price change and choice reveal?

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

ECONOMICS OF COCONUT PRODUCTS AN ANALYTICAL STUDY. Coconut is an important tree crop with diverse end-uses, grown in many states of India.

Saudi Arabia Iced/Rtd Coffee Drinks Category Profile

Temperature effect on pollen germination/tube growth in apple pistils

STACKING CUPS STEM CATEGORY TOPIC OVERVIEW STEM LESSON FOCUS OBJECTIVES MATERIALS. Math. Linear Equations

Online Appendix to The Effect of Liquidity on Governance

INFLUENCE OF THIN JUICE ph MANAGEMENT ON THICK JUICE COLOR IN A FACTORY UTILIZING WEAK CATION THIN JUICE SOFTENING

Appendix A. Table A.1: Logit Estimates for Elasticities

Update to A Comprehensive Look at the Empirical Performance of Equity Premium Prediction

GLOBAL COMPASS Global Wine Market Attractiveness. July 2018 Report

Global Rum Market Insights, Forecast to 2025

The Future of the Ice Cream Market in Finland to 2018

The Future of the Still & Sparkling Wine Market in Poland to 2019

The Future of the Confectionery Market in South Africa to 2019

OUR POTENTIAL. Business Update MAY 2017

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

US FOODS E-COMMERCE AND TECHNOLOGY OFFERINGS

World sugar market. Platts/Kingsman EU conference Geneva 14th of April Benoît Boisleux

China Sugar Market Review & Outlook

Effective and efficient ways to measure. impurities in flour used in bread making

Liquidity and Risk Premia in Electricity Futures Markets

Application & Method. doughlab. Torque. 10 min. Time. Dough Rheometer with Variable Temperature & Mixing Energy. Standard Method: AACCI

North America Ethyl Acetate Industry Outlook to Market Size, Company Share, Price Trends, Capacity Forecasts of All Active and Planned Plants

OenoFoss. Instant quality control throughout the winemaking process. Dedicated Analytical Solutions

The Market Potential for Exporting Bottled Wine to Mainland China (PRC)

Please sign and date here to indicate that you have read and agree to abide by the above mentioned stipulations. Student Name #4

AWRI Refrigeration Demand Calculator

OPPORTUNITIES FOR SRI LANKAN ELECTRONIC PRINTED CIRCUITS IN TURKEY. Prepared by:

Tariff vs non tariff barriers in seafood trade

EFFECT OF HARVEST TIMING ON YIELD AND QUALITY OF SMALL GRAIN FORAGE. Carol Collar, Steve Wright, Peter Robinson and Dan Putnam 1 ABSTRACT

Trade Integration and Method of Payments in International Transactions

Vegetable Spotlight Broccoli

Missing Data Treatments

Coffee Price Volatility and Intra-household Labour Supply: Evidence from Vietnam

OC Curves in QC Applied to Sampling for Mycotoxins in Coffee

Table A.1: Use of funds by frequency of ROSCA meetings in 9 research sites (Note multiple answers are allowed per respondent)

The Bank Lending Channel of Conventional and Unconventional Monetary Policy: A Euro-area bank-level Analysis

Statistics & Agric.Economics Deptt., Tocklai Experimental Station, Tea Research Association, Jorhat , Assam. ABSTRACT

Transcription:

Analysis of Things (AoT) Big Data & Machine Learning Applied to Brent Crude

Executive Summary Data Selecting & Visualising Data We select historical, monthly, fundamental data We check for correlations between variables using multiple techniques Check for linear relationships within variables Relations Understanding Important Parameters & their Relations Reduced number of parameters to achieve optimal model for both performance and accuracy We deploy recursive algorithms to guide us towards variable selection Prediction The Holy Grail We plot decision trees to understand the inner workings of the model We do some basic-level prediction for Brent Prices through 2016 based on the model

Data Selection Visualising and Normalising data

Data : Fundamental, Monthly, Historical 1. OECD Real Gross Domestic Product, 2. Non-OECD Real Gross Domestic Product, 3. OPEC Total Spare Crude Oil Production Capacity, 4. Crude Oil Production, Saudi Arabia, 5. Unplanned crude oil production disruptions, OPEC, 6. Unplanned liquid-fuel production disruptions, non-opec, 7. OECD End-of-period Commercial Crude Oil and Other Liquids Inventory, 8. Crude Oil and Liquid Fuels Supply, Total Non-OECD, 9. Net Inventory Withdrawals, Total Non-OECD Crude Oil and Other Liquids, 10. OECD Petroleum Production 11. Non-OECD Liquid Fuels consumption 12. OECD Liquid Fuels consumption 13. Crude Prices, End of Month Source: https://www.quandl.com/data/eia

Visualising Pair-wise Relationships

Correlations: Sample pairs Distribution, Correlation Ellipses, Means and Loess Smoothing Loess! Local Polynomial Regression; Perfect Ellipse! Weak Correlation

Correlations : Pearson Coefficient Viewing Correlations between various variables gives us a general sense of what kind of model we can expect

Correlations : Kendall Coefficient The same Correlation chart using another method Kendall s

Variable Selection Which Variables give the most Optimal Model

Prioritising Parameters Which variables to keep and discard Analysing R-squared Analysing each set of features to see which ones should make the cut without increasing the model complexity and without compromising on the model quality. Recursive elimination algorithm used for selecting the best possible combination of parameters.

Recursive Elimination Automated Algo tells us to use 5 variables for optimal model Var RMSE R squared RMSE SD R squared SD 1 6.339 0.961 1.747 0.0198 2 6.176 0.963 1.847 0.0207 3 6.213 0.963 1.574 0.0159 4 5.519 0.970 1.655 0.0163 5 5.443 0.971 1.697 0.0161 6 5.391 0.972 1.638 0.0156 7 5.325 0.972 1.638 0.0158 8 5.544 0.970 1.837 0.0194 9 5.702 0.968 1.774 0.0188 Top five variables suggested by automated algorithm 1. OECD Real GDP 2. Non-OECD Real GDP 3. OECD Inventory 4. Non-OECD Supply 5. Non-OECD LF consumption

Optimizing Selection All 9 variable Considered # Description / # of Variables 9 8 7 6 5 1 Real.GDP.NonOECD 2 Real.GDP.OECD *** *** *** *** *** 3 OECD.Inventory *** *** *** *** *** 4 Non.OECD.Supply.. 5 Non.OECD.LF.Consump ** * ** * 6 Spare.Crude.Prod.Cap.OPEC *** *** *** *** *** 7 OECD.Prodn *** *** *** *** *** 8 Percent.chg.RealGDP.OECD. 9 Crude.Prod.Saudi.Arabia *** *** *** *** *** Multiple R^2 0.9006 0.8994 0.8988 0.8968 0.8935 Adjusted R^2 0.8954 0.8947 0.8947 0.8932 0.8904 Significance codes: very significant ***

Selected Parameters After Elimination / Pruning 1. OECD Real Gross Domestic Product, 2. OPEC Total Spare Crude Oil Production Capacity, 3. Crude Oil Production, Saudi Arabia, 4. OECD End-of-period Commercial Crude Oil and Other Liquids Inventory, 5. OECD Petroleum Production 6. Non-OECD Real Gross Domestic Product, 7. Unplanned crude oil production disruptions, OPEC, 8. Unplanned liquid-fuel production disruptions, non-opec, 9. Percentage change in Real GDP of OECD Countries 10. Crude Oil and Liquid Fuels Supply, Total Non-OECD, 11. Net Inventory Withdrawals, Total Non-OECD Crude Oil and Other Liquids, 12. Non-OECD Liquid Fuels consumption 13. OECD Liquid Fuels consumption

Correlations: Selected Variables Distribution, Correlation Ellipses, Means and Loess Smoothing

End Result Decision Trees and Predictive Analytics

Decision Tree: Final Model Taking the most relevant trees only

Decision Tree: Simplified Final Model

Result: Predicting Brent in 2016 Applying Big Data Analytics & Machine Learning The range is predicted to be between 50-68 $/barrel (only month-end prices were projected.) Based on 1 year forward estimates of fundamental factors as given by EIA.

End Notes: Disclaimers & Conclusions So, the point of it all? To show that Machine Learning, a part of Big Data Analytics, can be applied to Commodities To understand the relations between various fundamental factors and how they affect prices How accurate is the prediction? We ve used Decision Forest and such models which show higher accuracy than other Machine Learning Algos, but still the prediction should be looked at as a range, rather than exact numbers. Such models applied to fundamental data like Demand / Supply might provide better accuracy. Could it have been better? We ve used month-end data for our analysis, from 2001 (180 data points). So yes, more frequent data, with different variables could have given better results. Assumptions were made on future Crude Production in Saudi Arabia. Can there be Other Applications of this model? Similar models can be built to predict Supply & Demand, Detect Fraud in Trading or Operations; Counter-party Credit Risk, Sales Analytics, Predict Prices and many other Custom Solutions

THANK YOU