Market Basket Analysis of Ingredients and Flavor Products. by Yuhan Wang A THESIS. submitted to. Oregon State University.

Similar documents
Problem Set #3 Key. Forecasting

2007 Sonoma Research Associates - All rights reserved.

Structures of Life. Investigation 1: Origin of Seeds. Big Question: 3 rd Science Notebook. Name:

The Market Potential for Exporting Bottled Wine to Mainland China (PRC)

Coffee market ends 2017/18 in surplus

QUARTERLY REVIEW OF THE PERFORMANCE OF THE DAIRY INDUSTRY 1

Mango Retail Performance Report 2017

MANGO PERFORMANCE BENCHMARK REPORT

Predicting Wine Quality

Coffee market remains volatile but lacks direction

Chapter 3. Labor Productivity and Comparative Advantage: The Ricardian Model

Preview. Chapter 3. Labor Productivity and Comparative Advantage: The Ricardian Model

Coffee market continues downward trend

Chapter 3. Labor Productivity and Comparative Advantage: The Ricardian Model. Pearson Education Limited All rights reserved.

MBA 503 Final Project Guidelines and Rubric

Composition and Value of Loin Primals

Gasoline Empirical Analysis: Competition Bureau March 2005

Record exports in coffee year 2017/18

Pinto and Great Northern Bean Prices: Historical Trends and Seasonal Patterns

Growing divergence between Arabica and Robusta exports

Candidate Agreement. The American Wine School (AWS) WSET Level 4 Diploma in Wines & Spirits Program PURPOSE

Multiple Imputation for Missing Data in KLoSA

Record Exports for Coffee Year 2016/17

Can You Tell the Difference? A Study on the Preference of Bottled Water. [Anonymous Name 1], [Anonymous Name 2]

UPPER MIDWEST MARKETING AREA THE BUTTER MARKET AND BEYOND

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Haystack at Scale in Australia & Data Driven Gap Analysis

Coffee market ends 2016/17 coffee year in deficit for the third consecutive year

DETERMINANTS OF DINER RESPONSE TO ORIENTAL CUISINE IN SPECIALITY RESTAURANTS AND SELECTED CLASSIFIED HOTELS IN NAIROBI COUNTY, KENYA

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

2018/19 expected to be the second year of surplus

Preview. Introduction. Chapter 3. Labor Productivity and Comparative Advantage: The Ricardian Model

Module 6. Yield and Fruit Size. Presenter: Stephan Verreynne

Analysis of Things (AoT)

Preview. Introduction (cont.) Introduction. Comparative Advantage and Opportunity Cost (cont.) Comparative Advantage and Opportunity Cost

Coffee market settles lower amidst strong global exports

Seasonal trends in hectares planted, sales volumes on markets and market prices. Pieter van Zyl, Potatoes South Africa

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

What makes a good muffin? Ivan Ivanov. CS229 Final Project

STUDY REGARDING THE RATIONALE OF COFFEE CONSUMPTION ACCORDING TO GENDER AND AGE GROUPS

Buying Filberts On a Sample Basis

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

Introduction to Management Science Midterm Exam October 29, 2002

Improving Capacity for Crime Repor3ng: Data Quality and Imputa3on Methods Using State Incident- Based Repor3ng System Data

Volatility returns to the coffee market as prices stay low

Coffee prices maintain downward trend as 2015/16 production estimates show slight recovery

July marks another month of continuous low prices

Prices for all coffee groups increased in May

World coffee consumption increases but prices still low

Drinks Ontario Fall Members Meeting 22 November 2013

How LWIN helped to transform operations at LCB Vinothèque

Sample. TO: Prof. Hussain FROM: GROUP (Names of group members) DATE: October 09, 2003 RE: Final Project Proposal for Group Project

2016 China Dry Bean Historical production And Estimated planting intentions Analysis

Whether to Manufacture

Price monitoring of key food items in Donetsk and Luhansk Oblasts

Dairy Outlook. December By Jim Dunn Professor of Agricultural Economics, Penn State University. Market Psychology

ARM4 Advances: Genetic Algorithm Improvements. Ed Downs & Gianluca Paganoni

Cultivation Pattern:

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Capacity Utilization. Last Updated: December 21, 2016

(A report prepared for Milk SA)

Downward correction as funds respond to increasingly positive supply outlook

Corn and Soybean CORN OUTLOOK SOYBEAN OUTLOOK STATISTICS AND ANALYSIS

NO TO ARTIFICIAL, YES TO FLAVOR: A LOOK AT CLEAN BALANCERS

IN THIS ISSUE FEBRUARY Financial Calendar: Late September 2014 Annual Results Announced. 26 March 2014 Interim Results Announced

Please sign and date here to indicate that you have read and agree to abide by the above mentioned stipulations. Student Name #4

Step 1: Prepare To Use the System

Blow Molding Machine Produced by IAR Team Focus Technology Co., Ltd

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

THOMSON REUTERS INDICES CONTINUOUS COMMODITY TOTAL RETURN INDEX

Fairfield Public Schools Family Consumer Sciences Curriculum Food Service 30

Association Rule Mining

What s the Best Way to Evaluate Benefits or Claims? Silvena Milenkova SVP of Research & Strategic Direction

Instruction (Manual) Document

Relation between Grape Wine Quality and Related Physicochemical Indexes

Running head: CASE STUDY 1

1/17/manufacturing-jobs-used-to-pay-really-well-notanymore-e/

Overview of the Manganese Industry

Specialty Coffee Market Research 2013

Bishop Druitt College Food Technology Year 10 Semester 2, 2018

COMPARISON OF CORE AND PEEL SAMPLING METHODS FOR DRY MATTER MEASUREMENT IN HASS AVOCADO FRUIT

CHAPTER 9 THE DRY BEAN SUPPLY CHAIN

The aim of the thesis is to determine the economic efficiency of production factors utilization in S.C. AGROINDUSTRIALA BUCIUM S.A.

QUARTELY MAIZE MARKET ANALYSIS & OUTLOOK BULLETIN 1 OF 2015

TRTP and TRTA in BDS Application per CDISC ADaM Standards Maggie Ci Jiang, Teva Pharmaceuticals, West Chester, PA

Menu Labeling Evaluation

GLOBAL DAIRY UPDATE KEY DATES MARCH 2017

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

FAST FOOD PROJECT WAVE 1 CAMPAIGN: PREPARED FOR: "La Plazza" PREPARED BY: "Your Company Name" CREATED ON: 26 May 2014

For personal use only

CHAPTER I BACKGROUND

Computerized Models for Shelf Life Prediction of Post-Harvest Coffee Sterilized Milk Drink

Work Sample (Minimum) for 10-K Integration Assignment MAN and for suppliers of raw materials and services that the Company relies on.

Regression Models for Saffron Yields in Iran

IT 403 Project Beer Advocate Analysis

OUR POTENTIAL. Business Update MAY 2017

FBA STRATEGIES: HOW TO START A HIGHLY PROFITABLE FBA BUSINESS WITHOUT BIG INVESTMENTS

MONTHLY COFFEE MARKET REPORT

MGEX Spring Wheat 2013

Trends. in retail. Issue 8 Winter The Evolution of on-demand Food and Beverage Delivery Options. Content

Transcription:

Market Basket Analysis of Ingredients and Flavor Products by Yuhan Wang A THESIS submitted to Oregon State University Honors College in partial fulfillment of the requirements for the degree of Honors Baccalaureate of Science in Business Information System (Honors Associate) Presented August 31, 2016 Commencement June 2017

2

AN ABSTRACT OF THE THESIS OF Yuhan Wang for the degree of Honors Baccalaureate of Science in Business Information System presented on August 31, 2016. Title: Market Basket Analysis of Ingredients and Flavor Products Abstract approved: Bin Zhu In today, information plays a more and more important role in business world. The companies are full of data, but poor of valuable information extracted from that diverse data. Data mining, as a process of transferring data from variety perspectives to useful information, is the new trend for businesses. It can be used to make more targeted business model or strategies decisions. Market basket analysis is one of the most useful modeling techniques in data mining. It usually used to analyze customer purchases behaviors. The results can provide decision-makers more valuable information such as marketing strategies making, inventory controlling, and cross sales. The main objective of this thesis is to study how 15 flavor products sold by Zengcheng Handyware Seasoning Company in five different regions interrelate. And based on the correlation results, how to provide valuable information to improve company s marketing activities. Key Words: data mining, market basket analysis, association rules Corresponding e-mail address: jasminewang417@gmail.com 3

Copyright by Yuhan Wang August 31, 2016 All Rights Reserved 4

Table of Content Chapter 1: Introduction... 7 1.1 Overview... 7 1.2 Research Problem Description... 7 Chapter 2: Data... 9 2.1 Data Description... 9 2.1.1 Basic description... 9 2.1.2 Description of regions... 9 2.1.3 Description of dataset... 11 2.2 Dataset Adjustment... 12 2.3 Considerations and Assumptions... 13 2.4 Research Questions... 14 Chapter 3: Research Methodology... 15 3.1 Market Basket Analysis... 15 3.2 Association Rule... 15 3.2.1 Definition of Association Rule... 15 3.2.2 Process of Association Rule... 18 3.2.3 Frequent Set of Items Generation... 18 3.2.4 Association Rules Generation... 22 Chapter 4: Data analysis and results... 24 4.1 Market Basket Analysis... 24 4.1.1 Analysis Package Selection -- SPSS... 24 4.1.2 SPSS Modeling Process... 25 4.2 Results... 28 4.2.1 Result for All Regions... 28 4.2.2 Result for the North China Region... 32 4.2.3 Result for the South China Region... 36 5

4.2.4 Result for the Mid China Region... 39 4.2.5 Result for the East China Region... 43 4.2.6 Result for the South West China Region... 46 Chapter 5: Conclusions... 50 5.1 General Discussion... 50 5.2 Academic Contribution... 50 5.3 Business Contribution... 51 5.4 Limitations of the study... 51 5.5 Directions for Future Research... 52 Bibliography... 53 Appendices... 54 Appendix A: Data Description... 54 Appendix B: Results... 58 6

Chapter 1: Introduction 1.1 Overview Today, we live in the age of information. More and more data from multiply dimensions are surrounding us. For the companies, the age of information provide them an opportunity to collect enormous amounts of data. In order to transfer the data into information, data mining become more and more popular in business world. Data mining usually starts from data collection. And clients are the source of data collection. Every purchasing transaction includes multi-angles data about clients. By analyzing and summarizing the large number of data, it is easy to extract useful information from raw data. The hidden patterns or models can be found also. The valuable information includes hidden patterns can be used in multiply business activities such as increase profit, decrease cost, and making market strategies. There are various statistical algorithms in data mining. For instance, classification, clustering, regression, artificial intelligence, neural networks, decision trees, and association rules are all important data mining techniques to study knowledge from data (Ramageri). In this study, we decided to use association rules, which also called market basket analysis, to mining the provided dataset. As a data mining technique, market basket analysis always focuses on what goes with what. It can provide the researcher more information than just the products in the shopping cart. In this research, market basket analysis method can help us to study the relationship between different flavor products purchases in different regions. Then, the deeper analysis results can provide the company more useful marketing strategy recommendations and guides for any specific region. 1.2 Research Problem Description In the past decades, market basket analysis usually appealed in retail market or e-commerce market. However, the advanced technology makes not only retailers have the opportunity to collect customers data, manufacturer can also gather their clients information to provide high quality products and technical support. Companies in multiple industries can use market basket analysis to help their decision-making. This technique can help the business to eliminate the blind market 7

and know more about their customers. By applying market basket analysis, we can find valuable information about customers, and hidden connection between products that help decision-makers to do further strategy making. The main objective of this thesis is finding out how to create or improve recommendation of flavor products to customers in different regions based on their shopping behaviors. Mining association rules based on analyzing transaction-based dataset can provide us useful information about co-purchases products. Classified the dataset by regions will help us to figure out the product preferences in different regions. 8

Chapter 2: Data 2.1 Data Description 2.1.1 Basic description The dataset used in this study is a collection of 15 different flavor products sales records from 2013 to 2015. The study is based on this simulated dataset provided by Zengcheng Handyware Seasoning Company. The simulated dataset still follows the original dataset s trend and characteristics. The sales records were classified by both year and region. In the dataset, there are five main regions: the South China Region, the North China Region, the Mid China Region, the East China Region, and the South West China Region. 2.1.2 Description of regions There is an obvious distance between the South China Region and the other four regions. And the sales volume for the South West China Region is only half of the South China Region. See Table 2.1 for more detailed information. Table 2.1 Regional Monthly Sales (kg) Comparison South China North China Mid China East China South West China Jan. 43270.00 26500.33 25019.33 30633.67 23089.67 Feb. 20637.67 10923.00 13426.33 13542.00 11363.00 Mar. 38186.67 24626.67 23568.00 27279.33 21145.00 Apr. 35506.67 24187.33 22812.00 25886.67 19758.33 May. 38698.33 22212.67 20531.67 27705.67 15933.67 Jun. 32966.33 22851.00 20334.67 26991.33 19392.33 Jul. 47471.33 29238.33 24165.67 29502.00 20293.67 Aug. 55559.00 29363.67 29857.33 32669.67 23031.33 Sep. 62206.33 34186.00 29722.00 38224.00 29586.00 Oct. 47238.67 22657.33 22453.33 29942.67 16073.67 Nov. 59163.00 27041.00 25675.67 28846.67 19397.33 Dec. 56090.67 33801.67 28716.00 30300.33 23652.67 9

The average monthly sales volume for different regions also showed us the aggressiveness of the South China Region. This region s monthly average sales volume is the double of the South West Region, which is the weakest during the past three years. See Table 2.2 for more detailed statistics. Table 2.2 Regional Sales Comparison Average Sales per month (kg) Average Monthly Sales in 2013 (kg) Average Monthly Sales in 2014 (kg) Average Monthly Sales in 2015 (kg) South 44,749.56 39,591.50 42,350.17 52,307.00 North 25,632.42 24,102.17 23,889.67 28,905.42 Mid 23,856.83 20,150.50 22,078.08 29,341.92 East 28,460.33 28,556.83 29,613.67 27,210.50 South West 20,226.39 18,443.00 19,870.00 22,366.17 The fluctuations of the company s total monthly sales from 2013 to 2015 are stable. The sales volume always starts from the lowest February then increase to the first high value at March. After fluctuating smoothly from March to June, it increases to the peak on September substantially. From September to next February, the sales volume usually fluctuates smoothly. In 2013 to 2015, peaks are located at the end of the summer (August or September), and the lowest point happens in February. The reason why both of February and October have lower monthly sale is related to nationwide holiday. In February, there is a 15 days break nationwide in China to celebrate spring festival. And in October, there is a 7 days nationality holiday. During the holiday, there are no orders and productions. Therefore, both of these two months usually have a lower sales than other months. The peak happened in August related to summer break schedule. In China, all of the primary schools, middle schools, high schools, and universities have the summer break during July to September. As the main purchasing power, students will consume more snacks products during the summer break. So during August, the sales usually goes well. Another peak in November represents that manufacturers are all preparing for the spring festival. Every year, the snacks consumption during spring 10

festival is very high. Therefore, November usually is another high point during the year. Figure 2.1 Regional Monthly Sales 80000 2013-2015 Regional Monthly Sales KILOGRAM 60000 40000 20000 0 Jan. Feb. Mar. Apr. May. Jun. Jul. Aug. Sep. Oct. Nov. Dec. 2013 2014 2015 2.1.3 Description of dataset The raw dataset is classified by region. All five regions sales records have been listed. The transaction database consists of the following five elements: Product name name of the products sold in all regions. Abbreviation descriptions are listed. (Table 1, appendix A) Month month of information collected Year year of information collected Region the name of the five different regions Sales volume the sales volume in kilogram This dataset includes 15 products, which sold in five regions. Some products, such as Tomato Flavor and Crab Flavor, have the greater volume of sales than other flavors since they have a wider market acceptance. Most end products includes these flavors. Some products sales, such as Chives flavor or Hot & Spice Sichuan Flavor are much lower than the average monthly sales because of the small client base. These kind of products are currently not mainstream product in the market. They are only accepted by a small group of customers. Hence, there is a big sales gap between popular products and niche products. For instance, the most popular product, Tomato Flavor, has a ten times average monthly sales than the Hot & Spice Sichuan Flavor. 11

And most products average monthly sales are lower than the average number. Table 2.3 provides more detailed information. Table 2.3 Product Name Average number of unique items per Difference with Total Average Sales month Tomato Flavor 8462.61 6556.93 Crab Flavor 5406.44 3500.77 Sauced Beef Flavor 1726.86-178.81 Cheese Corn Flavor 1632.90-272.77 Barbecue Flavor 1617.11-288.56 Pepper Beef Steak Flavor 1395.00-510.67 Chicken Flavor 1203.71-701.96 Curry Barbecue Flavor 1168.66-737.01 Cumin Barbecue Flavor 1160.65-745.02 Hot & Spicy Salt Flavor 959.89-945.78 Pork Steak Flavor 920.27-985.41 Yolk Flavor 902.23-1003.44 Kimchi Flavor 541.23-1364.45 Chives Flavor 592.36-1313.31 Hot & Spicy Sichuan Flavor 895.18-1010.50 2.2 Dataset Adjustment One of the crucial points of the Market Basket Analysis is studying how a product cross sell to customers. In another words, researcher usually more focused on what kind of product that customers purchased instead of how many they purchased. Therefore, depending on the research questions, the raw dataset should be adjusted from the original sales-based to a true or false dataset. A true or false dataset should only include the information about whether do customers purchase the products or not. In usual, 1 is used to represent true, which the customer purchases this specific product, and 0 means false, which the 12

customer does not purchase it. In order to adjust the dataset, we decided to find a cutoff point to differentiate the data between 1 and 0. In the given dataset, the certain differences in products sales volume were too large to be ignored. We cannot use one fix number as the cut-off point for all products. What s more, since the provided dataset is simulate, there may also has inevitable bias during the simulation process. In order to avoid the bias and decrease the simulation error, we decided to use the mean as the cut-off point. For the dataset includes all sales records in five regions, the cut-off point for every product is the average monthly sales volume in all regions for each product. For the dataset only includes specified region, the cut-off point for every product is the average monthly sales volume in specified region. If the monthly sales is higher than the cut-off point, it was marked as 1, if lower, then 0. The adjusted datasets are available (see Appendix A, Table 2). 2.3 Considerations and Assumptions 1. Sales Record The given dataset represents the 15 products monthly sales in five regions from 2013 to 2015. However, since the Market Basket Analysis requires the transaction-based dataset, we decide to treat one region s one-month sales record as one transaction. 2. Language The initial dataset is in Mandarin. So the names of all products have been translated to English (See Appendix A, Table 1). 3. Simulation The given dataset is not the real sales records. All of the data have been simulated for confidentiality purpose. But the simulated datasets keep the main original trend and characteristics. Bias may exist because of the uncertainty of simulation. 4. Correlation A general assumption in this study is that sales of different products are correlated. 13

2.4 Research Questions Regional-oriented Research Questions: 1. What kinds of product are frequently purchased together in all five regions? 2. What kinds of product are frequently purchased together in South China Region? 3. What kinds of product are frequently purchased together in North China Region? 4. What kinds of product are frequently purchased together in Mid China Region? 5. What kinds of product are frequently purchased together in East China Region? 6. What kinds of product are frequently purchased together in South West China Region? 15 products sold in all five regions will be analyzed for finding co-purchases in different regions. This regional-oriented analysis will provide the company with valuable information in further marketing strategies setting. The Association Rule Analysis as the outcome of the Market Basket Analysis is used to finding the frequent purchasing pattern, correlation and associations based on the given dataset (Tan, Steinbach and Kumar). It can help us to mine the given dataset and do recommendations. 14

Chapter 3: Research Methodology 3.1 Market Basket Analysis Based on the given dataset, market basket analysis (MBA) is a perfect method for this study. The raw data provided by Zengcheng Handyware Seasoning Company does not have a good pointing. As an undirected data mining technique, market basket analysis is a good start for knowing the large-scale of dataset (Padoe). What s more, there are three levels of market basket data: Customers, Orders, and Items (Padoe). After observing the dataset, we know this dataset is transactionbased and items-oriented. It includes five main customers (regions), 180 orders (transactions), and 15 items (flavor products). All requirements for market basket analysis are satisfied. However, the main problem of this study is determining the right cut-off point during the data adjustment. Using each product s sales mean as the cut-off point may generate the error. The uncertain error may affect results. Hence, the possible solution is using different cut-off points to get the results. Then comparing the results to see whether the error affect the final conclusions. 3.2 Association Rule As the outcome of the market basket analysis, association rule is a useful data mining method for mining frequent patterns, associations, correlations, or causal structures among sets of items in transaction databases (Han and Kamber). The main idea of this technique is producing rules on associations between products from a transaction-based dataset. 3.2.1 Definition of Association Rule According to the textbooks, Data Mining: Concepts and Techniques written by Jiawei Han and KAmber Micheline, and Introduction to Data Mining written by Pangning Tan, Steinbach Michael and Kumar Vipin, We can define the association rule by the following steps. Let I = {i 1, i 2,, i m } be a collection of m items in the market basket data. Let T= {t 1, t 2,, t n } be the set of all transactions in the market basket data. 15

Each transaction t i contains a subset of items from I. A transaction t i is said to contain an X, which is a collection of items, if X is a subset of t i (X t i, t i T). An association rule is an implication expression of the form X à Y. X and Y are disjoint collection of items (X, Y, X Y = ). The main idea of association rule is finding the relationship between purchases of different products. It can be represented as: IF {purchase A & B (A, B X)} THEN {purchase C (C Y)} X is an antecedent. Y is a Consequent. So we can get: Antecedent à Consequent [support, confidence] The strength of an association rule is measured by support and confidence. Support is the percentage of transactions that include both the antecedent and the consequent. It determines how often a rule is applicable to the given dataset. If P means probability, that Support (X à Y) = P (X Y) Table 3.1 Example of Support ID Items Support Calculus 1 A, B, C Total Support = 5 2 A, B, D {AB}: 2; Support {AB} = 2/5 = 40% 3 A, C {AC}: 2; Support {AC} = 2/5 = 40% 4 B, C {BC}: 3; Support {BC} = 3/5 = 60% 5 B, C, D {ABC}: 1; Support {ABC} = 1/5 = 20% Confidence is the percentage of antecedent transactions that also have the consequent item collection. It determines how frequently items in consequent (Y) appear in the transactions, which contain the antecedent (X). If P means probability, that Confidence (X à Y) = P (Y X) 16

Table 3.2 Example of Confidence ID Items Confidence Calculus 1 A, B, C 2 A, B, D 3 A, C 4 B, C 5 B, C, D Confidence {Aà B} = {AB}/{A} = 2/3 = 66% Confidence {Bà C} = {BC}/{B} = 3/4 = 75% Confidence {Cà D} = {CD}/{C} = 1/4 = 25% Confidence {ABà C} = {ABC}/{AB} = 1/2=50% Both of the support and confidence are very important in association rule. As the measures of interestingness, they respectively reflect the usefulness and certainty of discovered rules (Han and Kamber). A low support percentage means there is a low probability the chosen items were purchased X and Y together. And a low confidence percentage means there is a low percentage of customers who purchased X will also bought Y. Therefore, both a minimum support threshold and a minimum confidence threshold are necessary for this study. We want to find all rules Xà Y that satisfied the following two criteria: The percentage of X and Y both appear must equal or higher than the percentage of minimum support threshold of all given transactions. The percentage of Y appears in the given transaction that contain X must equal or higher than the percentage of minimum condition threshold. The performance of an association rule is measured by lift. Lift is one of the correlation measures. The occurrence of the set of items, X, is independent of the occurrence of Y if P (X Y) = P (X) P (Y); otherwise, X and Y are dependent and correlated. That is Confidence (A Lift X, Y = B) Support(B) = P (X Y) P X P(Y) If the value of lift is greater than 1, it indicates a rule that is useful in finding consequent set of items. In another words, the occurrence of X and Y are positively correlated. If the value of lift is less than 1, it means X is negatively correlated with Y. 17

If the value of life is equal to 1, it means no correlation between X and Y. Computing the lift is more useful than only selecting transactions randomly. 3.2.2 Process of Association Rule In general, there is a two-step process to solve the association rule problem (Han and Kamber). 1. Generating Frequent Set of Items Setting a minimum support threshold based on the given dataset. Then generating all sets of items from the given dataset that satisfied the support exceeds the minimum support threshold. 2. Generating Association Rules Setting a minimum condition threshold based on the given dataset. Then generating all sets of items from the frequent set of items that satisfied the condition exceeds the minimum condition threshold. 3.2.3 Frequent Set of Items Generation This first step in association rule generation is finding all sets of items that have high support (equal or higher than the minimum support threshold). A lattice structure usually used to list all possible sets of items. Figure 3.1 shows us the structure of 5 items (I = {a, b, c, d, e}). Therefore, if a given dataset includes 5 items, there are 2 5 possible candidate sets of items and 1 null set of item. Therefore, for a 5 items dataset, there are 2 5-1 frequent sets of items. 18

Figure 3.1 Itemset Lattice for Five Products (Tan, Steinbach and Kumar) In general, if the given dataset has k sets of items, there will be 2 k -1 frequent sets of items. Since number of sets of items (k) can be very large, the number of frequent sets of items, 2 k -1, will increase exponentially. In another word, the level of complication is also exponentially growing (Tan, Steinbach and Kumar). What s more, if we want to determine the support count for all candidate sets of items in the lattice structure, the brute-force approach is one of the essential technologies for us. According to this method, we need to compare each candidate frequent set of items against every transaction. Figure 3.2 shows the whole operations. 19

Figure 3.2 Brute- force Approach Operations (Tan, Steinbach and Kumar) As a very detailed and straightforward method, brute-force approach can help us to find the best combinations in most situations. It is also a simple approach. However, just like we mentioned before, the increasing number of itemsets will complicates the computation exponentially. Therefore, brute-force approach is only good for the study with a small scale of dataset. When we have a large scale of dataset, we always prefer the Apriori Algorithm approach. Apriori is the first association rule mining algorithm that pioneered the use of support-based pruning to systematically control the exponential growth of candidate itemsets (Tan, Steinbach and Kumar). It is a better method to lower the computation s level of complex. The operations of aprior algorithm approach are showed as following: Table 3.3 Original Sample Data Transaction ID Items Minimum Support Count = 3 1 {A, B, C, D} 2 {A, B, C, E} 3 {A, B, D} 4 {B, D} 5 {A, B, B} 20

Table 3.4 Candidate 1-itemsets Item No. of transaction A 4 B 5 C 2 D 3 E 1 Step 1: Count the number of transactions for each item. (Note: even product B is bought 6 times, it only occurs in 5 transactions) Table 3.5 Candidate 1-itemsets-cont Item No. of transaction A 4 B 5 D 3 Step 2: Discard the candidate itemsets if the number of transactions is fewer than minimum support count. Table 3.6 Candidate 2-itemsets Itemset No. of transaction AB 4 AD 2 BD 3 Step 3: Make pairs of all products in table 4.5. Then count the number of purchases for each pair. Table 3.7 Candidate 2-itemsets-cont Itemset No. of transaction AB 4 BD 3 Step 4: Repeat the step 2. Table 3.8 Candidate 3-itemsets Itemset No. of transaction ABD 2 Step 5: Repeat step 3 and step 4. 21

The example above shows how the apriori algorithm approach controls the exponentially growing in the number of candidate itemsets. The explanation of apriori property shows us, all subsets of a frequent item-set must also be frequent (Tan, Steinbach and Kumar). If we already know an itemset is infrequent, there is no need to study its subsets because they are also infrequent. Therefore, the apriori algorithm approach is more efficient than the brute-force approach for large-scale of dataset in frequent itemsets mining. 3.2.4 Association Rules Generation The second step in association rule generation is generating the exact association rules based on the given frequent itemsets (Han and Kamber). Each frequent itemset, Y, can produce up to 2 k 2 association rules. An association rule can be extracted by partitioning the itemset Y into two non-empty subsets, X and Y X, such that X à Y X satisfies the confidence threshold (Tan, Steinbach and Kumar; Han and Kamber). For instance, if X= {A, B, C}, it can produce up to 2 3 2 = 6 association rules. All possible association rules are listed: {A} à {B, C} {B} à {A. C} {C} à {A, B} {A, B} à {C} {A. C} à {B} {B, C} à {A} Apriori Algorithm can also be used in association rules generation. As a levelwise approach, each level corresponds to the number of items that belong to the rule consequent (Pang-Ning, 2006). The high-confidence rules are used to generate the new rules by merging or discarding. For instance, if we set {A, B, C} à {D} and {A, B, D} à {C} are highconfidence rules, then the new rules can be generated by merging: {A, B} à {D, C}. If we set {B, C, D} à {A} is a low-confidence rule, then all the rules includes {A} can be discarded. 22

Figure 3.3 Apriori Algorithm Rule Generation (Lanzi, 2009) The graph above visualized shows what happens after apriori algorithm detect the unsatisfied rules by correlations. If one rule is determined to useless, then all the other included rules will be also determined useless without any calculation needed. That can directly save time during the largescale of dataset analysis. 23

Chapter 4: Data analysis and results 4.1 Market Basket Analysis Market basket analysis allows researchers to know more about the customer behaviors and the sales pattern by analyzing the historical transactions. The basic data adjustment has been completed in Excel. For the frequent itemsets generating and association rules generating procedures, we will use the apriori algorithm method by the selected analysis package. And the minimum support will be set as 10%, the minimum confidence will be set as 80%, the minimum lift will be set as 1. 4.1.1 Analysis Package Selection -- SPSS There are several software can be used in data analysis research. For instance, SPSS, R, SAS, Excel and Matlab are all good packages for data analysis. All of them have their own advantages in analysis. The table 4.1 shows more detailed comparative information (Connor) Table 4.1 Comparison of data analysis software Name Advantages Limitations Open Source SPSS Large datasets Expensive No Visualization Statistical analysis R Library Support Programming-oriented Yes Statistical analysis Improper learning curve Excel Easy to use Complete function Poor performance in large datasets No Matlab Elegant Matrix Support Programming-oriented Poor statistical ability No Based on that, we know SPSS is our best choice in this study. Although Excel is an easy and full function package for data analysis, we cannot choose it since its poor performance in large dataset. For the other two packages, R and Matlab, we 24

have to give up them because of their weak visualization and programming-oriented characteristics. As a predictive analytics software, SPSS has the abilities in data collection, statistics, modeling and deployment (IBM). It can not only build predictive models based on the data collection, but also provide detailed statistical analysis visualized. Therefore, for this study, we decided to use SPSS to determine what kinds of flavor products will be purchased together in the specified region. 4.1.2 SPSS Modeling Process The following figures show how to build an association rules algorithm in SPSS. All six-research questions mentioned in preview chapter can be answered by loading different classified datasets into this model. Figure 4.2 SPSS Modeling Explanation 1 Step 1: Load the Excel source. Step 2: Assign a Type to the loaded Excel source. Changing all roles of product to Both, which means the roles of the fields are both input (predictor) and target (predicted). Then changing all products measurement level to Flag since we are using 0 and 1 in data. 25

Figure 4.3 SPSS Modeling Explanation 2 Step 3: Add an Apriori model to the Type. Setting the minimum antecedent support to 10%, the minimum rule confidence to 80% and the maximum number of antecedents to 2 (products). The reason why we control the maximum number of antecedents to two is limiting the number of generated rules. If we allow more antecedents, more rules will be generated. That will increase the difficulty level of mining rules. Step 4: Run the Apriori Node Model to get the results. The results includes all rules satisfied 10% minimum support and 80% maximum confidence. We can sort the list by either support or confidence to do further analysis. 26

Figure 4.4 SPSS Modeling Explanation 3 Step 5: Generate the rule sets based on the results. Set the target field as each product, minimum support as 10%, minimum confidence as 90%, and default value as 0. Then we will get the scoring of association rules. Setting the minimum confidence as 90% when we generating ruleset can decrease the difficulty of mining rulesets. Since we don t need to analyze all rules, only the rules have high confidence need to be focused. A higher minimum confidence setting can help us more focus on the valuable rules. What s more, we can also use Association Rules Node to generate rules. The steps are same as the Apriori Node. 27

4.2 Results 4.2.1 Result for All Regions Figure 4.6 Levels of Link between Two Products Figure 4.6 gives us a graphical understanding about levels of products links. The heavier line it shows, the stronger links. It provides us some one-to-one relationships between products. Some strong links such as Chives Flavor and Sauced Beef Flavor, Crab Flavor and Cheese Corn Flavor, Chives Flavor and Yolk Flavor are visualized in this figure. Table 4.2 Rule Statistics in all regions Measurements Minimum Maximum Mean Stand. Deviation Condition Support 10.59 % 55.29% 29.04% 8.76% Confidence 80.00% 100.00% 88.55% 5.35% Rule Support 10.00% 48.82% 25.58% 7.42% Lift 1.45 2.89 1.82 0.24 Deployability 0.00% 9.41% 3.47% 2.09% Number of Rules: 365 28

Table 4.3 Information for Most Frequent Items Item Name Records (%) Conditions (%) Predictions (%) Curry beef 55.29 13.7 12.05 Sauced beef flavor 54.12 14.79 14.25 Pepper beef steak 54.12 13.7 8.49 Hot & spices salt 51.18 12.05 0.82 Chicken flavor 50.59 12.33 11.51 BBQ flavor 50 13.15 12.88 Cumin BBQ 47.06 15.62 13.42 Kimchi flavor 45.88 18.36 9.32 Tomato flavor 44.12 3.56 0.82 Yolk 44.12 13.15 5.75 Hot&spice sichuan 43.53 14.25 2.19 Cheese corn flavor 41.18 3.29 1.64 Pork steak 37.65 19.45 4.93 Chives flavor 31.76 21.37 1.64 Crab flavor 30.59 3.29 0.27 Table 4.2 shows us the rule statistics about the rules. It gives us a basic understanding of the generated rules. There are 365 rules were generated, so we need to do the scoring to find out the most valuable or interesting rules. Table 4.3 provides us what products are popular in the original dataset, and what products will be popular in future based on the prediction. For instance, Curry Beef Flavor exists in 55.29% records. It is the most popular flavor based on the original dataset. What s more, Sauced Beef Flavor has the highest prediction percentage, which means 14.25% rules include this flavor. In another word, Sauced Beef Flavor may be the most popular product in future. Therefore, the company should more focused on these high prediction products by promoting to increase the sales or inventory and production cycling time control. We will also more focused on analyzing the rules include these products. It is also a strong evidence for the following valuable rules generation. 29

Table 4.4 Most Interesting Association Rules for All Regions (2013 2015) Probability (%) Antecedent Consequent Rule 1 1 Chives Flavor and Pepper Beef Sauced beef flavor Steak Rule 2 0.909 Cheese Corn Flavor and Yolk flavor Chives Flavor Rule 3 0.917 Chives Flavor and Hot Spice BBQ flavor Sichuan Rule 4 0.963 Pork Steak Flavor and Hot Curry beef flavor Spice Salt Rule 5 1 Cheese Corn and Pepper Beef Curry beef flavor Steak Rule 6 0.958 Crab Flavor and BBQ Flavor Cheese corn flavor Rule 7 0.955 Crab Flavor and Sauced Beef Cheese corn flavor Flavor Rule 8 0.912 Chives Flavor and Pepper Beef Curry beef flavor Steak Rule 9 1 Chives Flavor and Hot Spice Pork Steak flavor Sichuan Rule 10 0.923 Yolk and Cheese Corn Flavor BBQ flavor Table 4.4 shows us the most interesting association rules based on analyzing the dataset for all regions. The result were scored based the rule sets generation by targeting each product. (Full result is available in Appendix B, Table 1.) Choosing the rules with high confidence, high lift, but low support. A rule satisfied all three conditions usually means the products do not have a high appearance rate in dataset, but they are associated with each other. In another word, this rule is unpredictable in previous research. Then, picking up the highest probability rules related to those high prediction products. The full results of rule sets generation and scoring are available in Appendix B, Table 2. And the detailed 30

transaction and scoring information in tabular format for top ten rules is also available in Appendix B, Table 3. High affinity products: 1. If a customer purchases Crab Flavor and BBQ Flavor at the same time, there is 95.8% probability that the customer will order Cheese Corn Flavor. As one of the most popular flavor in snacks market, BBQ Flavor already has a lot of mature products nationwide for a long time. What s more, in the past three years, Crab Flavor becomes more and more popular in the nuts and beans end product market. And this flavor gradually expands to other kinds of end product such as chips or crackers. Both of these two flavors have good trends in crackers market. Cheese Corn Flavor, as a western taste flavor usually used in snacks and crackers products, it is highly accepted among young end product customers in recent years. For the customers who target the snacks and crackers market, they do have a high probability to purchase all these three flavors at the same time. Therefore, it is reasonable that Cheese Corn Flavor will be also ordered at the same time. 2. If a customer purchases Cheese Corn Flavor and Yolk Flavor at the same time, there is 92.3% probability that the customer will order BBQ Flavor. Both of Cheese Corn Flavor and Yolk Flavor belong to sweet flavors. They are usually used in puffed snack, such as Taiwan Pei Tien energy 99 sticks and Japanese rice rolls. However, BBQ Flavor belongs to salty flavors. In fact, transferring from sweet to salty is the current flavor development direction of the Chinese flavor market. Hence, this rule perfectly represents the trend of further flavor development. The company can exploit the association rules listed above by applying those in future marketing strategies. The sales can base on these rules to do products recommendation and promotion. For instance, when a customer orders Cheese Corn Flavor and Chives Flavor together, there is 90.9% that the customer is also interested in Yolk Flavor. In order to increase sales, Yolk Flavor can be recommended to the 31

customer by providing samples before asking. What s more, these rules can be also used to guide the future internal technology development. The technical department can pay more attention on improving the quality of those high prediction rate products. For example, Sauced Beef Flavor, Cumin BBQ Flavor and BBQ Flavor are the top three prediction rate products, they should be emphasized in the future internal technical development schedule. 4.2.2 Result for the North China Region Figure 4.7 Levels of Link between Two Products (North China) The level of every two flavors connection is showed visualized on figure 4.7. The following flavor sets have strong linked: Cumin BBQ Flavor and Sauced Beef Flavor, Cumin BBQ Flavor and Yolk Flavor, Chives Flavor and Yolk Flavor, and Cheese Corn Flavor and Yolk Flavor. 32

Table 4.5 Rule Statistics (North China) Measurements Minimum Maximum Mean Stand. Deviation Condition Support 12.50% 59.38% 32.77% 8.70% Confidence 90.00% 100.00% 95.93% 4.28% Rule Support 12.50% 56.25% 31.26% 7.95 % Lift 1.31 2.29 1.86 0.26 Deployability 0.00% 3.13 % 1.51 % 1.56 % Number of Rules: 395 Table 4.6 Information for Most Frequent Items (North Region) Item name Records (%) Conditions (%) Predictions (%) Kimchi flavor 68.75 9.77 11.76 Sauced beef flavor 62.50 9.49 8.22 Curry beef 59.38 11.19 4.82 Hot & Spice salt 59.38 10.76 1.42 Pepper beef steak 56.25 12.61 5.24 Tomato flavor 53.13 11.47 2.83 Crab flavor 53.13 13.03 2.97 Cheese corn flavor 53.13 11.47 11.47 Hot & Spice Sichuan 53.13 10.48 7.65 BBQ flavor 50.00 12.75 9.21 Chives flavor 50.00 13.03 7.65 Chicken flavor 43.75 14.87 7.51 Cumin BBQ 43.75 17.00 7.22 Pork steak 43.75 17.28 6.80 Yolk 40.63 16.43 5.24 33

Table 4.5 states the rule statistics. There are 395 rules were generated. The range of support is 12.5% to 59.4%, range of confidence is 90% to 100%, and all rules have a lift higher than 1. Table 4.6 answers us the question between products and popularity. Kimchi flavor is the most popular product in the original dataset, and its prediction percentage is also the highest. That means Kimchi flavor has the highest probability to be popular in future. This result is an important reference conditions for the following valuable rules generation. Table 4.7 Most Interesting Association Rule for North Region (2013 2015) Antecedent Consequent Rule 1 Hot & Spice Salt and BBQ Flavor Kimchi Flavor Rule 2 Yolk and Pepper Beef Steak Flavor Cheese Corn Flavor Rule 3 Hot & Spice Salt and Pork Steak Flavor BBQ Flavor Rule 4 Hot & Spice Salt and Chives Flavor Sauced Beef Flavor Rule 5 Pork Steak and Curry Beef Flavor Cheese Corn Flavor Rule 6 Yolk and Pepper Beef Steak Kimchi Flavor Rule 7 Cumin BBQ and Curry Beef Flavor Sauced Beef Flavor Rule 8 Pork Steak and Curry Beef Flavor Sauced Beef Flavor Rule 9 Pork Steak and Pepper Beef Steak Flavor BBQ Flavor Rule 10 Hot & Spice Salt and Chicken Flavor Kimchi Flavor Table 4.7 lists the top ten interesting association rules for the North China region. Just like section 4.2.1, this result was also based on apriori rule sets generation and analysis. (Full result is available in Appendix B, Table 4.) We sorted all rules by both confidence and support. A rule with high confidence but low support can provide us more unexpected information, which is defined as an interesting rule. Then picking up the high probability rules especially for high prediction products. The full results of rule sets generation and scoring are available in Appendix B, Table 5. And the scores in tabular format are also available in Appendix B, Table 6. 34

High affinity products: 1. If a customer purchases Hot & Spice Sichuan Flavor and Chicken Flavor at the same time, there is high probability that the customer will order Kimchi Flavor. All of these three flavors belong to salty flavors. One of the main uses of these flavors is jerky production. In fact, the jerky products have a wider acceptance in this region. What s more, the consumers in the North China region prefer strong tastes than other regions. Kimchi Flavor may be the further trend in jerky market. It can be combined with other flavors such as Chicken Flavor to develop new product. Therefore, for the manufactures that target jerky products in North China region, it is reasonable for them to order these three flavors together. This rule represents that, in North China region, some manufacturers may start to develop new Kimchi Flavor end products. All these rules can be considered in further marketing strategies making. It provides sales man more interesting understanding for products recommendation. For instance, when a customer orders Pork Steak Flavor and Curry Beef Flavor together, the sales man can recommend Sauced Beef Flavor and Cheese Corn Flavor to the customer. Although these two flavors are not commonly connected, the association rules do find out the strong relationship between them. What s more, the rules can also help the company to improve their current recommendation combination. If a customer suddenly change his or her usual purchases combination, company need to realize the change and figure out what happened behind the change. Then, if the customer changes the purchases because of any market strategies related reason, company should have the new recommendation based on customers current purchasing preferences. 35

4.2.3 Result for the South China Region Figure 4.8 Levels of Link between Two Products (South China) Figure 4.8 shows us the level of connections between two flavors. Cumin BBQ Flavor, Yolk Flavor, Pork Steak Flavor, Chicken Flavor all have strong connections to each other. Table 4.8 Rule Statistics (South China) Measurements Minimum Maximum Mean Stand. Deviation Condition Support 11.76 % 58.82% 30.29 % 8.70% Confidence 90.00% 100.00% 96.59% 4.14% Rule Support 11.76 % 52.94 % 29.07 % 7.97 % Lift 1.53 3.09 1.95 0.36 Deployability 0.00% 5.88 % 1.21 % 1.47 % Number of Rules: 383 Table 4.9 Information for Most Frequent Items (South China) Item name Records (%) Conditions (%) Predictions (%) Sauced beef flavor 58.82 10.18 10.18 spices salt 58.82 8.36 4.18 Kimchi flavor 58.82 11.75 9.92 36

Cheese corn flavor 55.88 12.27 9.40 Curry beef 55.88 8.62 2.61 Tomato flavor 52.94 10.70 4.70 BBQ flavor 52.94 10.97 10.44 Pepper beef steak 52.94 9.66 4.44 Hot Sichuan flavor 52.94 9.40 8.09 Crab flavor 50.00 10.44 2.09 Chives flavor 50.00 12.27 10.18 Chicken flavor 41.18 19.58 8.36 Cumin BBQ 38.24 19.84 7.57 Yolk 38.24 19.32 5.48 Pork steak 32.35 20.10 2.35 Table 4.8 shows the basic statistics of the 383 rules. The range of support is 11.8 % to 58.8%, range of confidence is 90% to 100%, and all rules have a lift higher than 1. Based on table 4.9, we know that Sauced beef Flavor, Hot & Spice Salt Flavor, and Kimchi Flavor are the products existing in 58% records. However, in the prediction, the highest percentage of rules that has BBQ flavor. Table 4.10 Most Interesting Association Rule for South China (2013 2015) Antecedent Consequent Rule 1 Pork Steak Flavor and Crab Flavor Chicken Flavor Rule 2 Curry Beef Flavor and Chicken Flavor Cheese Corn Flavor Rule 3 Yolk Flavor and Crab Flavor BBQ Flavor Rule 4 Pepper Beef Steak and Chicken Flavor Kimchi Flavor Rule 5 Cumin BBQ and Pepper Beef Steak BBQ Flavor Rule 6 Yolk Flavor and Tomato Flavor Sauced Beef Flavor Rule 7 Hot & Spice Salt and Chives Flavor Kimchi Flavor Rule 8 Pork Steak Flavor and Crab Flavor Chives Flavor Rule 9 Hot & Spice Salt and Chicken Flavor Cheese Corn Flavor Rule 10 Crab Flavor and Cumin BBQ Flavor BBQ Flavor 37

Table 4.10 states the top ten interesting association rules for the North China region. Just like section 4.2.1, this result was also based on apriori rule sets generation and analysis. (Full result is available in Appendix B, Table 7.) Following the high confidence and low support conditions, we can pick up the high probability rules for high prediction products. The full results of rule sets generation and scoring are available in Appendix B, Table 8. And the scores in tabular format are also available in Appendix B, Table 9. High affinity products: 1. If a customer purchases Pork Steak Flavor and Crab Flavor at the same time, there is high probability that the customer will order Chives Flavor. Pork Steak Flavor is one of the nationwide popular flavors. It can be used in many different kinds of end products. So the purchasing of this flavor is more stable than others. What s more, Crab Flavor, as a fast growing product, the acceptance of it is keep on increasing especially in South China. Another product with a strong regional preference is Chives Flavor. As a new product with growth potential, Chives Flavor becomes more popular in the South China region. The popularity of this flavor is affected by the dietary habits in South East Asia. It is one of the target markets for the cracker manufacturers in South China region. This rule represent a strong regional preference, which is valuable for the company to make specific market strategies in this region. 2. If a customer purchases Yolk Flavor and Tomato Flavor at the same time, there is high probability that the customer will order Sauced Beef Flavor. Tomato Flavor is a long-history flavor in China. It is also a common flavor exists in multiply kinds of snacks. Yolk Flavor, a sweet flavor, is one of the high acceptance products in South China. It can be used in puffed food or snacks. There is a market for both of these two products in South China region. Sauced Beef Flavor is a significant salty flavor, which was very popular in North China. However, the acceptance of this flavor in South China keeps on increasing. 38

This rule also represented the current flavor development trend in China: customer preferences transfer from sweet to salty. All these rules are useful for the company in marketing decision making. More hidden information can be applied in products recommendation and promotion. For instance, when Pork Steak Flavor and Crab Flavor exist in a transaction, there is a very high probability that Chicken Flavor and Chives Flavor also exist. In another word, when a customer orders Pork Steak Flavor and Crab Flavor together, he or she may also want to order Chicken Flavor or Chives Flavor. 4.2.4 Result for the Mid China Region Figure 4.9 Levels of Link between Two Products (Mid China) Figure 4.9 represent the different level of links between two flavors. In the grapy, we can see that Chives Flavor, Yolk Flavor, and Pork Steak Flavor all have strong links to each other. And Cumin BBQ Flavor also has strong links with Chives Flavor, Chicken Flavor, Yolk Flavor and Pork Steak Flavor. 39

Table 4.11 Rule Statistics (Mid China) Measurements Minimum Maximum Mean Stand. Deviation Condition Support 12.50 % 56.25% 31.36 % 8.39% Confidence 90.00% 100.00% 96.95% 4.12% Rule Support 12.50 % 53.13 % 30.24% 7.72 % Lift 1.44 2.67 2 0.39 Deployability 0.00% 3.13 % 1.12 % 1.50% Number of Rules: 424 Table 4.12 Information about Most Frequent Items (Mid China) Item name Records (%) Conditions (%) Predictions (%) Sauced beef flavor 62.50 10.85 8.49 Cheese corn flavor 62.50 11.08 4.25 spices salt 59.38 7.31 4.72 Kimchi flavor 59.38 12.74 11.79 Hot sichuan 59.38 11.32 9.91 Curry beef 56.25 6.60 2.36 Pepper beef steak 53.13 8.02 4.01 Tomato flavor 50.00 9.20 1.89 BBQ flavor 46.88 15.09 11.79 Crab flavor 43.75 13.44 0.00 Chicken flavor 43.75 16.27 11.79 Cumin BBQ 40.63 18.87 10.38 Yolk 40.63 16.98 7.78 Pork steak 37.50 17.69 5.42 Chives flavor 37.50 17.69 5.42 40

Table 4.11 shows the basic statistics. There are 424 rules based on the aprior analysis. The range of support is 12.5 % to 56.25%, range of confidence is 90% to 100%, and all rules have a lift higher than 1. Table 4.12 states the information about most frequent flavors. Sauced Beef Flavor has the highest appearance percentage in all 15 flavors. However, the highest prediction flavors are Kimchi Flavor, BBQ Flavor, and Chicken Flavor. Table 4.13 Most Interesting Association Rule for Mid Region (2013 2015) Antecedent Consequent Rule 1 Curry Beef and BBQ Flavor Chicken Flavor Rule 2 Pepper Beef Steak and Cumin BBQ Flavor BBQ Flavor Rule 3 Hot & Spice Salt and Cumin BBQ Kimchi Flavor Rule 4 Pork Steak and Crab Flavor Cumin BBQ Flavor Rule 5 Chives Flavor and Crab Flavor Kimchi Flavor Rule 6 Crab Flavor and Yolk Flavor Hot & Spice Sichuan Rule 7 Pork Steak Flavor and Crab Flavor BBQ Flavor Rule 8 Curry Beef and Chicken Flavor BBQ Flavor Rule 9 Hot & Spice Salt and BBQ Flavor Chicken Flavor Rule 10 Crab Flavor and Yolk Flavor Cumin BBQ Flavor Table 4.13 shows the most interesting association rules for the Mid China region. Like section 4.2.1, apriori rule sets analysis was used to generate it. (Full table is available in Appendix B, Table 10.) After sorting result from the highest confidence and lowest support, we can pick up the rules by considering the percentage of prediction. The full results of rule sets generation and scoring are available in Appendix B, Table 11. And the scores in tabular format are also available in Appendix B, Table 12. High affinity products: 1. If a customer purchases Crab Flavor and Pork Steak Flavor at the same time, there is high probability that the customer will order BBQ Flavor. Both of the Pork Steak Flavor and BBQ Flavor are typical salty flavors. And the Crab Flavor as a fast growing flavor becomes more and more 41

popular in multiple regions. All of these three flavors are often used in nuts and beans products. What s more, Mid China region distribute a plenty of small nuts and beans snacks manufacturers. So this region has a wider nut snacks market than other regions. This rule can be used for targeting the nuts snacks manufacturers in the Mid China region. 2. If a customer purchases Yolk Flavor and Crab Flavor at the same time, there is high probability that the customer will order Hot & Spice Sichuan Flavor. Yolk Flavor and Crab Flavor are usually used in snacks production in the past few years. Even in the Mid China region, especially Hu nan and Hu bei province, which has a fondness for spicy flavors, these two flavors still have a growing development. In fact, Hot & Spice Sichuan Flavor is a flavor with a very strong regional preference. It can only used in jerky or meat products in the past because of the technology limitation. However, the developing production technology in today provides more opportunities to this flavor. Several manufacturers in Mid China already start to launch new nuts even puffed products with Hot & Spice Sichuan Flavor to satisfy the end-customer in this region. For the manufacturers, which focus on snacks production in Mid China region, they do have a high probability to purchase these three products at the same time. By considering the rules, the sales man can provide more targeted recommendation to customers. For instance, if a customer orders Yolk Flavor and Crab Flavor at the same time, sales man can recommend to him or her two flavors, Hot & Spice Sichuan Flavor and Cumin BBQ Flavor based on the rules. 42