*Corresponding Author:

Similar documents
Relation between Grape Wine Quality and Related Physicochemical Indexes

EFFECT OF TOMATO GENETIC VARIATION ON LYE PEELING EFFICACY TOMATO SOLUTIONS JIM AND ADAM DICK SUMMARY

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Predicting Wine Quality

RELATIVE EFFICIENCY OF ESTIMATES BASED ON PERCENTAGES OF MISSINGNESS USING THREE IMPUTATION NUMBERS IN MULTIPLE IMPUTATION ANALYSIS ABSTRACT

1. Continuing the development and validation of mobile sensors. 3. Identifying and establishing variable rate management field trials

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

A Multi-Omics Approach to Finding Biomarkers in Philippine Civet Coffee

Relationships Among Wine Prices, Ratings, Advertising, and Production: Examining a Giffen Good

Analysis Report Wine-ProfilingTM

Multiple Imputation for Missing Data in KLoSA

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

wine 1 wine 2 wine 3 person person person person person

From VOC to IPA: This Beer s For You!

Detecting Melamine Adulteration in Milk Powder

Regression Models for Saffron Yields in Iran

DETERMINANTS OF DINER RESPONSE TO ORIENTAL CUISINE IN SPECIALITY RESTAURANTS AND SELECTED CLASSIFIED HOTELS IN NAIROBI COUNTY, KENYA

Chapter V SUMMARY AND CONCLUSION

The Importance of Dose Rate and Contact Time in the Use of Oak Alternatives

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

Parameters Effecting on Head Brown Rice Recovery and Energy Consumption of Rubber Roll and Stone Disk Dehusking

Using Growing Degree Hours Accumulated Thirty Days after Bloom to Help Growers Predict Difficult Fruit Sizing Years

Statistics & Agric.Economics Deptt., Tocklai Experimental Station, Tea Research Association, Jorhat , Assam. ABSTRACT

Michigan Grape & Wine Industry Council Annual Report 2012

7 th Annual Conference AAWE, Stellenbosch, Jun 2013

Elemental Analysis of Yixing Tea Pots by Laser Excited Atomic. Fluorescence of Desorbed Plumes (PLEAF) Bruno Y. Cai * and N.H. Cheung Dec.

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Missing Data Treatments

CARTHAMUS TINCTORIUS L., THE QUALITY OF SAFFLOWER SEEDS CULTIVATED IN ALBANIA.

PREDICTION MODEL FOR ESTIMATING PEACH FRUIT WEIGHT AND VOLUME ON THE BASIS OF FRUIT LINEAR MEASUREMENTS DURING GROWTH

Fair Trade and Free Entry: Can a Disequilibrium Market Serve as a Development Tool? Online Appendix September 2014

Handling Missing Data. Ashley Parker EDU 7312

Buying Filberts On a Sample Basis

IT 403 Project Beer Advocate Analysis

THE STATISTICAL SOMMELIER

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

Gasoline Empirical Analysis: Competition Bureau March 2005

ALESSIO TUGNOLO, COMPARISON OF SPECTROSCOPIC METHODS FOR EVALUATING THE PHYTOSANITARY STATUS OF WINE GRAPE, PAGE 6

IMSI Annual Business Meeting Amherst, Massachusetts October 26, 2008

Correspondence should be addressed to Diding Suhandy;

Identification of Adulteration or origins of whisky and alcohol with the Electronic Nose

Effective and efficient ways to measure. impurities in flour used in bread making

ACCEPTABILITY CHARACTERISTICS OF DRAGON FRUIT CUPCAKE

Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

Comparative Analysis of Fresh and Dried Fish Consumption in Ondo State, Nigeria

Appendix A. Table A.1: Logit Estimates for Elasticities

Problem Set #3 Key. Forecasting

Increasing Toast Character in French Oak Profiles

Grain and Flour Quality of Ethiopian Sorghum in Respect of their Injera Making Potential

EAST AFRICAN STANDARD


What makes a good muffin? Ivan Ivanov. CS229 Final Project

Wine analysis to check quality and authenticity by fully-automated 1

Analysis of Things (AoT)

OC Curves in QC Applied to Sampling for Mycotoxins in Coffee

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

The aim of the thesis is to determine the economic efficiency of production factors utilization in S.C. AGROINDUSTRIALA BUCIUM S.A.

MBA 503 Final Project Guidelines and Rubric

Ex-Ante Analysis of the Demand for new value added pulse products: A

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

PROCEDURE million pounds of pecans annually with an average

Final Exam Financial Data Analysis (6 Credit points/imp Students) March 2, 2006

Computerized Models for Shelf Life Prediction of Post-Harvest Coffee Sterilized Milk Drink

Determining the Optimum Time to Pick Gwen

To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

Evaluation and Analysis Model of Wine Quality Based on Mathematical Model

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

Imputation of multivariate continuous data with non-ignorable missingness

Fractions with Frosting

Update to A Comprehensive Look at the Empirical Performance of Equity Premium Prediction

World of Wine: From Grape to Glass Syllabus

cocos, 2016: 22: Printed in Sri Lanka RESEARCH ARTICLE

Flexible Working Arrangements, Collaboration, ICT and Innovation

IMPACT OF RAINFALL AND TEMPERATURE ON TEA PRODUCTION IN UNDIVIDED SIVASAGAR DISTRICT

Tips for Writing the RESULTS AND DISCUSSION:

Flexible Imputation of Missing Data

OF THE VARIOUS DECIDUOUS and

Varietal Specific Barrel Profiles

HONDURAS. A Quick Scan on Improving the Economic Viability of Coffee Farming A QUICK SCAN ON IMPROVING THE ECONOMIC VIABILITY OF COFFEE FARMING

Instruction (Manual) Document

Online Appendix to The Effect of Liquidity on Governance

Comparing R print-outs from LM, GLM, LMM and GLMM

Curtis Miller MATH 3080 Final Project pg. 1. The first question asks for an analysis on car data. The data was collected from the Kelly

Analytical Traceability of Food and Feed

Laboratory Performance Assessment. Report. Analysis of Pesticides and Anthraquinone. in Black Tea

Institute of Food Research. Ian Colquhoun

Method for the imputation of the earnings variable in the Belgian LFS

Validation Report: Total Sulfite Assay Kit (cat. no. K-TSULPH)

DEVELOPMENT OF A RAPID METHOD FOR THE ASSESSMENT OF PHENOLIC MATURITY IN BURGUNDY PINOT NOIR

THE EFFECT OF DIFFERENT APPLICATIONS ON FRUIT YIELD CHARACTERISTICS OF STRAWBERRIES CULTIVATED UNDER VAN ECOLOGICAL CONDITION ABSTRACT

GrillCam: A Real-time Eating Action Recognition System

1. Determine methods that can be used to form curds and whey from milk. 2. Explain the Law of Conservation of Mass using quantitative observations.

FACTORS DETERMINING UNITED STATES IMPORTS OF COFFEE

MULTIPE REGRESSION AND PRODUCTIVITY ANALYSIS OF MODJOPANGGUNG SUGAR FACTORY

Coffea arabica var. laurina Authentication Using Near Infrared Spectroscopy

Transcription:

Discrimination of Civet and Non-civet Coffee by Linear Discriminant Analysis (LDA), Partial Least Squares (-DA), and Orthogonal Projection to Latent Structures (O-DA) Madelene R. Datinginoo 1, Christine Angelica L. Losañes 1,*, and Rechel G. Arcilla 1 1 Mathematics Department, De La Salle University *Corresponding Author: christine.losanes@gmail.com Abstract: One of the issues faced by coffee traders and consumers is the widespread availability of adulterated civet coffee (kopi luwak) in the market. To address this problem, the industry needs a way to discriminate between civet and non-civet coffee. Metabolomics data consisting of 24 coffee beans were subjected to linear discriminant analysis (LDA), partial least squares discriminant analysis (-DA), and orthogonal projection to latent structures discriminant analysis (O-DA). LDA identified isonicotinic, 3-hydroxybenzoic, arbutin, and propane-1,3-diol NIST as discriminant markers. On the other hand, -DA described three factors highly represented by: (1) sugars and organic s; (2) aroma s; and (3) taste s as responsible for successful class separation. Lastly, O-DA showed that isonicotinic, 5-aminovaleric, beta-glutamic, pentitol, and urea were the most significant in discriminating the data. All the fitted models yielded 0 misclassification rates. The LDA model exhibited an R2 of 88.56%, while the O- DA and -DA models demonstrated R2Y of 87.9%. Unlike LDA, -DA is not governed by a set of assumptions. The -DA model was also evaluated with a higher Q2 (62.6%) than that of the O-DA model Q2 (51.5%). Hence, among the three discriminant analyses, -DA is the recommended analysis tool for discriminating between civet and non-civet coffee samples. Key Words: linear discriminant analysis; partial least squares; orthogonal projection to latent structures; kopi luwak; civet coffee 1. INTRODUCTION Civet coffee, kopi luwak in Indonesia, is considered the world s most expensive coffee (Yee, 2016). The main reason for its costliness is the process in which it is produced. Palm civets, also known as civet cats, eat the ripest coffee cherries and in the digestion process, a unique kind of fermentation happens which gives kopi luwak its

special aroma and taste. And because of its high selling potential, some farmers and coffee traders adulterate it to get rid of the laborious production it requires. According to Canadian food scientist Massimo Marcone, about 42% of all the kopi luwaks that are presently on sale are either adulterated or complete fakes (Watson, 2007). Also, due to the increasing demand for kopi luwak, many farmers have abandoned the traditional civet coffee bean collection and resorted to farming civet cats in awful conditions. In fact, PETA Asia has revealed that in several civet coffee farms in Indonesia and the Philippines, palm civets are imprisoned in cages for a maximum of three years where they are fed an all-coffee diet. In order to address these issues, there is a need for a credible and standardized method to assess the authenticity of civet coffee beans. From a large set of metabolomic compounds identified from each coffee bean, statistically significant compounds that would differentiate civet and non-civet coffee beans are identified as discriminant markers. For a set of observations consisting of several quantitative variables (metabolomic compound readings) and a classification variable (civet or noncivet), discriminant analysis is the most suitable platform for developing a model that would classify observations into one of the classes. To come up with optimal results, three discriminant procedures namely linear discriminant analysis (LDA), partial least squares discriminant analysis (-DA), and orthogonal projection to latent structures discriminant analysis (O-DA) were compared. Since the study intends to provide a standardized method in assessing the authenticity of civet coffee, the results would be beneficial to coffee traders as they ensure that authentic and high quality civet coffee beans are being sold in the market. This study would provide recommendations on which of the three aforementioned discriminant procedures is best in yielding optimal results. 2. CONCEPTUAL FRAMEWORK 2.1 Linear Discriminant Analysis Linear Discriminant Analysis (LDA) was developed to classify objects to one of c qualitative groups based on a set of measurements given by for each observation. A linear combination of these x variables describes the separation between the groups of observations by maximizing the ratio of between-group variance to within-group variance (Okwonu & Othman, 2012). 2.2 Partial Least Squares Discriminant Analysis Partial Least Square () aims to use a matrix X redefined by scores and loadings to predict a response variable represented by matrix Y. It uses the variability in Y together with the variability in X to find the best model. The following equations represent the relationship between X and Y: Note that t is the score vector and the link between X and Y. The goal of the algorithm is to calculate for t that can represent the highest amount of variation in X and Y simultaneously. (Barker & Rayens, 2003). 2.3 Orthogonal Projection to Latent Structures Discriminant Analysis Orthogonal Projection to Latent Structures DA (O-DA) uses a modified version of the - DA algorithm. The objective of O-DA is to remove systematic variation found in the predictive components that is orthogonal or not related to the response variable. From this concept, it is expected to produce a more parsimonious model that is easier to interpret compared to -DA (Trygg & Wold, 2002). 3. METHODOLOGY 3.1 Data The data consisted of measurements for 24 coffee beans, 12 of which were predetermined as civet and the other 12 as non-civet. For each of the two coffee species, Coffea liberica (Liberica) and Coffea canephora (Robusta), six beans were roasted while the rest were unroasted. Then, 459 metabolomic compound readings were recorded for each coffee sample. For ease of interpretation, statistical analyses were performed only to the 201 known metabolomic compounds. 3.2 Analysis A significance level of 5% was used in all the

statistical analyses performed and results were generated using SAS 9.3, except for O-DA and the plots for -DA which were employed in R version 3.3.2. 4. RESULTS AND DISCUSSION By LDA, four metabolomic compounds were identified as significant discriminant markers. The discriminant criterion described by the four compounds was able to yield a zero value for the error count estimate and a posterior probability error rate estimate of 0.0019. This means that only about 2 out of every 1000 samples is expected to be misclassified. Additionally, 88.56% of the variation in class membership can be explained by the LDA model. Results presented in Table 1 suggest that coffee samples having high concentrations of isonicotinic would tend to be identified as civet coffee. On the other hand, coffee samples with high concentrations of 3-hydroxybenzoic, arbutin, and propane-1,3-diol NIST have higher tendency of being classified as non-civet. Table 1. Discriminant Criterion for Civet and Noncivet Coffee Variable Coefficient isonicotinic -0.0000961989 3-hydroxybenzoic 0.0000825884 arbutin 0.0104961376 propane-1,3-diol NIST 0.0104265571 The -DA procedure was able to extract three factors from which 16 compounds can be considered as discriminant markers. As shown in Fig. 1, the goodness-of-fit statistics R2Y and Q2 were 87.9% and 62.6%, respectively. Hence, 87.9% of the response variation and 62.6% of the prediction variation can be explained by the model. Moreover, RMSEE of 19% indicates a low deviation of the predicted values from the actual values, as it is below 30%. As shown in Table 2, the factors were identified according to the function of their most significant compounds. The 11 compounds for the first factor were labeled sugars and organic s. The three 2 compounds were identified as aroma s. Lastly, the two compounds for the third factor were labeled taste s. Fig. 1. -DA score plot and model fit statistics Table 2. Summary of Significant Compounds by - DA Factor Factor 1 (Sugars and Organic Factor 2 (Aroma Factor 3 (Taste Compound Parameter Estimates Model Effect Loading VIP isonicotinic -0.0957-0.1213 2.6922 5- aminovaleric 0.0355 0.1195 1.5219 betaglutamic 0.0581 0.1192 1.9160 pentitol 0.0199 0.1192 1.5353 urea 0.0234 0.1165 1.7619 threitol 0.0261 0.1133 1.2240 glucose 0.0350 0.1129 1.2946 malonic 0.0284 0.1074 1.3000 fructose 0.0319 0.1055 1.2316 gluconic 0.0189 0.1050 1.3051 guanosine 0.0523 0.1045 1.2494 melezitose -0.0243-0.1100 1.0281 enolpyruvate NIST 0.0453 0.1068 1.4388 glutamic -0.0390-0.1063 1.2585 succinic -0.0540-0.1811 1.4022 phosphate -0.0104-0.1795 1.1552

By O-DA, five discriminant compounds were found. As shown in Fig. 2, the model produced had an R2Y and Q2 of 87.9% and 51.5%, respectively. RMSEE of 19% for O-DA indicates good predictive ability of the model. O-DA produced an acceptable model for classification based on R2Y, Q2, and RMSEE. The five compounds identified by the model as possible discriminant markers were isonicotinic, 5- aminovaleric, beta-glutamic, pentitol, and urea. Lastly, since the primary objective of the study is the discrimination between civet and non-civet coffee, -DA was found to be the most appropriate multivariate analysis to use based on its 0 misclassification rate, high R2Y, higher Q2 compared to O-DA, and low RMSEE. 6. ACKNOWLEDGMENTS Fig. 2. O-DA score plot and model fit statistics Although LDA was easier to implement in SAS and provided straightforward interpretations, the results may be biased because of the possible violation of multivariate normality. It must be noted that the discriminant markers by O-DA were the same markers found to be of highest loadings from Factor 1 of -DA. However, since -DA resulted to a higher Q2, it is recommended over O-DA for discriminating between civet and non-civet coffee. 5. CONCLUSIONS The three forms of discriminant analyses discussed in this paper were able to successfully identify discriminant markers that discriminate between civet and non-civet coffee beans. The LDA had low error count estimates and posterior probability error rates from using isonicotinic, 3- hydroxybenzoic, arbutin, and propane-1,3-diol NIST as discriminant compounds. From -DA, the three significant factors were described by the following 16 compounds: isonicotinic, 5- aminovaleric, beta-glutamic, pentitol, urea, threitol, glucose, malonic, fructose, gluconic, guanosine, melezitose, enolpyruvate NIST, glutamic, succinic, and phosphate. The model was considered acceptable based on the high R2Y and Q2 statistics as well as the low RMSEE. Similarly, The researchers would like to extend their gratitude to Dr. Emmanuel Garcia from the Chemistry Department of De La Salle University Manila for providing the metabolomics data utilized in this study. 7. REFERENCES Barker, M., & Rayens, W. (2003) Partial least squares for discrimination. Journal of Chemometrics, 17, 166-173. doi:10.1002/cem.785 Brereton, R., & Lloyd, G. (2013). Partial least squares discriminant analysis: taking the magic away. Journal of Chemometrics, 28, 213-225. doi: 10.1002/cem.2609 Briandet, R., Kemsley, E.K., Wilson, R. (1996). Discrimination of arabica and robusta in instant coffee by fourier transform infrared spectroscopy and chemometrics. Journal of Agricultural and Food Chemistry, 44, 170-174. doi: 10.1021/jf950305a Geladi, P., & Kowalski, B. (1986). Partial leastsquares regression: a tutorial. Analytica Chimica Acta, 185, 1 17. doi:10.1016/0003-2670(86)80028-9 Gromski, P.S., Muhamadali, H., Ellis, D.I., Xu, Y., Correa, E., Turner, M.L., & Goodacre, R. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis a marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10-23. doi: 10.1016/j.aca.2015.02.012

Okwonu, F., & Othman, A. (2012). Comparative performance of classical fisher linear discriminant analysis and robust fisher linear discriminant analysis. Paper presented at the 1 st ISM International Statistical Conference, Malaysia. Retrieved from https://www.researchgate.net/publication/30754 6624_Comparative_Performance_of_Classical_F isher_linear_discriminant_analysis_and_robu st_fisher_linear_discriminant_analysis Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-). Journal of Chemometrics, 16, 119-128. doi:10.1002/cem.695