Managing many models. February Hadley Chief Scientist, RStudio

Similar documents
R Functions. Why functional programming?

Fractions with Frosting

Chocolate, Banana and tennis biscuit Icebox Cake

Baker s Dozen Holiday Cookbook from your friends at. agents of science

Make Holiday or Themed Cookies

DROP IN THE BUCKET Bake Sale Recipes

Apple Streusel Sheet Cake

All About Food 1 UNIT

MAPLE-PEACH MILK SHAKE Recipe by Cooking Light

My Top 5 Favorite Dessert

CHIP COOKIES NO-BAKE CHEESECAKE POT DE CREME CHOCOLATE MOUSSE CUPCAKES TIRAMISU IRISH CREAM CAKE CANNOLI CHOCOLATE PECAN PIE BAILEYS

Predicting Wine Quality

FIELD notes UCSC Farm

2013 Celiac Cookie Exchange

Compare Measures and Bake Cookies

Table of Contents. Chapter 1 Introduction. Chapter 2 What Are Standing Mixers?

WESTTOWN SCHOOL. presents HOLIDAY COOKIE RECIPES happy holidays and best wishes for the new year

Orange-Currant Tea Cakes (Yields 60 tea cakes)

family science night Human Body Not just another boring pizza! or show us your

Creative Flavors for Cakes, Fillings & Frostings

Contents. Equal at a Glance. Recipes. BakingTip. Chef s Tip

Coconut Flour Recipes by The Coconut Mama

2019 Recipes BAKING AND PASTRY STAR EVENT

Breads. Biscuits. Muffins

Canada Day Strawberry Waffles

Kookaburra Creek Cafe ' Cupcake Recipes

Tips and Recipes for The Smart Cookie set by Shape+Store

Perfect Cheesecake Recipes From Chef Alisa

Pumpkin Spice Cut Out Cookies

CARAMEL APPLE CAKE CARAMEL APPLE CAKE

A Fresh Spring Brunch

XCentric Ideas. M u f f i n s. Why muffins? Recipes in this issue:

Banana Cream Cheesecake

Various Cakes. Tutti fruity sponge Cake

FUNDRAISING GUIDE. nationalcupcakeday.ca

1. Identify environmental conditions (temperature) and nutritional factors (i.e. sugar and fat) that encourages the growth of bacteria.

Research Essential Baking Equipment

Blueberry Lemon Bread with Lemon Glaze!

Holiday Favorites 2018

MasterChef Plus Recipes. Dual Fuel 30", 36 and 48" Range Induction 30 Range

Home-Made Marshmallow Peeps

2017 Kandiyohi County Fair. contest!

HOW TO USE BRANDED INGREDIENTS FOR DESSERT MENU. Foodservice

Real Food Real Kitchens

Carol L ourie s. Gluten-Free & Low-Sugar. Holiday Baking!

NEBRASKA. Family Magazine 15 BEST CHRISTMAS COOKIES. IN NEBRAsKA

These cupcakes are adapted from a recipe by ChikaLicious Dessert Bar in New York City. Martha made this recipe on episode 508 of Martha Bakes.

Grain Free Dessert And Baking Cookbook. Delicious Grain Free Baking And Dessert Recipes

5 THINGS TO MAKE THIS WEEK

INSTRUCTIONS. Cookie Press FOR PROPER USE AND CARE. IMPORTANT! Please keep these instructions and your original box packaging.

Orange Pecan Breakfast Cupcakes Featuring Krusteaz Cinnamon Streusel Coffee Cake Mix

Keto Diet for Beginners: Ketogenic Smoothie and Dessert Recipes. Copyright 2017 Amanda Lee All rights reserved

The Flux Capacitor. 4 great activities to charge your calculus class. Anne Barber Karen Scarseth

INCLUDES RECIPES CREATED FOR MODEL#GPC865, GPC655

RUSSIAN PIPING TIPS E-BOOK

The Hummingbird Bakery Cookbook The Number One Best Seller Now Revised And Expanded With New Recipes

The Gluten Free Cookbook. 22 Recipes

PSYC 6140 November 16, 2005 ANOVA output in R

Best of the Best. holiday cookies. Best of the Best (so far) Holiday Cookies 2014 Edition DixieCrystals.com Copyright 2014 Dixie Crystals Company

Decorate with Basic Garnishes

Cheery Cherry Pie INGREDIENTS DIRECTIONS. Serves: 8 Prep Time: 20 Minutes Total Time: 1 Hour 10 Minutes. 1 (15 ounce) package refrigerated piecrusts

***Ingredients with * are not in the I cabinet, check your tray or the demo kitchen (#1)***

Cupcake Competition. FCS Lesson BAKING AND PASTRY ARTS

Field Greens Salad With Pickled Beets, Goat s Cheese and Pumpkin Seed Vinaigrette Pumpkin Seed Vinaigrette

Plants of the Tropical Rainforest By Jane Saxer. Objective The students will learn how sunlight affects plants in the tropical rainforest.

Christmas Cake (protein enriched)

Cobbled Together: American Fruit Desserts

Village Bakery Proposal

Healthy Entertaining, Low Blood Sugar Menu Recipes by Chef Walter Staib

ZATARAIN S FROZEN ENTREES

Crostata. Equipment: Baking sheet Pastry and vegetable board Sharp knife Wire rack. Method:

Fish and Loaves Recipe Suggestions DESSERTS Double Chocolate Easter Danish

COPYRIGHT SWEET PEAS & SAFFRON {2015} Top 10 recipes from Sweet Peas & Saffron (2014) 1

Chocolate Chip Cookies by Ms. Shubitz TABLE OF CONTENTS

Cookie Basics. General Preparation Guidelines

Visit the Sweet Potato Café!

FEBRUARY 2015 RECIPES

DINNER ROLLS Copyright 2017 The Mobile Home Gourmet, MobileHomeGourmet.com, all rights reserved.

Beth Butler & Jim Grumbach s St. Patrick s Day Menu. Cucumber Rounds with Irish cheese & Smoked Salmon

Activity List. Activity Type. Benchmark Muffin Demonstration Muffin Demonstration. Mrs. Dalto. Joy of Baking Muffins

CAKES FOR EVERY OCCASION CREATE BAKE MAKE & BAKE PLAY SMILE

Autumn Fruit Basket Cake

EXACT MIXING EXACT MIXING. Leaders in Continuous Mixing solutions for over 25 years. BY READING BAKERY SYSTEMS

Classic Sweet Potato Casserole

T. oil-mix with rotary beater or in blender. Repeat cooking method per above.

Sandy s Famous Chocolate Chip Cookies

OUR T h a n k s g i v i n g M e n u

Vintner s Cellar Franchising Inc.

Cleo s Cookies. Kisses Candy Cane Blossoms

A Paleo Pumpkin!anksgiving.

THIS WEEK'S MENU: DAY 2 DAY 1 DAY 3 DAY 6 DAY 5 DAY 4 DAY 7 STANDARD PLAN Ham and Veggie Frittata Recipe. Hawaiian Hamburger Sliders

Academy of Television Arts and Sciences 64th Creative Arts & Primetime Emmy Awards Governors Ball Sunday, September 23, 2012

CONTENTS. Hi, I m Sandi! Our family s gluten free journey began seven years ago. For more recipes, please visit Fearless Dining

Sliding into Heaven. Unknown

Mighty Matcha Recipe Book

Breakfast. Lunch. Dinner. Blueberry Coffee Cake. Tuna Salad Wraps Simple Fruit Salad

Breakfast. Strawberry-Peach Muesli. Dinner Salad Cucumber, Mango and Red Onion Salad. Entrée Grilled Tuna. Side Dish Grilled Vegetables

Wizard Party Recipe Ideas Sheet

5 Piece Bakeware Set 1 YEAR LIMITED WARRANTY


Transcription:

Managing many models February 2016 Hadley Wickham @hadleywickham Chief Scientist, RStudio

There are 7 key components of data science Import Visualise Communicate Tidy Transform Model Automate Understand

Today I want to focus on understanding Import Visualise Communicate Tidy Transform Model Automate Exploratory data analysis

Gapminder data

80 142 countries 60 lifeexp 40 1950 1960 1970 1980 1990 2000 year

One way to handle is to fit a model to each country New Zealand year lifeex 1952 p 69.4 lm(lifeexp ~ year, data = nz) 1957 70.3 R 2 =0.95 augment 1962 71.2 glance year resid 1967 71.5...... tidy Intercept -307.7 Slope 0.19 1952 0.70 1957 0.61 1962 0.63 1967-0.05 Broom, by David Robinson, makes this easy!......

To do that for many countries, we need a list of data frames Year LifeEx Afghanistan 1952 28.9 p Afghanistan 1957 30.3 Afghanistan...... Albania 1952 55.2 Albania 1957 59.3 Albania...... Algeria............

A nested data frame has one row per group Year LifeExp 1952 28.9 Afghanistan Albania Algeria Data <data> <data> <data> 1957 30.3......... <data> Year LifeExp 1952 55.2 1957 59.3......

We can use purrr::map() to fit each model map(by_country$data, ~ lm(year1950 ~ year, data =.)) Data Afghanistan Albania Algeria <data> <data> <data>... <data> lm(lifeexp ~ year1950, data = afghanistan) lm(lifeexp1950 ~ year, data = albania)

Why for loops are bad An digression with cupcakes

Why for loops are bad An digression with cupcakes suboptimal

Vanilla cupcakes The hummingbird bakery cookbook 1 cup flour a scant ¾ cup sugar 1 ½ t baking powder 3 T unsalted butter ½ cup whole milk 1 egg ¼ t pure vanilla extract Preheat oven to 350 F. Put the flour, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

Chocolate cupcakes The hummingbird bakery cookbook ¾ cup + 2T flour 2 ½ T cocoa powder a scant ¾ cup sugar 1 ½ t baking powder 3 T unsalted butter ½ cup whole milk 1 egg ¼ t pure vanilla extract Preheat oven to 350 F. Put the flour, cocoa, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

Chocolate cupcakes The hummingbird bakery cookbook ¾ cup + 2T flour 2 ½ T cocoa powder a scant ¾ cup sugar 1 ½ t baking powder 3 T unsalted butter ½ cup whole milk 1 egg ¼ t pure vanilla extract Preheat oven to 350 F. Put the flour, cocoa, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

For loops bury the lede df <- data.frame(...) means <- double(ncol(df)) for(i in seq_along(df)) { means[[i]] <- mean(x[[i]], na.rm = TRUE) } medians <- double(ncol(df)) for(i in seq_along(df)) { median[[i]] <- median(x[[i]], na.rm = TRUE) }

For loops bury the lede df <- data.frame(...) means <- double(ncol(df)) for(i in seq_along(df)) { means[[i]] <- mean(x[[i]], na.rm = TRUE) } medians <- double(ncol(df)) for(i in seq_along(df)) { median[[i]] <- median(x[[i]], na.rm = TRUE) }

Vanilla cupcakes The hummingbird bakery cookbook 1 cup flour a scant ¾ cup sugar 1 ½ t baking powder 3 T unsalted butter ½ cup whole milk 1 egg ¼ t pure vanilla extract Preheat oven to 350 F. Put the flour, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched.

Vanilla cupcakes The hummingbird bakery cookbook 120g flour 140g sugar 1.5 t baking powder 40g unsalted butter 120ml milk 1 egg 0.25 t pure vanilla extract Preheat oven to 170 C. Put the flour, sugar, baking powder, salt, and butter in a freestanding electric mixer with a paddle attachment and beat on slow speed until you get a sandy consistency and everything is combined. Whisk the milk, egg, and vanilla together in a pitcher, then slowly pour about half into the flour mixture, beat to combine, and turn the mixer up to high speed to get rid of any lumps. Turn the mixer down to a slower speed and slowly pour in the remaining milk mixture. Continue mixing for a couple of more minutes until the batter is smooth but do not overmix. Spoon the batter into paper cases until 2/3 full and bake in the preheated oven for 20-25 minutes, or until the cake bounces back when touched. 1. Convert units

Vanilla cupcakes The hummingbird bakery cookbook 120g flour 140g sugar 1.5 t baking powder 40g butter 120ml milk 1 egg 0.25 t vanilla Beat flour, sugar, baking powder, salt, and butter until sandy. Whisk milk, egg, and vanilla. Mix half into flour mixture until smooth (use high speed). Beat in remaining half. Mix until smooth. Bake 20-25 min at 170 C. 2. Rely on domain knowledge

For loops emphasise the data df <- data.frame(...) means <- double(ncol(df)) for(i in seq_along(df)) { means[[i]] <- mean(x[[i]], na.rm = TRUE) } medians <- double(ncol(df)) for(i in seq_along(df)) { median[[i]] <- median(x[[i]], na.rm = TRUE) }

Purrr emphasises the action library(purrr) means <- map_dbl(df, mean) medians <- map_dbl(df, median)

Vanilla cupcakes The hummingbird bakery cookbook 120g flour 140g sugar 1.5 t baking powder 40g butter 120ml milk 1 egg 0.25 t vanilla Beat dry ingredients + butter until sandy. Whisk together wet ingredients. Mix half into dry until smooth (use high speed). Beat in remaining half. Mix until smooth. Bake 20-25 min at 170 C. 3. Use variables

Cupcakes Beat dry ingredients + butter until sandy. Whisk together wet ingredients. Mix half into dry until smooth (use high speed). Beat in remaining half. Mix until smooth. Bake 20-25 min at 170 C. Vanilla 120g flour 140g sugar 1.5t baking powder 40g butter 120ml milk 1 egg 0.25 t vanilla Chocolate 100g flour 20g cocoa 140g sugar 1.5t baking powder 40g butter 120ml milk 1 egg 0.25 t vanilla 4. Extract out common code

Similarly, purrr lets you create more complex recipes df <- data.frame(...) col_sum <- function(df, f) { df %>% keep(is_numeric) %>% map_dbl(f) } means <- col_sum(df, mean) medians <- col_sum(df, median)

Similarly, purrr lets you create more complex recipes df <- data.frame(...) col_sum <- function(df, f) { } map_dbl(keep(df, is_numeric), f) means <- col_sum(df, mean) medians <- col_sum(df, median)

Cupcakes Flour Baking powder Sugar Butter Egg Extra Vanilla 120 1.5 140 40 1 0.25t vanilla Chocolate 100 1.5 140 40 1 20g cocoa 0.25t vanilla Lemon 120 1.5 140 40 1 2T lemon zest Red velvet 150 0 150 60 1 10g cocoa 20ml red colouring 1.5t vinegar 0.5 t baking soda 5. Store as data

In R, we can store functions in lists funs <- list( mean = mean, median = median, sd = sd ) map(funs, col_sum, df = df)

Back to gapminder

We can use purrr::map() to fit each model map(by_country$data, ~ lm(year1950 ~ year, data =.)) Data Afghanistan Albania Algeria <data> <data> <data>... <data> lm(lifeexp ~ year1950, data = afghanistan) lm(lifeexp1950 ~ year, data = albania)

map(by_country$data, ~ lm(year1950 ~ year, data =.)) # same as out <- vector("list", length(by_country$data)) for (i in seq_along(by_country$data)) { df <- by_country$data[[i]] out[[i]] <- lm(year1950 ~ year, data = df) }

Multiple lists make it easy to lose context So use a data frame!

Unnesting is reverse of nesting nest() Year LifeEx Afghanistan 1952 28.9 p Afghanistan Albania Algeria Data <data> <data> <data> Afghanistan 1957 30.3 Afghanistan...... Albania 1952 55.2 Albania 1957 59.3... <data> Albania...... unnest() Algeria............

Cross-validation

Original Training Test

Original Training Model Test

Original Training Model Test Predict

Original Training Model Test Predict Score

Test Training Model Prediction Score 1 df df lm vector number 2 df df lm vector number 3 df df lm vector number 4 df df lm vector number...............

crossv <- partition(mtcars, 100, c( test = 0.2, training = 0.8 )) crossv <- crossv %>% mutate( # Fit the models model = map(training, ~ lm(mpg ~ wt, data =.)), # Make predictions on test data pred = map2(model, test, predict), # Evaluate difference between predicted diff = map2_dbl(pred, test %>% map("mpg"), msd) )

Conclusion

1. Store related objects in list-columns. 2. Learn FP so you can focus on verbs, not objects. 3. Use broom to convert models to tidy data.

dplyr Data frames broom Models tidyr Lists purrr

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/