\usepackage{amsmath}

LASSO?

Published

February 16, 2025

LASSO for Poisson Regression

Feb 16, 2025

Agenda

  • Overview
  • Bike data
  • Useing cv.glmnet directly
  • Tuning in tidymodels
  • Comparing to linear regression
  • Using a different dataset

Overview

I am tuning a Poisson regression model with LASSO and am confused by the results vis-a-vis directly using cv.glmnet

I am hoping someone will explain to me what it is that I do not understand.

library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
✔ broom        1.0.7     ✔ recipes      1.1.0
✔ dials        1.3.0     ✔ rsample      1.2.1
✔ dplyr        1.1.4     ✔ tibble       3.2.1
✔ ggplot2      3.5.1     ✔ tidyr        1.3.1
✔ infer        1.0.7     ✔ tune         1.2.1
✔ modeldata    1.4.0     ✔ workflows    1.1.4
✔ parsnip      1.2.1     ✔ workflowsets 1.1.0
✔ purrr        1.0.2     ✔ yardstick    1.3.1
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()
• Learn how to get started at https://www.tidymodels.org/start/
library(glmnetUtils)
library(poissonreg)
tidymodels_prefer()

Bike data

We use the bikeshare data as given in ISLR2

data(Bikeshare, package = "ISLR2")

head(Bikeshare)
  season mnth day hr holiday weekday workingday   weathersit temp  atemp  hum
1      1  Jan   1  0       0       6          0        clear 0.24 0.2879 0.81
2      1  Jan   1  1       0       6          0        clear 0.22 0.2727 0.80
3      1  Jan   1  2       0       6          0        clear 0.22 0.2727 0.80
4      1  Jan   1  3       0       6          0        clear 0.24 0.2879 0.75
5      1  Jan   1  4       0       6          0        clear 0.24 0.2879 0.75
6      1  Jan   1  5       0       6          0 cloudy/misty 0.24 0.2576 0.75
  windspeed casual registered bikers
1    0.0000      3         13     16
2    0.0000      8         32     40
3    0.0000      5         27     32
4    0.0000      3         10     13
5    0.0000      0          1      1
6    0.0896      0          1      1
?ISLR2::Bikeshare

Skipping over EDA, we will only make this change to the dataset.

Bikeshare <- Bikeshare %>% 
            mutate(season = factor(season)) %>% 
            mutate(season = recode(season, "1" = "Winter","2" = "Spring",
                                  "3" = "Summer","4" = "Fall" ))

We note that some features and factors and some are numerical. We are going to attempt to predict bikers. We suspect Poisson to be appropriate since the number of bikers can only be non-negative integer and under the conditions of a specific day, we will assume that bike usage follows a Poisson process.

So we learn from EDA, that we cannot use registered and casual to predict bikers, because that’s cheating!

First things first, we will split training and test and create a cross folds object (that we will use later).

Splits

set.seed(212)

bike_split <- initial_split(Bikeshare, prop = .7)

bike_train <- training(bike_split)
bike_test <-  testing(bike_split)


bike_folds <- vfold_cv(bike_train, v = 6, repeats = 6)

modeling with glmnet

bike_cv_fits  <-  cv.glmnet(formula = bikers ~ . - registered - casual,
                     data=bike_train ,family=poisson)

Taking a look at how this turns out.

plot(bike_cv_fits$lambda, bike_cv_fits$cvm)

Hmm, let’s see if we can tune small lambdas values (though no regularization at all might be better it seems)

bike_cv_fits  <-  cv.glmnet(formula = bikers ~ . - registered - casual,
                     lambda = seq(.005, .3,.005),data=bike_train 
                     ,family=poisson)

plot(bike_cv_fits$lambda, bike_cv_fits$cvm)

So we see the best value of lamnda is about .07.

bike_cv_fits$lambda.min
[1] 0.07

We can also see what impact the lambda values are having on the coefficient.

plot(bike_cv_fits$lambda, bike_cv_fits$nzero)

We also make predictions and get a benchmark against the test set.

bike_fit1 <- glmnet(formula = bikers ~ . - registered - casual,
                     lambda =bike_cv_fits$lambda.min ,data=bike_train
                    ,family=poisson)
results <- data.frame(bikers = bike_test$bikers, 
                      preds1 = as.vector(predict(bike_fit1, newdata = bike_test,
                                                 type = "response")))

And we can now use metrics on this. e.g.

poisson_log_loss(results, truth = bikers, estimate = preds1)
# A tibble: 1 × 3
  .metric          .estimator .estimate
  <chr>            <chr>          <dbl>
1 poisson_log_loss standard        15.9
rmse(results, truth = bikers, estimate = preds1)
# A tibble: 1 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard        68.1

Tidymodels

Next we attempt to tune this using tidymodels.

Create a workflow

# model spec


pr_model <- poisson_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet") 


# recipe

bikerec <- bike_train %>% 
            recipe(bikers ~ .) %>%
            step_rm(registered) %>% 
            step_rm(casual) %>% 
            step_normalize(all_numeric_predictors()) %>% 
            step_dummy(all_factor_predictors())


# Make workflow

bike_WF <- workflow() %>% 
          add_model(pr_model) %>% 
          add_recipe(bikerec)

Now we make our tuning grid.

tune_params<- extract_parameter_set_dials(pr_model)

tune_grid <- grid_regular(tune_params, levels = c(penalty = 40))

Our intuition is that the smaller penalties will be best (based on the above). However, let’s see what happens.

ncores <- parallel::detectCores() -2

library(doParallel)
Loading required package: foreach

Attaching package: 'foreach'
The following objects are masked from 'package:purrr':

    accumulate, when
Loading required package: iterators
Loading required package: parallel
cl <- makePSOCKcluster(ncores)
registerDoParallel(cl)
bike_tuning <-  tune_grid(bike_WF, resamples = bike_folds, grid = tune_grid,
                          metrics = metric_set(rmse, poisson_log_loss ))

We then see what is happening.

show_best(bike_tuning, metric = "poisson_log_loss")
# A tibble: 5 × 7
  penalty .metric          .estimator  mean     n std_err .config              
    <dbl> <chr>            <chr>      <dbl> <int>   <dbl> <chr>                
1  1      poisson_log_loss standard    17.2    36   0.135 Preprocessor1_Model40
2  0.307  poisson_log_loss standard   401.     36   2.29  Preprocessor1_Model38
3  0.170  poisson_log_loss standard   401.     36   2.29  Preprocessor1_Model37
4  0.554  poisson_log_loss standard   401.     36   2.29  Preprocessor1_Model39
5  0.0943 poisson_log_loss standard   401.     36   2.29  Preprocessor1_Model36

This is my puzzle. But I can change the tuning grid as follows.

tune_grid2 <- tibble(penalty = seq(.5,1.5, .1))


tune_grid3 <- tibble(penalty = seq(1.0,3.0,.5 ))
bike_WF %>% 
        tune_grid( resamples = bike_folds, grid = tune_grid2, 
                   metrics = metric_set( poisson_log_loss )) %>% 
    show_best()
Warning in show_best(.): No value of `metric` was given; "poisson_log_loss"
will be used.
# A tibble: 5 × 7
  penalty .metric          .estimator  mean     n std_err .config              
    <dbl> <chr>            <chr>      <dbl> <int>   <dbl> <chr>                
1     1.5 poisson_log_loss standard    17.8    36   0.141 Preprocessor1_Model11
2     0.5 poisson_log_loss standard   401.     36   2.29  Preprocessor1_Model01
3     0.6 poisson_log_loss standard   401.     36   2.29  Preprocessor1_Model02
4     0.7 poisson_log_loss standard   401.     36   2.29  Preprocessor1_Model03
5     0.8 poisson_log_loss standard   401.     36   2.29  Preprocessor1_Model04

And

bike_WF %>% 
        tune_grid( resamples = bike_folds, grid = tune_grid3, 
                   metrics = metric_set( poisson_log_loss )) %>% 
    show_best()
Warning in show_best(.): No value of `metric` was given; "poisson_log_loss"
will be used.
# A tibble: 5 × 7
  penalty .metric          .estimator  mean     n std_err .config             
    <dbl> <chr>            <chr>      <dbl> <int>   <dbl> <chr>               
1     3   poisson_log_loss standard    19.8    36   0.151 Preprocessor1_Model5
2     1   poisson_log_loss standard   402.     36   2.30  Preprocessor1_Model1
3     1.5 poisson_log_loss standard   402.     36   2.30  Preprocessor1_Model2
4     2   poisson_log_loss standard   402.     36   2.30  Preprocessor1_Model3
5     2.5 poisson_log_loss standard   402.     36   2.30  Preprocessor1_Model4

???

Does the best model fits similarly to our previous model?

best_parameters <- select_best(bike_tuning, metric = "poisson_log_loss")

bike_WF_final <- finalize_workflow(bike_WF,best_parameters)

bike_WF_final <- bike_WF_final %>% 
                  fit(data = bike_train)

Check against test

results <- results %>% 
              bind_cols( predict(bike_WF_final, new_data = bike_test,
                                 type = "numeric"))
poisson_log_loss(results, truth = bikers, estimate = preds1)
# A tibble: 1 × 3
  .metric          .estimator .estimate
  <chr>            <chr>          <dbl>
1 poisson_log_loss standard        15.9
poisson_log_loss(results, truth = bikers, estimate = .pred)
# A tibble: 1 × 3
  .metric          .estimator .estimate
  <chr>            <chr>          <dbl>
1 poisson_log_loss standard        17.0

So I have two questions:

  1. Why is the tuning in tidymodels poorer (and different) than via cv.glmnet?
  2. What is going on with the penalty parameters? It does not make sense to me.

Comparison to simple linear regression

We will see if the issue relates to the dataset or to poisson regression by repeating this process but using a linear model.

glmnet

bike_cv_fits_lr  <-  cv.glmnet(formula = bikers ~ . - registered - casual,
                     data=bike_train ,family=gaussian)

Taking a look at how this turns out.

plot(bike_cv_fits_lr$lambda, bike_cv_fits_lr$cvm)

plot(tail(bike_cv_fits_lr$lambda,-40), tail(bike_cv_fits_lr$cvm,-40))

bike_cv_fits_lr$lambda.min
[1] 0.1716935

So, somewhat similar to the poisson case. Little or no shrinkage works best.

Tidymodels

lr_model <- linear_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet") 





# Make workflow

bike_WF_lm <- workflow() %>% 
          add_model(lr_model) %>% 
          add_recipe(bikerec)
tune_params<- extract_parameter_set_dials(lr_model)

tune_grid <- grid_regular(tune_params, levels = c(penalty = 40))

tune_grid
# A tibble: 40 × 1
    penalty
      <dbl>
 1 1   e-10
 2 1.80e-10
 3 3.26e-10
 4 5.88e-10
 5 1.06e- 9
 6 1.91e- 9
 7 3.46e- 9
 8 6.24e- 9
 9 1.13e- 8
10 2.03e- 8
# ℹ 30 more rows
bike_tuning <-  tune_grid(bike_WF_lm, resamples = bike_folds, 
                          grid = tune_grid, metrics = metric_set(rmse))


show_best(bike_tuning)
Warning in show_best(bike_tuning): No value of `metric` was given; "rmse" will
be used.
# A tibble: 5 × 7
   penalty .metric .estimator  mean     n std_err .config              
     <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
1 2.89e- 2 rmse    standard    75.8    36   0.335 Preprocessor1_Model34
2 1.60e- 2 rmse    standard    75.8    36   0.335 Preprocessor1_Model33
3 1   e-10 rmse    standard    75.8    36   0.335 Preprocessor1_Model01
4 1.80e-10 rmse    standard    75.8    36   0.335 Preprocessor1_Model02
5 3.26e-10 rmse    standard    75.8    36   0.335 Preprocessor1_Model03

This might be better.

tune_metrics <- collect_metrics(bike_tuning)

ggplot(tune_metrics, aes(x = penalty, y = mean))+
  geom_point()

These are what I would expect given the what happens with cv.glmnet.

So the issue is Poisson regression perhaps combined with the fact that in this data set, shrinkage does not add much value.

Let’s try Poisson regression on a different dataset where shrinkage does help.

New Data Set

data(SingaporeAuto, package = "insuranceData")

head(SingaporeAuto)
  SexInsured Female VehicleType PC Clm_Count Exp_weights    LNWEIGHT NCD AgeCat
1          U      0           T  0         0   0.6680356 -0.40341383  30      0
2          U      0           T  0         0   0.5667351 -0.56786326  30      0
3          U      0           T  0         0   0.5037645 -0.68564629  30      0
4          U      0           T  0         0   0.9144422 -0.08944106  20      0
5          U      0           T  0         0   0.5366188 -0.62246739  20      0
6          U      0           T  0         0   0.7529090 -0.28381095  20      0
  AutoAge0 AutoAge1 AutoAge2 AutoAge VAgeCat VAgecat1
1        0        0        0       0       0        2
2        0        0        0       0       0        2
3        0        0        0       0       0        2
4        0        0        0       0       0        2
5        0        0        0       0       0        2
6        0        0        0       0       0        2
summary(SingaporeAuto)
 SexInsured     Female         VehicleType         PC           Clm_Count      
 F: 700     Min.   :0.00000   A      :3842   Min.   :0.0000   Min.   :0.00000  
 M:3145     1st Qu.:0.00000   G      :2882   1st Qu.:0.0000   1st Qu.:0.00000  
 U:3638     Median :0.00000   Q      : 358   Median :1.0000   Median :0.00000  
            Mean   :0.09355   M      : 188   Mean   :0.5134   Mean   :0.06989  
            3rd Qu.:0.00000   P      :  88   3rd Qu.:1.0000   3rd Qu.:0.00000  
            Max.   :1.00000   Z      :  71   Max.   :1.0000   Max.   :3.00000  
                              (Other):  54                                     
  Exp_weights          LNWEIGHT            NCD            AgeCat    
 Min.   :0.005476   Min.   :-5.2074   Min.   : 0.00   Min.   :0.00  
 1st Qu.:0.279261   1st Qu.:-1.2756   1st Qu.: 0.00   1st Qu.:0.00  
 Median :0.503764   Median :-0.6856   Median :20.00   Median :2.00  
 Mean   :0.519859   Mean   :-0.8945   Mean   :19.85   Mean   :1.94  
 3rd Qu.:0.752909   3rd Qu.:-0.2838   3rd Qu.:30.00   3rd Qu.:4.00  
 Max.   :1.000000   Max.   : 0.0000   Max.   :50.00   Max.   :7.00  
                                                                    
    AutoAge0         AutoAge1          AutoAge2          AutoAge     
 Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.000  
 1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.000  
 Median :0.0000   Median :0.00000   Median :0.00000   Median :1.000  
 Mean   :0.3905   Mean   :0.05867   Mean   :0.05987   Mean   :0.509  
 3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:1.000  
 Max.   :1.0000   Max.   :1.00000   Max.   :1.00000   Max.   :1.000  
                                                                     
    VAgeCat         VAgecat1    
 Min.   :0.000   Min.   :2.000  
 1st Qu.:0.000   1st Qu.:2.000  
 Median :1.000   Median :2.000  
 Mean   :2.019   Mean   :2.933  
 3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :6.000   Max.   :6.000  
                                

Splits

set.seed(212)

SA_split <- initial_split(SingaporeAuto, prop = .7)

SA_train <- training(SA_split)
SA_test <-  testing(SA_split)


SA_folds <- vfold_cv(SA_train, v = 6, repeats = 6)

We will predict Clm_count using Poisson regression and LASSO

SA_cv_fits  <-  cv.glmnet(formula = Clm_Count ~ . ,
                     data=SA_train ,family=poisson)


plot(SA_cv_fits$lambda, SA_cv_fits$cvm)

Tidymodels

# model spec


pr_model <- poisson_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet") 


# recipe

SArec <- SA_train %>% 
            recipe(Clm_Count ~ .) %>%
            step_normalize(all_numeric_predictors()) %>% 
            step_dummy(all_factor_predictors())


# Make workflow

SA_WF <- workflow() %>% 
          add_model(pr_model) %>% 
          add_recipe(SArec)

Now we make our tuning grid.

tune_params<- extract_parameter_set_dials(pr_model)

tune_grid <- grid_regular(tune_params, levels = c(penalty = 40))

Tuning part

SA_tuning <-  tune_grid(SA_WF, resamples = SA_folds, grid = tune_grid, metrics = metric_set( poisson_log_loss ))
show_best(SA_tuning, metric = "poisson_log_loss")
# A tibble: 1 × 7
  penalty .metric          .estimator  mean     n std_err .config              
    <dbl> <chr>            <chr>      <dbl> <int>   <dbl> <chr>                
1       1 poisson_log_loss standard   0.249    36 0.00399 Preprocessor1_Model40
tune_metrics <- collect_metrics(SA_tuning)

ggplot(tune_metrics, aes(x = penalty, y = mean))+
  geom_point()
Warning: Removed 39 rows containing missing values or values outside the scale range
(`geom_point()`).

tune_metrics
# A tibble: 40 × 7
    penalty .metric          .estimator  mean     n std_err .config             
      <dbl> <chr>            <chr>      <dbl> <int>   <dbl> <chr>               
 1 1   e-10 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
 2 1.80e-10 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
 3 3.26e-10 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
 4 5.88e-10 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
 5 1.06e- 9 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
 6 1.91e- 9 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
 7 3.46e- 9 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
 8 6.24e- 9 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
 9 1.13e- 8 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
10 2.03e- 8 poisson_log_loss standard     NaN     0      NA Preprocessor1_Model…
# ℹ 30 more rows

Well, this suggests that something if going on with how the penalty parameters are being fed to glmnet in the case of Poisson regression. There is clearly something I do not understand.

stopCluster(cl)