CLADAG 2021
BOOK OF ABSTRACTS AND SHORT PAPERS
13th Scientific Meeting of the Classification and Data Analysis Group
Firenze, September 9-11, 2021
edited by
Giovanni C. Porzio Carla Rampichini
Chiara Bocci
FIRENZE UNIVERSITY PRESS
2021
CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS : 13th Scientific Meeting of the Classification and Data Analysis Group Firenze, September 9-11, 2021/ edited by Giovanni C. Porzio, Carla Rampichini, Chiara Bocci. — Firenze : Firenze University Press, 2021.
(Proceedings e report ; 128)
https://www.fupress.com/isbn/9788855183406 ISSN 2704-601X (print)
ISSN 2704-5846 (online) ISBN 978-88-5518-340-6 (PDF) ISBN 978-88-5518-341-3 (XML) DOI 10.36253/978-88-5518-340-6
Graphic design: Alberto Pizarro Fernández, Lettera Meccanica SRLs
Front cover: Illustration of the statue by Giambologna, Appennino (1579-1580) by Anna Gottard
FUP Best Practice in Scholarly Publishing (DOI https://doi.org/10.36253/fup_best_practice)
All publications are submitted to an external refereeing process under the responsibility of the FUP Editorial Board and the Scientific Boards of the series. The works published are evaluated and approved by the Editorial Board of the publishing house, and must be compliant with the Peer review policy, the Open Access, Copyright and Licensing policy and the Publication Ethics and Complaint policy.
Firenze University Press Editorial Board
M. Garzaniti (Editor-in-Chief), M.E. Alberti, F. Vittorio Arrigoni, E. Castellani, F. Ciampi, D. D’Andrea, A.
Dolfi, R. Ferrise, A. Lambertini, R. Lanfredini, D. Lippi, G. Mari, A. Mariani, P.M. Mariano, S. Marinai, R.
Minuti, P. Nanni, A. Orlandi, I. Palchetti, A. Perulli, G. Pratesi, S. Scaramuzzi, I. Stolzi.
The online digital edition is published in Open Access on www.fupress.com.
Content license: except where otherwise noted, the present work is released under Creative Commons Attribution 4.0 International license (CC BY 4.0: http://creativecommons.org/licenses/by/4.0/
legalcode). This license allows you to share any part of the work by any means and format, modify it for any purpose, including commercial, as long as appropriate credit is given to the author, any changes made to the work are indicated and a URL link is provided to the license.
Metadata license: all the metadata are released under the Public Domain Dedication license (CC0 1.0 Universal: https://creativecommons.org/publicdomain/zero/1.0/legalcode).
© 2021 Author(s)
Published by Firenze University Press Firenze University Press
Università degli Studi di Firenze via Cittadella, 7, 50144 Firenze, Italy
CLAssification and Data Analysis Group (CLADAG)
of the Italian Statistical Society (SIS)
INDEX
Preface 1
Keynote Speakers Jean-Michel Loubes
Optimal transport methods for fairness in machine learning 5 Peter Rousseeuw, Jakob Raymaekers and Mia Hubert
Class maps for visualizing classification results 6
Robert Tibshirani, Stephen Bates and Trevor Hastie
Understanding cross-validation and prediction error 7 Cinzia Viroli
Quantile-based classification 8
Bin Yu
Veridical data science for responsible AI: characterizing V4 neurons
through deepTune 9
Plenary Session Daniel Diaz
A simple correction for COVID-19 sampling bias 14
Jeffrey S. Morris
A seat at the table: the key role of biostatistics and data science in the
COVID-19 pandemic 15
Bhramar Mukherjee
Predictions, role of interventions and the crisis of virus in India: a data
science call to arms 16
Danny Pfeffermann
Contributions of Israel’s CBS to rout COVID-19 17
Invited Papers
Claudio Agostinelli, Giovanni Saraceno and Luca Greco
Robust issues in estimating modes for multivariate torus data 21 Emanuele Aliverti
58
A
CCOUNTING FORR
ESPONSEB
EHAVIOR INL
ONGITUDINALR
ATINGD
ATARoberto Colombi1, Sabrina Giordano2and Maria Kateri3
1Department of Management, Information and Production Engineering, University of Bergamo, Italy (e-mail:roberto.colombi@unibg.it)
2Department of Economics, Statistics and Finance “Giovanni Anania”, University of Calabria, Italy (e-mail:sabrina.giordano@unical.it)
3 Institute for Statistics, RWTH Aachen University, Germany (e-mail:
maria.kateri@rwth-aachen.de)
ABSTRACT: We present a hidden Markov model for repeated ordinal responses ob- served on some units at different time occasions. The responses reflect the levels of unobservable latent constructs and can be observed under two latent regimes accord- ing to whether the respondents are confident with their preference or take shelter in the extremes/middle points of the rating scale.
KEYWORDS: latent variables; response style; financial capability.
Hidden Markov models with two regimes
Consider one ordinal response observed onn units at T time occasions. So Yit denotes the response of unit i, i∈I ={1, . . . ,n}, at occasiont, t∈T = {1, . . . ,T}, withYit ∈C ={1, . . . ,c}. The response is assumed to reflect the levels of unobservable latent constructsLit,i∈I,t∈T and can be observed under two different latent regimes: awareness(AWR) andmiddle or extreme categories response style(EMRS) that are captured by binary latent variables Uit,i∈I,t∈T. The presence of two regimes is based on the idea that when required to express their opinion on one item, respondents either identify their true preference into one category on the rating scale or, when in doubt or reluc- tant to disclose their opinion, take shelter by opting for the extreme or middle categories. These are the cases, for example, of patients asked to give a subjec- tive assessment of their health or disability in daily living, or people required to evaluate their financial capability; all of them can feel confident or reluctant to answer. The proposal is a hidden Markov model (HMM) defined by two components that describe the distribution of the latent variables and the condi- tional distribution of the response given the latent variables. It generalizes the models by Bartolucciet al., 2012 to a bivariate latent Markov process. Here, we describe the main features of the model proposed by Colombiet al., 2021.
The latent Markov model. For everyi∈I, t∈T, the latent construct Lit (as: health status, financial capability) has a finite discrete state space SL={1, . . . ,k}, while the latent binary response style indicator Uit has a state space SU ={1,2}, where 1 and 2 denote the EMRS and AWR states, respectively. The latent variables are independent across units and for every unit, {Lit,Uit}t∈T is a first order bivariate Markov process with states (u,l), u∈SU, l∈SL. The initial probabilities (t=1) of{Lit,Uit}t∈T areπi1(u,l), and πit(u,l|u,¯ l)¯ are the transition probabilities. They are are simplified to πit(u,l|u¯,l¯) =πUit|L(u|l,u¯)πLit(l|l¯),t=2, . . . ,T,by assuming thatLit, given its past, does not depend on the past ofUit and the currentUit depends on its past and on the contemporaneous latent construct but not on the past of the latent construct. The row vectorsx(m)i andz(m)it ,m∈ {L,U}, stand for the covariates, not necessarily different, influencing the initial and transition probabilities, re- spectively, of the latent variables. Assuming independence between the latent variables at the first time, the latent model is specified by the following logit models: A) a baseline logit model for the initial probabilities of the latent construct logππLLi1(l)
i1(1) =α0l+α1lx(L)i ,l=2, . . . ,k; B) a logit model for the ini- tial probabilities of the response style indicator logππUi1U(1)
i1(2) =α¯0+α¯1x(U)i ; C) baseline logit models for the marginal transition probabilities of the latent con- struct, with reference category the state ¯lof the previous time point, i.e. for l¯∈SL,logππLitL(l|l)¯
it(l¯|l)¯ =β0ll¯+β1ll¯z(L)it ,l∈SL,l=l,t¯ =2, . . . ,T; D) a logit model for the conditional transition probabilities of the response style indicator for each response style state ¯uof the previous occasion and for each current statel of the latent construct logπUit|L(1|l,u)¯
πUit|L(2|l,u)¯ =β¯0lu¯+β¯1lu¯z(U)it ,l∈SL,u¯∈SU,t=2, . . . ,T.
The observation model.Independence is assumed among units. The con- ditional probability functions ofYit, given the EMRS(1,l)and AWR(2,l)la- tent states are both time and subject invariant, denoted by f(y|l,u),u∈SU,l∈ SL,y∈C,fort∈T,i∈I. Given the EMRS regime, f(y|l,1),l∈SL, is pa- rameterized by the logits log f(y−1|l,1)f(y|l,1) =φ0l+φ1ls(y),y=2, . . . ,c,where the scores are known constantss(y) = (c2−y)/
∑cy=1−1(y−c/2)2,y∈C,φ0 gov- erns the skewness,φ1the U and bell shape. Given the AWR regime, f(y|l,2), l∈SL, is parameterized by the logits log f(yf(y|l,2)−1|l,2)=ϕyl,y=2, . . . ,c.
Application to Bank of Italy data.We applied the model to the panel data from the Survey on Household Income and Wealth (Bank of Italy), collected every 2 years from 2006 to 2016 on 1109 Italian households. The ordinal re-
A
CCOUNTING FORR
ESPONSEB
EHAVIOR INL
ONGITUDINALR
ATINGD
ATARoberto Colombi1, Sabrina Giordano2and Maria Kateri3
1Department of Management, Information and Production Engineering, University of Bergamo, Italy (e-mail:roberto.colombi@unibg.it)
2Department of Economics, Statistics and Finance “Giovanni Anania”, University of Calabria, Italy (e-mail:sabrina.giordano@unical.it)
3 Institute for Statistics, RWTH Aachen University, Germany (e-mail:
maria.kateri@rwth-aachen.de)
ABSTRACT: We present a hidden Markov model for repeated ordinal responses ob- served on some units at different time occasions. The responses reflect the levels of unobservable latent constructs and can be observed under two latent regimes accord- ing to whether the respondents are confident with their preference or take shelter in the extremes/middle points of the rating scale.
KEYWORDS: latent variables; response style; financial capability.
Hidden Markov models with two regimes
Consider one ordinal response observed on n units atT time occasions. So Yit denotes the response of unit i, i∈I ={1, . . . ,n}, at occasiont,t∈T = {1, . . . ,T}, withYit ∈C ={1, . . . ,c}. The response is assumed to reflect the levels of unobservable latent constructsLit,i∈I,t∈T and can be observed under two different latent regimes: awareness(AWR) andmiddle or extreme categories response style(EMRS) that are captured by binary latent variables Uit,i∈I,t∈T. The presence of two regimes is based on the idea that when required to express their opinion on one item, respondents either identify their true preference into one category on the rating scale or, when in doubt or reluc- tant to disclose their opinion, take shelter by opting for the extreme or middle categories. These are the cases, for example, of patients asked to give a subjec- tive assessment of their health or disability in daily living, or people required to evaluate their financial capability; all of them can feel confident or reluctant to answer. The proposal is a hidden Markov model (HMM) defined by two components that describe the distribution of the latent variables and the condi- tional distribution of the response given the latent variables. It generalizes the models by Bartolucci et al., 2012 to a bivariate latent Markov process. Here, we describe the main features of the model proposed by Colombiet al., 2021.
The latent Markov model. For everyi∈I,t∈T, thelatent construct Lit (as: health status, financial capability) has a finite discrete state space SL ={1, . . . ,k}, while the latent binary response style indicator Uit has a state space SU ={1,2}, where 1 and 2 denote the EMRS and AWR states, respectively. The latent variables are independent across units and for every unit,{Lit,Uit}t∈T is a first order bivariate Markov process with states(u,l), u∈SU, l∈SL. The initial probabilities (t=1) of{Lit,Uit}t∈T are πi1(u,l), and πit(u,l|u,¯ l)¯ are the transition probabilities. They are are simplified to πit(u,l|u¯,l¯) =πUit|L(u|l,u¯)πLit(l|l¯),t=2, . . . ,T,by assuming thatLit, given its past, does not depend on the past ofUit and the currentUit depends on its past and on the contemporaneous latent construct but not on the past of the latent construct. The row vectorsx(m)i andz(m)it ,m∈ {L,U}, stand for the covariates, not necessarily different, influencing the initial and transition probabilities, re- spectively, of the latent variables. Assuming independence between the latent variables at the first time, the latent model is specified by the following logit models: A) a baseline logit model for the initial probabilities of the latent construct logππLLi1(l)
i1(1) =α0l+α1lx(L)i ,l=2, . . . ,k; B) a logit model for the ini- tial probabilities of the response style indicator logππUUi1(1)
i1(2) =α¯0+α¯1x(U)i ; C) baseline logit models for the marginal transition probabilities of the latent con- struct, with reference category the state ¯l of the previous time point, i.e. for l¯∈SL,logππLitL(l|l)¯
it(l¯|l)¯ =β0ll¯+β1ll¯z(L)it ,l∈SL,l=l,t¯ =2, . . . ,T; D) a logit model for the conditional transition probabilities of the response style indicator for each response style state ¯uof the previous occasion and for each current statelof the latent construct logπU|Lit (1|l,u)¯
πUit|L(2|l,u)¯ =β¯0lu¯+β¯1lu¯z(U)it ,l∈SL,u¯∈SU,t=2, . . . ,T.
The observation model.Independence is assumed among units. The con- ditional probability functions ofYit, given the EMRS(1,l)and AWR(2,l)la- tent states are both time and subject invariant, denoted by f(y|l,u),u∈SU,l∈ SL,y∈C,fort∈T,i∈I. Given the EMRS regime, f(y|l,1), l∈SL, is pa- rameterized by the logits logf(y−1|l,1)f(y|l,1) =φ0l+φ1ls(y),y=2, . . . ,c,where the scores are known constantss(y) = (2c−y)/
∑cy=1−1(y−c/2)2,y∈C,φ0gov- erns the skewness,φ1the U and bell shape. Given the AWR regime, f(y|l,2), l∈SL, is parameterized by the logits logf(yf(y|l,2)−1|l,2)=ϕyl,y=2, . . . ,c.
Application to Bank of Italy data.We applied the model to the panel data from the Survey on Household Income and Wealth (Bank of Italy), collected every 2 years from 2006 to 2016 on 1109 Italian households. The ordinal re-
60
Figure 1.Observation probability functions of AWR and EMRS respondents in the two latent states of the perceived financial condition.
sponse of interest is the perception of the household’s financial ability to make ends meet (ve = very easily, e = easily, fe = fairly easily, sd = with some dif- ficulty, d = with difficulty, gd = with great difficulty), the covariates are: G (female,male), J (Jse: self-employee, Jhrs: housekeeper/retired/student,em- ployee), CH (with children, no children), D (with debts, no debts), S (with savings,no savings), E (up to secondary school,over high school), R (no risk averse in managing financial investments,risk averse), with the reference cate- gories being in italics. The minimum BIC corresponds to the model withk=2 states, meaning that households can be grouped according to whether they feel financially confident (l=1) or deal with financial stress (l=2). Fig. 1 allows us to characterize the choices of the respondents in 4 latent states. Individuals, in the financially confident latent state, when in doubt about their perception, tend to choose with more chance the optimistic extreme points, AWR peo- ple instead are more incline to the intermediate rates. Reluctant households (EMRS) in the latent group that deals with financial stress have the highest probabilities of reporting great difficulties, AWR people in the same group are more likely to point out just some difficulties. The behavior in the 4 stata is well distinguished, and optimistic/pessimistic choices are mainly due to the EMRS tendency. By the sign of the estimates in Table 1 row 1, we deduce that at the first occasion women, employees, people without savings, with high education and risk averse are with higher probability in a worse financial sta-
Table 1. Estimates (EM algorithm) of the parameters of logit models A, B, C, D.
parameters cst G Jse Jhrs CH D S E R
(α02,α2) 2.8 0.44∗ -1.38∗ -0.75∗ -0.15 0.02 -1.44∗ -1.86∗ -0.35∗ (¯α0,α¯1) -0.06 -0.03 0.16 0.08 -0.04 0.32 0.63∗ 0.04 0.14 (β021,β121) -0.86 1.32∗ 0.27 -0.49 -0.89∗ 0.48 -1.69∗ -1.16∗ -0.17 (β012,β112) -11.93 0.18 -0.91 -0.21 -0.36 -0.23 8.44∗ 1.38∗ -8.83∗ (β¯011,β¯111) 1.10 0.45 -0.29 0.00 -0.20 0.13 -0.79∗ -0.47∗ -0.06 (β¯021,β¯121) -3.36 -0.05 1.09∗ -0.33 0.45 -0.37 1.97∗ 0.81∗ -0.37 (β¯012,β¯112) 1.91 -0.07 -0.35 -0.23 0.00 -0.05 -0.19 -0.29 -0.39∗ (β¯022,β¯122) 1.69 -0.50 -0.34 -0.08 0.10 -0.07 1.80∗ -0.09 -0.37
cst: constant – ∗95% confidence interval does not contain zero
tus. Further, responders with savings show a major propensity to a response style at the beginning of the survey (row 2). From row 3, it seems that, in two consecutive moments, women move from a financially confident (l=1) condition to a worse status (l=2) with higher probability, while low-educated households with children and savings more likely tend to rest in the previous more comfortable financial status (l=1). Individuals who have savings and a low education pass with greater probability from the financial stressed status (l=2) to the better condition (l=1), while financially stressed households tend to remain in the same worst status with greater probability when they are no risk averse (row 4). From rows 5-6, it is more likely to change from the EMRS status ( ¯u=1) to an AWR behavior (u=2) for low educated persons with savings, who currently belong to the group of financially confident house- holds, while self-employee and low educated respondents with savings show greater probability of remaining in the EMRS status if in the previous occasion were reluctant ( ¯u=1) and in the current time are financially stressed (l=2).
Who is no risk averse and in the current moment feels to be financially confi- dent has higher probability of keeping the previous awareness in revealing the own financial capability. On the other hand, individuals with savings, being in the latent financially worrying status, tend with more propensity to give up on the previous AWR behavior and opt for a response style, rows 7-8.
References
BARTOLUCCI, F., FARCOMENI, A., & PENNONI, F. 2012. Latent Markov Models for Longitudinal Data. CRC Press.
COLOMBI, R., GIORDANO, S., & KATERI, M. 2021. Hidden Markov models for longitudinal rating data with dynamic response styles: evidence on household financial capability. Submitted.
Figure 1.Observation probability functions of AWR and EMRS respondents in the two latent states of the perceived financial condition.
sponse of interest is the perception of the household’s financial ability to make ends meet (ve = very easily, e = easily, fe = fairly easily, sd = with some dif- ficulty, d = with difficulty, gd = with great difficulty), the covariates are: G (female, male), J (Jse: self-employee, Jhrs: housekeeper/retired/student,em- ployee), CH (with children, no children), D (with debts, no debts), S (with savings,no savings), E (up to secondary school,over high school), R (no risk averse in managing financial investments,risk averse), with the reference cate- gories being in italics. The minimum BIC corresponds to the model withk=2 states, meaning that households can be grouped according to whether they feel financially confident (l=1) or deal with financial stress (l=2). Fig. 1 allows us to characterize the choices of the respondents in 4 latent states. Individuals, in the financially confident latent state, when in doubt about their perception, tend to choose with more chance the optimistic extreme points, AWR peo- ple instead are more incline to the intermediate rates. Reluctant households (EMRS) in the latent group that deals with financial stress have the highest probabilities of reporting great difficulties, AWR people in the same group are more likely to point out just some difficulties. The behavior in the 4 stata is well distinguished, and optimistic/pessimistic choices are mainly due to the EMRS tendency. By the sign of the estimates in Table 1 row 1, we deduce that at the first occasion women, employees, people without savings, with high education and risk averse are with higher probability in a worse financial sta-
Table 1. Estimates (EM algorithm) of the parameters of logit models A, B, C, D.
parameters cst G Jse Jhrs CH D S E R
(α02,α2) 2.8 0.44∗ -1.38∗ -0.75∗ -0.15 0.02 -1.44∗ -1.86∗ -0.35∗ (α¯0,α¯1) -0.06 -0.03 0.16 0.08 -0.04 0.32 0.63∗ 0.04 0.14 (β021,β121) -0.86 1.32∗ 0.27 -0.49 -0.89∗ 0.48 -1.69∗ -1.16∗ -0.17 (β012,β112) -11.93 0.18 -0.91 -0.21 -0.36 -0.23 8.44∗ 1.38∗ -8.83∗ (β¯011,β¯111) 1.10 0.45 -0.29 0.00 -0.20 0.13 -0.79∗ -0.47∗ -0.06 (β¯021,β¯121) -3.36 -0.05 1.09∗ -0.33 0.45 -0.37 1.97∗ 0.81∗ -0.37 (β¯012,β¯112) 1.91 -0.07 -0.35 -0.23 0.00 -0.05 -0.19 -0.29 -0.39∗ (β¯022,β¯122) 1.69 -0.50 -0.34 -0.08 0.10 -0.07 1.80∗ -0.09 -0.37
cst: constant – ∗95% confidence interval does not contain zero
tus. Further, responders with savings show a major propensity to a response style at the beginning of the survey (row 2). From row 3, it seems that, in two consecutive moments, women move from a financially confident (l=1) condition to a worse status (l=2) with higher probability, while low-educated households with children and savings more likely tend to rest in the previous more comfortable financial status (l=1). Individuals who have savings and a low education pass with greater probability from the financial stressed status (l=2) to the better condition (l=1), while financially stressed households tend to remain in the same worst status with greater probability when they are no risk averse (row 4). From rows 5-6, it is more likely to change from the EMRS status ( ¯u=1) to an AWR behavior (u=2) for low educated persons with savings, who currently belong to the group of financially confident house- holds, while self-employee and low educated respondents with savings show greater probability of remaining in the EMRS status if in the previous occasion were reluctant ( ¯u=1) and in the current time are financially stressed (l=2).
Who is no risk averse and in the current moment feels to be financially confi- dent has higher probability of keeping the previous awareness in revealing the own financial capability. On the other hand, individuals with savings, being in the latent financially worrying status, tend with more propensity to give up on the previous AWR behavior and opt for a response style, rows 7-8.
References
BARTOLUCCI, F., FARCOMENI, A., & PENNONI, F. 2012. Latent Markov Models for Longitudinal Data. CRC Press.
COLOMBI, R., GIORDANO, S., & KATERI, M. 2021. Hidden Markov models for longitudinal rating data with dynamic response styles: evidence on household financial capability. Submitted.