Accounting for response behavior in longitudinal rating data

(1)

CLADAG 2021

BOOK OF ABSTRACTS AND SHORT PAPERS

13th Scientific Meeting of the Classification and Data Analysis Group

Firenze, September 9-11, 2021

edited by

Giovanni C. Porzio Carla Rampichini

Chiara Bocci

FIRENZE UNIVERSITY PRESS

2021

(2)

CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS : 13th Scientific Meeting of the Classification and Data Analysis Group Firenze, September 9-11, 2021/ edited by Giovanni C. Porzio, Carla Rampichini, Chiara Bocci. — Firenze : Firenze University Press, 2021.

(Proceedings e report ; 128)

https://www.fupress.com/isbn/9788855183406 ISSN 2704-601X (print)

ISSN 2704-5846 (online) ISBN 978-88-5518-340-6 (PDF) ISBN 978-88-5518-341-3 (XML) DOI 10.36253/978-88-5518-340-6

Graphic design: Alberto Pizarro Fernández, Lettera Meccanica SRLs

Front cover: Illustration of the statue by Giambologna, Appennino (1579-1580) by Anna Gottard

FUP Best Practice in Scholarly Publishing (DOI https://doi.org/10.36253/fup_best_practice)

All publications are submitted to an external refereeing process under the responsibility of the FUP Editorial Board and the Scientific Boards of the series. The works published are evaluated and approved by the Editorial Board of the publishing house, and must be compliant with the Peer review policy, the Open Access, Copyright and Licensing policy and the Publication Ethics and Complaint policy.

Firenze University Press Editorial Board

M. Garzaniti (Editor-in-Chief), M.E. Alberti, F. Vittorio Arrigoni, E. Castellani, F. Ciampi, D. D’Andrea, A.

Dolfi, R. Ferrise, A. Lambertini, R. Lanfredini, D. Lippi, G. Mari, A. Mariani, P.M. Mariano, S. Marinai, R.

Minuti, P. Nanni, A. Orlandi, I. Palchetti, A. Perulli, G. Pratesi, S. Scaramuzzi, I. Stolzi.

The online digital edition is published in Open Access on www.fupress.com.

Content license: except where otherwise noted, the present work is released under Creative Commons Attribution 4.0 International license (CC BY 4.0: http://creativecommons.org/licenses/by/4.0/

legalcode). This license allows you to share any part of the work by any means and format, modify it for any purpose, including commercial, as long as appropriate credit is given to the author, any changes made to the work are indicated and a URL link is provided to the license.

Metadata license: all the metadata are released under the Public Domain Dedication license (CC0 1.0 Universal: https://creativecommons.org/publicdomain/zero/1.0/legalcode).

Published by Firenze University Press Firenze University Press

Università degli Studi di Firenze via Cittadella, 7, 50144 Firenze, Italy

CLAssification and Data Analysis Group (CLADAG)

of the Italian Statistical Society (SIS)

INDEX

Preface 1

Keynote Speakers Jean-Michel Loubes

Optimal transport methods for fairness in machine learning 5 Peter Rousseeuw, Jakob Raymaekers and Mia Hubert

Class maps for visualizing classification results 6

Robert Tibshirani, Stephen Bates and Trevor Hastie

Understanding cross-validation and prediction error 7 Cinzia Viroli

Quantile-based classification 8

Bin Yu

Veridical data science for responsible AI: characterizing V4 neurons

through deepTune 9

Plenary Session Daniel Diaz

A simple correction for COVID-19 sampling bias 14

Jeffrey S. Morris

A seat at the table: the key role of biostatistics and data science in the

COVID-19 pandemic 15

Bhramar Mukherjee

Predictions, role of interventions and the crisis of virus in India: a data

science call to arms 16

Danny Pfeffermann

Contributions of Israel’s CBS to rout COVID-19 17

Invited Papers

Claudio Agostinelli, Giovanni Saraceno and Luca Greco

Robust issues in estimating modes for multivariate torus data 21 Emanuele Aliverti

(3)

58

A

CCOUNTING FOR

R

^ESPONSE

B

^{EHAVIOR IN}

L

ONGITUDINAL

R

^ATING

D

^ATA

Roberto Colombi¹, Sabrina Giordano²and Maria Kateri³

1Department of Management, Information and Production Engineering, University of Bergamo, Italy (e-mail:roberto.colombi@unibg.it)

2Department of Economics, Statistics and Finance “Giovanni Anania”, University of Calabria, Italy (e-mail:sabrina.giordano@unical.it)

3 Institute for Statistics, RWTH Aachen University, Germany (e-mail:

maria.kateri@rwth-aachen.de)

ABSTRACT: We present a hidden Markov model for repeated ordinal responses observed on some units at different time occasions. The responses reflect the levels of unobservable latent constructs and can be observed under two latent regimes according to whether the respondents are confident with their preference or take shelter in the extremes/middle points of the rating scale.

KEYWORDS: latent variables; response style; financial capability.

Hidden Markov models with two regimes

Consider one ordinal response observed onn units at T time occasions. So Yit denotes the response of unit i, i∈I ⁼{1, . . . ,n}, at occasiont, t∈T ⁼ {1, . . . ,T}, withYit ∈C ={1, . . . ,c}. The response is assumed to reflect the levels of unobservable latent constructsLit,i∈I,t∈T and can be observed under two different latent regimes: awareness(AWR) andmiddle or extreme categories response style(EMRS) that are captured by binary latent variables Uit,i∈I,t∈T. The presence of two regimes is based on the idea that when required to express their opinion on one item, respondents either identify their true preference into one category on the rating scale or, when in doubt or reluctant to disclose their opinion, take shelter by opting for the extreme or middle categories. These are the cases, for example, of patients asked to give a subjec- tive assessment of their health or disability in daily living, or people required to evaluate their financial capability; all of them can feel confident or reluctant to answer. The proposal is a hidden Markov model (HMM) defined by two components that describe the distribution of the latent variables and the conditional distribution of the response given the latent variables. It generalizes the models by Bartolucciet al., 2012 to a bivariate latent Markov process. Here, we describe the main features of the model proposed by Colombiet al., 2021.

The latent Markov model. For everyi∈I, t∈T, the latent construct Lit (as: health status, financial capability) has a finite discrete state space SL={1, . . . ,k}, while the latent binary response style indicator Uit has a state space SU ={1,2}, where 1 and 2 denote the EMRS and AWR states, respectively. The latent variables are independent across units and for every unit, {Lit,Uit}t∈T is a first order bivariate Markov process with states (u,l), u∈SU, l∈SL. The initial probabilities (t=1) of{Lit,Uit}t∈T areπi1(u,l), and πit(u,l|u,¯ l)¯ are the transition probabilities. They are are simplified to πit(u,l|u¯,l¯) =π^U_it^|^L(u|l,u¯)π^L_it(l|l¯),t=2, . . . ,T,by assuming thatLit, given its past, does not depend on the past ofUit and the currentUit depends on its past and on the contemporaneous latent construct but not on the past of the latent construct. The row vectorsx^(m)_i andz^(m)_it ,m∈ {L,U}, stand for the covariates, not necessarily different, influencing the initial and transition probabilities, respectively, of the latent variables. Assuming independence between the latent variables at the first time, the latent model is specified by the following logit models: A) a baseline logit model for the initial probabilities of the latent construct log_π^πL^Lⁱ¹^(l)

i1(1) =α0l+α_1lx^(L)_i ,l=2, . . . ,k; B) a logit model for the ini- tial probabilities of the response style indicator log^π_π^Uⁱ¹U⁽¹⁾

i1(2) =α¯0+α¯₁x^(U)_i ; C) baseline logit models for the marginal transition probabilities of the latent construct, with reference category the state ¯lof the previous time point, i.e. for l¯∈SL,log^π_π^L^itL^(l^|^l)^¯

it(l¯|l)¯ =β_0ll¯+β_1ll¯z^(L)_it ,l∈SL,l=l,t¯ =2, . . . ,T; D) a logit model for the conditional transition probabilities of the response style indicator for each response style state ¯uof the previous occasion and for each current statel of the latent construct log^πÛît^|L⁽¹^|^l,û)^¯

π^U_it^|^L(2|l,u)¯ =β¯0lu¯+β¯_1l_u_¯z^(U)_it ,l∈SL,u¯∈SU,t=2, . . . ,T.

The observation model.Independence is assumed among units. The conditional probability functions ofYit, given the EMRS(1,l)and AWR(2,l)latent states are both time and subject invariant, denoted by f(y|l,u),u∈SU,l∈ SL,y∈C,fort∈T,i∈I. Given the EMRS regime, f(y|l,1),l∈SL, is parameterized by the logits log _f_(y−1|l,1)^f^(y^|^l,1) =φ0l+φ1ls(y),y=2, . . . ,c,where the scores are known constantss(y) = (^c₂−y)/

∑^c_y=1⁻¹(y−c/2)²,y∈C,φ0 gov- erns the skewness,φ1the U and bell shape. Given the AWR regime, f(y|l,2), l∈SL, is parameterized by the logits log _f_(y^f^(y|l,2)₋₁_|_l,2)=ϕy_l,y=2, . . . ,c.

Application to Bank of Italy data.We applied the model to the panel data from the Survey on Household Income and Wealth (Bank of Italy), collected every 2 years from 2006 to 2016 on 1109 Italian households. The ordinal re-

(4)

A

CCOUNTING FOR

R

^ESPONSE

B

^{EHAVIOR IN}

L

ONGITUDINAL

R

^ATING

D

^ATA

Roberto Colombi¹, Sabrina Giordano²and Maria Kateri³

1Department of Management, Information and Production Engineering, University of Bergamo, Italy (e-mail:roberto.colombi@unibg.it)

2Department of Economics, Statistics and Finance “Giovanni Anania”, University of Calabria, Italy (e-mail:sabrina.giordano@unical.it)

3 Institute for Statistics, RWTH Aachen University, Germany (e-mail:

maria.kateri@rwth-aachen.de)

ABSTRACT: We present a hidden Markov model for repeated ordinal responses observed on some units at different time occasions. The responses reflect the levels of unobservable latent constructs and can be observed under two latent regimes according to whether the respondents are confident with their preference or take shelter in the extremes/middle points of the rating scale.

KEYWORDS: latent variables; response style; financial capability.

Hidden Markov models with two regimes

Consider one ordinal response observed on n units atT time occasions. So Yit denotes the response of unit i, i∈I ⁼{1, . . . ,n}, at occasiont,t∈T ⁼ {1, . . . ,T}, withYit ∈C ={1, . . . ,c}. The response is assumed to reflect the levels of unobservable latent constructsLit,i∈I,t∈T and can be observed under two different latent regimes: awareness(AWR) andmiddle or extreme categories response style(EMRS) that are captured by binary latent variables Uit,i∈I,t∈T. The presence of two regimes is based on the idea that when required to express their opinion on one item, respondents either identify their true preference into one category on the rating scale or, when in doubt or reluctant to disclose their opinion, take shelter by opting for the extreme or middle categories. These are the cases, for example, of patients asked to give a subjec- tive assessment of their health or disability in daily living, or people required to evaluate their financial capability; all of them can feel confident or reluctant to answer. The proposal is a hidden Markov model (HMM) defined by two components that describe the distribution of the latent variables and the conditional distribution of the response given the latent variables. It generalizes the models by Bartolucci et al., 2012 to a bivariate latent Markov process. Here, we describe the main features of the model proposed by Colombiet al., 2021.

The latent Markov model. For everyi∈I,t∈T, thelatent construct Lit (as: health status, financial capability) has a finite discrete state space SL ={1, . . . ,k}, while the latent binary response style indicator Uit has a state space SU ={1,2}, where 1 and 2 denote the EMRS and AWR states, respectively. The latent variables are independent across units and for every unit,{Lit,Uit}t∈T is a first order bivariate Markov process with states(u,l), u∈SU, l∈SL. The initial probabilities (t=1) of{Lit,Uit}t∈T are πi1(u,l), and πit(u,l|u,¯ l)¯ are the transition probabilities. They are are simplified to πit(u,l|u¯,l¯) =π^U_it^|^L(u|l,u¯)π^L_it(l|l¯),t=2, . . . ,T,by assuming thatLit, given its past, does not depend on the past ofUit and the currentUit depends on its past and on the contemporaneous latent construct but not on the past of the latent construct. The row vectorsx^(m)_i andz^(m)_it ,m∈ {L,U}, stand for the covariates, not necessarily different, influencing the initial and transition probabilities, respectively, of the latent variables. Assuming independence between the latent variables at the first time, the latent model is specified by the following logit models: A) a baseline logit model for the initial probabilities of the latent construct log_π^πL^Lⁱ¹^(l)

i1(1) =α0l+α_1lx^(L)_i ,l=2, . . . ,k; B) a logit model for the ini- tial probabilities of the response style indicator log^π_π^UUⁱ¹⁽¹⁾

i1(2) =α¯0+α¯₁x^(U)_i ; C) baseline logit models for the marginal transition probabilities of the latent construct, with reference category the state ¯l of the previous time point, i.e. for l¯∈SL,log^π_π^L^itL^(l^|^l)^¯

it(l¯|l)¯ =β_0ll¯+β_1ll¯z^(L)_it ,l∈SL,l=l,t¯ =2, . . . ,T; D) a logit model for the conditional transition probabilities of the response style indicator for each response style state ¯uof the previous occasion and for each current statelof the latent construct log^πÛ|Lît ⁽¹^|^l,û)^¯

π^U_it^|^L(2|l,u)¯ =β¯0lu¯+β¯_1l_u_¯z^(U)_it ,l∈SL,u¯∈SU,t=2, . . . ,T.

The observation model.Independence is assumed among units. The conditional probability functions ofYit, given the EMRS(1,l)and AWR(2,l)latent states are both time and subject invariant, denoted by f(y|l,u),u∈SU,l∈ SL,y∈C,fort∈T,i∈I. Given the EMRS regime, f(y|l,1), l∈SL, is parameterized by the logits log_f_(y−1|l,1)^f^(y^|^l,1) =φ0l+φ1ls(y),y=2, . . . ,c,where the scores are known constantss(y) = (₂^c−y)/

∑^c_y=1⁻¹(y−c/2)²,y∈C,φ0gov- erns the skewness,φ1the U and bell shape. Given the AWR regime, f(y|l,2), l∈SL, is parameterized by the logits log_f_(y^f^(y|l,2)₋₁_|_l,2)=ϕy_l,y=2, . . . ,c.

Application to Bank of Italy data.We applied the model to the panel data from the Survey on Household Income and Wealth (Bank of Italy), collected every 2 years from 2006 to 2016 on 1109 Italian households. The ordinal re-

(5)

60

Figure 1.Observation probability functions of AWR and EMRS respondents in the two latent states of the perceived financial condition.

sponse of interest is the perception of the household’s financial ability to make ends meet (ve = very easily, e = easily, fe = fairly easily, sd = with some difficulty, d = with difficulty, gd = with great difficulty), the covariates are: G (female,male), J (Jse: self-employee, Jhrs: housekeeper/retired/student,em- ployee), CH (with children, no children), D (with debts, no debts), S (with savings,no savings), E (up to secondary school,over high school), R (no risk averse in managing financial investments,risk averse), with the reference cate- gories being in italics. The minimum BIC corresponds to the model withk=2 states, meaning that households can be grouped according to whether they feel financially confident (l=1) or deal with financial stress (l=2). Fig. 1 allows us to characterize the choices of the respondents in 4 latent states. Individuals, in the financially confident latent state, when in doubt about their perception, tend to choose with more chance the optimistic extreme points, AWR people instead are more incline to the intermediate rates. Reluctant households (EMRS) in the latent group that deals with financial stress have the highest probabilities of reporting great difficulties, AWR people in the same group are more likely to point out just some difficulties. The behavior in the 4 stata is well distinguished, and optimistic/pessimistic choices are mainly due to the EMRS tendency. By the sign of the estimates in Table 1 row 1, we deduce that at the first occasion women, employees, people without savings, with high education and risk averse are with higher probability in a worse financial sta-

Table 1. Estimates (EM algorithm) of the parameters of logit models A, B, C, D.

parameters cst G Jse Jhrs CH D S E R

(α02,α2) 2.8 0.44^∗ -1.38^∗ -0.75^∗ -0.15 0.02 -1.44^∗ -1.86^∗ -0.35^∗ (¯α0,α¯1) -0.06 -0.03 0.16 0.08 -0.04 0.32 0.63^∗ 0.04 0.14 (β021,β₁₂₁) -0.86 1.32^∗ 0.27 -0.49 -0.89^∗ 0.48 -1.69^∗ -1.16^∗ -0.17 (β012,β₁₁₂) -11.93 0.18 -0.91 -0.21 -0.36 -0.23 8.44^∗ 1.38^∗ -8.83^∗ (β¯011,β¯₁₁₁) 1.10 0.45 -0.29 0.00 -0.20 0.13 -0.79^∗ -0.47^∗ -0.06 (β¯021,β¯₁₂₁) -3.36 -0.05 1.09^∗ -0.33 0.45 -0.37 1.97^∗ 0.81^∗ -0.37 (β¯012,β¯₁₁₂) 1.91 -0.07 -0.35 -0.23 0.00 -0.05 -0.19 -0.29 -0.39^∗ (β¯022,β¯₁₂₂) 1.69 -0.50 -0.34 -0.08 0.10 -0.07 1.80^∗ -0.09 -0.37

cst: constant – ∗95% confidence interval does not contain zero

tus. Further, responders with savings show a major propensity to a response style at the beginning of the survey (row 2). From row 3, it seems that, in two consecutive moments, women move from a financially confident (l=1) condition to a worse status (l=2) with higher probability, while low-educated households with children and savings more likely tend to rest in the previous more comfortable financial status (l=1). Individuals who have savings and a low education pass with greater probability from the financial stressed status (l=2) to the better condition (l=1), while financially stressed households tend to remain in the same worst status with greater probability when they are no risk averse (row 4). From rows 5-6, it is more likely to change from the EMRS status ( ¯u=1) to an AWR behavior (u=2) for low educated persons with savings, who currently belong to the group of financially confident households, while self-employee and low educated respondents with savings show greater probability of remaining in the EMRS status if in the previous occasion were reluctant ( ¯u=1) and in the current time are financially stressed (l=2).

Who is no risk averse and in the current moment feels to be financially confident has higher probability of keeping the previous awareness in revealing the own financial capability. On the other hand, individuals with savings, being in the latent financially worrying status, tend with more propensity to give up on the previous AWR behavior and opt for a response style, rows 7-8.

References

BARTOLUCCI, F., FARCOMENI, A., & PENNONI, F. 2012. Latent Markov Models for Longitudinal Data. CRC Press.

COLOMBI, R., GIORDANO, S., & KATERI, M. 2021. Hidden Markov models for longitudinal rating data with dynamic response styles: evidence on household financial capability. Submitted.

(6)

Figure 1.Observation probability functions of AWR and EMRS respondents in the two latent states of the perceived financial condition.

sponse of interest is the perception of the household’s financial ability to make ends meet (ve = very easily, e = easily, fe = fairly easily, sd = with some difficulty, d = with difficulty, gd = with great difficulty), the covariates are: G (female, male), J (Jse: self-employee, Jhrs: housekeeper/retired/student,em- ployee), CH (with children, no children), D (with debts, no debts), S (with savings,no savings), E (up to secondary school,over high school), R (no risk averse in managing financial investments,risk averse), with the reference cate- gories being in italics. The minimum BIC corresponds to the model withk=2 states, meaning that households can be grouped according to whether they feel financially confident (l=1) or deal with financial stress (l=2). Fig. 1 allows us to characterize the choices of the respondents in 4 latent states. Individuals, in the financially confident latent state, when in doubt about their perception, tend to choose with more chance the optimistic extreme points, AWR people instead are more incline to the intermediate rates. Reluctant households (EMRS) in the latent group that deals with financial stress have the highest probabilities of reporting great difficulties, AWR people in the same group are more likely to point out just some difficulties. The behavior in the 4 stata is well distinguished, and optimistic/pessimistic choices are mainly due to the EMRS tendency. By the sign of the estimates in Table 1 row 1, we deduce that at the first occasion women, employees, people without savings, with high education and risk averse are with higher probability in a worse financial sta-

Table 1. Estimates (EM algorithm) of the parameters of logit models A, B, C, D.

parameters cst G Jse Jhrs CH D S E R

(α02,α2) 2.8 0.44^∗ -1.38^∗ -0.75^∗ -0.15 0.02 -1.44^∗ -1.86^∗ -0.35^∗ (α¯0,α¯1) -0.06 -0.03 0.16 0.08 -0.04 0.32 0.63^∗ 0.04 0.14 (β021,β₁₂₁) -0.86 1.32^∗ 0.27 -0.49 -0.89^∗ 0.48 -1.69^∗ -1.16^∗ -0.17 (β012,β₁₁₂) -11.93 0.18 -0.91 -0.21 -0.36 -0.23 8.44^∗ 1.38^∗ -8.83^∗ (β¯011,β¯₁₁₁) 1.10 0.45 -0.29 0.00 -0.20 0.13 -0.79^∗ -0.47^∗ -0.06 (β¯021,β¯₁₂₁) -3.36 -0.05 1.09^∗ -0.33 0.45 -0.37 1.97^∗ 0.81^∗ -0.37 (β¯012,β¯₁₁₂) 1.91 -0.07 -0.35 -0.23 0.00 -0.05 -0.19 -0.29 -0.39^∗ (β¯022,β¯₁₂₂) 1.69 -0.50 -0.34 -0.08 0.10 -0.07 1.80^∗ -0.09 -0.37

cst: constant – ∗95% confidence interval does not contain zero

tus. Further, responders with savings show a major propensity to a response style at the beginning of the survey (row 2). From row 3, it seems that, in two consecutive moments, women move from a financially confident (l=1) condition to a worse status (l=2) with higher probability, while low-educated households with children and savings more likely tend to rest in the previous more comfortable financial status (l=1). Individuals who have savings and a low education pass with greater probability from the financial stressed status (l=2) to the better condition (l=1), while financially stressed households tend to remain in the same worst status with greater probability when they are no risk averse (row 4). From rows 5-6, it is more likely to change from the EMRS status ( ¯u=1) to an AWR behavior (u=2) for low educated persons with savings, who currently belong to the group of financially confident households, while self-employee and low educated respondents with savings show greater probability of remaining in the EMRS status if in the previous occasion were reluctant ( ¯u=1) and in the current time are financially stressed (l=2).

Who is no risk averse and in the current moment feels to be financially confident has higher probability of keeping the previous awareness in revealing the own financial capability. On the other hand, individuals with savings, being in the latent financially worrying status, tend with more propensity to give up on the previous AWR behavior and opt for a response style, rows 7-8.

References

BARTOLUCCI, F., FARCOMENI, A., & PENNONI, F. 2012. Latent Markov Models for Longitudinal Data. CRC Press.

COLOMBI, R., GIORDANO, S., & KATERI, M. 2021. Hidden Markov models for longitudinal rating data with dynamic response styles: evidence on household financial capability. Submitted.