Interpretable Discriminative Dimensionality Reduction and Feature Selection on the Manifold

(1)

Interpretable Discriminative Dimensionality Reduction and Feature Selection

on the Manifold

Babak Hosseini*, Barbara Hammer

*Bielefeld University (formerly) Dortmund University (currently)

Twitter: @Babak_hss

ECML 2019, 19 September 2019

(2)

Outline:

• Introduction

• Proposed Method

• Experiments

• Conclusion

(3)

Babak Hosseini, Barbara Hammer ECML 2019, 19 September 2019

Dimensionality reduction (DR):

• Mapping:

• Visualization purpose

• Lower down data complexity

(4)

Relational representation:

• No vectorial representation anymore

(5)

DR on manifold:

dim. reduction Input space

Relational rep. Feature space

Projected space

(6)

Interpretation of the projection:

dim. reduction

?

Feature space

(7)

Class-based Interpretation:

• Applicable to Kernel-based DR method

(8)

Class-based Interpretation:

• Kernel-PCA

• Embedding dimensions

• Each is recont. from a selection of data

Q: all of them selected from one class?

• If Yes  dimension represents (or is related to) class q

(9)

Class-based Interpretation:

• a & b: each dim. uses all classes

• c & d: each dim. uses all almost one class

• Separation of data in the label-space

(10)

Class-based Interpretation:

Supervised K-based DR methods

• e.g.: K-FDA (kernel fisher discriminant analysis)

• Within-class () and between-class () covariance matrices

• Good class-separation

• Weak class-based interpretation

(11)

Outline:

• Introduction

• Proposed Method

• Experiments

• Conclusion

(12)

Notations:

• Training Matrix:

• Label matrix:

• Mapping to RKHS (rel. rep.)

• Embedding dimensions

• Embedding of

(13)

Objectives:

• O1: Increasing the class-based interpretation of embedding dimensions.

• O2: The embedding should make the classes more separated in the LD space.

• O3: The classes should be locally more condensed in the embedded space.

• O4: Performing feature selection if a multiple kernel representation is provided.

(14)

Objectives:

Optimization framework:

(15)

Objectives:

(16)

Interpretability term (O1):

•

• Embedding vector:

1. non-zero  small 2.  large

Reconst.  close data points in RKHS

Smooth labeling in local neighborhoods

(17)

Objectives:

(18)

Inter-class dissimilarity (O2):

•

• Projected vectors

Goal:

• To reduce the similarity of and other classes in the embedded space

(19)

Objectives:

(20)

Intra-class similarity (O3):

•

• Works on non-zero entries in each belonging to class()

Goal:

• If is large  embedding dim: is const. the class()

(21)

Objectives:

(22)

Feature-selection (O4):

• m projections:

•

• Multiple-kernel representation of

(23)

Feature-selection (O4):

• : dim. in e.g.:

• multivariate time-series

• multi-view image data

• multi-domain information

• …

• Scaling of the RKHS:

Goal:

• Given the supervised information

•  dim. is chosen

(24)

Feature-selection (O4):

Goal:

• Given the supervised information

•  dim. is chosen

• Injecting into the opt. framework

• affine-constraint + non-negativity cons.  interpretable solution

(25)

Optimization scheme:

Convexity of the terms:

• PSD

• non-convex term (w.r.t. ):

• relaxation of the opt. problem

• Alternating opt. scheme

(26)

Optimization scheme:

• Close-form solution

• ADMM algorithm

• QP

(27)

Outline:

• Introduction

• Proposed Method

• Experiments

• Conclusion

(28)

Datasets:

Different domains:

• face, text, image, etc.

• UCI & feature-selection rep.

• A wide range of dimensions

Alternative methods:

• Supervised: K-FDA, LDR, SDR, KDR

• Unsupervised: JSE, S-KPCA, KEDR

(29)

Dimensionality reduction results:

• Classification accuracy (%)

• 1-nn classifier based on the projected data

• 10-fold CV

(30)

Dimensionality reduction results:

(31)

Interpretation of the embedding dimension:

• Interpretability measure :

• becomes if is recon. using one class

• close to if is recon. using all the class

(32)

Interpretation of the embedding dimension:

• Projecting the emb. Dimensions on the label-space:

(33)

Feature selection result:

• MK representation of the data

• non-zero entries in beta:

• alternative methods:

• MKL algorithms: MKL-TR, MKL-DR, KNMF-MKL, and DMKL

• Classification accuracy &

(34)

Conclusion:

• A novel method for discriminative dimensionality reduction.

• Focused on the local neighborhoods in RKHS

• Aimed the class-based interpretation of the embedding dimensions.

• A good trade-off between interpretation and separation of classes.

• Feature-selection extension using multiple-kernel data representation.

(35)

• Thank you very much!

• Questions?

Twitter: @Babak_hss

35 35