• Keine Ergebnisse gefunden

Coupled Social Media Content Representation ModelModel

Socioeconomic Status based on Social Media Data

4.3 The Proposed Model

4.3.2 Coupled Social Media Content Representation ModelModel

In this part, we first present the social media text representation method and coupled user level attribute representation method. Then, the social media text representation and platform-based user level attribute representation are aggregated into a vector representation of social media content. Finally, based on the social media content representation, we build a 3-way classifier to assign SES label to each social media user.

Social Media Text Representation. Long Short-Term Memory (LSTM) [43], a variation of RNN, is widely adopted for textual data modeling due to its excellent performance on sequence modeling. LSTM is able to consider long-term dependencies of a sequence through introducing a memory cell. To model the semantic representation of social media text and consider the order of text, we adopt BiLSTM (Bidirectional LSTM) to represent the social media text both from forward and backward, which can increase the amount of input information available to the network compared with LSTM. Besides, to take into account the hierarchical structure of social media text, inspired by the principle of compositionality [30], we model a social media user’s text through a hierarchical structure composed of three levels, i.e., word-level, microblog-level and user-level.

As shown in Figure 4.2, in the word level, we first embed each word in a microblogbiinto a low dimensional semantic space, i.e., each wordwji is mapped to its embeddingwij ∈Rd. The word embedding method and its settings will be described in Section 4.5.1. At each step, given

4.3 The Proposed Model 71

Figure 4.2:The architecture of the proposed model.

an input word embeddingwij, the current cell statecij and hidden statehijcan be updated with the previous cell statecij−1and hidden statehij−1as follows:

wherei,f,oindicates gate activations,denotes element-wise multiplication,σis the logistic sigmoid function andW,bare the trainable parameters. Therefore, for a sequence of words {wi1, wi2, ..., wil

i}, the forward LSTM reads the word sequence fromwi1towil

i and the backward LSTM reads the word sequence fromwil

i tow1i. Then we concatenate the forward hidden state

−→

hijand the backward hidden state←−

hij, i.e.,hij = [−→ hij;←−

hij], where[.;.]denotes the concatenation operation. In BiLSTM, the hidden statehij denotes the information of the whole sequence centered aroundwij. As a result, the BiLSTM network receives[wi1,wi2, ...,wil

i]and generates hidden states[hi1,hi2, ...,hili]. Then we feed the hidden states to an average pooling layer to obtain the microblog text representationbifor microblogbi.

Intra Intra Intra Inter

Inter Inter

Coupled interactions among attributes

Original user level attribute set Integration Coupled attribute representation

Figure 4.3:An overview of coupled user level attribute representation.

In the microblog level, given the microblog representation vectors of a user{b1,· · · ,bn}, we also utilize BiLSTM to encode the microblogs as follows:

hi =−−−−−−−→

LST M(bi), (4.5)

←−

hi =←−−−−−−−

LST M(bi), (4.6)

We then concatenate the forward hidden state −→

hi and the backward hidden state ←− hi, i.e., hi = [−→

hi;←−

hi].hisummarizes the neighbor microblogs around thei-th microblog but still focus on thei-th microblog. Then we feed the hidden states to an average pooling layer to obtain the final social media text representationutfor useru.

Coupled User Level Attribute Representation. Besides social media text, each social media user generally has platform-based user level attributes. For example, some attributes like the number of followees indicate platform impact, some like the number of microblogs indicate platform behaviors. Like previous related work, we assume that these user level attributes could make a contribution to the representation of social media content for individual SES prediction.

To our best knowledge, most previous works only leverage original user level attributes without considering relations among attributes. However, inspired by previous work [18, 96], in the real word, attributes are more or less coupled via explicit or implicit relationships. Therefore, it is natural to hypothesize that the user level attributes are related to each other in some way. To this end, this work proposes to employ a coupled representation method [96] to represent user level attributes, which is able to capture such latent relations among attributes.

To be more specific, as illustrated in Figure 4.3, we consider two kinds of interaction relations among platform-based user level attributes: the intra-coupled interaction within an attribute with the correlations between every attribute and its own powers, and the inter-coupled interaction among different attributes with the correlations between each attribute and the powers of other attributes.

4.3 The Proposed Model 73

Firstly, we map the original attribute space to an expanded space for incorporating linear and nonlinear information by means of a power expansion as follows:

{ha1i1,ha1i2, ...,ha1iL,ha2i1,ha2i2, ...,ha2iL, ...,hami1,hami2, ...,hamiL} (4.7) wherehajip(1≤pL, p∈Z,1≤jm)denotes thep-th power of the corresponding value of attributeaj.

Leveraging the power expansion, the intra-coupled interaction within attributeanj is defined as anL×LmatrixMIa(aj), with considering the correlations between attributeaj and its own powershajip.

whereθpq(j)denotes the Pearson’s product-moment correlation coefficient betweenhajipand hajiq. Here, we use the revised correlation coefficient by taking account of the p-values for testing the hypothesis of no correlation between attributes, i.e., if p-value is no less than 0.05, the correlation coefficient is set to 0.

Besides, the inter-coupled interaction between numerical attributeaj and other attributesak

(k6=j)is defined as anL×L·(m−1)matrixMIe(aj|{ak}k6=j).

whereδpq(j, ki)denotes the Pearson’s product-moment correlation coefficient betweenhajip andhakiiq, and{ak}k6=j ={ak1, ..., akm−1}is the set of attributes other thanaj.

For each user objectui, the attribute values ofaj and its powers are presented as a vector:

zei(aj) = [hviji1,hviji2, ...,hvijiL], (4.10) while the attribute values of other attributes{ak}k6=jand their powers are denoted as another vector:

zei({ak}k6=j) = [hvik1i1,hvik1i2, ...,hvik1iL, ...,hvikm−1i1,hvikm−1i2, ...,hvikm−1iL]. (4.11)

Here, the attribute value of user ui on attributeaj isvij. We incorporate the intra-coupled interaction and the inter-coupled interaction into a new coupled attribute representation, a1×L vectorri(aj), for user objectuion the numerical attributeaj as follows:

ri(aj) =zei(aj)w⊗[MnIa(aj)]T +zei({ak}k6=j)[w,w,· · · ,w]

| {z }

m−1

⊗[MnIe(aj|{ak}k6=j)]T, (4.12)

wherew= [1/(1!),1/(2!),· · · ,1/(L!)],denotes the Hadamard product and⊗indicates the matrix multiplication. After considering all thednoriginal numerical attributes, we obtain the final coupled user level attribute representation for the user objectuias follows:

rai = [ri(a1),ri(a2),· · ·,ri(am)]∈RL·m (4.13)

Before fusing the user level attributes, to capture the latent relationships between high level features, we link the raw attribute vectorrato thek-length representation vectoruain terms of a fully connected network as follows:

ua=ra·Wa (4.14)

where the weightWaencodes the interaction strength over attributes in the fully-connected layer.

Consequently, in the user level, we aggregate user level attributes and social media text into a representation vector. More specifically, we concatenate the social media text representation and the coupled user level attribute representation to obtain the social media content representation u= [ut;ua].

Individual SES Prediction based on Social Media Content. Given the high level repre-sentation of social media content, we employ a linear layer and a softmax layer to project the social media content representationuinto SES distribution ofCclasses as follows:

pc=sof tmax(W u+b). (4.15)

wherepc is the predicted probability of SES labelc. In this model, the cross-entropy error between ground truth SES level distribution and predicted SES level distribution is defined as loss function for optimization when training:

L=−X

u∈U C

X

c=1

pgc(u)·log(pc(u)), (4.16) wherepgcdenotes the gold probability of SES labelcwith ground truth being 1 and others being 0, andU represents the training social media users.

4.3 The Proposed Model 75

Keyword

Search user

User 2

User 3 User 1 certificated person card

Figure 4.4:A demonstration of user search function in Sina Weibo.