Unsupervised online multitask learning of behavioral sentence embeddings

Appropriate embedding transformation of sentences can aid in downstream tasks such as NLP and emotion and behavior analysis. Such efforts evolved from word vectors which were trained in an unsupervised manner using large-scale corpora. Recent research, however, has shown that sentence embeddings trained using in-domain data or supervised techniques, often through multitask learning, perform better than unsupervised ones. Representations have also been shown to be applicable in multiple tasks, especially when training incorporates multiple information sources. In this work we aspire to combine the simplicity of using abundant unsupervised data with transfer learning by introducing an online multitask objective. We present a multitask paradigm for unsupervised learning of sentence embeddings which simultaneously addresses domain adaption. We show that embeddings generated through this process increase performance in subsequent domain-relevant tasks. We evaluate on the affective tasks of emotion recognition and behavior analysis and compare our results with state-of-the-art general-purpose supervised sentence embeddings. Our unsupervised sentence embeddings outperform the alternative universal embeddings in both identifying behaviors within couples therapy and in emotion recognition.


I. INTRODUCTION
Representation learning has become a crucial tool for obtaining superior results in many machine learning tasks [1].In the scope of natural language processing (NLP) a notable example of transforming input into more informative nonlinear abstractions is word embeddings, or word2vec [2].Word embeddings exploit the use of language by learning semantic regularities based on a context of neighboring words.This form of contextual learning is unsupervised, which allows learning from large-scale corpora and is a main reason for its strength and effectiveness in improving performance of many tasks such as constituency parsing [3], sentiment analysis [4], [5], natural language inference [6], and video/image captioning [7], [8].
Later, with the introduction of sequence-to-sequence models [9], embeddings were extended to encode entire sentences and allowed representation of higher levels of concept through transformation of longer contexts.For example, [10] obtained sentence embeddings, which they referred to as skip-thoughts, by training models to generate the surrounding sentences of extracts from contiguous pieces of text from novels.The authors showed that the embeddings were adept at representing the semantic and syntactic properties of sentences through evaluation on various semantic related tasks.In [11] the authors extracted sentence embeddings from an LSTM-RNN which was trained using user click-through data logged from a web search engine.They then showed that embeddings generated by their models were especially useful for web document retrieval tasks.Later, [12] extracted sentence embeddings from a conversation model and showed the richness of semantic content by applying an additional weakly-supervised architecture to estimate the behavioral ratings of couples therapy sessions.Many other works have focused on obtaining general purpose sentence representations: sentence embeddings that are adept at multiple NLP tasks [13], [14], [15].
The benefit of many of the methods in aforementioned works is that the embedding transformation is learned on large amounts of unlabeled data.Since natural language is an extremely complex process, it is crucial to leverage large corpora when learning embeddings so as to capture true semantic concepts instead of regularities of the data, e.g.domain-specific topics [16].Unsupervised learning allows us to utilize as much data as possible to increase the breadth of language understanding while minimizing the effort of data annotation.
However, a common issue with unsupervised training of sentence embeddings is the unpredictability of the resulting embedding transformation.In other words the embedding distribution is highly random and often contains redundant or irrelevant information.In addition, depending on training conditions such as architecture or dataset, it might fail to capture informational concepts or even semantics of the input data [14].This is to be expected since the amount of information increases significantly as we move from words to sentences.It has been also noted that the quality of sentence embeddings is often highly dependent on the training dataset [11], [12].So much so that the use of embeddings trained on small domain-relevant datasets could yield results better than those trained on larger generic unsupervised datasets [10].
In this work we propose an online multitask learning (MTL) framework which aims to guide unsupervised sentence embeddings into a space that is more discriminative in a final task.In our framework, transfer of domain-knowledge is achieved through an additional task in parallel with unsupervised contextual learning.The labels for the multitask are generated online from the unlabeled data to maintain the low annotation effort of an unsupervised scenario.Finally we apply the sentence embeddings to a final task of annotating human behaviors as evaluation and show improvement in the potency of unsupervised contextual learning through MTL.

II. RELATED WORK
Many works have focused on leveraging multitask learning to enhance the informational content of sentence embeddings.These methods can generally be categorized into task-specific or general-purpose applications.
In task-specific implementations a multitask function is often added to a primary supervised objective.For example, [17] jointly learned sentence embeddings with an additional pivot prediction task in conjunction with sentiment classification.[18] predicted neighboring words as a secondary objective to improve accuracy of various sequence labeling tasks.
On the other hand, general purpose sentence embeddings aim to provide pre-trained features which, when transferred to unrelated tasks, improves overall performance.[19] achieved this by combining various tasks such as machine translation, constituency parsing, and image caption generation, which improved the translation quality between English and German.Recently, [20] presented a large-scale multitask framework for learning general purpose sentence embeddings by training with a multitude of NLP tasks, including skip-thought training, machine translation, entailment classification, and constituent parsing.Similarly, universal sentence representations were also proposed in [14] and [15].[14] used a single Natural Language Inference (NLI) task as the training objective whereas [15] also included tasks such as skip-thought and response-generation.
Our work differs in that we build on contextual learning and attempt to guide unsupervised learning through a related multitask objective.Unlike prior works, we target unsupervised scenarios and instead use a simple scheme to generate multitask labels online.Although unsupervised learning has historically required more data and training time, recent implementations of general purpose sentence embeddings have greatly scaled up training in both dataset size and model complexity.We show that through multitask guidance unsupervised sentence embeddings can still excel in targeted tasks without requiring extensive labeled datasets or complicated models.
In this paper we evaluate the performance of the unsupervised multitask sentence embeddings in identifying various human behaviors exhibited in conversational dialogue.In order to assess different sentence embedding methods fairly we apply simple machine learning techniques to obtain results for the final task rather than neural networks which would be able to exploit minor gains in the features.We then provide an analysis of the results to give insight on the benefits of our proposed framework.

A. Sequence-to-sequence sentence embeddings
The sequence-to-sequence model (seq2seq) [9] maps input sequences to output sequences using an encoder-decoder architecture.Given an input sentence x = (x 0 , x 2 , ..., x T ) and output sentence y = (y 0 , y 2 , ..., y T ), where x t and y t represent individual words, the standard sequence model can be expressed as computing the conditional probability where s is the sequence of outputs s t from the encoder and h is the internal representation of the input given by the last hidden state of the encoder.For a given dataset D = {(x n , y n )} N n=1 , the internal representation h can be expressed as where f (•) is the encoder function and θ is the set of parameters resulting from D.
The internal representation h θ encodes the input x into an internal representation that allows the decoder to generate the best estimate of y.In cases where D contains semanticallyrelated data pairs, h θ can be viewed as a semantic vector representation of the input, or sentence embedding, which can be useful for subsequent NLP tasks.In our case we apply contextual learning and designate consecutive sentences in continuous corpora as x and y.
While this model allows us to obtain semantic rich embeddings through training on unsupervised data, the quality of the embeddings is highly influenced by biases in the data and prevents the embeddings from becoming specialized in any target task [14].Therefore we propose to enhance the quality of unsupervised sentence embeddings through multitask learning.

B. Multitask embedding training
The addition of a multitask objective can guide embeddings into a space that is more discriminative in a target application.We hypothesize that this holds true even when the multitask labels are generated online from unsupervised data with no assumption of label reliability, as long as there is some relation between the multitask and target application.
Assuming an online system which generates multitask labels b for each input x we can augment the dataset to yield We then aim to predict this new label b in conjunction with the original output sequence y.This is implemented in our seq2seq model by adding another head, or multitask network, after the internal representation h, as shown in Figure 1.In addition to Eq. 1, the model now also estimates the conditional probability where g(•) is the multitask network and h θaug is the new internal representation given by D aug .In this work, the multitask network g(•) is implemented with a multilayer perceptron.
The training loss is then the weighted sum of losses from the multiple tasks, defined as where L 1 and L 2 are the cross entropy losses for contextual learning and the additional task, respectively.With most multitask setups there is an issue on how to control the training ratio λ to account for different data sources.For example, if there is no overlap in inputs of the multiple tasks then λ can only alternate between 0 and 1 during training to switch between the different tasks.However, since we propose a multitask objective whose labels are generated from incoming data we are able to freely adjust λ.It is possible to adjust the multitask ratio as training progresses to put emphasis on different tasks but we do not make any assumptions on the optimal weighting scheme and give equal importance to both tasks by setting λ to 0.5.
m X t p u b d e 9 X 6 B c x R h B M 4 h X P w 4 A r q c A c N 8 I F B B s / w C m / O k / P i v D s f 8 9 a C s 5 i p w B 8 4 n z 9 4 1 5 T a < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w x A 3 x + a 5 / w x w + E q J Z i < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w x A 3 x + a 5 A / w x w + E q J Z i < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w x A 3 x + a 5 A / w x w + E q J Z i < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Unsupervised contextual learning Online transfer learning

C. Online multitask label generation
To guide the embeddings in becoming more humanbehaviorally relevant, we select a multitask objective that attempts to classify the affective state of input sentences.The definition of human behavior is more complicated than these states, however we hypothesize this is a suitable method of transferring related domain knowledge into the unsupervised sentence embeddings.
We generate the affective labels for each input during training using an online mechanism.In our online approach we apply the simplest method by automatically labeling inputs using a simple look-up table of affective words [21].Specifically, we use words categorized in the two top-level affective states: negative and positive emotion.An input sentence is assigned a Negative or Positive label based on the count of words corresponding to each affective state.Although this labeling approach is extremely naive with a high rate of misclassification, we hypothesize the inclusion of affective knowledge in embeddings will be beneficial in identifying more complex behaviors or emotions later.Some examples of affective words in our lookup table are shown in Table I.

IV. BEHAVIOR IDENTIFICATION USING EMBEDDINGS
After unsupervised multitask training the encoder in the seq2seq model is used to extract embeddings for use as features in behavior identification in long pieces of text (which we refer to as sessions).We define sentence embeddings to be the concatenation of the final output states of both the forward and backward RNNs in the encoder.We also concatenated the output states from all the intermediate layers of the encoder.This is an extension of history-of-word embeddings [22] and is motivated by the intuition that intermediate layers represent different levels of concept.By utilizing intermediate representations of the sentence, we hypothesize that more information related to human behavior can be captured.Annotation of human behavior using sentence embeddings was then applied using various unsupervised and supervised methods.

A. Unsupervised clustering
As an initial step we analyzed the performance of the embeddings on a behavior classification task without any supervision.We applied a simple k-means clustering method on individual sentence embeddings to obtain multiple clusters.We then labeled the clusters by randomly selecting a single seed session and assigning the session label to the centroid which the majority of embeddings in the session were closest to.During evaluation, session labels were predicted based on the centroid which the majority of embeddings from the session were closest to.

B. k-Nearest neighbors
For supervised classification we applied a simple k-nearest neighbor (k-NN) approach.In k-NN, an embedding is labeled according to its k-nearest neighbors in the training set.The final session label was then obtained by a majority vote over all embeddings in the session.

C. Rating estimation using neural networks
Finally, we applied a neural network on top of the embeddings to estimate actual behavior ratings.For this section we applied the framework proposed in [12].Sessions were segmented into sentences and represented as a sequence of embeddings.A sliding window of size 3 was applied over the embeddings followed by an RNN using LSTM units.The RNN was trained to predict the session rating from each window.The final session label was obtained by training a Support Vector Regressor to map from the median of the window predictions to the session rating.For more details the reader can refer to [12].

V. EXPERIMENTAL SETUP
A. Datasets 1) OpenSubtitles: We used separate datasets to train the unsupervised and supervised portions of our proposed method.Since our final task is behavior annotation of human interaction, we wish to use a dataset that contains conversational speech when learning the unsupervised sentence embedding.A natural choice for a source rich in dialogue is movie subtitles.To this end we used the OpenSubtitles Corpus [23].This corpus was generated using data from the website opensubtitles.org and contains user-submitted subtitles of movies and TV shows.
We applied additional pre-processing in addition to the steps already taken in [23].Mainly, we attempted to generate a back-and-forth conversation by taking consecutive lines in the subtitles and assigning them as utterance and replies in an interaction.As there is no speaker information in the corpus it is hard to distinguish between dialogues and monologues without the use of advanced content analysis methods.However, we assume that this difference in conversational continuity will be dampened by the large amount of data available.We also assume that monologues also represent some form of internal dialogue which closely ties with the concepts between sentences.
Finally, we applied standard text processing techniques to clean up the text further.These included auto-correction of commonly misspelled words, contraction removal, and replacement of proper nouns through parts-of-speech tagging.The final unsupervised training set consists of 30 million sentence pairs.
2) Couples Therapy Corpus: We applied our unsupervised sentence embeddings to the task of annotating behaviors in human interactions.For this we used data from the UCLA/UW Couple Therapy Research Project [24] which contains recordings of 134 real couples with marital issues interacting over multiple sessions.In each session the couples each discussed a self-selected topic for around 10 minutes.The recordings of the session were then rated by multiple annotators based on the Couples Interaction [25] and Social Support [26] Rating Systems.This rating system describes 33 behavioral codes rated on a Likert scale of 1 to 9, where 1 indicates strong absence and 9 indicates strong presence of the given behavior.The number of annotators per session ranged from 2 to 12, however the majority of sessions (∼90%) had 3 to 4 annotators.Annotator ratings were then averaged to obtain a 33 dimensional vector of behavior ratings per interlocutor for every session.The ratings were binarized to produce labels for the classification task and the Likert scale values were used for behavior rating estimation.
In this work we focused on the behaviors Acceptance, Blame, Humor, Sadness, Negativity, and Positivity.Similar to prior works ( [27], [12]) we used only the top and bottom 20% of the dataset in terms of averaged behavior ratings.To train our models the dataset was split into train and test sets using a leave-one-couple-out scheme.That is, for each fold, one couple was used as the test set and the remaining as the train set.This resulted in 85-fold cross-validation.
3) IEMOCAP: We also evaluated the effectiveness of our sentence embeddings in emotion recognition using the Interactive Emotional Dyadic Motion Capture Database (IEMOCAP) [28].This dataset contains recordings from five male-female pairs of actors performing both scripted and improvised dyadic interactions.Utterances from the interactions were then rated by multiple annotators for dimensional and categorical emotions.Similar to other works [29], [30], we focused on four categorical labels where there was majority agreement between annotators: happiness, sadness, anger, and neutral, with excitement considered as happiness.We used the transcripts from the dataset and removed any acoustic annotations such as laughter or breathing.After discarding empty sentences our final dataset consisted of 5,500 utterances (1103 for anger, 1078 for sadness, 1615 for happiness, and 1704 for neutral).
To train the supervised layers we used leave-one-pair-out which resulted in a 5-fold cross-validation scheme.

B. Model architectures and training details
1) Sentence embeddings: The sequence-to-sequence model with multitask objective, shown in Figure 1, can be described as three sections: the encoder, the decoder, and the multitask network.The encoder was constructed using a multi-layered bidirectional RNN using GRU units.We performed a grid search using hyper-parameter settings of 2 and 3 layers, and, 100 and 300 dimensions in each direction per layer.For the decoder a unidirectional RNN using GRU units was used instead of bidirectional.The number of layers in the decoder were the same as the encoder while the dimension size was doubled to account for the concatenation of states and outputs from both directions.
The multitask network was implemented using a neural network with four hidden layers of sizes 512, 512, 256, and 128.We used rectified linear unit (ReLU) function as activation functions in the hidden layers and 2-dimensional softmax before the final output.No other network hyper-parameters were tried for the multitask network.
The sentence embedding models were trained with the OpenSubtitles dataset for 5 epochs using SGD with momentum.The learning rate was set to 0.05 and momentum set to 0.9.We also reduced the learning rate by a factor of 10 every epoch.
2) Supervised behavior annotation: Similar to [12] we used an RNN with LSTM units to estimate behavior ratings in the Couples Thearpy Corpus.The RNN had a single layer with a dimension size of 50 in the LSTM unit.A sigmoid function was applied before the output to estimate the normalized rating value.In each fold one couple was randomly selected as validation to select the best model.
3) Supervised emotion recognition: A neural network with four hidden layers was used to classify emotions using embeddings of sentences from the IEMOCAP dataset.The hidden layers were of size 256 and used ReLU as the activation function.The model was trained for 20 epochs using Adagrad [31] as the optimization method.No other network hyperparameters were tried for the emotion recognition network.A subset of the training data (∼10%) was used as validation in selecting the best model.Lex-eVector [32] 57.40 E-vector + MCNN [30] 59.63 mLRF [33] 63.80 InferSent [14]

VI. EXPERIMENTAL RESULTS
We compared our unsupervised multitask sentence embeddings to general purpose embeddings such as InferSent [14], GenSen [20], and Universal Sentence Encoder [15].Table II shows the results of behavior identification using sentence embeddings for different behaviors in the Couple Therapy Corpus.The addition of the multitask objective improved the classification accuracy of unsupervised sentence embeddings from the conversation model across all behaviors except Positivity in unsupervised classification with k-Means.Under supervised learning using k-NN, our multitask embeddings improved accuracy on all behaviors except Humor.In terms of mean accuracy, our multitask embeddings performed better than other sentence embeddings with an absolute improvement over no multitasking of 1.07% and 3.24% for unsupervised and supervised methods respectively.Our multitask embeddings also achieved the highest mean accuracy over all the sentence embeddings tested.
The results of emotion recognition on IEMOCAP are shown in Table III.In addition to general purpose embeddings we also compared with other works that only used transcripts ( [30], [32], [33] ).It should be noted that there is no consensus on data split and evaluation conditions in IEMOCAP, and while we made every effort to be consistent with other works the results may not be directly comparable.However, when comparing among our implementation using sentence embeddings we observed that online MTL improved the weighted accuracy (WA) of unsupervised embeddings by an absolute value of 8.02% which is more than 14% relative improvement.The highest accuracy was obtained using embeddings from the Universal Sentence Encoder, however our implementation was a close second by less than one percent.
Finally, we analyzed the performance of our sentence embeddings on Negativity classification in behavior identification over the progression of training across different model architectures.From the standard error plot, shown in Figure 2, we can observe that the addition of the multitask learning objective collectively increases performance in the final task.This shows that online transfer learning through multitask was successful at improving the performance of unsupervised sentence embeddings in our final task.

VII. CONCLUSION
In this work we explored the benefits of introducing additional objectives to unsupervised contextual learning of sentence embeddings.We found empirical evidence that supports the hypothesis that multitask learning can increase affective concepts in unsupervised sentence embeddings, even when the multitask labels are generated online and extremely unreliable.Our proposed model has the benefit of not requiring additional effort in generating or collecting data for multitasks.This allows learning from large-scale corpora in an unsupervised manner while simultaneously applying transfer learning.In contrast to general purpose sentence embeddings, our model learns sentence representations using less complex models and training effort, while at the same time yields higher performance in our target task.We argue that when learning sentence embeddings, it is more beneficial to apply guided unsupervised learning instead of overemphasis on universality before domain transfer.
While we do expect that further improvements can be obtained through better labels for the multitask objective, that would entail additional effort in system design and label generation.In addition, we also expect that multitask labels that are too domain-specific (e.g.focusing on a specific way or definition of affective expression) may actually hinder the performance of unsupervised embeddings.However, we do not verify this claim and leave it to future work.
t e x i t s h a 1 _ b a s e 6 4 = " S L b x m C k s N A C t G k 7 E 4 I I 6 a + x m M 7 4 = " > A e X P e Z 6 0 5 Z z 5 T h D 9 w P n 4 A + m 2 V + Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J M M w m d P i i y t B m n N H e L n 5 d D 6 m F L w = " > A A A B + 3 i c b V B N S 8 N A E N 3 U r 1 q / o j 1 6 W S y C B y m J F x U v B S 9 e x A r G F p t S N t t J u 3 S z C b u b Q g n 1 r 3 j x o O L V P + L N k 3 / F T d u D t j 4 Y e L w 3 s z v z g o Q z p R 3 n y y o s L a + s r h X X S x u b W 9 s 7 9 u 7 e v Y p T S c G j M Y 9 l M y A K O B P g a a Y 5 N B M J J A o 4 N I L B Z e 4 3 h i A d S P a + 6 t 2 6 l d o y m K K J 9 d I C O k I t O U Q 1 d o T r y E E U j 9 I R e 0 K v 1 a D 1 b b 9 b 7 t L V g z W b K 6 A + s j x 9 E w p b x < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " J M M w m d P i i y t B m n N H e L n 5 d D 6 m F L w = " > A A A B + 3 i c b V B N S 8 N A E N 3 U r 1 q / o j 1 6 W S y C B y m J F x U v B S 9 e x A r G F p t S N t t J u 3 S z C b u b Q g n 1 r 3 j x o O L V P + L N k 3 / F T d u D t j 4 Y e L w 3 s z v z g o Q z p R 3 n y y o s L a + s r h X X S x u b W 9 s 7 9 u 7 e v Y p T S c G j M Y 9 l M y A K O B P g a a Y 5 N B M J J A o 4 N I L B Z e 4 3 h i A R y a z p j o s V r 2 S v E / r 5 / p 6 D o o m E g z D Y I u P o o y j n W C y y T w k E m g m u e G E C q Z 2 R X T M Z G E a p N X G Y K 7 f P I q 8 S 5 b N y 3 3 3 m 2 2 L 6 o 0 a u g E n a J z 5 K I r 1 E Z 3 q I M 8 R F G O n t E r e r O e r B f r 3 f p Y t K 5 Z 1 U w D / Y H 1 + Q O P 4 p Q t < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " S L b x m C k s N A C t G k 7 E 4 I I 6 a + x m M 7 4 = " > A 6 H b q N 8 e o N I / l g 8 k S 7 E R 0 I H m f M 2 q s 1 C 1 X g j G y P B i h k m 5 i y H D S D U v d c t W t u T O Q V e I t S L X u B M d g 0 e i W v 4 J e z N I I p W G C a t 3 2 7 F u d n C r D m c B J K U g 1 J p S N 6 A D b l k o a o e 7 k s + U n 5 M w q P d K P l S 1 p y E z 9 P Z H T S O s s C m 1

p u b d e 9 X 6 B
c x R h B M 4 h X P w 4 A r q c A c N 8 I F B B s / w C m / O k / P i v D s f 8 9 a C s 5 i p w B 8 4 n z 9 4 1 5 T a < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w x A 3 x + a 5 r N t a D a 7 J N n C s t S / 4 s W D i l f / i D d / j G D 6 c d D W B w O P 9 2 a S m R c k g m v j O F + 4 s L G 5 V d w u 7 Z R 3 9 y r 7 B 9 X D o w c d p 4 qB x 2 I R q 0 5 A N Q g u w T P c C O g k C m g U C G g H 4 5 u Z 3 5 6 A 0 j y W 9 y Z L o B f R o e Q h Z 9 R Y q V + t + R N g u T 8 G J Z 3 E k N G 0 H 5 T 7 1 b r T c O Y g 6 8 R d k n o T + 7 X v S j F r 9 a u f / i B m a Q T S M E G 1 7 rr 2 r V 5 O l e F M w L T s p x o S y s Z 0 C F 1 L J Y 1 A 9 / L 5 8 l N y a p U B C W N l S x o y V 3 9 P 5 D T S O o s C 2 x l R M 9 K r 3 k z 8 z + u m J r z q 5 r N t a D a 7 J N n C s t S / 4 s W D i l f / i D d / j G D 6 c d D W B w O P 9 2 a S m R c k g m v j O F + 4 s L G 5 V d w u 7 Z R 3 9 y r 7 B 9 X D o w c d p 4 qB x 2 I R q 0 5 A N Q g u w T P c C O g k C m g U C G g H 4 5 u Z 3 5 6 A 0 j y W 9 y Z L o B f R o e Q h Z 9 R Y q V + t + R N g u T 8 G J Z 3 E k N G 0 H 5 T 7 1 b r T c O Y g 6 8 R d k n o T + 7 X v S j F r 9 a u f / i B m a Q T S M E G 1 7 rr 2 r V 5 O l e F M w L T s p x o S y s Z 0 C F 1 L J Y 1 A 9 / L 5 8 l N y a p U B C W N l S x o y V 3 9 P 5 D T S O o s C 2 x l R M 9 K r 3 k z 8 z + u m J r z q 5 H k e m M y Z 6 r J a 9 U v z P 6 2 d 6 e B 0 U TK S Z B k E X H w 0 z j n W C y y T w g E m g m u e G E C q Z 2 R X T M Z G E a p N X G Y K 7 f P I q 8 S 5 b N y 3 3 3 m 2 2 L 6 o 0 a u g E n a J z 5 K I r 1 E Z 3 q I M 8 R F G O n t E r e r O e r B f r 3 f p Y t K 5 Z 1 U w D / Y H 1 + Q O J 0 p Q p < / l a t e x i t >hb < l a t e x i t s h a 1 _ b a s e 6 4 = " S L b x m C k s N A C t G k 7 E 4 I I 6 a + x m M 7 4 = " > A L X u B M d g 0 e i W v 4 J e z N I I p W G C a t 3 2 7 F u d n C r D m c B J K U g 1 J p S N 6 A D b l k o a o e 7 k s + U n 5 M w q P d K P l S 1 p y E z 9 P Z H T S O s s C m 1 p u b d e 9 X 6 B c x R h B M 4 h X P w 4 A r q c A c N 8 I F B B s / w C m / O k / P i v D s f 8 9 a C s 5 i p w B 8 4 n z 9 4 1 5 T a < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w x

Fig. 1 .
Fig.1.Bidirectional sequence-to-sequence conversation model with multitask objective.The GRU blocks represent multi-layered RNNs using units, C is the concatenation function, and Attn is an attention mechanism.

Fig. 2 .
Fig. 2. Standard error plot of classification accuracy on Negativity across checkpoints for various model configurations.