Incorporating word embeddings into open directory project based large-scale classification

Kang Min Kim, Aliyeva Dinara, Byung Ju Choi, Sang Keun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Recently, implicit representation models, such as embedding or deep learning, have been successfully adopted to text classification task due to their outstanding performance. However, these approaches are limited to small- or moderate-scale text classification. Explicit representation models are often used in a large-scale text classification, like the Open Directory Project (ODP)-based text classification. However, the performance of these models is limited to the associated knowledge bases. In this paper, we incorporate word embeddings into the ODP-based large-scale classification. To this end, we first generate category vectors, which represent the semantics of ODP categories by jointly modeling word embeddings and the ODP-based text classification. We then propose a novel semantic similarity measure, which utilizes the category and word vectors obtained from the joint model. The evaluation results clearly show the efficacy of our methodology in large-scale text classification. The proposed scheme exhibits significant improvements of 10% and 28% in terms of macro-averaging F1-score and precision at k, respectively, over state-of-the-art techniques.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 22nd Pacific-Asia Conference, PAKDD 2018, Proceedings
EditorsBao Ho, Dinh Phung, Geoffrey I. Webb, Vincent S. Tseng, Mohadeseh Ganji, Lida Rashidi
PublisherSpringer Verlag
Pages376-388
Number of pages13
ISBN (Print)9783319930367
DOIs
StatePublished - 2018
Event22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018 - Melbourne, Australia
Duration: 3 Jun 20186 Jun 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10938 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018
Country/TerritoryAustralia
CityMelbourne
Period3/06/186/06/18

Bibliographical note

Publisher Copyright:
© 2018, Springer International Publishing AG, part of Springer Nature.

Keywords

  • Text classification
  • Word embeddings

Fingerprint

Dive into the research topics of 'Incorporating word embeddings into open directory project based large-scale classification'. Together they form a unique fingerprint.

Cite this