Annotated Corpus for Citation Context Analysis

  • Myriam Hernández-Álvarez National Polytechnic School
  • José Gómez Soriano Universidad de Alicante
  • Patricio Martínez-Barco Universidad de Alicante
Keywords: Corpus, annotation, methodology, machine-learning, function, polarity, aspects, schema, keywords, labels, classification

Abstract

In this paper, we present a corpus composed of 85 scientific articles annotated with 2092 citations analyzed using context analysis. We obtained a high Inter-annotator agreement; therefore, we assure reliability and reproducibility of the annotation performed by three coders in an independent way. We applied this corpus to classify citations according to qualitative criteria using a medium granularity categorization scheme enriched by annotated keywords and labels to obtain high granularity. The annotation schema handle three dimensions: PURPOSE: POLARITY: ASPECTS. Citation purpose define functions classification: use, critique, comparison and background with more specific classes stablished using keywords: Based on, Supply; Useful; Contrast; Acknowledge, Corroboration, Debate; Weakness and Hedges. Citation aspects complement the citation characterization: concept, method, data, tool, task, among others. Polarity has three levels: Positive, Negative and Neutral. We developed the schema and annotated the corpus focusing in applications for citation influence assessment, but we suggest that applications as summary generation and information retrieval also could use this annotated corpus because of the organization of the scheme in clearly defined general dimensions.

DOI

Downloads

Download data is not yet available.

References

Hernández Álvarez, M., & Gómez Soriano, J. (2015b). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering. Cambridge University Press.Available on CJO 2015 doi: 10.1017/S1351324915000388

Athar, A. (2014). Sentiment analysis of scientific citations. Technical Report, University of Cambridge. (UCAM-CL-TR-856).

Mandya, A. A. (2012). Enhancing Citation Context based Information Services through Sentence Context Identification. Doctoral dissertation, University of Otago. Retrieved from: http://hdl.handle.net/10523/2520

Ciancarini, P., Di Iorio, A., Nuzzolese, A. G., Peroni, S., & Vitali, F. (2014). Evaluating citation functions in CiTO: cognitive issues. In The Semantic Web: Trends and Challengespp. 580-94. Springer International Publishing.

Teufel, S., Siddharthan, A., & Tidhar, D. (2006, July). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing(pp. 103-110). Association for Computational Linguistics.

Hernández Álvarez, M., & Gómez Soriano, J. (2015a). Esquema de anotación para categorización de citas en bibliografía científica. Procesamiento del Lenguaje Natural, 54, 45-52.

Hyland, K. 1998. Hedging in Scientific Research Articles, Vol. 54. Amsterdam: John Benjamins Publishing.

Artstein, R., & Poesio, M. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555-96.

Krippendorff, Klaus. 2004. Reliability in content analysis: Some common misconceptions and recommendations. Human Communication Research, 30(3):411–33.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159-74.

Jochim, C., and Schütze, H. 2012. Towards a generic and flexible citation classifier based on a faceted classification scheme. In Procedings of COLING’12(pp. 1343–58).

Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social studies of science, 5(1), 86-92.

Dong, C., and Schäfer, U. 2011. Ensemble-style self-training on citation classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 623–31. Chiang Mai, Thailand: Asian Federation of Natural Language Processing.

Meyers, A. 2013. Contrasting and corroborating citations in journal articles. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, Hissar, Bulgaria, pp. 460–6.

Iorio, A., Di, Nuzzolese, A. G., and Peroni, S. 2013. Towards the automatic identification of the nature of citations. In SePublica, Montpellier, France, pp. 63–74.

Li, X., He, Y., Meyers, A., and Grishman, R. 2013. Towards fine-grained citation function classification. In Proceedings of Recent Advances in Natural Language Processing,Hissar, Bulgaria, pp. 402–7.

Abu-Jbara, A., Ezra, J., and Radev, D. 2013. Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL. Atlanta, GA. pp. 596–606.

Concit –Corpus (2015), Universidad de Alicante digital repository. http://hdl.handle.net/10045/47416.

Radicchi, F. 2012. In science “there is no bad publicity”: Papers criticized in comments have high scientific impact. Nature Scientific Reports 2: 815.

Published
2016-05-20
How to Cite
[1]
M. Hernández-Álvarez, J. Gómez Soriano, and P. Martínez-Barco, “Annotated Corpus for Citation Context Analysis”, LAJC, vol. 3, no. 1, pp. 35 - 42, May 2016.
Section
Research Articles for the Regular Issue