Annotated Corpus for Citation Context Analysis
Palabras clave:
Corpus, annotation, methodology, machine-learning, function, polarity, aspects, schema, keywords, labels, classificationResumen
In this paper, we present a corpus composed of 85 scientific articles annotated with 2092 citations analyzed using context analysis. We obtained a high Inter-annotator agreement; therefore, we assure reliability and reproducibility of the annotation performed by three coders in an independent way. We applied this corpus to classify citations according to qualitative criteria using a medium granularity categorization scheme enriched by annotated keywords and labels to obtain high granularity. The annotation schema handle three dimensions: PURPOSE: POLARITY: ASPECTS. Citation purpose define functions classification: use, critique, comparison and background with more specific classes stablished using keywords: Based on, Supply; Useful; Contrast; Acknowledge, Corroboration, Debate; Weakness and Hedges. Citation aspects complement the citation characterization: concept, method, data, tool, task, among others. Polarity has three levels: Positive, Negative and Neutral. We developed the schema and annotated the corpus focusing in applications for citation influence assessment, but we suggest that applications as summary generation and information retrieval also could use this annotated corpus because of the organization of the scheme in clearly defined general dimensions.
Descargas
Referencias
Hernández Álvarez, M., & Gómez Soriano, J. (2015b). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering. Cambridge University Press.Available on CJO 2015 doi: 10.1017/S1351324915000388
Athar, A. (2014). Sentiment analysis of scientific citations. Technical Report, University of Cambridge. (UCAM-CL-TR-856).
Mandya, A. A. (2012). Enhancing Citation Context based Information Services through Sentence Context Identification. Doctoral dissertation, University of Otago. Retrieved from: http://hdl.handle.net/10523/2520
Ciancarini, P., Di Iorio, A., Nuzzolese, A. G., Peroni, S., & Vitali, F. (2014). Evaluating citation functions in CiTO: cognitive issues. In The Semantic Web: Trends and Challengespp. 580-94. Springer International Publishing.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006, July). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing(pp. 103-110). Association for Computational Linguistics.
Hernández Álvarez, M., & Gómez Soriano, J. (2015a). Esquema de anotación para categorización de citas en bibliografía científica. Procesamiento del Lenguaje Natural, 54, 45-52.
Hyland, K. 1998. Hedging in Scientific Research Articles, Vol. 54. Amsterdam: John Benjamins Publishing.
Artstein, R., & Poesio, M. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555-96.
Krippendorff, Klaus. 2004. Reliability in content analysis: Some common misconceptions and recommendations. Human Communication Research, 30(3):411–33.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159-74.
Jochim, C., and Schütze, H. 2012. Towards a generic and flexible citation classifier based on a faceted classification scheme. In Procedings of COLING’12(pp. 1343–58).
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social studies of science, 5(1), 86-92.
Dong, C., and Schäfer, U. 2011. Ensemble-style self-training on citation classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 623–31. Chiang Mai, Thailand: Asian Federation of Natural Language Processing.
Meyers, A. 2013. Contrasting and corroborating citations in journal articles. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, Hissar, Bulgaria, pp. 460–6.
Iorio, A., Di, Nuzzolese, A. G., and Peroni, S. 2013. Towards the automatic identification of the nature of citations. In SePublica, Montpellier, France, pp. 63–74.
Li, X., He, Y., Meyers, A., and Grishman, R. 2013. Towards fine-grained citation function classification. In Proceedings of Recent Advances in Natural Language Processing,Hissar, Bulgaria, pp. 402–7.
Abu-Jbara, A., Ezra, J., and Radev, D. 2013. Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL. Atlanta, GA. pp. 596–606.
Concit –Corpus (2015), Universidad de Alicante digital repository. http://hdl.handle.net/10045/47416.
Radicchi, F. 2012. In science “there is no bad publicity”: Papers criticized in comments have high scientific impact. Nature Scientific Reports 2: 815.
Descargas
Publicado
Número
Sección
Licencia
Aviso de derechos de autor/a
Los autores/as que publiquen en esta revista aceptan las siguientes condiciones:
- Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación, con el trabajo registrado con la Creative Commons Attribution-Non-Commercial-Share-Alike 4.0 International, que permite a terceros utilizar lo publicado siempre que mencionen la autoría del trabajo y a la primera publicación en esta revista.
- Los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
- Se permite y recomienda a los autores a compartir su trabajo en línea (por ejemplo: en repositorios institucionales o páginas web personales) antes y durante el proceso de envío del manuscrito, ya que puede conducir a intercambios productivos, a una mayor y más rápida citación del trabajo publicado.
Descargo de Responsabilidad
LAJC en ningún caso será responsable de cualquier reclamo directo, indirecto, incidental, punitivo o consecuente de infracción de derechos de autor relacionado con artículos que han sido presentados para evaluación o publicados en cualquier número de esta revista. Más Información en nuestro Aviso de Descargo de Responsabilidad.