Items - RISIS2 Open Data VRE Catalogue

dataset

CodiEsp-abstracs: Abstracts from Lilacs and Ibecs with ICD10 codes

JSON file with abstracts from Lilacs and Ibecs with ICD10 codes (ICD10-CM and ICD10-PCS) associated to them (CIE10 in Spanish). These databases have MeSH terms...

dataset

CodiEsp codes: list of valid CIE10 codes for the CodiEsp task

This compressed folder contains two files: + codiesp-D_codes.tsv: list of CIE10-Diagnósticos terms (2018 version) with their description in Spanish and in English....

dataset

dedup_wf_001--ef0fc5ab66a6d95fe2db127c359d6cda

This release contains data sets for experiments with document-level machine translation. The data sets have been used in previous studies and provided here for replicability and...

dataset

dedup_wf_001--b719e38f8d8cb6229c959d496ae1b5d1

In order to analyze the impact on model quality while reducing the number of dimensions, strictly controlled trainings of word embedding are performed on Wikipedia corpora of...

dataset

Transfer fine-tuned BERT models by paraphrases

Transfer fine-tuned BERT models by phrasal paraphrases. transferFT_bert-base-uncased.pkl bases on the bert-base-uncased model transferFT_bert-large-uncased.pkl bases...

dataset

Gold Standard Corpus, Ontologies, And Entity-Quality Ontology Annotations For...

This data set includes a gold-standard corpus of evolutionary phenotype descriptions (in the form of character state descriptions pulled from a variety of phylogenetic...

dataset

Security Bug Conversations

This dataset will be released as part of the following publication. Benjamin S. Meyers, Nuthan Munaiah, Andrew Meneely, and Emily Prud'hommeaux. Pragmatic...

dataset

MeSpEn_Parallel-Corpora

MeSpEn consists of a resource of heterogeneous health related documents in Spanish and English useful to build parallel corpora for training and evaluating Spanish <->...

publication

Deep Learning Approaches to Text Production

Text production is a key component of many NLP applications. In data-driven approaches, it is used for instance, to generate dialogue turns from dialogue moves, to verbalise the...

publication

Adapting text mining tools to noisy text

Invited talk given at Text Mining for Science Studies Workshop, Berlin

publication

Curation Technologies for a Cultural Heritage Archive: "Project Tongilbu"

We are developing a platform for generic curation technologies, using various NLP procedures, that is specifically targeted at, but not limited to, document collections that are...

publication

Data Discovery on Siren

Tutorial given on May 4th 2020 at the Knowledge Graph Conference

publication

dedup_wf_001--0d7b92f2f6bde9215ae00f45018692e2

POSTDATA focused on poetry analysis, the publication of poetic resources and their exploration, applying Digital Humanities methods. This is a trans-domain project, as it...

publication

The Knowledge Graph that Listens

Enterprises that are building Knowledge Graphs are rapidly getting a grip on unstructured data with current advances in Natural Language Processing (NLP) techniques. But there...

publication

Adapted TextRank for Term Extraction

Automatic Term Extraction is a fundamental Natural Language Processing task often used in many knowledge acquisition processes. It is a challenging NLP task due to its high...

publication

Semantically Aware Text Categorisation for Metadata Annotation

In this paper we illustrate a system aimed at solving a longstanding and challenging problem: acquiring a classifier to automatically annotate bibliographic records by starting...

publication

Machine Learning for ontologies: the KNOWMAK experience

Webinar "Can machine learning technologies be useful to create or complete ontologies in agriculture?" as part of the CGIAR Ontologies Communities of Practice Platform for Big...

software

Extracting Terms Concerning Ai Based On Web Of Science Data

This is the accompanied code to extract terms connected to AI through titles and abstracts from Web of Science data.

dataset

Magi Practical Web Article Corpus

This corpus contains 10 million Chinese articles consisting of more than 10 billion words, which has been extracted from the Internet, and refined such that only the main body...

19 items found