Twitter pre-trained word vectors

Clean up of glove.twitter.27B.zip <ODC Public Domain Dedication and Licence (PDDL) 1.0>. "2B tweets, 27B tokens, 1.2M vocab, uncased" Changes from original Headers added to allow loading by gensim. [Added via scripts.glove2word2vec] Recompressed as individual gzip files [Instead of a combined zip]. These changes make the files easier to work with and increase compatibility. Headers Example of added header line 1193513 200 Header gives number of tokens and dimensions. Statistics Entries: 1,193,513 Token length (characters). Min:1, Max:140, Avg:6.73 Number of words per token. Min:0, Max:17, Avg:1.00669200921984 Tokens with more than one word: 4874 (0.41%) Twitter data collection date: Unknown. History: ?? Aug 2014 — GloVe v.1.0 released 16 Aug 2014 — Files first appear as headerless .txt.gz files, some files have mislabeled linked (via wayback machine) ?? Oct 2015 — GloVe v.1.2 released ?? ??? ???? — Files replaced with a .zip file 03 June 2019 — (These files) Repackaged like original as .txt.gz, plus added headers for increased compatibility Example of 17-word token:  سكس_طيز_قحبه_عنيف_اغتصاب_سكسيه_فحل_زب_نيك_بنات_مكوه_شهوه_لحس_عنف_تومبوي_ليدي_سبورت 200d file: Normalized: no Values:  Min:-6.7986, Max:4.609, Avg:0.009065093 Zero values (exactly zero): none Zero values (approx zero) per entry: Min:0 (0.00%), Max:2 of 200 (1.0%), Avg:0.00375865197949247 (0.00%)

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.5281/zenodo.3237457
PID https://www.doi.org/10.5281/zenodo.3237458
URL https://figshare.com/articles/Twitter_pre-trained_word_vectors/11640300
URL http://dx.doi.org/10.5281/zenodo.3237458
URL https://zenodo.org/record/3237458
URL http://dx.doi.org/10.5281/zenodo.3237457
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author University, Stanford
Contributor Halasz, Peter
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Zenodo; Datacite; figshare
Hosted By Zenodo; figshare
Publication Date 2019-06-03
Publisher Zenodo
Additional Info
Field Value
Language UNKNOWN
Resource Type Dataset
system:type dataset
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/dataset?datasetId=dedup_wf_001::8a406f96222f107fd60c0ecd68b9ce3c
Author jsonws_user
Version None
Last Updated 16 December 2020, 00:17 (CET)
Created 16 December 2020, 00:17 (CET)