Ondřejov Dataset - Items - RISIS2 Open Data VRE Catalogue

Item
Groups

Ondřejov Dataset

Ondřejov dataset contains 12936 labelled stellar spectra from Ondřejov CCD700 archive. The spectra were observed with Ondřejov Perek 2m Telescope. Code used for generation of this dataset is in podondra/ondrejov-dataset GitHub repository. The dataset was created to support the discovery of emission-line spectra in the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) survey. The main idea was to use Ondřejov dataset to train a machine learning algorithm and (in combination with domain adaption) find interesting objects in the large spectral archive. The dataset is released as a CSV file containing the following columns for each spectrum: id: a unique identifier (FITS file name), label: assigned class, object: title of observation, ra: right ascension, dec: declination, expval: exposure value in photon counts [Mcounts], gratang: diffraction grating angle, detector: name of the detector, chipid: name of CCD chip, specfilt: spectral filter, date-obs: UTC date start of the observation, dichmir: dichroic mirror number, fluxes: 140 columns of fluxes sampled uniformly between 6519 and 6732 Ångströms. Spectra are divided into 3 classes according to the profile of the H-alpha spectral line: absorption: 6102 spectra (47.17%), emission: 5301 spectra (40.98%), double-peak: 1533 spectra (11.85%), where double-peak is a special type of emission with typical disk geometry common in Be stars. Spectra from Ondřejov CCD700 archive are in air wavelengths but LAMOST spectra use vacuum wavelengths. Therefore, conversion of Ondřejov spectra was made according to formulas provided on Vienna Atomic Line Database Wiki. LAMOST spectrograph spectral resolving power is between 500 and 1800 which is much smaller than spectral resolving power 13000 in H-alpha of Ondřejov spectrograph. To overcome this difference spectra from the dataset were blurred with Gaussian filter with a standard deviation of value 7. Machine learning algorithms require their inputs to be a set of features. In order to have the same features for all spectra, they need to be resampled to get the measurement in the same wavelength across all spectra. Then it is easy to create a design matrix where each row is a spectrum and columns contain fluxes in specified wavelengths between 6519 and 6732 Ångströms.

Tags

Data and Resources

To access the resources you must log in

This item has no data

Item URL

http://data.d4science.org/ctlg/RISIS2OpenData/dedup_wf_001--0b8b001e42a9e8c89d5128e63211dc5a

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field	Value
PID	https://www.doi.org/10.5281/zenodo.2640971
PID	https://www.doi.org/10.5281/zenodo.2640970
URL	https://zenodo.org/record/2640971
URL	http://dx.doi.org/10.5281/zenodo.2640970
URL	http://dx.doi.org/10.5281/zenodo.2640971
URL	https://figshare.com/articles/Ond_ejov_Dataset/8787305

Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field	Value
Access Right	Open Access

Attribution

Description: Authorships and contributors

Field	Value
Author	Podsztavek, Ondřej, 0000-0002-9187-6619
Author	Škoda, Petr, 0000-0002-7434-9518

Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field	Value
Collected From	Zenodo; figshare; Datacite
Hosted By	Zenodo; figshare
Publication Date	2019-04-15
Publisher	Zenodo

Additional Info

Field	Value
Language	English
Resource Type	Dataset
system:type	dataset

Management Info

Field	Value
Source	https://science-innovation-policy.openaire.eu/search/dataset?datasetId=dedup_wf_001::0b8b001e42a9e8c89d5128e63211dc5a
Author	jsonws_user
Last Updated	11 January 2021, 18:31 (CET)
Created	11 January 2021, 18:31 (CET)