The training dataset of CNIT containing human (GRCH38) and Arabidopsis (EnsemblPlants-v37) protein-coding transcripts and noncoding transcripts. To evaluate the performance of CNIT across species, we further built an test set for 10 animals: Mouse, Anole lizard, Chicken, Gorilla, Xenopus, Macaque, Chimpanzee, Orangutan, Zebrafish, worm and 25 plants: Aegilops tauschii, Amborella trichopoda, Arabidopsis lyrata, Beta vulgaris, Brachypodium distachyon, Brassica napus, Brassica oleracea, Brassica rapa, Chlamydomonas reinhardtii, Galdieria sulphuraria, Glycine max, Medicago truncatula, Musa acuminate, Oryza brachyantha, Oryza sativa, Physcomitrella patens, Populus trichocarpa, Selaginella moellendorffii, Setaria italic, Solanum lycopersicum, Solanum tuberosum, Sorghum bicolor, Theobroma cacao, Vitis vinifera and Zea mays. We selected animal protein-coding and noncoding transcripts from the RefSeq database. For plant, Coding or noncoding transcripts were obtained from the Refseq or EnsemblPlants (v37) databases with transcript status as “KNOWN”, respectively. Users can download all sequences in training and test set via the links below.
Animal | mRNAs | LncRNAs | Plant | mRNAs | LncRNAs | Plant | mRNAs | LncRNAs |
---|---|---|---|---|---|---|---|---|
Human (Training set) | Download | Download | Arabidopsis thaliana (Training set) | Download | Download | Musa acuminata | Download | Download |
Mouse | Download | Download | Aegilops tauschii | Download | Download | Oryza brachyantha | Download | Download |
Anole lizard | Download | Download | Amborella trichopoda | Download | Download | Oryza sativa | Download | Download |
Chicken | Download | Download | Arabidopsis lyrata | Download | Download | Physcomitrella patens | Download | Download |
Gorilla | Download | Download | Beta vulgaris | Download | Download | Populus trichocarpa | Download | Download |
Xenopus | Download | Download | Brachypodium distachyon | Download | Download | Selaginella moellendorffii | Download | Download |
Macaque | Download | Download | Brassica napus | Download | Download | Setaria italica | Download | Download |
Chimpanzee | Download | Download | Brassica oleracea | Download | Download | Solanum lycopersicum | Download | Download |
Lamprey | Download | Download | Brassica rapa | Download | Download | Solanum tuberosum | Download | Download |
Orangutan | Download | Download | Chlamydomonas reinhardtii | Download | Download | Sorghum bicolor | Download | Download |
Zebrafish | Download | Download | Galdieria sulphuraria | Download | Download | Theobroma cacao | Download | Download |
Glycine max | Download | Download | Vitis vinifera | Download | Download | |||
Medicago truncatula | Download | Download | Zea mays | Download | Download |
To evaluate the performance of CNIT across other softwares, we further use the independent testing set for human, mouse, zebrafish, fly, worm and Arabidopsis from CPC2 dataset.