PubMA Exercise.1
Search GEO to find public datasets related to one's project
[ Main_Page | Hands-on Analysis of public microarray datasets | PubMA_Exercise.2 ]
Contents
Introduction
<<The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the studies and gene expression patterns stored in GEO. For more information about various aspects of GEO, please see our documentation listings and publications>> [1].
If you are new to GEO have a look to their handout first ![2]
Find GEO datasets relevant for your Biological question
You may use the NCBI search page in a very basic way by entering your gene of interest to look for related knock-out experiments or by searching for a compound or disease name that is relevant to your research. Note that this will sometimes find too many datasets or miss the true one you dream of.
Other, better ways of finding if GEO data does exist
- The current NCBI search page (and its advanced counterpart) also allows restricting your queries in smart ways and reach the goal of finding the best suited data in the repository. A related How-To page be found at Find_GEO_datasets. Please read this page and discover the advantages of adding filters to your queries.
- For a good resource about how to build top-notch queries in the GEO advanced search page look at the NCBI tutorial [3] with examples of good syntax to recycle and copy [4]. As example, search experiments in rat with more than 100 samples with '(100:500[Number of Samples]) AND rat[Organism]'.
Get information about GSE6943 used in this session
We will do a simple NCBI-GEO search here and look for one particular experiment defined by its GEO accession number: GSE6943. This data set is used by CLC in their tutorial. This data set was published by 'van Lunteren E, Spiegler S, Moyer M' [5]. One control sample was omitted by the paper authors due to low quality[6]. Full details about this dataset can be found in the GEO record of this data set.
The red link allows the user to download the CEL files of this experiment containing the raw microarray data
The green links allow the user to download the SOFT or Series file of this experiment containing the normalized data. Some software tools like the CLC Main Workbench cannot perform the probe-level normalization and expect the data in this format. Normally, the data in these files should be normalized but this is not always the case. It is left to the goodwill of the submitter to make sure that these files contain normalized data.
The purple link Analyze with GEO2R opens a new window with the GEO2R submission form further detailed in our GEO2R tutorial.
The grey link shows the grouping of the samples: which samples belong to the group consisting of Diaphragm samples and which samples consist of Heart tissue ?
download exercise files
Download exercise files here.
References:
- ↑ http://www.ncbi.nlm.nih.gov/geo/info/faq.html#What
- ↑ http://www.ncbi.nlm.nih.gov/geo/info/GEOHandoutFinal.pdf
- ↑ http://www.ncbi.nlm.nih.gov/geo/info/qqtutorial.html
- ↑ http://www.ncbi.nlm.nih.gov/geo/info/qqtutorial.html#fields
- ↑
Erik van Lunteren, Sarah Spiegler, Michelle Moyer
Contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism.
Respir Physiol Neurobiol: 2008, 161(1);41-53
[PubMed:18207466] ##WORLDCAT## [DOI] (P p) - ↑ http://www.bioconductor.org/packages/release/data/experiment/html/parathyroidSE.html
[ Main_Page | Hands-on Analysis of public microarray datasets | PubMA_Exercise.2 ]