GV Exercise.5
Analyze human cancer data and find tumor markers
[ Main_Page | Genevestigator_training | Analyze_public_microarray_data_using_Genevestigator | GV Exercise.4 |
| GV Exercise.6 ]
last edit: October 31, 2014
When setting-up enrichment analyses, remember we are using GV data originated from the Affy 47k chip aka GeneChip® Human Genome U133 Plus 2.0 Array
Contents
- 1 Find liver neoplasm specific markers as compared to normal liver
- 2 Find markers specific for HCC and absent in other tumors
- 3 Perform functional analysis using free web-resources
- 4 [VIB license required] Upload gene lists to IPA and run a core analysis
- 5 Download the exercise files
Find liver neoplasm specific markers as compared to normal liver
Find malignant liver cell markers absent in normal liver tissue
We want to get markers differentially expressed between liver neoplasm cells and normal liver cells. At this stage, we do not care about expression of these genes outside of the liver context.
create a sample list with all human 47k samples
try it first
- Using human_47k, start the Gene Search Neoplasm tool
- search for the top 100 liver neoplasm markers
- not considering metastatic entries - (this 'may' correspond to primary tumor markers)
- take normal liver as background
- do not include 'cell lines' - (note that this is a NEW feature since the last training)
check the top bar and choose meaningful options, first 'collapse all' to shorten the list then use the search box to find what you need
try it first
try it first
run the tool and inspect the results, see if these markers appear in other conditions, and refine the selection to exclude cholangiocarcinoma markers search for the top 100 markers now specific to hepathocellular carcinoma
try it first
- Save the last probe list to a text file
- copy the probes to your clipboard
- open a text editor, paste and save the list; name it hepathocellular_carcinoma-vs-liver.txt
- also create the corresponding list in GV with the New button and name it hepathocellular_carcinoma-vs-liver
Find normal tissues that express HCC-specific markers
We just identified HCC markers and wish to know if a majority of these are found in some other 'normal' tissue(s). In order to do so, we create a new sample list with all human 47k samples that are not from tumor and not from cell lines (name it: human_47k-noTumor-noCellLines). We then build a heatmap with all markers from the list and all tissues from the new sample group.
try it first
Perform clustering using
- the hepathocellular_carcinoma-vs-liver list
- the human_non-tumor sample set
- find in which 'normal' tissues (Anatomy) these markers (or part of) are differentially expressed
- search 'hepatocytes' in the large heatmap to control your initial filtering
try it first
Find markers specific for HCC and absent in other tumors
From here you can proceed in two ways; create a sample list with only tumor experiments OR use the full human sample list and restrict your search to neoplasms. We take the first method but you are free to try the second.
- create a new sample list with all human tumor samples
try it first
- select the human neoplasm samples and search for hepathocellular_carcinoma specific markers
try it first
Starting from all samples this would show a longer list but with the same annotations
- run the tool and inspect the results with neoplasms
try it first
- create a new gene list (hepathocellular_carcinoma-vs-all-neoplasms) with the top 100 markers
try it first
- save the probe list to a text file and name it hepathocellular_carcinoma-vs-all-neoplasms.txt
Perform functional analysis using free web-resources
The canonical DAVID, or the modern Enrich or WebGestalt, as well as other web enrichment tools allow complex yet easy enrichment computation starting from a list of IDs. The enrichment step is vital because human cannot efficiently comprehend gene lists and prefer biological functions to understand biology.
We Illustrate here the first step of such analysis using one of our lists and invite the users to further explore these nice tools
Using DAVID to perform functional enrichment
The oldest of all such tools but still appreciated by many biologists for its ease of use.
Access DAVID at http://david.abcc.ncifcrf.gov/home.jsp
- upload the hepathocellular_carcinoma-vs-all-neoplasms.txt list to DAVID and set it as a gene-list
try it first
- run the enrichment using standard parameters (or tune them!)
try it first
- review clusters
try it first
- review charts
try it first
- review tables
try it first
Each output type has its own specificities and goodies, this is NOT a DAVID training, a great documentation is presented online
DAVID and BioMart conversion from probe IDs to gene symbols
The DAVID built-in ID convertor
DAVID does not only performs functional enrichment from ma,ny kind of ID lists but it can also be used to simply convert IDs from one type to another ( http://david.abcc.ncifcrf.gov/conversion.jsp)
try it first
BioMart conversion from probe IDs to gene symbols
Besides its huge database export capabilities, BioMart was recently added a fantastic web portal for ID conversion (http://central.biomart.org/converter/#!/ID_converter/gene_ensembl_config_2)
Using this portal does not require any knowledge about EnsEMBL ans is illustrated below to convert our list of probe IDs to a list a Gene Symbols (HUGO) that is required in the next exercise.
try it first
Performing enrichment with the BioMart enrichment tool
This recent tool aggregates several sources for enrichment.
Access the Biomart Enrichment tools at http://central.biomart.org/enrichment/#/gui/Enrichment/. The BioMart tool is relatively simple in design and performs only a limited number of annotations.
We can try the tool with the first exported list hepathocellular_carcinoma-vs-liver.txt and not forgetting to specify the matching background (Affymetrix human u133_a)
Setup, Gene Ontology, and MIM results | |
---|---|
Performing enrichment with Enrich
This recent tool aggregates several sources for enrichment and returns very dynamic content.
Access Enrich at http://amp.pharm.mssm.edu/Enrichr/
Enrich, unlike DAVID does not support probe IDs, we need first to convert our probes to gene symbols using BIOMART or DAVID and to de-duplicate the obtained list
Setup | |
---|---|
Result categories | try it to get the results | |
---|---|---|
Performing enrichment with WebGestalt
Another recent tool that also aggregates several sources for enrichment and returns very dynamic content. Please first register (free: http://bioinfo.vanderbilt.edu/webgestalt/login.php) and start using this great and intuitive tool.
Setup, Gene Ontology, and MIM results | |
---|---|
Many more such tools exist as well as great BioConductor packages that will produce excellent results after some time and learning
[VIB license required] Upload gene lists to IPA and run a core analysis
Due to the VIB concurrent IPA license limit, we should not all try this at the same time, please review the pictures and tables generated for you and included here, especially if you do not have experience with working in IPA
click here to go to the IPA login page
Use the 100 probes saved as hepathocellular_carcinoma-vs-all-neoplasms.txt [1] or hepathocellular_carcinoma-vs-liver.txt [2], copy paste them or upload them to IPA
We fist report here results from the Venn intersection of both hepathocellular_carcinoma-vs-all-neoplasms.txt and hepathocellular_carcinoma-vs-liver.txt lists with the IPA knowledge base HCC biomarker list. Only few biomarkers are specifically expressed in HCC! this may seem strange at first sight but is in fact very common since tumor-specific antigens do not really exist
IPA venn diagram from three GV lists
We now reproduce some of the other types of results one can obtain in IPA. We tried to demonstrate that IPA can find the biology hidden behind the different lists and to inform the user about what may be happening in the system (in tis case in HCC tumors).
Networks
- IPA networks are pre-built entities showing known relations between proteins that are common to known functions or processes. Networks are often more informative than 'canonical pathways' as they group proteins that play together in a shared context rather than showing knowledge assembled from encyclopedic sources.
top IPA Networks hcc-vs-neoplassms hcc-vs-liver
Best Network from each core analysis hcc-vs-neoplasms hcc-vs-liver
Biological and Tox functions
Biological- and Tox-functions enriched in these two list are very relevant given the origin of the data.
Best tox-lists from each core analysis hcc-vs-neoplasms hcc-vs-liver
In the comparison with liver, one top Tox annotation is 'cholangiocarcinoma' which was not specified in GV but is apparent here. This could be due to some GV samples being mislabeled and in fact belonging to this type or simply to a large overlap in features between the two tumor types.
IPA demonstrated its superiority of on free tools and was able to identify the very nature of the data based on a simple list of <100 markers selected by GV
Download the exercise files
Try it by yourself before expanding on the right!
- download hepathocellular_carcinoma-vs-liver.txt and open it with your default worksheet application
- download hepathocellular_carcinoma-vs-all-neoplasms.txt and open it with your default worksheet application
IPA-results for carcinoma-vs-liver & carcinoma-vs-all-neoplasms
- download the IPA_core-hcc_vs_liver.pdf report file link
- download the IPA_core-hcc_vs_neoplasms.pdf report link
genevestogator workspace
- download ex5.gv4 and open it from within genevestigator File Load Workspace file link
References:
[ Main_Page | Genevestigator_training | Analyze_public_microarray_data_using_Genevestigator | GV Exercise.4 |
| GV Exercise.6 ]