PubMA Exercise.4
RobiNA analysis
[ Main_Page | Hands-on Analysis of public microarray datasets | PubMA_Exercise.3 | PubMA_Exercise.4 |
| PubMA_Exercise.5 ]
Contents
Introduction
" Several commercial programs such as GeneSpring featuring a rich and simple user interface to statistically analyze high-throughput "omics" data are available. However, these are usually very expensive and might even require an annual subscription. On the other hand, there are free, open source tools such as BioConductor which offer great statistical options and support for free, but this power often comes at the price of usability, since these tools often feature either only a command line interface or solely very basic user interaction.
Finally, there are tools such as RMA Express which offer a rich user interface but only a very limited set of options.
RobiNA tries to bridge this gap by providing a flexible user friendly graphical interface to unleash the power of R/BioConductor for the individual biologist. RobiNA comes as a convenient all-in-one installation package, that automatically installs the application itself plus all required external tools (i.e. the R and BioConductor frameworks and bowtie). "[1]
Although RobiNA can handle several data type including two color microarray and even RNASeq data, we here provide a simple tutorial to perform QC and differential analysis of Affymetrix microarray data comparable to what can be achieved using the CLC Main workbench
Please see the RobiNA quick guide and Robin and RobiNA user's manual which contain step-by-step walk through and detailed information. RobinA can be downloaded from http://mapman.gabipd.org/web/guest/robin
Required before starting RobiNA
RobiNA working great ... or NOT
RobiNA should work under Windows, Mac, and Unix as it uses Java which is a universal language. However, we have found that some computers and operating systems are refractory to RobiNA and even with 6GB RAM may have issues running RobiNA with as few as 10 CEL files. RobiNA requires a recent version of Java JDK (you can obtain JDK from: [1]). The RobiNA developers do not actively support the software right now and you will need to try things by yourself if you have issues
The CDF annotation file
RobiNA needs a CDF file to work. CDF files are complex text tables allowing the identification of the probes and genes associated with the chip spots, such files are sometimes difficult to find. The RobinA software was developed by a plant groups and has plant as main focus and does not include non-plant annotations. For those who want to analyze other living organisms, the microarray annotation file corresponding to the used chip (CDF) should be obtained before starting RobiNA. A place where to find Affymetrix CDF files should be the company website but it is often difficult to locate the CDF among the long lists of available Affymetrix resources (http://www.affymetrix.com/estore/ | free registration required)[2].
The easiest way to obtain the correct Affymetrix CDF file is probably by using the Affymetrix Expression Console.
the Affymetrix Expression Console software is only available for Windows and downloading/using it requires a free user registration
After installing and starting the Affymetrix Expression Console program, Use the File menu to access the 'library' download manager
Log into the Affy support area using your credentials
Search and check the chip corresponding to your data, and let the manager download the associated files
In the example above we highlight the rat array that was already downloaded and was used to generate the demo data in the CLC Bio Main Workbench (measurements of gene expression in tissues from cardiac left ventricle and diaphragm muscle of rats; Lunteren et al., 2008).
The Affymetrix Expression Console will download the CDF file to the folder that you have specified in the library path (typically, the folder where your CEL files are stored). The file shown here as example is for Arabidopsis
You can change this folder via Edit in the top menu. Click Set library path.
Step by Step RobiNA analysis workflow
You can use RobiNA for...
- quality assessment of your data
- normalization of your microarray data
- detection of differentially expressed genes
- preparation of the data for an import into MapMan and/or excel
- generation of informative plots on your experiment
Pre-requisites are the CEL files containing each hybridization data (obtained from the GEO page as a '.tar' archive and decompressed to individual CEL files) and a CDF file listing all probes present on the chip used in the experiment. Today's CDF file can be obtained from the Affymetrix site [2] after registering (free) and searching for the library file corresponding to the platform reported in the GEO pages (in our case: Rat Expression Set 230, aka RAE230A).
The decompressed archive contains the required CDF file under 'CD_RAE230_rev04/Full/RAE230A/LibFiles/RAE230A.CDF' that can be copied in the RobiNA project area.
start RobiNA and create a new project for results
Importing CDF & CEL files
While preparing this training, we discovered that GEO had a damaged file (GSM160097) for one of the sample, we will therefore do this training with only 5-replicate in the Diaphragm group while the CLC analysis was done with the full data
You are now ready to import the CEL files and the matching CDF annotation database.
Performing QC on each CEL file
When QC finishes, a number of single plots can be reviewed by clicking on each section. These figures report issues or demonstrate good quality of the data and are also saved to disk for later inclusion in your reports.
Overview of all QC-Plots | |
---|---|
Evaluating QC results
One plot of each kind is reproduced below and shows that RobiNA generates classical QC plots showing how 'good' the data is and how well it divides according to the defined groups.
Details for each QC-Plot type | |
---|---|
# one for each sample | |
<- one for each pairwise sample comparison (n^2) | |
# distribution of the variance across principal components and PCA plot |
Define design
In this part, we define contrasts and start the differential analysis. Please note that complex design are possible here by defining 'metagroups' next to the sample groups (eg. mouse background, stimulant, ...). This is not the aim of this very simplistic design comparing two tissues. Please refer to the software documentation for more detailed examples.
Computing differential expression
RobiNA was developed for plants and can use annotation files from the website to add gene descriptions to the differential expression table. For non-plant data, users will need to add these annotations using external tools
We choose 'Skip' as we do not have annotation files for rat. The differential expression gets computed. When done, we select 'Exit' from the front window.
Reviewing DE results
RobiNA saves a number of files to the disk in its folder structure. Text tables can be found in the 'detailed results' folder and are partly reproduced below. A short summary lists the parameters applied to perform the analysis while two tables report the full results and the top-100 results after sorting lines in decreasing adj.P.Val order. Three PNG pictures (below) report the Up-regulated gene count, DE-gene count and Down-regulated gene-count respectively.
Final plots summarize key facets of the analysis as done in classical MA analysis. The left plot shows in red probesets that are differentially expressed with the selected statistics; the right plot confirms that the samples segregate as defined in the grouping. The bottom simple venn diagrams report counts of DE probesets (Total, Up, Down).
Plots | |
---|---|
In addition to the plots, several text tables are saved in 'detailed_results' and sampled below.
Analysis summary
# Robin affymetrix data analysis summary
# RobiNA-results
# 8/18/2014; 16:35:28
###################################
# Input files:
# Normalization settings for quality control
normalization method: rma
P-value correction method: BH
analysis strategy: Limma
# Normalization settings for main analysis
normalization method: rma
P-value correction method: BH
Multiple testing strategy: nestedF
P-value cut-off value for significant differential expression: 0.05
Genes that showed a log2-fold change smaller than two ignored: yes
# The analysis produced the following warnings
none
Head of 'full_table_Heart - Diaphragm.txt'
1371247_at 8.8690486752098 9.2066061992338 138.180384786338 7.72673860255015e-23 1.23032858768406e-18 40.3028038935592
1370412_at 8.00700839973324 8.66636177255196 112.671243745562 1.24362501567467e-21 9.90112056229389e-18 38.5147635203019
1387787_at 8.51384348690296 10.0136513602682 107.791218026326 2.27212516655303e-21 1.2059683009008e-17 38.0893208171574
1372195_at 9.44110458586408 9.1061642897453 100.528072562373 5.87151716284703e-21 2.01180346646277e-17 37.3935367235781
1370971_at 8.88483556476391 8.44782914943697 99.9889507276205 6.31728778013807e-21 2.01180346646277e-17 37.3386444895643
1367896_at 8.17319442370653 8.32878048386055 97.7761545656647 8.56617301567765e-21 2.27331954881059e-17 37.1083132290366
1374248_at 7.92028193041167 9.06490481778715 95.9358621573261 1.10936083087999e-20 2.52347893001459e-17 36.9103955684498
1368093_at -8.61265104186292 8.23846334693981 -81.9875192287634 9.40341381711377e-20 1.87163197762378e-16 35.195893580546
1388139_at 8.60278366808529 8.70065810953839 80.0047250421511 1.31181945592154e-19 2.32090013295985e-16 34.9170369337463
Head of 'mean_rma_normalized_expression_values.txt'
1367452_at 9.5293566110383 9.5889443412077
1367453_at 10.1758428259477 10.1726921462891
1367454_at 8.87231561088045 8.86593395245015
1367455_at 10.4751392501094 10.3325900097593
1367456_at 10.7698324122072 10.286304116399
1367457_at 8.75165515455833 8.73159768841635
1367458_at 7.54000953569876 7.71781966096578
1367459_at 11.0420305904582 11.1519497142209
1367460_at 9.90893064678104 9.5235636384544
Head of 'raw_rma_normalized_expression_values.txt'
1367452_at 9.64295883656058 9.65178912787566 9.32863584112902 9.48475646698994 9.4716451160078 9.59635427766681 9.65872553515732 9.63503499785589 9.50781431044761 9.58763738444463 9.55550947813306
1367453_at 10.1553644909771 10.2499020063438 10.1861859768652 10.0965629262007 10.159434523851 10.2076070314483 10.2269807954293 10.195253118062 10.1861859768652 10.2883928784824 9.96664796260638
1367454_at 8.73905780465407 9.08252980064849 9.07161018095027 8.68762900735943 8.81499836330383 8.83806850836661 8.88654295440762 9.05310137910709 8.68462564392101 8.8673312764484 8.83806850836661
1367455_at 10.4763615418993 10.4471239467834 10.5710504733759 10.3264804870339 10.4745349426514 10.5552841089126 10.5959387062089 10.5139881779964 10.2210244191821 10.3038610413289 10.0281377040801
1367456_at 10.9984593902943 10.8441477983143 10.673570257498 10.6700249981427 10.6750331606405 10.7577588683534 10.1986166073672 10.3974448991471 10.2291761637936 10.3422547513151 10.2640281603721
1367457_at 8.78904682538038 8.72229122394877 8.67050724938203 8.9448960477874 8.72229122394877 8.66089835690261 8.76340465790531 8.76513510798621 8.68486622829271 8.72229122394877 8.72229122394877
1367458_at 7.53098410601309 7.35718220143744 7.71709223686958 7.65558449687549 7.45098048336888 7.52823368962808 7.7461437721208 7.40572408979464 7.74545079274697 7.7485040416615 7.94327560850499
1367459_at 11.1105760949228 11.2159762048115 10.9101244239086 11.149432063699 10.8953374468102 10.970737308597 11.2355512291063 11.2441035772632 11.0435353491247 11.219515240616 11.0170431749943
1367460_at 9.96660388228079 9.84511446842093 9.94477628710098 9.91078389672863 9.84447947912025 9.94182586703467 9.87080073408934 9.70116063385167 9.49343722267783 9.27274366865815 9.279675932995
Adding annotations to the RobiNA data
Because RobiNA does not support non-plant organisms, it saves the data in a quite anonymous way with only probe IDs. The code below is borrowed from the GEO2R script and adds gene symbols and additional annotations to the RobiNA table.
R-code to annotate the RobiNA data
# Add annotations to the RobiNA full table # the code below is borrowed to the GEO2R code and adapted library(GEOquery) # make sure that the surrent directory is set to folder enclosing the 'RobiNA-results' folder base <- getwd() # load RobiNA full table in a data-frame robina.folder <- paste(base, "RobiNA-results", "detailed_results", sep="/") robina.data <- paste(robina.folder, "full_table_Heart - Diaphragm.txt", sep="/") robina.full <- read.delim(robina.data, as.is = TRUE) # load NCBI platform annotation gpl <- "GPL341" platf <- getGEO(gpl, AnnotGPL=TRUE, destdir = base) ncbifd <- data.frame(attr(dataTable(platf), "table")) # replace original platform annotation data <- merge(robina.full, ncbifd, by="ID") data <- data[order(data$P.Value), ] # restore correct order # preview first 10 columns head(data)[,c(1:10)]
ID logFC AveExpr t P.Value adj.P.Val B
3796 1371247_at 8.869049 9.206606 138.18038 7.726739e-23 1.230329e-18 40.30280
2961 1370412_at 8.007008 8.666362 112.67124 1.243625e-21 9.901121e-18 38.51476
11906 1387787_at 8.513843 10.013651 107.79122 2.272125e-21 1.205968e-17 38.08932
4744 1372195_at 9.441105 9.106164 100.52807 5.871517e-21 2.011803e-17 37.39354
3520 1370971_at 8.884836 8.447829 99.98895 6.317288e-21 2.011803e-17 37.33864
445 1367896_at 8.173194 8.328780 97.77615 8.566173e-21 2.273320e-17 37.10831
Gene.title Gene.symbol
3796 troponin T type 3 (skeletal, fast) Tnnt3
2961 troponin T type 1 (skeletal, slow) Tnnt1
11906 myosin light chain, phosphorylatable, fast skeletal muscle Mylpf
4744 troponin C type 2 (fast) Tnnc2
3520 myosin, heavy chain 2, skeletal muscle, adult///myosin, heavy chain 1, skeletal muscle, adult Myh2///Myh1
445 carbonic anhydrase 3 Car3
Gene.ID
3796 24838
2961 171409
11906 24584
4744 296369
3520 691644///287408
445 54232
# save to new file outfile <- paste(robina.folder, "GSE6943_DE-Robina.txt", sep="/") write.table(data, file=outfile, row.names=F, sep="\t", quote=FALSE) colnames(data)
The resulting file has gained 20 columns in addition to the first 7 created by RobiNA
[5] "P.Value" "adj.P.Val" "B" "Gene.title"
[9] "Gene.symbol" "Gene.ID" "UniGene.title" "UniGene.symbol"
[13] "UniGene.ID" "Nucleotide.Title" "GI" "GenBank.Accession"
[17] "Platform_CLONEID" "Platform_ORF" "Platform_SPOTID" "Chromosome.location"
[21] "Chromosome.annotation" "GO.Function" "GO.Process" "GO.Component"
[25] "GO.Function.ID" "GO.Process.ID" "GO.Component.ID"
Conclusion
RobiNA is a wrapper for R-code, developed and standardized by the authors to run reproducibly and do as they are expected. Although simple in layout, RobiNA appears as a quite performant alternative top more top-notch analysis methods reported in the literature. A computer with sufficient resources is wished to perform QC steps in a reasonable time but current strong laptops and desktops should do the job.
For those who cannot afford the more expensive CLC-Main workbench and do not wish to learn [R] and Bioconductor, RobiNA seems a good choice when in need of MA analysis (and it can also do standard RNASeq analysis - as will be described in a separate hands-on)
The complete file is accessible here (use right-click download linked file as or navigate from the page bottom-link)
download exercise files
Download exercise files here.
References:
- ↑
Marc Lohse, Adriano Nunes-Nesi, Peter Krüger, Axel Nagel, Jan Hannemann, Federico M Giorgi, Liam Childs, Sonia Osorio, Dirk Walther, Joachim Selbig, Nese Sreenivasulu, Mark Stitt, Alisdair R Fernie, Björn Usadel
Robin: an intuitive wizard application for R-based expression microarray quality assessment and analysis.
Plant Physiol: 2010, 153(2);642-51
[PubMed:20388663] ##WORLDCAT## [DOI] (I p)http://mapman.gabipd.org/web/guest/robin
- ↑ 2.0 2.1 http://www.affymetrix.com/estore/
[ Main_Page | Hands-on Analysis of public microarray datasets | PubMA_Exercise.3 | PubMA_Exercise.4 |
| PubMA_Exercise.5 ]