Analyze GEO data with the Affymetrix software

From BITS wiki
Jump to: navigation, search

Analyzing a selected GEO dataset using the Affymetrix Expression Console (EC) and Transcriptome Analysis Console (TAC)

affymetrix.png

[ Main_Page | Hands-on Analysis of public microarray datasets ]


Introduction

Handicon.png The data used in this how-to tutorial is the same as that used for the BITS hands-on training Hands-on_Analysis_of_public_microarray_datasets

The Affymetrix online training page dedicated to MA and transcriptome analysis can be browsed here[1]; This main pages contains links to download the necessary software as well as links to other Affymetricx resources necessary to perform a full expression analysis. Also refer to the Affymetrix Transcriptome Analysis Console (TAC) Software and Expression Console Software tutorial pages [2]

Handicon.png You will need to set a free NetAffx account to download software and access data pages


tac_data_workflow.jpg

A summarized in the above picture, we will now perform the two steps required to perform a full analysis starting from a set of CEL files obtained from the GEO repository. The method can be divided into two steps as detailed below; the first step converts CEL data to a format better suited for differential expression analysis using the Expression console; the second step computes differential expression base don user-defined sample groups and using the Transcription Analysis Console. Results presented here correspond to the blue-highlights in the above workflow.

The Affymetrix Expression Console (EC)

The EC software allows step by step processing of the data by sequentially clicking each tool on the right hand side of the window

EC_workflow.png

Other 'Configuration' tools are not detailed here.

Converting CEL data to CHP format required for TAC

Using the 'Study' tools, the CEL files downloaded from GEO are loaded in the software, then normalized using a chosen method (out of RMA, MAS5 and PLIER). We use RMA as this is the standard method.

EC-load-CEL-files.png

The interface allowing defining data quality controls used by the software can be reached from the right workflow items 'report-controls'.

EC-report-controls.png

The choice of the right method to apply for normalization is not detailed here, please refer to the BITS microarray training session and material for more information about this topic (Introduction to Affymetrix Microarray Anaysis). The normalization method is selected from a pop-down menu.

EC-normalization-options.png

The process takes some time and leads to a summary page and saves new files to the disk with extension '.chp' containing the normalized data, one for each imported CEL file. The '.chp' files are ready for import in the TAC tool

EC-normalization-summary.png

As seen above, several samples are reported 'outside bounds' by the RMA workflow. It means that some control probe sets did not meet the quality requirements. We looked it up and saw that the sample prep control probe sets (targeting B. subtilis genes: dap, thr, phe and lys) were not behaving as expected. Dap RNA is added in higher concentrations than thr RNA so the signal of dap should be higher than that of thr and this was not the case for the samples that were flagged 'outside bounds'. The other control probes behaved as they should. So it might be that in some samples the reverse transcription of the high abundance transcripts was not completely efficient (because of saturation...).

Technical.png As part of the standard Affymetrix microarray processing, control molecules are added to the mRNA at different concentrations prior to producing the cDNA. Other molecules (cDNA) are added later in the sample preparation to control for hybridization on the chip. The out of bound errors reported above result from the discrepancy between the known spiked-in quantities and the readout after scanning the chip. The highest concentration of control does not produce a final value higher than a lower concentration of control which results in raising an alarm and showing the 4 samples with colored background. Full details about the identity of the faulty probes and the obtained values can be found at the bottom table part of the full report linked in the next paragraph (PDF)

Performing QC on the data and generating summarizing plots

A number of QC plots can be generated using the right tools. The full QC report can then be saved as PDF file and is available both for the RMA method, MAS5 method, and PLIER method on our server. Users are welcome to evaluate each QC plot by themselves using the data available on the server as input (see link at the bottom of this page)

The Affymetrix Transcriptome Analysis Console (TAC)

Importing EC data and defining Groups

TAC-load-chp-data.png

Each group is in turn defined by moving CHP files to the appropriate group window. This is done for 'Heart' and for 'Diaphragm' samples

TAC-diaphragm-group.png


TAC-defined-groups.png


Computing 'gene-level' Differential expression

'Run analysis' is clicked to compute differential expression between the two groups

TAC-run-DEA.png

Other expression analyses can be performed when the probe type is compatible with transcript level analysis (discerning between alternative transcripts). However, this is not demonstrated here and we only provide the example of gene-level analysis.

The summary of a standard DE analysis is shown with counts for UR and DR genes under standard filtering values (more than two-fold difference between the groups and adjusted p-value < 0.05)

TAC-filtered-data.png


TAC-RMA-summary.png

Adjusting Differential expression limits

The filtering values can be adapted by the user to restrain or increase the DE gene list and new plots generated.

Adjusting the differential expression limit

TAC-filter-FC.png

Adjusting the limit for the adjusted p-value

TAC-filter-FDR.png

Plots based on the filtered differential expression table

Additional graphs can be obtained to view the data from different angles. The scatter plot highlights potential differences between UR and DR genes between the groups. The graphs are interactive and the user can query the full data to find which probesets or genes are UR or DR using the mouse and selecting area around points.


RMA-scatter-plot.png

Volcano plots are very popular and show how confident the data is and how many genes show deviation from the steady state


RMA-volcano-plot.png

The interactive nature of the plot allows identifying outliers or significantly DE genes using the mouse.

TAC-volcano-select-heart-down.png


TAC-volcano-select-heart-up.png

The count of UR and DR genes is reported in the summary page

TAC-filtered-genes-summary.png

Additional annotations can be added using the dedicated menu

TAC-customize-annotations.png

A plot of differential expression per chromosome may highlight local regulatory biases (hot spot loci)

RMA-chrom-plot.png

Heatmaps can be generated that show genes with similar pattern of variation across samples

TAC-run-hclust.png


TAC-hclust-results.png

Exporting results

The tabular results can finally be exported to local file(s) for further use (IPA, ...)

TAC-export-table-options.png

Additional columns can be added to the table if the user needs them

TAC-show-hide-column.png


After download to 'txt' files, results can easily be converted and filtered in the Excel spreadsheet editor

RMA-top-heart-down.png


RMA-top-heart-up.png

Conclusion

The combination of the Affymetrix Expression and Transcription Analysis Consoles allows Windows-PC users without any knowledge of [R] to perform standard analysis of Affymetrix microarray data and obtain differential expression tables that can be used for downstream biological interpretation. Note that other more specific options and alternative analysis workflows are available with the same tools and that this tutorial is only an introduction with a selection of basic methods.

The main added value of these tools over [R] are the full range of QC plots generated and classically produced by bioinformatician experts as well as the very rapid processing of public Affymetrix CEL data (within minutes). We therefore recommend exploring the EC and TAC tools and associate them to IPA and other downstream tools allowing biological evaluation of public microarray data.


Youtube videos from the Affymetrix training team

Please follow the video webcasts below to get familiar with the Affymetrix Expression Console and Transcription Analysis Console

A series of YouTube videos can be found on the Affymetrix web site

(Hosted by John Burrill, PhD the Sr. Director Application Science at Affymetrix)

How to run an analysis in Expression Console Software

How to perform QC in Expression Console Software

How to Customize QC reports and graphs in Expression Console

Setting up an analysis in Transcriptome Analysis Console

Gene-level analysis in Transcriptome Analysis Console

Splice variant analysis in Transcriptome analysis Console

download exercise files

Download exercise files here.

Use the right application to open the files present in result-files

For re-analysis, download the selected ZIP files and decompress them in a local directory.

Note that additional RAE230A library files need to be installed from within EC and TAC to allow re-analysis of this rat data.

References:
  1. http://www.affymetrix.com/estore/browse/level_seven_software_products_only.jsp?productId=131414&categoryId=35623&productName=Affymetrix%2526%2523174%253B-Expression-Console%2526%2523153%253B-Software#1_1
  2. http://www.affymetrix.com/support/learning/training_tutorials/tac_ec/index.affx#1_2

[ Main_Page | Hands-on Analysis of public microarray datasets ]