Genome Graphs User's Guide
|
|
|
|
|
Genome Graphs is a tool for displaying genome-wide data sets
such as the results of genome-wide SNP association studies,
linkage studies and homozygosity mapping.
Using the Genome Graphs tool, you can:
- upload several sets of genome-wide data and display
them simultaneously
- click on an area of interest and go directly to the
genome browser at that position
- set a significance threshold for your data and view
only regions that meet that threshold
- view the genes that exist in areas where your data
meet your significance threshold
To return to Genome Graphs from any other location on the
Genome Browser website, use your browser's Back button, or
press Home on the blue navigation bar, then
press the Genome Graphs link.
Note that only the "standard" chromosomes are
displayed in the Genome Graphs display; haplotype and
mitochondrial chromosomes are not displayed.
This User's Guide is aimed at both the novice Genome Graphs user
as well as the advanced user. If you are new to the Genome Graphs
tool, read the Quick Start section to
learn about the basics using some sample data. Advanced users
may want to proceed directly to the section that addresses
a particular area of functionality in detail.
| |
|
|
Formatting, Uploading & Importing Data
|
|
|
Formatting Data
Genome Graphs allows you to upload data from files that reside
on your computer. Several file formats are accepted by the
program. For all formats there is a single line for each marker.
Each line starts with information on the marker, and ends with
the numerical values associated with that marker. The markers
can be of one of the following types:
— |
chromosome base: e.g. chr1 130000
(Note that the first base in a chromosome is considered position 0.)
|
— |
STS Marker: e.g. RH75228
|
— |
dbSNP rsID: e.g. rs12345
|
— |
Affymetrix 500k Gene Chip: e.g. SNP_A-1780270
|
— |
Affymetrix Genome-Wide SNP Array 6: e.g. SNP_A-8575125
|
— |
Affymetrix SNP Array 6 Structural-Variation: e.g. CN_47396
|
— |
Illumina HumanHap300 Bead Chip: e.g. rs3934834
|
— |
Illumina HumanHap550 Bead Chip: e.g. rs3094315
|
— |
Illumina HumanHap650 Bead Chip: e.g. rs3094315
|
— |
Agilent CGH 244A: e.g. A_14_P112718
|
The marker-value pairs in each line of the file can be separated
with a single space, a tab, or a comma. The file can
contain multiple values for each marker. In that case, a
separate graph will be created for each value column in the input file.
For example, chromosome base markers with only one
value associated with the marker would be entered like this:
chrX 100000 1.23
dbSNP rsID markers with two values associated
with the marker would be entered like this:
rs10218492 0.384 0.882
The Genome Graph program will map the marker IDs to the genome.
In cases where the marker maps to more than one location in
the genome, the value(s) in your input file will be associated
with each location.
If the value associated with your marker is positive, do not
include a sign (e.g. '+'). Include a sign ('-') only if the value is
negative.
Note that markers can only be mapped to assemblies for which
there already exists a track of the type that contains your
marker type. You can not, for example, use dbSNP rsID markers
for the cow genome, as it does not have a SNP track.
Uploading Data
Once you have created your input file, you must upload it to
Genome Graphs. From the main Genome Graphs page, choose your
clade, genome, and assembly to which your data pertains.
If you are unsure of the UCSC assembly name, you can check
this page.
Now, press the upload button to go to the upload page.
To upload a file in any of the supported formats, locate the
file on your computer using the controls next to file
name, and then submit. The other controls on this form are
optional, though filling them out will sometimes enhance the
display. In general the controls that default to "best
guess" can be left alone, since the guess is almost always
correct.
The controls for display min and max values and connecting
lines can be set later via the configuration page as well.
Here is a description of each control.
- name of data set: Displayed in graph drop-down in
Genome Graphs and as the track name in Genome Browser. Only the
first 16 characters are visible in some contexts. For data
sets with multiple graphs, this is the first part of the
name, shared with all members of the data set.
- description: A short sentence describing the
data set. Displayed in the Genome Graphs and Genome Browser
configuration pages, and as the center label in the Genome Browser.
- file format: Controls whether the upload file is
a tab-separated, comma-separated, or space separated table.
- markers are: Describes how to map the data to
chromosomes. The choices are that either the first column of the
file is an ID of some sort, or the first column is a chromosome
and the next a base. The IDs can be SNP rs numbers, STS marker
names or ID's from any of the supported genotyping platforms.
- column labels: Controls whether the first row of the
upload file is interpreted as labels or data. If the
first row contains text in the numerical fields, or if the
mapping fields are empty, it is interpreted by "best
guess" as labels. This is generally correct, but you can
override this interpretation by explicitly setting the control.
- display min value/max value: Set the range of the
data set that will be plotted. If left blank, the range will
be taken from the min/max values in the data set itself. For
all data sets to share the same scale, you will usually need
to set this.
- label values: A comma-separated list of numbers
for the vertical axis. If left blank, the axis will be
labeled at the 1/3 and 2/3 points of your data range.
- draw connecting lines: Lines are drawn connecting data points
that are separated by this number of bases or fewer.
- file name, or Paste URLs or data: Specify the uploaded data --
enter either a file on your local computer; or a URL at which the data file can be
found; or simply paste-in the data. If entries are made in both fields, the file name will take
precedence.
Importing Data
In addition to supplying your own genome-wide data files, you can also
import existing database tables from an assembly into the
Genome Graphs tool. Any table containing positional information can be
imported. This includes tables of the following types: BED, PSL, wiggle,
MAF, and bedGraph. Custom track tables can be imported
as well. The tables made by Genome Graphs (chromGraph) can not be imported
as they are already in the format used by the tool, thus no conversion is
necessary. All tables imported into Genome Graphs will be converted into
a custom track of type chromGraph using a window-size of 10,000 bases.
To import a table or custom track, choose the group, track, and table
from the lists, then press the submit button. The other controls are optional,
though completing them will enhance the display. The controls for display
min and max values and connecting lines can be set later via the configuration
page as well. Here is a description of each control.
- name of data set: This will be displayed in the graph list
in the Genome Graphs tool and as the track name in the Genome Browser.
Only the first 16 characters are visible in some contexts. For data
sets with multiple graphs, this is the first part of the name, shared
with all members of the data set.
- description: Enter a short sentence describing the data set.
It will be displayed in the Genome Graphs tool and in the Genome Browser.
- display min value/max value: Set the range of the data set to
be plotted. If left blank, the range will be taken from the min and max
values in the data set itself. If you would like all of your data sets to
share the same scale, you will need to set this.
- label values: A comma-separated list of numbers for the vertical
axis. If left blank the axis will be labeled at the 1/3 and 2/3 point.
- draw connecting lines: Lines connecting data points separated
by no more than this number of bases are drawn.
- depth or coverage: When importing positional tables, you
can choose to convert those tables to the chromGraph format by using
either the depth or coverage conversion method. Both
conversion methods use a non-overlapping window size of 10,000 bases
when converting to the chromGraph format. In the depth method,
the weighted average for each 10,000 base window is assigned to a single
point in the center of this window. Whereas the coverage method
is binary &mdash if there is even one point in the input table in that
10,000 base window, the resulting graph will have a value of 1 for that
range.
| |
|
|
|
|
Use the examples in this section of the User's Guide to get a feel for how
the tool works. Refer to other sections
in this User's Guide for details and instructions for more advanced features.
The Genome Graphs tool comes pre-loaded with sample data. These sample
data sets are from real-world genome-wide studies. Use these data sets to
quickly see what the tool looks like when data is displayed. To
view the sample data, choose a data set from the
graph drop-down list, then choose your desired display color from
the in drop-down list. The tool will display the data set
directly above the chromosomes in Genome Graphs.
Read on to learn how to
customize the display.
Example #1 — SNPs on chr22
Follow these steps to display in Genome Graphs all of the highest quality
SNPs on chromosome 22 for the hg18 assembly whose predicted functional
role is "coding non-synonymous"
(where there is a change in the peptide for the allele with respect to
the reference assembly). Note that there are no SNPs on the p-arm of
chromosome 22.
This data set is formatted in the "marker
value"
style. The markers are dbSNP rsIDs. The associated
value is +1 if the SNP is on the positive strand, and
-1 if the SNP is on the negative strand. Here are the first
ten rows of the data file:
rs1007298 +1
rs1007863 +1
rs10154509 +1
rs10154678 +1
rs10154785 +1
rs1018448 +1
rs10212022 +1
rs1022478 +1
rs1042311 +1
rs1042435 +1
Step 1. Upload the data into the Genome Graphs tool
Copy the entire sample data set
into a text editor and save the file to your computer. This data
set is associated with the human assembly: hg18
(Mar. 2006). Be sure to
configure the Genome Graphs tool to use the hg18 assembly
like so:
clade: Vertebrate
genome: Human
assembly: Mar. 2006
Upload the file into the Genome Graphs tool.
You can configure each control on the upload
page, or just leave them set to their default values.
The upload process may take some time, as the program is actually
mapping each rsID in the input file to its location(s) in the genome.
Step 2. Display the graph in Genome Graphs
Now that your input file has been uploaded to the server, you will want
to display it in the Genome Graphs tool. To display your uploaded data,
simply choose the graph name from the
graph drop-down list, then choose your desired display color
from the in drop-down list.
Your graph will be displayed directly above the chromosomes
in Genome Graphs. You should see the data plotted directly
above chromosome 22.
Step 3. View the graph in the Genome Browser
From the Genome Graphs display, press anywhere on the graph
or on chromosome 22 to open the
Genome Browser for hg18 centered at that location on chr22.
The graph will be drawn as a track near the top of the Genome
Browser display.
| |
|
|
Displaying Data in Genome Graphs
|
|
|
Once you have uploaded your data, you will want to display it in the
Genome Graphs tool. To display your uploaded data, simply choose the
graph name from the
graph drop-down list, then choose the color in which you
would like it to be displayed from the in drop-down list.
Your graph will be displayed directly above the chromosomes
in Genome Graphs. Read on to learn how to
customize the display.
Configuring the Display
Configuring the graphs display
To go to the configuration page, press the configure button
on the main Genome Graphs page. This is the page from which you can
configure many overall aspects of the Genome Graphs display.
Individual graphs can also be configured (see the next section
for help on that).
On this page you will find the following controls:
- image width - controls the overall width of the graphs
display on the main Genome Graphs page. The default is
620 pixels.
- graph height - controls the height of the graph(s) in
the space above each chromosome. The default is
27 pixels.
- graphs per line - controls how many graphs are displayed
on each line in the space above each chromosome. For example,
if you set this value to two, the display will superimpose
two graphs on top of each other on one line. The axis label
for the first graph will appear on the left side of the display
and the axis for the second graph on the right side.
- lines of graphs - controls how many sets of graphs will
appear above each chromosome. For example, if you set this
value to 2, the display will make room for two lines of
graphs (each at the graph height above) in the space
above each chromosome.
- chromosome layout - controls how the chromosomes are laid
out in the Genome Graphs display. You can choose to view one
or two chromosomes on each horizontal line in the display.
Alternatively, you can set up the display such that all of the
chromosomes appear in one long line. If you choose this layout,
you may want to adjust the width of the image (image width
above).
- numerical labels - check this box if you would like to see
axis labels to the right/left of the display. If you did not specify
label values when you uploaded your file, the numerical
labels will default to 1/3 and 2/3 of the max and min values in
your data input file.
- highlight missing - check this box if you would like to
see the areas in your graph where there is no data. Note that if
you are displaying more than one graph, this attribute only
pertains to the first graph.
- region padding - controls the size of the data
regions. The data points in your graphs which exceed
the significance threshold are padded by this number
of bases on either side.
The default places 25,000 bases on each side.
When you have completed configuring the display, press the
submit
button to return to the Genome Graphs display.
Configuring individual graphs
Near the bottom of the Configuration page, you will see a list of
the graphs that you have uploaded. Click on the hyperlinked graph
name to configure that graph. This configuration pertains to
the Genome Graphs view.
You can set the range of the display
by editing the display min/max value values. This will
restrict the Genome Graphs display for this graph to that
data range. The axis will be labeled at 1/3 and 2/3 of the
data range that you set.
If your data is sparse, you may want to draw lines between
your data points. You can configure that by editing the
draw connecting lines between markers separated by
up to ... bases value. The default value is 25,000,000
bases.
When you have completed configuring the display, press the
submit
button twice to return to the Genome Graphs display.
Setting a Significance Threshold
Most genome-wide data has some amount of noise and is only interesting when
the data values are above a certain value. You can set this value using the
significance threshold input box. Enter a decimal number in this input box
and press Enter. The display will now have a light gray line across the
graph at this data value. If you have more than one graph displayed, the
significance threshold only pertains to the graphs that contain the significance
threshold in the displayed data range.
The significance threshold works in concert with the browse
regions
and sort genes buttons; it will affect the regions
that are
displayed once you press either of these two buttons.
To open the Genome Browser with a view of all of the regions
in your graph
that include data points that pass the significance threshold,
press the
browse regions button. This will open the Genome
Browser with a
navigation pane on the left side of the screen. This pane
will contain
links to all regions which pass your significance threshold.
Note that if you are displaying more than one graph, the
significant regions are based only on the first graph in the
display list.
To view a list of genes which are in regions that pass the
significance
threshold, press the sort genes button. This will
open the
Gene Sorter with only the genes that are in significant
locations
with respect to your data.
If you would rather view all of your regions without restricting the
output to only those regions that pass the significance threshold, simply
delete any values from the significance threshold input box and press
Enter before pressing browse regions.
Setting a Data Region
The data region is the span of bases that will be added to either side
of the data points in your graphs which exceed the significance
threshold. Set the data region by editing the region
padding value on the configuration page.
The combination of setting the data region and the significance
threshold will affect two things:
- the regions displayed in the
Genome Browser
after you press the browse regions button,
- the genes displayed in the
Gene Sorter
after you press the sort genes button.
For example, take a data set that contains the following data:
chr2 100100000 2.3
chr2 100100500 4.5
chr2 100101000 1.2
If you set the significance threshold at 4.0, one data
point
in the data set passes that threshold. If you then set the data
range to 200, then the one significant data point will be
padded on each side by 200 base pairs. In that case, the
only resulting significant data region will be
chr2:100,100,300-100,100,700.
If instead you set the data range
to 2,000, then the one significant data point will be
padded on each side by 2,000 base pairs. In that case, the
resulting significant data region will be
chr2:100,098,500-100,102,500.
| |
|
|
Viewing Data in the Genome Browser
|
|
|
To view your graphs in the Genome Browser, press the
browse regions button. This will open the Genome
Browser with your graph(s) displayed as track(s). You can
configure and edit your track
as you can any other track in the Genome Browser.
In addition to the Genome Browser, you will also see a pane
on the left-hand side, which contains links to all of the
significant regions in your data.
Please note that if you are displaying more than one graph in
Genome Graphs, the significant regions are based only on the
first graph in the display list.
You can also navigate to the Genome Browser by clicking directly
on a graph or chromosome in Genome Graphs. The Genome Browser
will open with a 1,000,000 bp window centered on the location
on which you clicked.
| |
|
|
Viewing Data in the Gene Sorter
|
|
|
To view the set of genes that are in
significant regions in your data,
press the sort genes button. This will open the Gene
Sorter with a filter to include only genes that are located
in regions in your input data that are above the significance
threshold. Please note that if you are displaying more than
one graph in Genome Graphs, the significant genes are based
only on the first graph in the display list.
If the graph was uploaded using markers, then a custom
Gene Sorter column with the same name as the graph
will be created. This column will list all markers for each
gene that contain values above the significance threshold.
| |
|
|
|
|
There are several ways to delete your data once it has
been uploaded. If you are viewing your data as a track in
the Genome Browser, you can click on the mini-button
or track control for the track and delete the track
using the Remove custom track button. You can also
choose to reset your cart which will reset the browser
interface settings to their defaults, as well as delete
all custom tracks and data. Do this by visiting the
gateway page and pressing the hyper link: "Click
here to reset".
Your data will be saved on our server for at least 48 hours
from the time you last access it, unless it is saved in a
Session.
| |
|
|
|
|
To calculate how well correlated with one another your data
sets are, press the correlate button. This will
calculate and display the correlation coefficient (R)
among each of your data sets. R, also known as Pearson's
correlation coefficient, is a measure of the extent that two
graphs move together. The value of R ranges between
-1 and 1. A positive R indicates that the graphs
tend to move in the same direction, while a negative R
indicates that they tend to move in opposite directions.
R-Squared (which is indeed just R*R) measures
how much of the variation in one graph can be explained by
a linear dependence on the other graph. R-Squared
ranges between 0 when the two graphs are independent to 1
when the graphs are completely dependent.
To return to the Genome Graphs, press the return to
graphs button.
| |
|
|
|