Submission of ENCODE data
|
|
|
We are looking forward to the submission of
ENCODE-related data, and are ready to help
throughout the process. We recommend the
following steps:
- Choose an appropriate format
(BED, GFF,
GTF, MAF, or
WIG) for your data from the
descriptions below and create a
file in that format. The submitted data file should be
in plain-text (or compressed plain-text) format.
- Data coordinates should be based on the NCBI
Build 35 assembly (May 2004, hg17). If necessary,
data in BED format from previous assemblies may be
transformed to the Build 35 assembly with the UCSC
liftOver
tools.
- Test your file by loading it into the
browser as a
custom track.
- Generate a description page to accompany
your data. The page should contain a brief
description of the data, methods used
to generate the data, the techniques used to
verify the data, acknowledgements of the
individuals or organizations involved in data
collection and analysis, and any related literature
references (optional). Use the
HTML template
to organize and format your text.
- When your data is displayed as you desire,
please
contact us by email to encode@soe.ucsc.edu,
to initiate loading the tracks into the
database. This makes the data available to
the rest of the ENCODE community and to the
public.
-
The status of your track at UCSC can be checked on the
ENCODE data status page.
| |
|
|
Browser Extensible Data Format (BED)
|
|
|
The BED format was defined for the efficient
storage and retrieval of genomic annotations.
It provides a flexible way to define the data
lines that are displayed in an annotation
track. Please see the
BED format description and the
custom tracks page for more details.
| |
|
|
General Feature Format (GFF)
|
|
|
GFF is used for data where there are a set of
linked features, such as gene models that have
introns, exons, promoters, and transcription
start/end sites. GFF lines have nine required
fields that must be tab-separated. If the
fields are separated by spaces instead of
tabs, the track will not display correctly.
Please see the
GFF format description and the
custom tracks page for more details.
| |
|
|
Gene Transfer Format (GTF)
|
|
|
GTF is a refinement of GFF that tightens the
specification and allows arbitrary
supplemental information for each gene.
Please see the
GTF format description and the
custom tracks page for more details.
| |
|
|
Multiple Alignment Format (MAF)
|
|
|
The multiple alignment format stores a series
of multiple alignments in a format that is
easy to parse and relatively easy to read.
This format stores multiple alignments at the
DNA level between entire genomes.
Please see the
MAF format description and the
custom tracks page for more details.
| |
|
|
|
|
The wiggle (WIG) format allows display of
continuous-valued data in track format. This
is useful for GC percent, probability scores,
and transcriptome data.
Please see the
WIG format description and the
custom tracks page for more details.
| |
|
|
|