ENCODE Project at UCSC: Data Submission

Submission of ENCODE data

We are looking forward to the submission of ENCODE-related data, and are ready to help throughout the process. We recommend the following steps:

Choose an appropriate format (BED, GFF, GTF, MAF, or WIG) for your data from the descriptions below and create a file in that format. The submitted data file should be in plain-text (or compressed plain-text) format.
Data coordinates should be based on the NCBI Build 35 assembly (May 2004, hg17). If necessary, data in BED format from previous assemblies may be transformed to the Build 35 assembly with the UCSC liftOver tools.
Test your file by loading it into the browser as a custom track.
Generate a description page to accompany your data. The page should contain a brief description of the data, methods used to generate the data, the techniques used to verify the data, acknowledgements of the individuals or organizations involved in data collection and analysis, and any related literature references (optional). Use the HTML template to organize and format your text.
When your data is displayed as you desire, please contact us by email to encode@soe.ucsc.edu, to initiate loading the tracks into the database. This makes the data available to the rest of the ENCODE community and to the public.
The status of your track at UCSC can be checked on the ENCODE data status page.

Browser Extensible Data Format (BED)


	The BED format was defined for the efficient storage and retrieval of genomic annotations. It provides a flexible way to define the data lines that are displayed in an annotation track. Please see the BED format description and the custom tracks page for more details.

General Feature Format (GFF)


	GFF is used for data where there are a set of linked features, such as gene models that have introns, exons, promoters, and transcription start/end sites. GFF lines have nine required fields that must be tab-separated. If the fields are separated by spaces instead of tabs, the track will not display correctly. Please see the GFF format description and the custom tracks page for more details.

Gene Transfer Format (GTF)


	GTF is a refinement of GFF that tightens the specification and allows arbitrary supplemental information for each gene. Please see the GTF format description and the custom tracks page for more details.

Multiple Alignment Format (MAF)


	The multiple alignment format stores a series of multiple alignments in a format that is easy to parse and relatively easy to read. This format stores multiple alignments at the DNA level between entire genomes. Please see the MAF format description and the custom tracks page for more details.

Wiggle Format (WIG)


	The wiggle (WIG) format allows display of continuous-valued data in track format. This is useful for GC percent, probability scores, and transcriptome data. Please see the WIG format description and the custom tracks page for more details.