YQFC

Compare Quantitative Features Between Two Yeast Gene Lists

Step1. Input Two Gene Lists (L1 vs. L2)

Input First Gene List (L1)
Sample: rESR (585 genes)

Input Second Gene List (L2) User Input Sample: iESR (281 genes) Union of the Chosen Pre-complied Gene Lists

6604 ORFs

299 tRNA genes

91 transposable elements

77 snoRNA genes

50 LTR retrotransposons

27 rRNA genes

18 ncRNA genes

12 pseudogenes

6 snRNA genes

1 telomerase RNA gene

Step2. Select the Quantitative Feature to be Analyzed

12 Gene Features

Select All

CDS length

5'UTR length

3'UTR length

# of publications

# of GO terms

# of GO slim terms

# of pathways

# of mutant phenotypes

# of mRNA isoforms

# of transcriptional regulators

# of fungal homologs

# of non-fungal and S. cerevisiae homologs

4 mRNA Features

Select All

mRNA level (3 datasets)

mRNA half-life

Transcriptional plasticity

Translational efficiency (5 datasets)

52 Protein Features

Select All

Amino acid composition (20 kinds)

Atomic composition (5 kinds)

Protein abundance (in the normal growth condition, 23 datasets)

Protein abundance (in various stress conditions, 11 kinds)

Extinction coefficient at 280nm (2 kinds)

# of protein domains

# of PTMs

Protein half-life

Protein Physical Details

Protein length

Molecular weight

Isoelectric point

Aliphatic index

Instability index

Coding Region Translation Calculations

Codon bias

Codon adaptation index

Frequency of optimal codons

Hydropathicity of protein

Aromaticity score

17 Network Features

Select All
# of interactors in the

PI network (3 datasets)

GI network

CC network

CX network

DC network

GN network

GT network

PG network

TS network

PIA network

GIA network

TFBA network

TFRA network

EPA network

FAA network

LEA network

MPA network

Step3. Specify P-value Cutoff for Mulitple Hypothese Testing

Bonferroni correction: p-value cutoff = 10 ^{^–} FDR (False Discovery Rate): p-value cutoff = 10 ^{^–} No correction: p-value cutoff = 10 ^{^–}

Reset

Submit

Comparison Results

User's Specification

# of genes in L1	input1Gene_length	# of genes in L2	input2Gene_length
Multiple hypotheses testing	p-value cutoff = 10^-
See the testing result of a chosen quantitative feature
Gene Features
mRNA Features
Protein Features
Network Features

Testing Result

Step 1

Users need to input two yeast gene lists to be compared.
Standard names, systematic names, or aliases are all acceptable.
If users only have one input gene list, they can use our pre-complied gene lists (e.g. 6604 ORF genes, 299 tRNA genes, 27 rRNA genes, etc.) to generate the second input gene list which is the union of all the selected gene lists.

Step 2

Users need to define the sets of genes (in the yeast genome) whose promoters/coding regions contain specific histone modifications by setting the thresholds.
For example, by setting log₂(H3K9ac/H3)≥1 (meaning the two-fold enrichment over the background) in the promoters, a set of 2129 yeast genes whose promoters contain H3K9ac could be defined.
Then the expected ratio of promoters having H3K9ac in the yeast genome is equal to 0.32 (2129/6572).
Further, by intersecting the input list of N genes and the set of 2129 genes, the number (denoted as M) of input genes whose promoters having H3K9ac can be calculated.
Then the observed ratio of promoters having H3K9ac in the input list of genes is equal to M/N.
Finally, the input list of N genes is said to be enriched with H3K9ac in the promoters if the observed ratio (M/N) is much larger than the expected ratio (2129/6572).
The statistical significance is calculated using hypergeometric testing.

H3K14ac [H2O2]: The yeast cells are grown in the rich medium adding H2O2.
log2(H2AK5ac / Input): "Input" means the control experiment, which is the ChIP-chip/ChIP-seq experiment without using any anti-histone modification (e.g. anti-H3K79me2) antibody.
MAT score (H3K79me2 / Input): MAT stands for Model-based Analysis of Tiling-arrays, which is an algorithm for reliably detecting enriched regions. The higher the MAT score, the higher the enrichment.

Step 3

Since YQFC tests many quantitative features (i.e. multiple hypotheses testing), users have to select a statistical method (Bonferroni correction or FDR) for multiple hypotheses correction and set the p-value threshold.
Bonferroni correction is more conservative than FDR. That is, Bonferroni correction has a smaller type I error rate, resulting in a smaller power, than FDR does.
The p-value threshold determines the statistical significance of how different of a quantitative feature is between the two input gene lists.
The more stringent the p-value threshold, the higher the statistical significance of the identified distinct quantitative feature.

Introduction of Bonferroni correction
Introduction of FDR

Warning

Input genes contain names with multiple IDs or unknown names.

Please modify your input gene list.

Names with multiple IDs	IDs
INPUT 1
Unknown names

Names with multiple IDs	IDs
INPUT 2
Unknown names

Proof page

PIA (Physical Interaction Association) network

Two genes have a link if the PIA score of this gene pair is within the top 5% of the PIA scores of all gene pairs in the yeast genome.

(From YAGM)

GIA (Genetic Interaction Association) network

Two genes have a link if the GIA score of this gene pair is within the top 5% of the GIA scores of all gene pairs in the yeast genome.

(From YAGM)

TFBA (Transcription Factor Binding Association) network

Two genes have a link if the TFBA score of this gene pair is within the top 5% of the TFBA scores of all gene pairs in the yeast genome.

(From YAGM)

TFRA (Transcription Factor Regulation Association) network

Two genes have a link if the TFRA score of this gene pair is within the top 5% of the TFRA scores of all gene pairs in the yeast genome.

(From YAGM)

EPA (Expression Profile Association) network

Two genes have a link if the EPA score of this gene pair is within the top 5% of the EPA scores of all gene pairs in the yeast genome.

(From YAGM)

FAA (Functional Annotation Association) network

Two genes have a link if the FAA score of this gene pair is within the top 5% of the FAA scores of all gene pairs in the yeast genome.

(From YAGM)

LEA (Literature Evidence Association) network

Two genes have a link if the LEA score of this gene pair is within the top 5% of the LEA scores of all gene pairs in the yeast genome.

(From YAGM)

MPA (Mutant Phenotype Association) network

Two genes have a link if the MPA score of this gene pair is within the top 5% of the MPA scores of all gene pairs in the yeast genome.

(From YAGM)

PI (Physical Interaction) network

Two genes have a link if they have physical interaction.

(From BioGRID and YeastNet)

The physical interaction data named BioGRID were retrieved from BioGRID.

The physical interaction data named YeastNet_HT and YeastNet_LC were retrieved from YeastNet.

GI (Genetic Interaction) network

Two genes have a link if they have genetic interaction.

(From BioGRID)

CC network

Inferred links by co-citation of two genes across 46,111 pubmed Medline article abstracts for yeast biology

(From YeastNet)

CX network

Inferred links by co-expression pattern of two genes (based on high-dimensional gene expression data)

(From YeastNet)

DC network

Inferred links by co-occurrence of protein domains between two coding genes

(From YeastNet)

GN network

Inferred links by similar genomic context of bacterial orthologs of two yeast genes

(From YeastNet)

GT network

Inferred links by similar profiles of genetic interaction partners

(From YeastNet)

PG network

Inferred links by similar phylogenetic profiles between two yeast genes

(From YeastNet)

TS network

Inferred links by 3-D protein structure of interacting orthologous proteins between two yeast proteins

(From YeastNet)

mRNA level (3 datasets)

The data of mRNA expression level were retrieved from Table S4 of Nagalakshmi (2008)

The data of transcription level and transcriptional frequency were retrieved from Holstege (1998)

Transcriptional plasticity

The capacity for a gene to change its transcriptional level under different conditions

(From Lin 2010)

Translational efficiency

The rate of mRNA translation into proteins within cells

(From WIKIPEDIA)

To know the details of each dataset, please check Csárdi (2015)

Codon bias

Codon Bias Index (CBI) is a measure of directional codon bias, it measures the extent to which a gene uses a subset of optimal codons.

In a gene with extreme codon bias, CBI will equal 1.0, in a gene with random codon usage CBI will equal 0.0.

Note that it is possible for the number of optimal codons to be less than expected by random change.

This results in a negative value for CBI.

(From CondonW)

Codon adaptation index

The Codon Adaptation Index (CAI) is the most widespread technique for analyzing codon usage bias.

CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes.

CAI is used as a quantitative method of predicting the level of expression of a gene based on its codon sequence.

(From WIKIPEDIA)

Frequency of optimal codons

This index is the ratio of optimal codons to synonymous codons (genetic code dependent).

(From CondonW)

Hydropathicity of protein

Hydrophobicity scales are values that define the relative hydrophobicity or hydrophilicity of amino acid residues.

The more positive the value, the more hydrophobic are the amino acids located in that region of the protein.

These scales are commonly used to predict the transmembrane alpha-helices of membrane proteins.

When consecutively measuring amino acids of a protein, changes in value indicate attraction of specific protein regions towards the hydrophobic region inside lipid bilayer.

(From WIKIPEDIA)

Aromaticity score

The frequency of aromatic amino acids (Phe, Tyr, Trp) in the hypothetical translated gene product.

The hydropathicity and aromaticity protein scores are indices of amino acid usage.

(From CondonW)

Protein abundance (in the normal growth condition, 23 datasets)

These 23 datasets were retrieved from Table S4 of Ho (2018)

Protein abundance (in various stress conditions, 11 kinds)

The protein abundance data in 11 kinds of stress conditions were retrieved from Table S8 of Ho (2018)

Extinction coefficient at 280nm

The extinction coefficient at 280nm indicates how much light a protein absorbs at the wavelength of 280nm.

It is useful to have an estimation of this coefficient for following a protein which a spectrophotometer when purifying it.

Two values are provided, both for proteins measured in water at 280 nm.

The first one shows the computed value based on the assumption that all cysteine residues appear as half cystines (i.e. all pairs of Cys residues form cystines), and the second one assuming that no cysteine appears as half cystine (i.e. assuming all Cys residues are reduced).

Note: Cystine is the amino acid formed when of a pair of cysteine molecules are joined by a disulfide bond.

(From ExPASy)

Isoelectric point

The isoelectric point (pI, pH(I), IEP) is the pH at which a molecule carries no net electrical charge or is electrically neutral in the statistical mean

(From WIKIPEDIA)

Aliphatic index

The aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine).

It may be regarded as a positive factor for the increase of thermostability of globular proteins.

(From ExPASy)

Instability index

The instability index provides an estimate of the stability of a protein in a test tube.

(From ExPASy)