INSTRUCTIONS

TRANSCRIPTION FACTOR ENRICHMENT

Description:
TFBSenrich is used to calculate Transcription Factor Binding Site enrichments on a user provided gene list.

Usage:
TFBSenrich (user.file=system.file("data/Sample_userfile",package="RegFacEnc"), TF.db=system.file("data/JASPAR_CLOVER",package="RegFacEnc"), TF.nome=system.file("data/JASPAR_NOMENCLATURE_TABLE",package="RegFacEnc"), db.seq=system.file("data/UP1000_Protien_Coding_HUMANS_unique.fasta",package="RegFacEnc"), cpg.seq=system.file("data/HUMAN_CpG.fa",package="RegFacEnc"),
chro.seq=system.file("data/HUMAN_chr20.fa",package="RegFacEnc"),option= "-t",
pval = "0.05", species = "Human", TF_motifs="JASPAR", BF_Type="Protein Coding")

Arguments:
user.file: Specifies the name of the gene list file. The user file must be in Comma Separated Value File (CSV) format. Example gene list;

Sno,	Genename,	EnsemblID
1,	ELK3,	ENSG00000111145
2,	RAB10,	ENSG00000084733
3,	ELK4,	ENSG00000158711
4,	SLC31A2,	ENSG00000136867
5,	SLC31A1,	ENSG00000136868

NOTE: Please make sure the headers of the columns in the user file have the exact same names and order. Please avoid NA's in the userfile and also make sure the file doesn't contain any duplicate EnsemblId's and Genenames.

TF.db: The PWM file that is formatted to be compatible for the Clover algorithm using a perl script provided on the CLOVER home page. Custom PWM file could also be passed using this parameter (PWM file has to be Clover compatible).

TF.nome: Provides the name/path of the nomenclature file. Each PWM file has a corresponding nomenclature file. The nomenclature file holds, Genenames and MotifIds of Transcription factors that are present in the PWM file.

db.seq: Specifies the background sequence file. The user can input their own specific background file by providing the whole path along with mentioning the file name. i.e. db.seq="/home/data/UP_1000_FULL_GENOME_GENES_RAT.fasta.txt". It is mandatory to use Ensembl Geneids as sequence headers in the background file.

cpg.seq: Specifies the second background file. We use FASTA sequence of CpG islands of respective species.

chro.seq: Specifies the third background file. We use FASTA sequence of the smallest chromosomes (with appropriate GC content) of the respective species.

pval: Specifies p value threshold. 0.05 is set as default.

species: This parameter is used to specify the species for which the analysis is to be run. Species could be: "Humans" or "Mouse" or "Rat".

TF_motifs: This parameter provides the user to select the motif library of interest. (Could be: “JASPAR” or "HOCOMOCO" or "TRANSFAC")

BF_Type: This parameter enables the user to select the background file of interest.

(Could be: “Protein Coding” or “Whole Genome"). The user can select the upstream 1000bp from the TSS of protein coding genes only (for that species) or could use the upstream 1000bp region of Protein Coding + lincRNA coding genes background file by selecting the “Whole Genome” option.

Example:

In the following example the user provides specific set of background files.

TFBSenrich (user.file="DRG_DARKRED_Ids.csv", TF_motifs="TRANSFAC", db.seq="/home/Sazariah/data/UP_1000_FULL_GENOME_GENES_RAT.fasta.txt", cpg.seq="/home/Sazariah/data/RAT_CpG.fa.txt", chro.seq="/home/Sazariah/data/RAT_chr20.fa.txt", species = "Rat")

miRNA SEED SEQUENCE ENRICHMENT

Description:
miRNAenrich is used to calculate microRNA seed sequence enrichments on a user provided gene list.

Usage:
miRNAenrich (user.file = system.file("data/Sample_userfile",package="RegFacEnc"), PWM.db= system.file("data/PWM_miRNA_Humans_FINAL",package="RegFacEnc"), miRNA_DB=system.file("data/miRNA_seed_HUMANs_NAMES",package="RegFacEnc"), miR.seq=system.file("data/3pUTR_Human_Genes_sequences_unique.fasta",package="RegFacEnc"), cpg.seq=system.file("data/HUMAN_CpG.fa",package="RegFacEnc"), chro.seq=system.file ("data/HUMAN_chr20.fa",package="RegFacEnc"),option= "-t", pval = "0.05", species = "Human")

Arguments:
user.file: Specifies the name of the gene list file. The user file must be in Comma Separated Value File (CSV) format. Example gene list;

Sno,	Genename,	EnsemblID
1,	ELK3,	ENSG00000111145
2,	RAB10,	ENSG00000084733
3,	ELK4,	ENSG00000158711
4,	SLC31A2,	ENSG00000136867
5,	SLC31A1,	ENSG00000136868

PWM.db: Specifies the file containing species-specific miRNA seed sequence.

miRNA_DB: Provides the name of the nomenclature file. Each species specific miRNA seed file has a corresponding Nomenclature file. The nomenclature file holds, miRNA_NAME, and site of each seed.

miR.seq: Specifies the background 3pUTR fasta sequence file. The default file contains 3pUTR region of all transcripts in the genome. The user can input their own specific background file by providing the whole path along with mentioning the file name. i.e. db.seq="/home/data/UP_1000_FULL_GENOME_GENES_RAT.fasta.txt". It is mandatory to use Ensembl Geneids as sequence headers in the background file.