riboraptor
library is a Python library for analysis of Ribo-seq data.
It assumes the reads have already been aligned to a reference and are available as BAM/SAM for input.
Current capabilities of riboraptor
include:
Visualization: |
|
---|---|
Utilities: |
|
$ Usage: riboraptor [OPTIONS] COMMAND [ARGS]...
riboraptor: Tool for ribosome profiling analysis
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
bam-to-bedgraph Convert bam to bedgraph
bedgraph-to-bigwig Convert bedgraph to bigwig
export-bed-fasta Export gene level fasta from specified bed
export-gene-coverages Export gene level coverage for all genes for given region
export-metagene-coverage Export metagene coverage for given region
export-read-length Calculate read length distribution
periodicity Calculate periodicity
plot-metagene Plot metagene profile
plot-read-length Plot read length distribution
uniq-bam Create a new bam with unique mapping reads only
uniq-mapping-count Count number of unique mapping reads
Contents:
To install riboraptor, run this command in your terminal:
$ pip install riboraptor
This is the preferred method to install riboraptor, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
The sources for riboraptor can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/saketkc/riboraptor
Or download the tarball:
$ curl -OL https://github.com/saketkc/riboraptor/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
We will be working with the first published Ribo-seq dataset GSE13750 from Ingolia et al. (2009) which has samples for both mRNA-seq and Ribo-seq from Yeast grown in starved and nutrient rich media.
At this point, we assume you have already completed all the steps under “Installing dependencies” section of the README.
We will download all SRA files corresponding to GSE13750.
cd riboraptor
download_sra_data --sradb=../riboraptor-data/SRAmetadb.sqlite \
--geodb=../riboraptor-data/GEOmetadb.sqlite GSE13750
GEO IDs are automatatiicaly converted to corresponding SRP IDs. GSE13750 corresponds to SRP000637.
There are 6 experiments in total (SRX003184-SRX003191), but we will be working with only two: SRX003187`and `SRX003191 one of which is mRNA-seq while other is Ribo-seq. (We will figure out which is which later.) You can delete are SRX directories except the above two. We will be using sacCerR64 as our reference.
We will now use Snakemake to run all the downstream steps. Here is what the overall workflow looks like:
cd snakemake
cp configs/SRP000637.py.sample configs/SRP000637.py
Edit the paths inside SRP000637.py to point to your RAW data, GTF and BED files. BED files for a lot of assemblies are inbuilt into riboraptor. In most cases, those should suffice. However, you stil need to provide paths to fasta, chromosome sizes and the GTF. The BED files are created from a particular version of the GTF, so if you are using your own GTF, you should ideally be using your own BED files too.
An example of a config would be:
## Path to SRP directory
RAWDATA_DIR = '/staging/as/skchoudh/SRA_datasets/SRP000637'
## Output directory (will be created if does not exist)
OUT_DIR = '/staging/as/skchoudh/riboraptor-analysis/SRP000637'
## Genome fasta location
GENOME_FASTA = '/home/cmb-06/as/skchoudh/genomes/sacCerR64/fasta/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa'
## Chromosome sizes location
CHROM_SIZES = '/home/cmb-06/as/skchoudh/genomes/sacCerR64/fasta/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.sizes'
## Path to STAR index (will be generated if does not exist)
STAR_INDEX = '/home/cmb-06/as/skchoudh/genomes/sacCerR64/star_annotated'
## GTF path
GTF = '/home/cmb-06/as/skchoudh/genomes/sacCerR64/annotation/Saccharomyces_cerevisiae.R64-1-1.91.gtf'
## Path to bed file containing Intron coordinates
INTRON_BED = '/home/cmb-panasas2/skchoudh/github_projects/riboraptor/riboraptor/annotation/sacCerR64/intron.bed'
## Path to bed file containing CDS coordinates
CDS_BED = '/home/cmb-panasas2/skchoudh/github_projects/riboraptor/riboraptor/annotation/sacCerR64/cds.bed'
## Path to bed file containing 5'UTR coordinates
UTR5_BED = '/home/cmb-panasas2/skchoudh/github_projects/riboraptor/riboraptor/annotation/sacCerR64/utr5.bed'
## Path to bed file containing 3'UTR coordinates
UTR3_BED = '/home/cmb-panasas2/skchoudh/github_projects/riboraptor/riboraptor/annotation/sacCerR64/utr3.bed'
Edit Line4 snakemake/jobscript.sh pointing to your conda root directory.
An example path would be:
export PATH="/home/cmb-panasas2/wenzhenl/miniconda3/bin:$PATH"
Edit Line6 snakemake/cluster.yaml and Line7 snakemake/cluster.yaml to point to your log directory error log file.
An example path would be:
logout: '/home/cmb-06/as/skchoudh/logs/{rule}.{wildcards}.out'
logerror: '/home/cmb-06/as/skchoudh/logs/{rule}.{wildcards}.err'
You would want to just edit the directory path leading to /home/cmb-06/as/skchoudh/logs/ and leave the rest as it is.
bash submitall.sh SRP000637
The submitall.sh looks for a file named SRP000637.py in the configs directory, so make sure SRP000637.py exists inside configs/ directory.
When the entire pipeline as run, it will create an html file riboraptor_report.html as output. You can copy it locally to visualize metagene profiles and read length distribution for all samples.
roboraptor is a comprehensive pipeline and set of tools for analyzing Ribo-seq data. This manual explains the stages in our pipeline, how to use the analysis tools and how to modify the pipeline for your specific context.
Our pipeline was designed to run in a cluster computing context, with many processing nodes available, and a job submission system like PBS or SGE. Much of this analysis is computationally intensive. We assume that individual nodes will have several GB of memory available for processing.
Ribo-seq experiments are always single-end sequenced. Ribosome protected fragments range from 28-32 nucleotides and hence most experiments involve 50 bp single end reads. Before mapping, we need to get rid of the adapters ligated at the 3’ end of the fragments as part of the library protocol.
We use trim_galore for trimming. It automates adapter trimming:
$ trim_galore -o <out_dir> -q <min_quality> <input.fq.gz>
-o out_dir | Output directory |
-q min_quality | Trim low-quality ends from reads in addition to adapter removal |
We use STAR to map reads, though other splice-aware aligners can also be used. The first step is to create an index, preferably using a GTF file. If the index step is run without a GTF file (which is optional), STAR will not be splice-aware.
$ STAR --runThreadN <threads>\
--runMode genomeGenerate\
--genomeDir <index_out_dir>\
--genomeSAindexNbases <SA_INDEX_Nbases>\
--genomeFastaFiles <input.fasta>\
--sjdbGTFfile <input.gtf>
--runThreadN | Number of threads to use |
--runMode | Flag to set for index mode |
--genomeDir | Directory to write index files to |
--genomeSAindexNbases | |
min(14, log2(GenomeLength)/2 - 1), this must be scaled down for small genomes | |
--genomeFastaFiles | |
Path to reference fasta | |
--sjdbGTFfile | Path to GTF file |
Often Ribo-seq experiments have a matching RNA-seq library. Ideally, the RNA-seq library should be as similar to Ribo-seq library and hence will often be single-ended. We recommend both RNA-seq and Ribo-seq samples be mapped using the following parameters:
$ STAR --runThreadN <threads>\
--genomeDir <input.index>\
--outFilterMismatchNmax 2\
--alignIntronMin <ALIGN_INTRON_Nmin>\
--alignIntronMax <ALIGN_INTRON_Nmax>\
--outFileNamePrefix <params.prefix> --readFilesIn <input.R1>\
--outSAMtype BAM Unsorted\
--readFilesCommand zcat\
--quantMode TranscriptomeSAM\
--outTmpDir /tmp/<params.name>_tmp\
--outReadsUnmapped Fastx\
--runThreadN | Number of threads to use |
--genomeDir | Path to index directory |
--outFilterMismatchNmax | |
Allow a maximum of mismatches=2 | |
--alignIntronMin | |
Minimum intron size. Any genomic gap is considered intron if its length >= alignIntronMin. (Default = 20) | |
--alignIntronMax | |
Maximum intron size (Default = 1000000) | |
--outFileNamePrefix | |
Prefix for output files | |
--readFilesIn | Path to input fastq.gz |
--outSAMtype | Output an unsorted BAM file (outtype=BAM Unsorted) |
--readFilesCommand | |
cat/zcat depending on input is fq/fq.gz | |
--quantMode | Also output BAM aligned to the transcriptome |
--outTmpDir | Directory to use for writing temporary files |
--outReadsUnmapped | |
Write unmapped reads to separate fastq file |
STAR outputted BAM files are not sorted. We need a BAM file sorted by coordinates.
$ samtools sort <prefix>Aligned.out.bam -o <output.bam> -T <tmpdir>_sort &&\
$ samtools index <prefix>Aligned.out.bam
Additionaly, we also need BAM file sorted by name, since htseq-counts (and featureCounts) prefer a BAM sorted by name in their default mode.
$ samtools sort -on <input.bam> -T <tmpdir> -o <output.bam> &&\
$ samtools index <output.bam>
Once we have the bams, we are ready for downstream analysis. The downstream step often involves a number of steps. The following list summarises these steps along with their recommended values (wherever applicable):
The first step is to simply caculate the number of uniquely mapped reads. We recommend a minimum of 5 million reads for any downstream analysis. TODO: list different recommendation for different species
$ riboraptor uniq-mapping-count --bam <input.bam>
–bam input.bam Path to bam file
An ideal Ribo-seq library is expected to have 28-32 nt long fragments most enriched. We can calculate enrichment and plot the fragment size distribution using riboraptor.
Readd length distribution can be calculated using the read-length-dist subcommand:
$ riboraptor read-length-dist --bam <input.bam>
This will print out the read length and associated counts on the console. In order to visualize thhese counts as a barplot, we can use the plot-read-dist subcommand:
$ riboraptor read-length-dist --bam <input.bam>\
| riboraptor plot-read-dist --saveto <output.png>
A metagene plot is used as a summary statistic to visualize the distribution of ribosome protected fragments along the positions of a gene often starting (ending) at the start (stop) codon. This is useful for estimating P-site offsets. The ribosome subunuits are known to protect 28-32 nt and hence the P-site is often located 12 nt downstream the 5’ position of the mapped read.
To perform metagene analysis, we will work with bigWig format. in order to do that, we need an intermediate bedGraph file. This can be done using bam-to-bedgraph subcommand:
$ riboraptor bam-to-bedgraph --bam <input.bam>
This will print the bedGraph to console. this cna be piped to bedgraph-to-bigwig subcommand:
$ riboraptor bam-to-bedgraph --bam <input.bam> \
| riboraptor bedgraph-to-bigwig --sizes <genome.sizes> --saveto <output.bw>
We now have <output.bw> ready for further downstream analysis.
TODO (See Example)
TODO (See Example)
We will use two samples from GSE94454 , one RNA-seq sample (SRR5227310) and one Ribo-seq sample (SRR5227306) as examples for examples that follow.
$ riboraptor uniq-mapping-count --bam data/SRR5227310.bam
28637667
This is a pretty deep library.
$ riboraptor read-length-dist --bam data/SRR5227310.bam\
| riboraptor plot-read-dist --saveto SRR5227310.png
How enriched is it in 27-32 nt fragment range?
$ riboraptor read-length-dist --bam data/SRR5227310.bam\
| riboraptor read-enrichment
(Enrichment: 1.52768004237, pval: 0.458943823895)
So the fragment length distribution doesn’t seem to be enriched. We next perform metagene analysis. Ribo-seq data is expected to have an inherent periodicity of 3, since ribosomes move one codon at a time during active translation.
$ riboraptor bedgraph-to-bigwig -bg data/SRR5227310.bg -s hg38 -o data/SRR5227310.bw
$ riboraptor metagene-coverage -bw data/SRR5227310.bw \
--region_bed hg38_cds --max-positions 500 \
--prefix data/SRR5227310.metagene --offset 60 --ignore_tx_version
$ riboraptor plot-read-counts \
--counts data/SRR5227310.metagene_metagene_normalized.pickle\
--saveto data/SRR5227310.metagene.png
Since metagene gives a summary statistic, we can also look at the abolute counts distribution per frame:
$ riboraptor plot-framewise-counts --counts data/SRR5227310.metagene_metagene_raw.pickle\
--saveto data/SRR5227310.framewise.png
Let’s try another sample: SRR5227306 and compare it with SRR5227310 with respect to distribution of reads.
$ riboraptor uniq-mapping-count --bam data/SRR5227306.bam
10658208
$ riboraptor read-length-dist --bam data/SRR5227306.bam | riboraptor plot-read-dist --saveto SRR5227306.png
$ riboraptor read-length-dist --bam data/SRR5227306.bam | riboraptor read-enrichment
(Enrichment: 14.0292145986, pval: 0.135220082438)
As compared to SRR5227310, the enrichment in this case is almost 10 times higher.
$ riboraptor plot-framewise-counts --counts data/SRR5227306.metagene_metagene_raw.pickle\
--saveto data/SRR5227306.framewise.png
We can see the framewise distribution of reads in SRR5227310 is more or less uniform, but not so in SRR5227306.
$ riboraptor bedgraph-to-bigwig -bg data/SRR5227306.bg -s hg38 -o data/SRR5227306.bw
$ riboraptor metagene-coverage -bw data/SRR5227306.bw \
--region_bed hg38_cds --max-positions 500 \
--prefix data/SRR5227306.metagene --offset 60 --ignore_tx_version
$ riboraptor plot-read-counts \
--counts data/SRR5227306.metagene_metagene_normalized.pickle\
--saveto data/SRR5227306.metagene.png
The metagene of a Ribo-seq sample will show periodicity as in the case of SRR5227306 sample. On the other hand a RNA-seq sample like SRR5227310 will tend to have a flat profile.
TODO
from riboraptor import read_length_distribution
from riboraptor import fragment_enrichment
from riboraptor.plotting import plot_read_counts
from riboraptor.plotting import plot_fragment_dist
ribo_bam_f = '../data/U251_ribo.bam'
fragment_lengths = read_length_distribution(ribo_bam_f)
ax, fig = plot_fragment_dist(fragment_lengths)
The abosve is a good example of an ideal Ribo-seq library where the fragment lengths are concentrated between 28 and 31 nucelotides.
We can also “quantify” this enrichment:
ribo_enrichment = fragment_enrichment(ribo_bam_f)
print(ribo_enrichment)
>>> (75.19322097821424, 0.045749574004642399)
Let’s compare this with a RNA-Seq sample from the same sample.
rna_bam_f = '../data/U251_rna.bam'
fragment_lengths = read_length_distribution(rna_bam_f)
ax, fig = plot_fragment_dist(fragment_lengths)
rna_enrichment = fragment_enrichment(rnabam_f)
print(rna_enrichment)
>>> (2.334333862251336e-05, 1.0)
In order to visualize genewise coverage, we need to create a bigwig file first. This inturn requires a bedgraph file.
from riboraptor import create_bedgraph
create_bedgraph(ribo_bam_f, strand='both', end_type='5prime', outfile='../data/U251_ribo.bg')
from riboraptor import bedgraph_to_bigwig
bedgraph_f = '../data/U251_ribo.bg'
chrom_sizes = '../data/hg38.sizes'
bedgraph_to_bigwig(bedgraph_f, chrom_sizes, '../data/U251_ribo.bw')
from riboraptor import gene_coverage
cds_bed = '../data/hg38.cds.bed'
bw = '../data/U251_ribo.bw'
coverage, _, _, _ = gene_coverage('ENSG00000080824', cds_bed, bw, 60)
The last argument 60 here specifies the number of upstream bases to count. We visualize only the first 100 bases:
ax, fig, peak = plot_read_counts(coverage[range(-60,100)],
majorticks=10,
minorticks=5,
marker='o',
millify_labels=False)
TODO
TODO
A score to determine if protein translation is complete. Defined as the ratio between reads in coding region to reads in the 3’UTR region normalized by the corresponding ratio in mRNA data allowing for incorrect 3’UTR annotations (CDS to 3’UTR ratio in mRNA is not 1 and is reflective of different protein producing potential)
where $F_i$ represents number of reads in frame $i$ and $bar{F}$ represents $mean(F1,F2,F3)$
where the $f_{ref}$ is contructed by counting the number of fragments of a particular read length over only annotated protein-coding genes. Cutiff is determined by identifying outliers using Tukey’s method.
The FLOSS cutoff, calculated as a function of the total number of reads in the transcript histogram, was established by considering a rolling window of individual annotated genes and the computing the upper extreme outlier cutoff for each window using Tukey’s method (Q3 + 3*IQR, where Q3 is the 3rd quartile and IQR is the interquartile range).
riboraptor.coherence.
get_periodicity
(values, input_is_stream=False)[source]¶Calculate periodicty wrt 1-0-0 signal.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.coherence.
naive_periodicity
(values, identify_peak=False)[source]¶Calculate periodicity in a naive manner
Take ratio of frame1 over avg(frame2+frame3) counts. By default the first value is treated as the first frame as well
Parameters: |
|
---|---|
Returns: |
|
Utilities for read counting operations.
riboraptor.count.
bam_to_bedgraph
(bam, strand=u'both', end_type=u'5prime', saveto=None)[source]¶Create bigwig from bam.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
bedgraph_to_bigwig
(bedgraph, sizes, saveto, input_is_stream=False)[source]¶Convert bedgraph to bigwig.
Parameters: |
|
---|
riboraptor.count.
collapse_gene_coverage_to_metagene
(gene_coverages, target_length, outfile=None)[source]¶Collapse gene coverages to specific target length.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
count_feature_genewise
(feature_bed, bam, force_strandedness=False, use_multiprocessing=False)[source]¶Count features genewise.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
count_reads_bed
(bam, region_bed_f, saveto)[source]¶Count number of reads following in each region.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
count_reads_in_features
(feature_bed, bam, force_strandedness=False, use_multiprocessing=False)[source]¶Count reads overlapping features.
Parameters: |
|
---|
riboraptor.count.
count_reads_per_gene
(bw, bed, prefix=None, n_cores=16, collapse_intervals=True)[source]¶Count number of reads following in each region.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
count_utr5_utr3_cds
(bam, utr5_bed=None, cds_bed=None, utr3_bed=None, genome=None, force_strandedness=False, genewise=False, saveto=None, use_multiprocessing=False)[source]¶One shot counts over UTR5/UTR3/CDS.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
diff_region_enrichment
(numerator, denominator, prefix)[source]¶Calculate enrichment of counts of one region over another.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
export_gene_coverages
(bigwig, region_bed_f, saveto, offset_5p=60, offset_3p=0, ignore_tx_version=True)[source]¶Export all gene coverages.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
export_metagene_coverage
(bigwig, region_bed_f, max_positions=None, saveto=None, offset_5p=60, offset_3p=0, ignore_tx_version=True)[source]¶Calculate metagene coverage.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
extract_uniq_mapping_reads
(inbam, outbam)[source]¶Extract only uniquely mapping reads from a bam.
Parameters: |
|
---|
riboraptor.count.
gene_coverage
(gene_name, bed, bw, gene_group=None, offset_5p=0, offset_3p=0, collapse_intervals=True)[source]¶Get gene coverage.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
gene_coverage_sum
(gene_name, bed, bw, collapse_intervals=True)[source]¶Keep track of only the sum
Parameters: |
|
---|
riboraptor.count.
get_fasta_sequence
(fasta, intervals)[source]¶Extract fasta sequence given a list of intervals.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
get_region_sizes
(bed)[source]¶Get collapsed lengths of gene in bed.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
htseq_to_cpm
(htseq_f, saveto=None)[source]¶Convert HTSeq counts to CPM.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
htseq_to_tpm
(htseq_f, cds_bed_f, saveto=None)[source]¶Convert HTSeq counts to TPM.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
interval_coverage
(bw, intervals)[source]¶Get coverage at custom intervals
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
mapping_reads_summary
(bam, prefix)[source]¶Count number of mapped reads.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
pickle_bed_file
(bed, collapse_intervals=True)[source]¶Create a lookup pickle file for genewise CDS/UTR coordinates.
In order to prevent recalculating the coordinates that should be fetched for each genes’ CDS or UTR regions, they can be stored in a pickle file.
Parameters: |
|
---|
riboraptor.count.
read_enrichment
(read_lengths, enrichment_range=[28, 29, 30, 31, 32], input_is_stream=False, input_is_file=False)[source]¶Calculate read enrichment for a certain range of lengths
Parameters: |
|
---|---|
Returns: |
|
riboraptor.count.
read_htseq
(htseq_f)[source]¶Read HTSeq file.
Parameters: |
|
---|---|
Returns: |
|
Utilities to download data from NCBI SRA
riboraptor.download.
run_download_sra_script
(download_root_location=None, ascp_key_path=None, srp_id_file=None, srp_id_list=None)[source]¶Download data from SRA.
Parameters: |
|
---|
riboraptor.dtw.
dtw
(X, Y, metric=u'euclidean', ddtw=False, ddtw_order=1)[source]¶Parameters: |
|
---|
riboraptor.dtw.
get_path
(D)[source]¶Traceback path of minimum cost
Given accumulated cost matrix D, trace back the minimum cost path
Parameters: |
|
---|---|
Returns: |
|
riboraptor.fasta.
complete_gene_fasta
(utr5_bed_f, cds_bed_f, utr3_bed_f, fasta_f, prefix)[source]¶Merge Utr5, CDS, UTR3 coordinates to get one fasta.
Parameters: |
|
---|
riboraptor.fasta.
export_all_fasta
(region_bed_f, chrom_sizes, fasta, prefix, offset_5p=60, offset_3p=0, ignore_tx_version=True)[source]¶Export all gene coverages.
Parameters: |
|
---|
riboraptor.fasta.
export_fasta_from_bed
(gene_name, bed, chrom_sizes, fasta_f, gene_group=None, offset_5p=0, offset_3p=0)[source]¶Extract fasta genewise given coordinates in bed file
Parameters: |
|
---|---|
Returns: |
|
riboraptor.fasta.
get_fasta_sequence
(fasta_f, intervals)[source]¶Extract fasta sequence given a list of intervals.
Parameters: |
|
---|---|
Returns: |
|
All functions that are not so useful, but still useful.
riboraptor.helpers.
check_file_exists
(filepath)[source]¶Check if file exists.
Parameters: |
|
---|
riboraptor.helpers.
codon_to_anticodon
(codon)[source]¶Codon to anticodon.
Parameters: |
|
---|
riboraptor.helpers.
collapse_bed_intervals
(intervals, chromosome_lengths=None, offset_5p=0, offset_3p=0)[source]¶Collapse intervals into non overlapping manner
# NOTE # TODO : This function has a subtle bug that it will be offset by 1 # position when the gene is on negative strand # So essentially if you have CDS on a negative strand # The first position should be discarded # Similary for the last position in the gene on + strand # you have an extra position in the end
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
create_ideal_periodic_signal
(signal_length)[source]¶Create ideal ribo-seq signal.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
get_strandedness
(filepath)[source]¶Parse output of infer_experiment.py from RSeqC to get strandedness.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
identify_peaks
(coverage)[source]¶Given coverage array, find the site of maximum density
riboraptor.helpers.
list_to_ranges
(list_of_int)[source]¶Convert a list to a list of range object
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
millify
(n)[source]¶Convert integer to human readable format.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
pad_five_prime_or_truncate
(some_list, offset_5p, target_len)[source]¶Pad first the 5prime end and then the 3prime end or truncate
Parameters: |
|
---|
riboraptor.helpers.
pad_or_truncate
(some_list, target_len)[source]¶Pad or truncate a list upto given target length
Parameters: |
|
---|
riboraptor.helpers.
parse_star_logs
(infile, outfile=None)[source]¶Parse star logs into a dict
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
r2
(x, y)[source]¶Calculate pearson correlation between two vectors.
Parameters: |
|
---|
riboraptor.helpers.
round_to_nearest
(x, base=5)[source]¶Round to nearest base.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
set_xrotation
(ax, degrees)[source]¶Rotate labels on x-axis.
Parameters: |
|
---|
riboraptor.helpers.
summarize_counters
(samplewise_dict)[source]¶Summarize gene counts for a collection of samples.
Parameters: |
|
---|---|
Returns: |
|
riboraptor.helpers.
summary_stats_two_arrays_welch
(old_mean_array, new_array, old_var_array=None, old_n_counter=None, carried_forward_observations=None)[source]¶Average two arrays using welch’s method
Parameters: |
|
---|---|
Returns: |
|
Plotting methods.
riboraptor.plotting.
plot_featurewise_barplot
(utr5_counts, cds_counts, utr3_counts, ax=None, saveto=None, **kwargs)[source]¶Plot barplots for 5’UTR/CDS/3’UTR counts.
Parameters: |
|
---|
riboraptor.plotting.
plot_framewise_counts
(counts, frames_to_plot=u'all', ax=None, title=None, millify_labels=False, position_range=None, saveto=None, ascii=False, input_is_stream=False, **kwargs)[source]¶Plot framewise distribution of reads.
Parameters: |
|
---|
riboraptor.plotting.
plot_read_counts
(counts, ax=None, marker=None, color=u'royalblue', title=None, label=None, millify_labels=False, identify_peak=True, saveto=None, position_range=None, ascii=False, input_is_stream=False, ylabel=u'Normalized RPF density', **kwargs)[source]¶Plot RPF density aro und start/stop codons.
Parameters: |
|
---|
riboraptor.plotting.
plot_read_length_dist
(read_lengths, ax=None, millify_labels=True, input_is_stream=False, title=None, saveto=None, ascii=False, **kwargs)[source]¶Plot read length distribution.
Parameters: |
|
---|
riboraptor.plotting.
setup_axis
(ax, axis=u'x', majorticks=5, minorticks=1, xrotation=45, yrotation=0)[source]¶Setup axes defaults
Parameters: |
|
---|
riboraptor.statistics.
KDE
(values)[source]¶Perform Univariate Kernel Density Estimation.
Wrapper utility around statsmodels for quick KDE TODO: scikit-learn has a faster implementation (?)
Parameters: |
|
---|---|
Returns: |
|
riboraptor.statistics.
KS_test
(a, b)[source]¶Perform KS test between a and b values
Parameters: |
|
---|---|
Returns: |
|
riboraptor.wig.
WigReader
(wig_location)[source]¶Bases: object
Class for reading and querying wigfiles.
get_chromosomes
¶Return list of chromsome and their sizes as in the wig file.
Returns: |
|
---|
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Report bugs at https://github.com/saketkc/riboraptor/issues.
If you are reporting a bug, please include:
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
riboraptor could always use more documentation, whether as part of the official riboraptor docs, in docstrings, or even on the web in blog posts, articles, and such.
The best way to send feedback is to file an issue at https://github.com/saketkc/riboraptor/issues.
If you are proposing a feature:
Ready to contribute? Here’s how to set up riboraptor for local development.
Fork the riboraptor repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/riboraptor.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv riboraptor
$ cd riboraptor/
$ python setup.py develop
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:
$ flake8 riboraptor tests
$ python setup.py test or py.test
$ tox
To get flake8 and tox, just pip install them into your virtualenv.
Commit your changes and push your branch to GitHub:
$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Before you submit a pull request, check that it meets these guidelines:
None yet. Why not be the first?