riboraptor package

Submodules

riboraptor.cli module

riboraptor.coherence module

riboraptor.coherence.get_periodicity(values, input_is_stream=False)[source]

Calculate periodicty wrt 1-0-0 signal.

Parameters:
values : array like

List of values

Returns:
periodicity : float

Periodicity calculated as cross correlation between input and idea 1-0-0 signal

riboraptor.coherence.naive_periodicity(values, identify_peak=False)[source]

Calculate periodicity in a naive manner

Take ratio of frame1 over avg(frame2+frame3) counts. By default the first value is treated as the first frame as well

Parameters:
values : Series

Metagene profile

Returns:
periodicity : float

Periodicity

riboraptor.count module

Utilities for read counting operations.

riboraptor.count.bam_to_bedgraph(bam, strand=u'both', end_type=u'5prime', saveto=None)[source]

Create bigwig from bam.

Parameters:
bam : str

Path to bam file

strand : str, optional

Use reads mapping to ‘+/-/both’ strands

end_type : str

Use only end_type=5prime(5’) or “3prime(3’)”

saveto : str, optional

Path to write bedgraph

Returns:
genome_cov : str

Bedgraph output

riboraptor.count.bedgraph_to_bigwig(bedgraph, sizes, saveto, input_is_stream=False)[source]

Convert bedgraph to bigwig.

Parameters:
bedgraph : str

Path to bedgraph file

sizes : str

Path to genome chromosome sizes file or genome name

saveto : str

Path to write bigwig file

input_is_stream : bool

True if input is sent through stdin

riboraptor.count.collapse_gene_coverage_to_metagene(gene_coverages, target_length, outfile=None)[source]

Collapse gene coverages to specific target length.

Parameters:
gene_coverages : string

Path to gene coverages.tsv

target_lenght : int

Collapse to target length

Returns:
collapsed_gene_coverage : Series like

Collapsed version

riboraptor.count.count_feature_genewise(feature_bed, bam, force_strandedness=False, use_multiprocessing=False)[source]

Count features genewise.

Parameters:
bam : str

Path to bam file

feature_bed : str

Path to features bed file

Returns:
counts : dict

Genewise feature counts

riboraptor.count.count_reads_bed(bam, region_bed_f, saveto)[source]

Count number of reads following in each region.

Parameters:
bam : str

Path to bam file (unique mapping only)

region_bed_f : pybedtools.BedTool or str

Genomic regions to get distance from

prefix : str

Prefix to output pickle files

Returns:
counts_by_region : Series

Series with counts indexed by gene id

region_lengths : Series

Series with gene lengths

counts_normalized_by_length : Series

Series with normalized counts

riboraptor.count.count_reads_in_features(feature_bed, bam, force_strandedness=False, use_multiprocessing=False)[source]

Count reads overlapping features.

Parameters:
feature_bed : str

Path to features bed file

bam : str

Path to bam file

force_strandedness : bool

Should count feature only if on the same strand

use_multiprocessing : bool

True if multiprocessing mode

Returns
——-
counts : int

Number of intersection between bam and bed

riboraptor.count.count_reads_per_gene(bw, bed, prefix=None, n_cores=16, collapse_intervals=True)[source]

Count number of reads following in each region.

Parameters:
bw : str

Path to bigWig file

bed : pybedtools.BedTool or str

Genomic regions to get distance from

prefix : str

Prefix to output pickle files

n_cores : int

Use multiple cores (Default: 16). Set to 1 to disable multiprocessing

collapse_intervals : bool

Should the intervals be collapsed based on the ‘name’ column in gene This should be set to False for things like tRNA where the tRNA can span multiple chromosomes

Returns:
counts_by_region : Series

Series with counts indexed by gene id

region_lengths : Series

Series with gene lengths

counts_normalized_by_length : Series

Series with normalized counts

riboraptor.count.count_utr5_utr3_cds(bam, utr5_bed=None, cds_bed=None, utr3_bed=None, genome=None, force_strandedness=False, genewise=False, saveto=None, use_multiprocessing=False)[source]

One shot counts over UTR5/UTR3/CDS.

Parameters:
bam : str

Path to bam file

utr5_bed : str

Path to 5’UTR feature bed file

utr3_bed : str

Path to 3’UTR feature bed file

cds_bed : str

Path to CDS feature bed file

saveto : str, optional

Path to output file

use_multiprocessing : bool

SHould use multiprocessing? Not been well tested if it really helps

Returns:
counts : dict

Dict with keys as feature type and counts as values

riboraptor.count.diff_region_enrichment(numerator, denominator, prefix)[source]

Calculate enrichment of counts of one region over another.

Parameters:
numerator : str

Path to pickle file

denominator : str

Path to pickle file

prefix : str

Prefix to save pickles to

Returns:
enrichment : series
riboraptor.count.export_gene_coverages(bigwig, region_bed_f, saveto, offset_5p=60, offset_3p=0, ignore_tx_version=True)[source]

Export all gene coverages.

Parameters:
bigwig : str

Path to bigwig file

region_bed_f : str

Path to region bed file (CDS/3’UTR/5’UTR) with bed name column as gene or a genome name (hg38_utr5, hg38_cds, hg38_utr3)

saveto : str

Path to write output tsv file

offset_5p : int

number of bases to count upstream (5’)

offset_30 : int

number of bases to count downstream (3’)

ignore_tx_version : bool

Should versions be ignored for gene names

Returns:
gene_profiles: file

with the following format: gene1 5poffset1 3poffset1 length1 mean1 median1 stdev1 cnt1_1 cnt1_2 cnt1_3 …

gene2 5poffset2 3poffset2 length2 mean2 median2 stdev2 cnt2_1 cnt2_2 cnt2_3 cnt2_4 …

riboraptor.count.export_metagene_coverage(bigwig, region_bed_f, max_positions=None, saveto=None, offset_5p=60, offset_3p=0, ignore_tx_version=True)[source]

Calculate metagene coverage.

Parameters:
bigwig : str

Path to bigwig file

region_bed_f : str

Path to region bed file (CDS/3’UTR/5’UTR) or a genome name (hg38_utr5, hg38_cds, hg38_utr3)

max_positions: int

Number of positions to consider while calculating the normalized coverage Higher values lead to slower implementation

saveto : str

Path to write output tsv file

offset_5p : int

Number of bases to offset upstream(5’)

offset_3p : int

Number of bases to offset downstream(3’)

ignore_tx_version : bool

Should versions be ignored for gene names

Returns:
metagene_profile : series

Metagene profile

riboraptor.count.extract_uniq_mapping_reads(inbam, outbam)[source]

Extract only uniquely mapping reads from a bam.

Parameters:
inbam : string

Path to input bam file

outbam : string

Path to write unique reads bam to

riboraptor.count.gene_coverage(gene_name, bed, bw, gene_group=None, offset_5p=0, offset_3p=0, collapse_intervals=True)[source]

Get gene coverage.

Parameters:
gene_name : str

Gene name

bed : str

Path to CDS or 5’UTR or 3’UTR bed

bw : str

Path to bigwig to fetch the scores from

offset_5p : int (positive)

Number of bases to count upstream (5’)

offset_3p : int (positive)

Number of bases to count downstream (3’)

collapse_intervals : bool

Should bed be collapsed based on gene name

Returns:
coverage_combined : series

Series with index as position and value as coverage

intervals_for_fasta_read : list

List of tuples

index_to_genomic_pos_map : series
gene_offset : int

Gene wise offsets

riboraptor.count.gene_coverage_sum(gene_name, bed, bw, collapse_intervals=True)[source]

Keep track of only the sum

Parameters:
gene_name : str

Name of gene

bed : str

Path to bed file

bw : str

Path to bigwig file

collapse_intervals : bool

Should the intervals be collapsed based on the ‘name’ column in gene This should be set to False for things like tRNA where the tRNA can span multiple chromosomes

riboraptor.count.get_fasta_sequence(fasta, intervals)[source]

Extract fasta sequence given a list of intervals.

Parameters:
fasta : str

Path to fasta file

intervals : list(tuple)

A list of tuple in the form [(chrom, start, stop, strand)]

Returns:
seq : list

List of sequences at intervals

riboraptor.count.get_region_sizes(bed)[source]

Get collapsed lengths of gene in bed.

Parameters:
bed : str

Path to bed file

Returns:
region_sizes : dict

Region sies with gene names as key and value as size of this named region

riboraptor.count.htseq_to_cpm(htseq_f, saveto=None)[source]

Convert HTSeq counts to CPM.

Parameters:
htseq_f : str

Path to HTseq counts file

saveto : str, optional

Path to output file

Returns:
cpm : dataframe

CPM

riboraptor.count.htseq_to_tpm(htseq_f, cds_bed_f, saveto=None)[source]

Convert HTSeq counts to TPM.

Parameters:
htseq_f : str

Path to HTseq counts file

region_sizes : dict

Dict with keys as gene and values as length (CDS/Exon) of that gene

saveto : str, optional

Path to output file

Returns:
tpm : dataframe

TPM

riboraptor.count.interval_coverage(bw, intervals)[source]

Get coverage at custom intervals

Parameters:
bw : str

Path to bigwig file

intervals : list of tuples

[(chrom, start, stop, strand)]

Returns:
coverage : list of series

Coverage for each interval, so that it is sorted oritentation wise

riboraptor.count.mapping_reads_summary(bam, prefix)[source]

Count number of mapped reads.

Parameters:
bam : str

Path to bam file

prefix : str

Prefix to save pickle to (optional)

Returns:
counts : counter

Counter with keys as number of times read maps and values as number of reads of that type

riboraptor.count.pickle_bed_file(bed, collapse_intervals=True)[source]

Create a lookup pickle file for genewise CDS/UTR coordinates.

In order to prevent recalculating the coordinates that should be fetched for each genes’ CDS or UTR regions, they can be stored in a pickle file.

Parameters:
bed : string

Path to bed file

collapse_intervals : bool

Should the intervals be collapsed based on the ‘name’ column in gene This should be set to False for things like tRNA where the tRNA can span multiple chromosomes

riboraptor.count.read_enrichment(read_lengths, enrichment_range=[28, 29, 30, 31, 32], input_is_stream=False, input_is_file=False)[source]

Calculate read enrichment for a certain range of lengths

Parameters:
read_lengths : Counter

A counter with read lengths and their counts

enrichment_range : range or str

Range of reads to concentrate upon (28-32 or range(28,33))

input_is_stream : bool

True if input is sent through stdin

Returns:
ratio : float

Enrichment in this range (Scae 0-1)

riboraptor.count.read_htseq(htseq_f)[source]

Read HTSeq file.

Parameters:
htseq_f : str

Path to htseq counts file

Returns:
htseq_df : dataframe

HTseq counts as in a dataframe

riboraptor.count.read_length_distribution(bam, saveto)[source]

Count read lengths.

Parameters:
bam : str

Path to bam file

saveto: str

Path to write output tsv file

Returns
——-
lengths : counter

Counter of read length and counts

riboraptor.count.unique_mapping_reads_count(bam)[source]

Count number of mapped reads.

Parameters:
bam : str

Path to bam file

Returns:
n_mapped : int

Count of mapped reads

riboraptor.download module

Utilities to download data from NCBI SRA

riboraptor.download.run_download_sra_script(download_root_location=None, ascp_key_path=None, srp_id_file=None, srp_id_list=None)[source]

Download data from SRA.

Parameters:
download_root_location : string

Path to download SRA files

ascp_key_path : string

Location for aspera private keypp

srp_id_list : list

List of SRP ids for download

srp_id_file : string

File containing list of SRP Ids, one per line

riboraptor.dtw module

riboraptor.dtw.dtw(X, Y, metric=u'euclidean', ddtw=False, ddtw_order=1)[source]
Parameters:
X : array_like

M x D matrix

Y : array_like

N x D matrix

metric : string

The distance metric to use. Can be : ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’. See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

ddtw : bool

Should use derivative DTW where the distance matrix is created using the derivate values at each point rather than the point themselves

ddtw_order : int [1,2]

First order uses one difference method Second order uses np.gradient for an approximation upto second order

Returns
——-
total_cost : float

Total (minimum) cost of warping

pointwise_cost : array_like

M x N matrix with cost at each (i, j)

accumulated_cost : array_like

M x N matrix with (minimum) cost accumulated till (i,j) having started from (0, 0)

riboraptor.dtw.get_path(D)[source]

Traceback path of minimum cost

Given accumulated cost matrix D, trace back the minimum cost path

Parameters:
D : array_like

M x N matrix as obtained from accumulated_cost using: total_cost, pointwise_cost, accumulated_cost = dtw(X, Y, metric=’euclidean’)

Returns:
traceback_x, traceback_x : array_like

M x 1 and N x 1 array containing indices of movement starting from (0, 0) going to (M-1, N-1)

riboraptor.dtw.plot_warped_timeseries(x, y, pointwise_cost, accumulated_cost, path, colormap=<Mock name='mock.pyplot.cm.Blues' id='140086391779792'>, linecolor=u'#D55E00')[source]

riboraptor.fasta module

riboraptor.fasta.complete_gene_fasta(utr5_bed_f, cds_bed_f, utr3_bed_f, fasta_f, prefix)[source]

Merge Utr5, CDS, UTR3 coordinates to get one fasta.

Parameters:
utr5_bed : str

Path to 5’UTR bed

cds_bed : str

Path to CDS bed

utr3_bed : str

Path to 3’UTR bed

riboraptor.fasta.export_all_fasta(region_bed_f, chrom_sizes, fasta, prefix, offset_5p=60, offset_3p=0, ignore_tx_version=True)[source]

Export all gene coverages.

Parameters:
region_bed_f : str

Path to region bed file (CDS/3’UTR/5’UTR) with bed name column as gene

chrom_sizes : str

Path to chrom.sizes file

prefix : str

Prefix to write output file

offset_5p : int

number of bases to count upstream (5’)

offset_30 : int

number of bases to count downstream (3’)

ignore_tx_version : bool

Should versions be ignored for gene names

riboraptor.fasta.export_fasta_from_bed(gene_name, bed, chrom_sizes, fasta_f, gene_group=None, offset_5p=0, offset_3p=0)[source]

Extract fasta genewise given coordinates in bed file

Parameters:
gene_name : str

Gene name

bed : str

Path to CDS or 5’UTR or 3’UTR bed

fasta_f : str

Path to fasta file

chrom_sizes : str

Path to chrom.sizes file

offset_5p : int (positive)

Number of bases to count upstream (5’)

offset_3p : int (positive)

Number of bases to count downstream (3’)

Returns:
gene_offset : int

Gene wise offsets

riboraptor.fasta.get_fasta_sequence(fasta_f, intervals)[source]

Extract fasta sequence given a list of intervals.

Parameters:
fasta_f : str

Path to fasta file

intervals : list(tuple)

A list of tuple in the form [(chrom, start, stop, strand)] NOTE: 1-based start and stop only!

Returns:
seq : list

List of sequences at intervals

riboraptor.genome module

riboraptor.helpers module

All functions that are not so useful, but still useful.

riboraptor.helpers.check_file_exists(filepath)[source]

Check if file exists.

Parameters:
filepath : str

Path to file

riboraptor.helpers.codon_to_anticodon(codon)[source]

Codon to anticodon.

Parameters:
codon : string

Input codon

riboraptor.helpers.collapse_bed_intervals(intervals, chromosome_lengths=None, offset_5p=0, offset_3p=0)[source]

Collapse intervals into non overlapping manner

# NOTE # TODO : This function has a subtle bug that it will be offset by 1 # position when the gene is on negative strand # So essentially if you have CDS on a negative strand # The first position should be discarded # Similary for the last position in the gene on + strand # you have an extra position in the end

Parameters:
intervals : list of tuples

Like [(‘chr1’, 310, 320, ‘+’), (‘chr1’, 321, 330, ‘+’)]

chromosome_lengths : dict

A map of each chromosome’e length Only used with offset_3p, offset_5p>0

offset_5p : int (positive)

Number of bases to count upstream (5’)

offset_3p : int (positive)

Number of bases to count downstream (3’)

Returns:
interval_combined : list of tuples

A collapsed version of interval This is useful when the annotations are overlapping. Example: chr1 310 320 gene1 + chr1 319 324 gene1 + Returns: chr1 310 324 gene1 +

intervals_for_fasta_read : list of tuples

This list can be used to directly fetch fasta from pyfaidx. NOTE: DO NOT do offset adjustments as they are already adjusted for pyfaidx format (1-end both start and end)

gene_offset_5p, gene_offset_3 : in

Gene wise offsets. This might be different from offset_5p in cases where offset_5p leads to a negative coordinate

riboraptor.helpers.create_ideal_periodic_signal(signal_length)[source]

Create ideal ribo-seq signal.

Parameters:
signal_length : int

Length of signal to create

Returns:
signal : array_like

1-0-0 signal

riboraptor.helpers.get_strandedness(filepath)[source]

Parse output of infer_experiment.py from RSeqC to get strandedness.

Parameters:
filepath : str

Path to infer_experiment.py output

Returns:
strandedness : str

reverse or forward or none

riboraptor.helpers.identify_peaks(coverage)[source]

Given coverage array, find the site of maximum density

riboraptor.helpers.list_to_ranges(list_of_int)[source]

Convert a list to a list of range object

Parameters:
list_of_int: list

List of integers to be squeezed into range

Returns:
list_of_range: list

List of range objects

riboraptor.helpers.load_pickle(filepath)[source]

Read pickled files easy in Python 2/3

riboraptor.helpers.millify(n)[source]

Convert integer to human readable format.

Parameters:
n : int
Returns:
millidx : str

Formatted integer

riboraptor.helpers.mkdir_p(path)[source]

Python version mkdir -p

Parameters:
path : str
riboraptor.helpers.pad_five_prime_or_truncate(some_list, offset_5p, target_len)[source]

Pad first the 5prime end and then the 3prime end or truncate

Parameters:
some_list : list

Input list

offset_5p : int

5’ offset

target_length : int

Final length of list

If being extended, returns list padded with NAs.
riboraptor.helpers.pad_or_truncate(some_list, target_len)[source]

Pad or truncate a list upto given target length

Parameters:
some_list : list

Input list

target_length : int

Final length of list

If being extended, returns list padded with NAs.
riboraptor.helpers.parse_star_logs(infile, outfile=None)[source]

Parse star logs into a dict

Parameters:
infile : str

Path to starlogs.final.out file

Returns:
star_info : dict

Dict with necessary records parsed

riboraptor.helpers.path_leaf(path)[source]

Get path’s tail from a filepath

riboraptor.helpers.r2(x, y)[source]

Calculate pearson correlation between two vectors.

Parameters:
x : array_like

Input

y : array_like

Input

riboraptor.helpers.round_to_nearest(x, base=5)[source]

Round to nearest base.

Parameters:
x : float

Input

Returns:
v : int

Output

riboraptor.helpers.set_xrotation(ax, degrees)[source]

Rotate labels on x-axis.

Parameters:
ax : matplotlib.Axes

Axes object

degrees : int

Rotation degrees

riboraptor.helpers.summarize_counters(samplewise_dict)[source]

Summarize gene counts for a collection of samples.

Parameters:
samplewise_dict : dict

A dictionary with key as sample name and value as another dictionary of counts for each gene

Returns:
totals : dict

A dictionary with key as sample name and value as total gene count

riboraptor.helpers.summary_stats_two_arrays_welch(old_mean_array, new_array, old_var_array=None, old_n_counter=None, carried_forward_observations=None)[source]

Average two arrays using welch’s method

Parameters:
old_mean_array : Series

Series of previous means with index as positions

old_var_array : Series

Series of previous variances with index as positions

new_array : array like

Series of new observations (Does noes Ciunts of number of positions at a certain index

Returns:
m : array like

Column wise Mean array

var : array like

Column wise variance

Consider an example: [1,2,3], [1,2,3,4], [1,2,3,4,5]
old = [1,2,3]
new = [1,2,3,4]
counter = [1,1,1]
mean = [1,2,3,4] Var =[na, na, na, na], carried_fowrad = [[1,1], [2,2], [3,3], [4]]
old = [1,2,3,4]
new = [1,2,3,4,5]
couter = [2,2,2,1]
mean = [1,2,3,4,5]
var = [0,0,0, na, na]
carried_forward = [[], [], [], [4,4], [5]]

riboraptor.normalization module

riboraptor.normalization.deseq2_normalization(list_of_profiles)[source]

Perform DESeq2 like normalization position specific scores

Parameters:
list_of_profiles: array-like

array of profiles across samples for one gene

Returns:
normalized_profiles: array-like

array of profiles across samples

riboraptor.plotting module

Plotting methods.

riboraptor.plotting.create_wavelet(data, ax)[source]
riboraptor.plotting.plot_featurewise_barplot(utr5_counts, cds_counts, utr3_counts, ax=None, saveto=None, **kwargs)[source]

Plot barplots for 5’UTR/CDS/3’UTR counts.

Parameters:
utr5_counts : int or dict

Total number of reads in 5’UTR region or alternatively a dictionary/series with genes as key and 5’UTR counts as values

cds_counts : int or dict

Total number of reads in CDs region or alternatively a dictionary/series with genes as key and CDS counts as values

utr3_counts : int or dict

Total number of reads in 3’UTR region or alternatively a dictionary/series with genes as key and 3’UTR counts as values

saveto : str

Path to save output file to (<filename>.png/<filename>.pdf)

riboraptor.plotting.plot_framewise_counts(counts, frames_to_plot=u'all', ax=None, title=None, millify_labels=False, position_range=None, saveto=None, ascii=False, input_is_stream=False, **kwargs)[source]

Plot framewise distribution of reads.

Parameters:
counts : Series

A series with position as index and value as counts

frames_to_plot : str or range

A comma seaprated list of frames to highlight or a range

ax : matplotlib.Axes

Default none

saveto : str

Path to save output file to (<filename>.png/<filename>.pdf)

riboraptor.plotting.plot_read_counts(counts, ax=None, marker=None, color=u'royalblue', title=None, label=None, millify_labels=False, identify_peak=True, saveto=None, position_range=None, ascii=False, input_is_stream=False, ylabel=u'Normalized RPF density', **kwargs)[source]

Plot RPF density aro und start/stop codons.

Parameters:
counts : Series/Counter

A series with coordinates as index and counts as values

ax : matplotlib.Axes

Axis to create object on

marker : string

‘o’/’x’

color : string

Line color

label : string

Label (useful only if plotting multiple objects on same axes)

millify_labels : bool

True if labels should be formatted to read millions/trillions etc

saveto : str

Path to save output file to (<filename>.png/<filename>.pdf)

riboraptor.plotting.plot_read_length_dist(read_lengths, ax=None, millify_labels=True, input_is_stream=False, title=None, saveto=None, ascii=False, **kwargs)[source]

Plot read length distribution.

Parameters:
read_lengths : array_like

Array of read lengths

ax : matplotlib.Axes

Axis object

millify_labels : bool

True if labels should be formatted to read millions/trillions etc

input_is_stream : bool

True if input is sent through stdin

saveto : str

Path to save output file to (<filename>.png/<filename>.pdf)

riboraptor.plotting.setup_axis(ax, axis=u'x', majorticks=5, minorticks=1, xrotation=45, yrotation=0)[source]

Setup axes defaults

Parameters:
ax : matplotlib.Axes
axis : str

Setup ‘x’ or ‘y’ axis

majorticks : int

Length of interval between two major ticks

minorticks : int

Length of interval between two major ticks

xrotation : int

Rotate x axis labels by xrotation degrees

yrotation : int

Rotate x axis labels by xrotation degrees

riboraptor.plotting.setup_plot()[source]

Setup plotting defaults

riboraptor.statistics module

riboraptor.statistics.KDE(values)[source]

Perform Univariate Kernel Density Estimation.

Wrapper utility around statsmodels for quick KDE TODO: scikit-learn has a faster implementation (?)

Parameters:
values : array like
Returns:
support : array_like
cdf : array_like
riboraptor.statistics.KS_test(a, b)[source]

Perform KS test between a and b values

Parameters:
a, b : array-like

Input

Returns:
D : int

KS D statistic

effect_size : float

maximum difference at point of D-statistic

cdf_a, cdf_b : float

CDF of a, b

Note: By default this method does testing for alternative=lesser implying
that the test will reject H0 when the CDf of b is ‘above’ a
riboraptor.statistics.calculate_cdf(data)[source]

Calculate CDF given data points

Parameters:
data : array-like

Input values

Returns:
cdf : series

Cumulative distribution funvtion calculated at indexed points

riboraptor.statistics.series_cdf(series)[source]

Calculate cdf of series preserving the index

Parameters:
series : series like
Returns:
cdf : series

riboraptor.utils module

riboraptor.utils.determine_cell_type(sample_attribute)[source]
riboraptor.utils.get_cell_line_or_tissue(row)[source]
riboraptor.utils.get_enrichment_cds_stats(pickle_file)[source]
riboraptor.utils.get_fragment_enrichment_score(txt_file)[source]
riboraptor.utils.get_strain_type(sample_attribute)[source]
riboraptor.utils.get_tissue_type(sample_attribute)[source]
riboraptor.utils.load_tpm(path)[source]
riboraptor.utils.summary_starlogs_over_runs(directory, list_of_srr)[source]

riboraptor.wig module

class riboraptor.wig.WigReader(wig_location)[source]

Bases: object

Class for reading and querying wigfiles.

get_chromosomes

Return list of chromsome and their sizes as in the wig file.

Returns:
chroms : dict

Dictionary with {“chr”: “Length”} format

.. currentmodule:: .WigReader
.. autosummary::

.WigReader

query(intervals)[source]

Query regions for scores.

Parameters:
intervals : list(tuple)
A list of tuples with the following format:

(chr, chrStart, chrEnd, strand)

Returns:
scores : array_like

A numpy array containing scores for each tuple

.. currentmodule:: .WigReader
.. autosummary::

.WigReader

Module contents