pimkl package¶

Submodules¶

pimkl.analysis module¶

pimkl.analysis.plot_aucs_to_buffer(df, save=False)[source]¶: plot AUC for multiindexed pandas.DataFrame where df.columns.names = [‘data’, ‘kind’]

pimkl.analysis.plot_weights_significant_correlations_to_buffer(weights_df, correlation_type, save=False)[source]¶: plot heatmap showing value of correlation if significant between different molecular signatures where weights_df.index.names is [‘fold’, ‘class’]

pimkl.analysis.plot_weights_to_buffer(weights_df, save=False, plot_correlations=False)[source]¶: plot molecular signature over many folds in pathway wise boxes. For non-binary problems each 1 versus Rest is ploted. weights_df.index.names should be [‘fold’] or [‘fold’, ‘class’])

pimkl.analysis.significant_correlations(rho, sample_length)[source]¶

pimkl.analysis.significant_pathways(df, alpha=0.001)[source]¶: significance test vs the average weight

pimkl.cli module¶

pimkl.data module¶

Split data into training and test.

pimkl.data.get_learning_data(X, labels=None, max_per_class=30)[source]¶: Return splitted test and training data for single data type.

pimkl.data.get_learning_data_in_dict_mode(X, labels=None, data_types=None, max_per_class=30)[source]¶: Return splitted test and training data for multiple data types.

pimkl.data.get_learning_data_in_dict_mode_fraction(X, labels=None, data_types=None, fraction=0.5)[source]¶: Return splitted test and training data for multiple data types.

pimkl.data.get_learning_data_indices_fraction(X, fraction=0.5)[source]¶: Return data in dict mode splitted using a fraction.

pimkl.evaluation module¶

pimkl.evaluation.performances(y_true, y_score)[source]¶

pimkl.evaluation.roc_analysis(y_test, y_score)[source]¶

pimkl.evaluation.roc_multiclass(y_test, y_score, n_classes=None)[source]¶

pimkl.evaluation.roc_two_classes(y_test, y_score)[source]¶

pimkl.evaluation.sensitivity(tp, fn)[source]¶

pimkl.evaluation.specificity(tn, fp)[source]¶

pimkl.inducers module¶

pimkl.inducers.get_matching_data_and_network(data, network)[source]¶: Interesct data labels with network node labels.

pimkl.inducers.get_pathway_inducer(network, gene_set, normed=True)[source]¶: Get a laplacian based pathway inducer.

pimkl.inducers.get_pathway_selector(network, gene_set)[source]¶: Get a pathway selector.

pimkl.inducers.read_gmt(gmt_file)[source]¶: Read a .gmt file.

pimkl.inducers.read_gmt_from_file_pointer(fp)[source]¶

pimkl.inducers.read_inducer(filename, size, header=None, sep=',')[source]¶: Read inducer in CSC format.

pimkl.inducers.write_inducer(inducer, filename, sep=',')[source]¶: Write and inducer in COO format.

pimkl.inducers.write_inducers(data, network, gene_sets, data_type, network_type, output_dir, selection_only=False, gene_set_type='')[source]¶: Write inducers and data for a specific data-network combination.

pimkl.inducers.write_preprocessed(data, data_name, network, network_name, gene_sets, gene_sets_name, output_dir)[source]¶: Write inducers and data for a specific data-network combination.

pimkl.network module¶

class pimkl.network.Network(graph, labels)[source]¶

Bases: object

get_laplacian(normed=True, return_diag=False)[source]¶

get_sub_network(labels)[source]¶

pimkl.network.filter_interaction_table_by_labels(interaction_table, labels)[source]¶

pimkl.network.force_undirected_coo_matrix_input(row, col, values)[source]¶

pimkl.network.generate_random_sets(number_of_sets, max_nodes, nodes_labels, number_of_nodes=None)[source]¶

pimkl.network.get_fantom5_network(fantom5_filename, **kwargs)[source]¶

pimkl.network.get_network_from_csv(filename, sep=',', **kwargs)[source]¶

pimkl.network.get_network_from_pandas_interactions_list(data, adjacency=False, threshold=None, force_undirected=True)[source]¶

pimkl.network.get_random_scale_free_interaction_df(nodes_labels, m=5)[source]¶

pimkl.network.get_string_network(string_filename, **kwargs)[source]¶

pimkl.network.get_unique_rows(matrix, return_index=True)[source]¶

pimkl.network.is_symmetric(m)[source]¶

pimkl.network.scale(array)[source]¶

pimkl.network.selected_set_to_weighted_adjacency(interaction_table, selected_set, all_nodes_labels)[source]¶

pimkl.pimkl module¶

Main module.

pimkl.run module¶

pimkl.run.fold_generator(number_of_folds, data, labels, max_per_class, transformer_class=<class 'pimkl.utils.preprocessing.standardizer.Standardizer'>)[source]¶: generate class balanced splits of data and labels

pimkl.run.run_model(inducers, induction_name, mkl_name, estimator_name, mkl_parameters, estimator_parameters, induction_parameters, inducers_extended_names, fold_parameters)[source]¶

Run a single fold of the model with data splits from fold_generator.

Arguments are those to PIMKL and then the inducer_names and a dict containing the fold specific arguments. In junction with partial and the fold_generator it can be used for running folds in parallel: `list(pool.imap(run_fold, fold_generator(...)))`

Module contents¶

Top-level package for pimkl.