Data Science

Data analysis and machine learning.

372 resources29 categoriesView Original

Data Manipulation(31 items)

A

Arctic

High-performance datastore for time series and tick data.

Data Manipulation
B

blaze

NumPy and pandas interface to Big Data.

Data Manipulation
C

cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Data Manipulation
C

cuDF

GPU DataFrame Library.

Data Manipulation
D

dataprep

Collect, clean, and visualize your data in Python with a few lines of code.

Data Manipulation
D

Dataset

Helps you conveniently work with random or sequential batches of your data and define data processing.

Data Manipulation
D

datatable

Data.table for Python.

Data Manipulation
D

dopanda

Hints and tips for using pandas in an analysis environment.

Data Manipulation
D

Dplython

Dplyr for Python.

Data Manipulation
H

Hamilton

A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.

Data Manipulation
M

meza

A Python toolkit for processing tabular data.

Data Manipulation
M

modin

Speed up your pandas workflows by changing a single line of code.

Data Manipulation
P

pandas

Powerful Python data analysis toolkit.

Data Manipulation
P

pandas-gbq

pandas Google Big Query.

Data Manipulation
P

pandas-log

A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.

Data Manipulation
P

pandas-ply

Functional data manipulation for pandas.

Data Manipulation
P

pandasql

Allows you to query pandas DataFrames using SQL syntax.

Data Manipulation
P

pandas_profiling

Create HTML profiling reports from pandas DataFrame objects

Data Manipulation
P

pdpipe

Sasy pipelines for pandas DataFrames.

Data Manipulation
P

polars

A fast multi-threaded, hybrid-out-of-core DataFrame library.

Data Manipulation
P

Prodmodel

Build system for data science pipelines.

Data Manipulation
P

pyjanitor

Clean APIs for data cleaning.

Data Manipulation
P

pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Data Manipulation
S

sklearn-pandas

pandas integration with sklearn.

Data Manipulation
S

snorkel

A system for quickly generating training data with weak supervision.

Data Manipulation
S

SSPipe

Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.

Data Manipulation
S

swifter

A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.

Data Manipulation
V

vaex

Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.

Data Manipulation
X

xarray

Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.

Data Manipulation
X

xpandas

Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.

Data Manipulation
Y

ydata-synthetic

A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models.

Data Manipulation

Deep Learning(31 items)

A

autograd

Efficiently computes derivatives of numpy code.

Deep Learning
C

Caffe

A fast open framework for deep learning.

Deep Learning
C

Catalyst

High-level utils for PyTorch DL & RL research.

Deep Learning
C

ChemicalX

A PyTorch-based deep learning library for drug pair scoring.

Deep Learning
E

Elephas

Distributed Deep learning with Keras & Spark.

Deep Learning
F

FLAX

A neural network library for JAX that is designed for flexibility.

Deep Learning
H

Hyperas

Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter.

Deep Learning
I

ignite

High-level library to help with training neural networks in PyTorch.

Deep Learning
J

JAX

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.

Deep Learning
K

Keras

A high-level neural networks API running on top of TensorFlow.

Deep Learning
K

keras-contrib

Keras community contributions.

Deep Learning
L

Ludwig

A toolbox that allows one to train and test deep learning models without the need to write code.

Deep Learning
M

Mesh TensorFlow

Model Parallelism Made Easier.

Deep Learning
N

nnabla

Neural Network Libraries by Sony.

Deep Learning
O

Optax

A gradient processing and optimization library for JAX.

Deep Learning
P

Polyaxon

A platform that helps you build, manage and monitor deep learning models.

Deep Learning
P

PyTorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration.

Deep Learning
P

pytorch-lightning

PyTorch Lightning is just organized PyTorch.

Deep Learning
Q

qkeras

A quantization deep learning library.

Deep Learning
S

skorch

A scikit-learn compatible neural network library that wraps PyTorch.

Deep Learning
S

Sonnet

TensorFlow-based neural network library.

Deep Learning
T

Tangent

Source-to-Source Debuggable Derivatives in Pure Python.

Deep Learning
T

TensorFlow

Computation using data flow graphs for scalable machine learning by Google.

Deep Learning
T

TensorFlow Fold

Deep learning with dynamic computation graphs in TensorFlow.

Deep Learning
T

tensorflow-upstream

TensorFlow ROCm port.

Deep Learning
T

TensorLayer

Deep Learning and Reinforcement Learning Library for Researcher and Engineer.

Deep Learning
T

TensorLight

A high-level framework for TensorFlow.

Deep Learning
T

tensorpack

A Neural Net Training Interface on TensorFlow.

Deep Learning
T

tfdeploy

Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy.

Deep Learning
T

TFLearn

Deep learning library featuring a higher-level API for TensorFlow.

Deep Learning
T

transformers

State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Deep Learning

Feature Engineering(16 items)

B

BoostARoota

A fast xgboost feature selection algorithm.

Feature Engineering
B

boruta_py

Implementations of the Boruta all-relevant feature selection method.

Feature Engineering
D

dirty_cat

Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression).

Feature Engineering
F

Feature Engine

Feature engineering package with sklearn-like functionality.

Feature Engineering
F

Feature Forge

A set of tools for creating and testing machine learning features.

Feature Engineering
F

Featuretools

Automated feature engineering.

Feature Engineering
F

few

A feature engineering wrapper for sklearn.

Feature Engineering
N

NitroFE

Moving window features.

Feature Engineering
O

OpenFE

Automated feature generation with expert-level performance.

Feature Engineering
S

scikit-feature

Feature selection repository in Python.

Feature Engineering
S

scikit-mdr

A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

Feature Engineering
S

scikit-rebate

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

Feature Engineering
S

sk-transformer

A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps

Feature Engineering
S

skl-groups

A scikit-learn addon to operate on set/"group"-based features.

Feature Engineering
T

tsfresh

Automatic extraction of relevant features from time series.

Feature Engineering
Z

zoofs

A feature selection library based on evolutionary algorithms.

Feature Engineering

Machine Learning(41 items)

C

CatBoost

An open-source gradient boosting on decision trees library.

Machine Learning
C

causalml

Uplift modeling and causal inference with machine learning algorithms.

Machine Learning
C

cuML

RAPIDS Machine Learning Library.

Machine Learning
D

dlib

Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).

Machine Learning
F

fastFM

A library for Factorization Machines.

Machine Learning
H

hyperlearn

50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels.

Machine Learning
I

imbalanced-algorithms

Python-based implementations of algorithms for learning on imbalanced data.

Machine Learning
I

imbalanced-learn

Module to perform under-sampling and over-sampling with various techniques.

Machine Learning
L

LightGBM

A fast, distributed, high-performance gradient boosting.

Machine Learning
L

liquidSVM

An implementation of SVMs.

Machine Learning
M

metric-learn

Metric learning algorithms in Python.

Machine Learning
M

ML-Ensemble

High performance ensemble learning.

Machine Learning
M

mlpack

A scalable C++ machine learning library (Python bindings).

Machine Learning
M

MLxtend

Extension and helper modules for Python's data analysis and machine learning libraries.

Machine Learning
M

modAL

Modular active learning framework for Python3.

Machine Learning
N

NGBoost

Natural Gradient Boosting for Probabilistic Prediction.

Machine Learning
P

PyCaret

An open-source, low-code machine learning library in Python.

Machine Learning
P

pyFM

Factorization machines in python.

Machine Learning
P

pyGAM

Generalized Additive Models in Python.

Machine Learning
P

pystruct

Simple structured learning framework for Python.

Machine Learning
R

Reproducible Experiment Platform (REP)

Machine Learning toolbox for Humans.

Machine Learning
R

rgf_python

Python Wrapper of Regularized Greedy Forest.

Machine Learning
R

rpforest

A forest of random projection trees.

Machine Learning
R

RuleFit

Implementation of the rulefit.

Machine Learning
S

scikit-learn

Machine learning in Python.

Machine Learning
S

scikit-multilearn

Multi-label classification for python.

Machine Learning
S

scikit-rvm

Relevance Vector Machine implementation using the scikit-learn API.

Machine Learning
S

seqlearn

Sequence classification toolkit for Python.

Machine Learning
S

Shogun

Machine learning toolbox.

Machine Learning
S

sklearn-expertsys

Highly interpretable classifiers for scikit learn.

Machine Learning
S

sklearn-random-bits-forest

Wrapper of the Random Bits Forest program written by (Wang et al., 2016).

Machine Learning
S

Sparkit-learn

PySpark + scikit-learn = Sparkit-learn.

Machine Learning
S

stacked_generalization

Library for machine learning stacking generalization.

Machine Learning
S

Stacking

Simple and useful stacking library written in Python.

Machine Learning
T

TensorFlow Decision Forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.

Machine Learning
T

tffm

TensorFlow implementation of an arbitrary order Factorization Machine.

Machine Learning
T

ThunderGBM

Fast GBDTs and Random Forests on GPUs.

Machine Learning
T

ThunderSVM

A fast SVM Library on GPUs and CPUs.

Machine Learning
V

vecstack

Python package for stacking (machine learning technique).

Machine Learning
X

XGBoost

Scalable, Portable, and Distributed Gradient Boosting.

Machine Learning
X

xLearn

High Performance, Easy-to-use, and Scalable Machine Learning Package.

Machine Learning

Model Explanation(26 items)

A

aequitas

Bias and Fairness Audit Toolkit.

Model Explanation
A

AI Explainability 360

Interpretability and explainability of data and machine learning models.

Model Explanation
A

Alibi

Algorithms for monitoring and explaining machine learning models.

Model Explanation
A

anchor

Code for "High-Precision Model-Agnostic Explanations" paper.

Model Explanation
A

Auralisation

Auralisation of learned features in CNN (for audio).

Model Explanation
C

CapsNet-Visualization

A visualization of the CapsNet layers to better understand how it works.

Model Explanation
C

Contrastive Explanation

Contrastive Explanation (Foil Trees).

Model Explanation
D

dalex

moDel Agnostic Language for Exploration and explanation.

Model Explanation
E

ELI5

A library for debugging/inspecting machine learning classifiers and explaining their predictions.

Model Explanation
F

FairML

FairML is a python toolbox auditing the machine learning models for bias.

Model Explanation
F

FlashLight

Visualization Tool for your NeuralNetwork.

Model Explanation
L

L2X

Code for replicating the experiments in the paper *Learning to Explain: An Information-Theoretic Perspective on Model Interpretation*.

Model Explanation
L

Lime

Explaining the predictions of any machine learning classifier.

Model Explanation
L

lucid

A collection of infrastructure and tools for research in neural network interpretability.

Model Explanation
M

model-analysis

Model analysis tools for TensorFlow.

Model Explanation
N

Netron

Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).

Model Explanation
P

PDPbox

Partial dependence plot toolbox.

Model Explanation
P

PyCEbox

Python Individual Conditional Expectation Plot Toolbox.

Model Explanation
S

scikit-plot

An intuitive library to add plotting functionality to scikit-learn objects.

Model Explanation
S

shap

A unified approach to explain the output of any machine learning model.

Model Explanation
S

Shapley

A data-driven framework to quantify the value of classifiers in a machine learning ensemble.

Model Explanation
S

Skater

Python Library for Model Interpretation.

Model Explanation
T

tensorboard-pytorch

Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).

Model Explanation
T

themis-ml

A library that implements fairness-aware machine learning algorithms.

Model Explanation
T

treeinterpreter

Interpreting scikit-learn's decision tree and random forest predictions.

Model Explanation
Y

yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Model Explanation

Optimization(24 items)

B

Bayesian Optimization

A Python implementation of global optimization with gaussian processes.

Optimization
B

BoTorch

Bayesian optimization in PyTorch.

Optimization
G

GPflowOpt

Bayesian Optimization using GPflow.

Optimization
H

hyperopt

Distributed Asynchronous Hyperparameter Optimization in Python.

Optimization
H

hyperopt-sklearn

Hyper-parameter optimization for sklearn.

Optimization
N

nlopt

Library for nonlinear optimization (global and local, constrained or unconstrained).

Optimization
O

Optuna

A hyperparameter optimization framework.

Optimization
O

Optunity

Is a library containing various optimizers for hyperparameter tuning.

Optimization
O

OR-Tools

An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.

Optimization
P

Platypus

A Free and Open Source Python Library for Multiobjective Optimization.

Optimization
P

POT

Python Optimal Transport library.

Optimization
P

pycma

Python implementation of CMA-ES.

Optimization
P

pymoo

Multi-objective Optimization in Python.

Optimization
P

PySwarms

A research toolkit for particle swarm optimization in Python.

Optimization
S

SafeOpt

Safe Bayesian Optimization.

Optimization
S

scikit-opt

Heuristic Algorithms for optimization.

Optimization
S

scikit-optimize

Sequential model-based optimization with a `scipy.optimize` interface.

Optimization
S

sigopt_sklearn

SigOpt wrappers for scikit-learn methods.

Optimization
S

sklearn-deap

Use evolutionary algorithms instead of gridsearch in scikit-learn.

Optimization
S

sklearn-genetic-opt

Hyperparameters tuning and feature selection using evolutionary algorithms.

Optimization
S

SMAC3

Sequential Model-based Algorithm Configuration.

Optimization
S

Solid

A comprehensive gradient-free optimization framework written in Python.

Optimization
S

Spearmint

Bayesian optimization.

Optimization
T

Talos

Hyperparameter Optimization for Keras Models.

Optimization

Reinforcement Learning(24 items)

A

Acme

A library of reinforcement learning components and agents.

Reinforcement Learning
C

Catalyst-RL

PyTorch framework for RL research.

Reinforcement Learning
C

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG).

Reinforcement Learning
D

d3rlpy

An offline deep reinforcement learning library.

Reinforcement Learning
D

DI-engine

OpenDILab Decision AI Engine.

Reinforcement Learning
D

Dopamine

A research framework for fast prototyping of reinforcement learning algorithms.

Reinforcement Learning
E

EnvPool

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

Reinforcement Learning
G

garage

A toolkit for reproducible reinforcement learning research.

Reinforcement Learning
G

Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym).

Reinforcement Learning
H

Horizon

A platform for Applied Reinforcement Learning.

Reinforcement Learning
I

Imitation

Clean PyTorch implementations of imitation and reward learning algorithms.

Reinforcement Learning
K

keras-rl

Deep Reinforcement Learning for Keras.

Reinforcement Learning
M

Machin

A reinforcement library designed for pytorch.

Reinforcement Learning
M

MAgent2

An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments.

Reinforcement Learning
P

PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities.

Reinforcement Learning
R

RLlib

Scalable Reinforcement Learning.

Reinforcement Learning
R

rlpyt

Reinforcement Learning in PyTorch.

Reinforcement Learning
S

Shimmy

An API conversion tool for popular external reinforcement learning environments.

Reinforcement Learning
S

SKRL

Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym.

Reinforcement Learning
S

Stable Baselines3

A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.

Reinforcement Learning
T

TensorForce

A TensorFlow library for applied reinforcement learning.

Reinforcement Learning
T

TF-Agents

A library for Reinforcement Learning in TensorFlow.

Reinforcement Learning
T

Tianshou

An elegant PyTorch deep reinforcement learning library.

Reinforcement Learning
T

TRFL

TensorFlow Reinforcement Learning.

Reinforcement Learning

Visualization(19 items)

A

Altair

Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph

Visualization
A

animatplot

A python package for animating plots built on matplotlib.

Visualization
A

AutoViz

: Visualize data automatically with 1 line of code (ideal for machine learning)

Visualization
B

Bokeh

Interactive Web Plotting for Python.

Visualization
B

bqplot

Plotting library for IPython/Jupyter notebooks

Visualization
C

chartify

Python library that makes it easy for data scientists to create charts.

Visualization
F

folium

Makes it easy to visualize data on an interactive open street map

Visualization
G

geemap

Python package for interactive mapping with Google Earth Engine (GEE)

Visualization
H

HoloViews

Stop plotting your data - annotate your data and let it visualize itself.

Visualization
M

Matplotlib

Plotting with Python.

Visualization
M

missingno

Missing data visualization module for Python.

Visualization
P

physt

Improved histograms.

Visualization
P

plotly

A Python library that makes interactive and publication-quality graphs.

Visualization
P

prettyplotlib

Painlessly create beautiful matplotlib plots.

Visualization
P

pyecharts

Migrated from Echarts, a charting and visualization library, to Python's interactive visual drawing library.

Visualization
P

pyLDAvis

: Visualize interactive topic model

Visualization
P

python-ternary

Ternary plotting library for Python with matplotlib.

Visualization
S

seaborn

Statistical data visualization using matplotlib.

Visualization
S

SweetViz

: Visualize and compare datasets, target values and associations, with one line of code.

Visualization