###################################
 Welcome to PyThresh Documentation
###################################

**Deployment, Stats, & License**

|badge_pypi| |badge_anaconda| |badge_docs| |badge_testing|
|badge_coverage| |badge_maintainability| |badge_stars| |badge_downloads|
|badge_versions| |badge_licence| |badge_citation|

.. |badge_pypi| image:: https://img.shields.io/pypi/v/pythresh.svg?color=brightgreen&logo=pypi&logoColor=white
   :alt: PyPI version
   :target: https://pypi.org/project/pythresh/

.. |badge_anaconda| image:: https://img.shields.io/conda/vn/conda-forge/pythresh?color=brightgreen&logo=conda-forge&logoColor=white
   :alt: Anaconda version
   :target: https://anaconda.org/conda-forge/pythresh

.. |badge_docs| image:: https://img.shields.io/readthedocs/pythresh.svg?version=latest&logo=read-the-docs&logoColor=white
   :alt: Documentation status
   :target: http://pythresh.readthedocs.io/?badge=latest

.. |badge_testing| image:: https://github.com/KulikDM/pythresh/actions/workflows/ci.yml/badge.svg
   :alt: testing
   :target: https://github.com/KulikDM/pythresh/actions/workflows/ci.yml

.. |badge_coverage| image:: https://codecov.io/gh/KulikDM/pythresh/branch/main/graph/badge.svg?token=8ZAPXTLW9Y
   :alt: Codecov
   :target: https://codecov.io/gh/KulikDM/pythresh

.. |badge_maintainability| image:: https://api.codeclimate.com/v1/badges/3e2de42b48701c731ef6/maintainability
   :alt: Maintainability
   :target: https://codeclimate.com/github/KulikDM/pythresh/maintainability

.. |badge_stars| image:: https://img.shields.io/github/stars/KulikDM/pythresh.svg?logo=github&logoColor=white&style=flat
   :alt: GitHub stars
   :target: https://github.com/KulikDM/pythresh/stargazers

.. |badge_downloads| image:: https://img.shields.io/badge/dynamic/xml?url=https%3A%2F%2Fstatic.pepy.tech%2Fbadge%2Fpythresh&query=%2F%2F*%5Blocal-name()%20%3D%20%27text%27%5D%5Blast()%5D&logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyBzdHlsZT0iZW5hYmxlLWJhY2tncm91bmQ6bmV3IDAgMCAyNCAyNDsiIHZlcnNpb249IjEuMSIgdmlld0JveD0iMCAwIDI0IDI0IiB4bWw6c3BhY2U9InByZXNlcnZlIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIj48ZyBpZD0iaW5mbyIvPjxnIGlkPSJpY29ucyI%2BPGcgaWQ9InNhdmUiPjxwYXRoIGQ9Ik0xMS4yLDE2LjZjMC40LDAuNSwxLjIsMC41LDEuNiwwbDYtNi4zQzE5LjMsOS44LDE4LjgsOSwxOCw5aC00YzAsMCwwLjItNC42LDAtN2MtMC4xLTEuMS0wLjktMi0yLTJjLTEuMSwwLTEuOSwwLjktMiwyICAgIGMtMC4yLDIuMywwLDcsMCw3SDZjLTAuOCwwLTEuMywwLjgtMC44LDEuNEwxMS4yLDE2LjZ6IiBmaWxsPSIjZWJlYmViIi8%2BPHBhdGggZD0iTTE5LDE5SDVjLTEuMSwwLTIsMC45LTIsMnYwYzAsMC42LDAuNCwxLDEsMWgxNmMwLjYsMCwxLTAuNCwxLTF2MEMyMSwxOS45LDIwLjEsMTksMTksMTl6IiBmaWxsPSIjZWJlYmViIi8%2BPC9nPjwvZz48L3N2Zz4%3D&label=downloads
   :alt: Downloads
   :target: https://pepy.tech/project/pythresh

.. |badge_versions| image:: https://img.shields.io/pypi/pyversions/pythresh.svg?logo=python&logoColor=white
   :alt: Python versions
   :target: https://pypi.org/project/pythresh/

.. |badge_licence| image:: https://img.shields.io/github/license/KulikDM/pythresh.svg?logo=data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjMyIiBpZD0iaWNvbiIgdmlld0JveD0iMCAwIDMyIDMyIiB3aWR0aD0iMzIiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PGRlZnMgZmlsbD0iI2ViZjJlZSI+PHN0eWxlPgogICAgICAuY2xzLTEgewogICAgICAgIGZpbGw6IG5vbmU7CiAgICAgIH0KICAgIDwvc3R5bGU+PC9kZWZzPjxyZWN0IGhlaWdodD0iMiIgd2lkdGg9IjEyIiB4PSI4IiB5PSI2IiBmaWxsPSIjZWJmMmVlIi8+PHJlY3QgaGVpZ2h0PSIyIiB3aWR0aD0iMTIiIHg9IjgiIHk9IjEwIiBmaWxsPSIjZWJmMmVlIi8+PHJlY3QgaGVpZ2h0PSIyIiB3aWR0aD0iNiIgeD0iOCIgeT0iMTQiIGZpbGw9IiNlYmYyZWUiLz48cmVjdCBoZWlnaHQ9IjIiIHdpZHRoPSI0IiB4PSI4IiB5PSIyNCIgZmlsbD0iI2ViZjJlZSIvPjxwYXRoIGQ9Ik0yOS43MDcsMTkuMjkzbC0zLTNhLjk5OTQuOTk5NCwwLDAsMC0xLjQxNCwwTDE2LDI1LjU4NTlWMzBoNC40MTQxbDkuMjkyOS05LjI5M0EuOTk5NC45OTk0LDAsMCwwLDI5LjcwNywxOS4yOTNaTTE5LjU4NTksMjhIMThWMjYuNDE0MWw1LTVMMjQuNTg1OSwyM1pNMjYsMjEuNTg1OSwyNC40MTQxLDIwLDI2LDE4LjQxNDEsMjcuNTg1OSwyMFoiIGZpbGw9IiNlYmYyZWUiLz48cGF0aCBkPSJNMTIsMzBINmEyLjAwMjEsMi4wMDIxLDAsMCwxLTItMlY0QTIuMDAyMSwyLjAwMjEsMCwwLDEsNiwySDIyYTIuMDAyMSwyLjAwMjEsMCwwLDEsMiwyVjE0SDIyVjRINlYyOGg2WiIgZmlsbD0iI2ViZjJlZSIvPjxyZWN0IGNsYXNzPSJjbHMtMSIgZGF0YS1uYW1lPSImbHQ7VHJhbnNwYXJlbnQgUmVjdGFuZ2xlJmd0OyIgaGVpZ2h0PSIzMiIgaWQ9Il9UcmFuc3BhcmVudF9SZWN0YW5nbGVfIiB3aWR0aD0iMzIiIGZpbGw9IiNlYmYyZWUiLz48L3N2Zz4=
   :alt: License
   :target: https://github.com/KulikDM/pythresh/blob/main/LICENSE

.. |badge_citation| image:: https://zenodo.org/badge/497683169.svg
   :alt: Zenodo DOI
   :target: https://zenodo.org/badge/latestdoi/497683169

----

PyThresh is a comprehensive and scalable **Python toolkit** for
**thresholding outlier detection likelihood scores** in
univariate/multivariate data. It has been written to work in tandem with
PyOD and has similar syntax and data structures. However, it is not
limited to this single library. PyThresh is meant to threshold
likelihood scores generated by an outlier detector. It thresholds these
likelihood scores and replaces the need to set a contamination level or
have the user guess the amount of outliers that may exist in the dataset
beforehand. These non-parametric methods were written to reduce the
user's input/guess work and rather rely on statistics instead to
threshold outlier likelihood scores. For thresholding to be applied
correctly, the outlier detection likelihood scores must follow this
rule: the higher the score, the higher the probability that it is an
outlier in the dataset. All threshold functions return a binary array
where inliers and outliers are represented by a 0 and 1 respectively.

PyThresh includes more than 30 thresholding algorithms. These algorithms
range from using simple statistical analysis like the Z-score to more
complex mathematical methods that involve graph theory and topology.

**API Demo**:

.. code:: python

   # train the KNN detector
   from pyod.models.knn import KNN
   from pythresh.thresholds.clust import CLUST

   clf = KNN()
   clf.fit(X_train)

   # get outlier likelihood scores
   decision_scores = clf.decision_scores_

   # get outlier labels
   thres = CLUST()
   thres.fit(decision_scores)

   labels = thres.labels_ # or thres.predict(decision_scores)

----

**************************
 Benchmarking & Utilities
**************************

Benchmarking has been done on all the thresholders and it was found
that the ``MIXMOD`` thresholder performed best while the ``CLF``
thresholder provided the smallest uncertainty about its mean and is
the most robust (best least accurate prediction). However, for
interpretability and general performance the ``MIXMOD, FILTER,`` and
``META`` thresholders are good fits.

Further utilities are available for assisting in the selection of the
most optimal outlier detection and thresholding methods `ranking
<https://pythresh.readthedocs.io/en/latest/ranking.html>`_ as well as
determining the confidence with regards to the selected thresholding
method `thresholding confidence
<https://pythresh.readthedocs.io/en/latest/confidence.html>`_

----

************************
 External Feature Cases
************************

**Towards Data Science**: `Thresholding Outlier Detection Scores with
PyThresh
<https://towardsdatascience.com/thresholding-outlier-detection-scores-with-pythresh-f26299d14fa>`_

**Towards Data Science**: `When Outliers are Significant: Weighted
Linear Regression
<https://towardsdatascience.com/when-outliers-are-significant-weighted-linear-regression-bcdc8389ab10>`_

**ArXiv**: `Estimating the Contamination Factor's Distribution in
Unsupervised Anomaly Detection <https://arxiv.org/abs/2210.10487>`_

----

***********************************
 Available Thresholding Algorithms
***********************************

+-----------+----------------------------------------------------------------+-----------------------------------+
| Abbr      | Description                                                    | References                        |
+===========+================================================================+===================================+
| AUCP      | Area Under Curve Percentage                                    | :cite:`ren2018aucp`               |
+-----------+----------------------------------------------------------------+-----------------------------------+
| BOOT      | Bootstrapping                                                  | :cite:`martin2006boot`            |
+-----------+----------------------------------------------------------------+-----------------------------------+
| CHAU      | Chauvenet's Criterion                                          | :cite:`bolshev2016chau`           |
+-----------+----------------------------------------------------------------+-----------------------------------+
| CLF       | Trained Linear Classifier                                      | :cite:`aggarwal2017clf`           |
+-----------+----------------------------------------------------------------+-----------------------------------+
| CLUST     | Clustering Based                                               | :cite:`klawonn2008clust`          |
+-----------+----------------------------------------------------------------+-----------------------------------+
| CPD       | Change Point Detection                                         | :cite:`fearnhead2016cpd`          |
+-----------+----------------------------------------------------------------+-----------------------------------+
| DECOMP    | Decomposition                                                  | :cite:`boente2002decomp`          |
+-----------+----------------------------------------------------------------+-----------------------------------+
| DSN       | Distance Shift from Normal                                     | :cite:`amagata2021dsn`            |
+-----------+----------------------------------------------------------------+-----------------------------------+
| EB        | Elliptical Boundary                                            | :cite:`friendly2013eb`            |
+-----------+----------------------------------------------------------------+-----------------------------------+
| FGD       | Fixed Gradient Descent                                         | :cite:`qi2021fgd`                 |
+-----------+----------------------------------------------------------------+-----------------------------------+
| FILTER    | Filtering Based                                                | :cite:`hashemi2019filter`         |
+-----------+----------------------------------------------------------------+-----------------------------------+
| FWFM      | Full Width at Full Minimum                                     | :cite:`joneidi2013fwfm`           |
+-----------+----------------------------------------------------------------+-----------------------------------+
| GAMGMM    | Bayesian Gamma GMM                                             | :cite:`perini2023gamgmm`          |
+-----------+----------------------------------------------------------------+-----------------------------------+
| GESD      | Generalized Extreme Studentized Deviate                        | :cite:`alrawashdeh2021gesd`       |
+-----------+----------------------------------------------------------------+-----------------------------------+
| HIST      | Histogram Based                                                | :cite:`thanammal2015hist`         |
+-----------+----------------------------------------------------------------+-----------------------------------+
| IQR       | Inter-Quartile Regression                                      | :cite:`bardet2015iqr`             |
+-----------+----------------------------------------------------------------+-----------------------------------+
| KARCH     | Karcher mean (Riemannian Center of Mass)                       | :cite:`afsari2011karch`           |
+-----------+----------------------------------------------------------------+-----------------------------------+
| MAD       | Median Absolute Deviation                                      | :cite:`archana2015mad`            |
+-----------+----------------------------------------------------------------+-----------------------------------+
| MCST      | Monte Carlo Shapiro Tests                                      | :cite:`coin2008mcst`              |
+-----------+----------------------------------------------------------------+-----------------------------------+
| META      | Metamodel Trained Classifier                                   | :cite:`zhao2022meta`              |
+-----------+----------------------------------------------------------------+-----------------------------------+
| MIXMOD    | Normal & Non-Normal Mixture Models                             | :cite:`veluw2023mixmod`           |
+-----------+----------------------------------------------------------------+-----------------------------------+
| MOLL      | Friedrichs' Mollifier                                          | :cite:`keyzer1997moll`            |
+-----------+----------------------------------------------------------------+-----------------------------------+
| MTT       | Modified Thompson Tau Test                                     | :cite:`rengasamy2020mtt`          |
+-----------+----------------------------------------------------------------+-----------------------------------+
| OCSVM     | One-Class Support Vector Machine                               | :cite:`barbado2022ocsvm`          |
+-----------+----------------------------------------------------------------+-----------------------------------+
| QMCD      | Quasi-Monte Carlo Discrepancy                                  | :cite:`iouchtchenko2019qmcd`      |
+-----------+----------------------------------------------------------------+-----------------------------------+
| REGR      | Regression Based                                               | :cite:`aggarwal2017clf`           |
+-----------+----------------------------------------------------------------+-----------------------------------+
| VAE       | Variational Autoencoder                                        | :cite:`xiao2020vae`               |
+-----------+----------------------------------------------------------------+-----------------------------------+
| WIND      | Topological Winding Number                                     | :cite:`jacobson2013wind`          |
+-----------+----------------------------------------------------------------+-----------------------------------+
| YJ        | Yeo-Johnson Transformation                                     | :cite:`raymaekers2021yj`          |
+-----------+----------------------------------------------------------------+-----------------------------------+
| ZSCORE    | Z-score                                                        | :cite:`bagdonavicius2020zscore`   |
+-----------+----------------------------------------------------------------+-----------------------------------+
| COMB      | Thresholder Combination                                        |                                   |
+-----------+----------------------------------------------------------------+-----------------------------------+
| DUMMY     | Dummy Percentile Based                                         |                                   |
+-----------+----------------------------------------------------------------+-----------------------------------+

**Tutorial Notebooks**

+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+
| Notebook                                                                                                          | Description                                                                                         |
+===================================================================================================================+=====================================================================================================+
| `Introduction <https://github.com/KulikDM/pythresh/tree/main/notebooks/00_Introduction.ipynb>`_                   | Basic intro into outlier thresholding                                                               |
+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+
| `Advanced Thresholding <https://github.com/KulikDM/pythresh/tree/main/notebooks/01_Advanced.ipynb>`_              | Additional thresholding options for more advanced use                                               |
+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+
| `Threshold Confidence <https://github.com/KulikDM/pythresh/tree/main/notebooks/02_Confidence.ipynb>`_             | Calculating the confidence levels around the threshold point                                        |
+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+
| `Outlier Ranking <https://github.com/KulikDM/pythresh/tree/main/notebooks/03_Ranking.ipynb>`_                     | Assisting in selecting the best performing outlier and thresholding method combo using ranking      |
+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+


**The comparison among of implemented models** is made available below:

.. thumbnail:: figs/All.png
   :alt: Comparison of selected models

############################
 API Cheatsheet & Reference
############################

The following APIs are applicable for all detector models for easy use.

-  :func:`pythresh.thresholders.base.BaseDetector.eval`: evaluate a single
   outlier or multiple outlier detection likelihood score set (Legacy method).

-  :func:`pythresh.thresholders.base.BaseDetector.fit`: fit a
   thresholder for a single outlier or multiple outlier detection
   likelihood score set.

-  :func:`pythresh.thresholders.base.BaseDetector.predict`: predict the
   binary labels using the fitted thresholder on a single outlier or
   multiple outlier detection likelihood score set.

Key Attributes of a threshold:

-  :attr:`pythresh.thresholders.base.BaseDetector.thresh_`: Return the
   threshold value that separates inliers from outliers. Outliers are
   considered all values above this threshold value. Note the threshold
   value has been derived from likelihood scores normalized between 0 and 1.

-  :attr:`pythresh.thresholds.base.BaseThresholder.labels_`: Return a binary
   array of labels for the fitted thresholder on the fitted dataset.

-  :attr:`pythresh.thresholders.base.BaseDetector.confidence_interval_`:
   Return the lower and upper confidence interval of the contamination level.
   Only applies to the COMB thresholder

-  :attr:`pythresh.thresholders.base.BaseDetector.dscores_`: 1D array of the
   TruncatedSVD decomposed decision scores if multiple outlier detector score
   sets are passed

-  :attr:`pythresh.thresholders.mixmod.MIXMOD.mixture_`: fitted mixture model class
   of the selected model used for thresholding. Only applies to MIXMOD. Attributes
   include: components, weights, params. Functions include: fit, loglikelihood,
   pdf, and posterior.

----

.. toctree::
   :maxdepth: 2
   :hidden:
   :caption: Getting Started

   install
   example
   benchmark
   ranking
   confidence

.. toctree::
   :maxdepth: 2
   :hidden:
   :caption: Documentation

   api_cc
   pythresh

.. toctree::
   :maxdepth: 2
   :hidden:
   :caption: Additional Information

   FAQ

----

.. rubric:: References

.. bibliography::
   :cited:
   :labelprefix:
   :keyprefix: a-