Outlier detection using Hampel (2024)

Outlier detection using Hampel (2)

In the realm of time series data analysis, the identification and handling of anomalies are crucial tasks. Anomalies, or outliers, are data points that deviate significantly from the expected patterns, potentially indicating errors, fraud, or valuable insights.

One effective technique for addressing this challenge is the Hampel Filter.

In this article, we will explore how to apply this outlier detection technique , using my hampel library.

Let’s begin!

Outlier detection using Hampel (3)
  1. The Hampel Filter Demystified

The Hampel Filter is a robust method for detecting and handling outliers in time series data. It relies on the Median Absolute Deviation (MAD) and employs a rolling window for the identification of outliers. MAD is a robust measure of data dispersion, calculated as the median of the absolute deviations from the median value.

Configuring the Hampel filter involves two parameters:

  • Window Size: This parameter determines the size of the moving window used to evaluate each data point. It essentially defines the scope within which we look for outliers.
  • Threshold: Careful selection of the threshold is essential to avoid triggering outlier detection for valuable data.

2. Hampel meets Python 🐍

To use the Hampel filter in your Python project, first install the package via pip:

pip install hampel

And import it in your Python script using:

from hampel import hampel

The hampel function has three available parameters:

  • data: The input 1-dimensional data to be filtered (pandas.Series or numpy.ndarray).
  • window_size (optional): The size of the moving window for outlier detection (default is 5).
  • n_sigma (optional): The number of standard deviations for outlier detection (default is 3.0). It is related to the threshold concept discussed in the previous section, i.e. by tuning this parameter we can have more or less tolerance to possible outliers.

Now let’s generate synthetic data, in which we will introduce four outliers at positions 20, 40, 60, 80 (of course in real situations the problem will not be so easy, but it is a good example to understand how hampel works 😅).

import matplotlib.pyplot as plt
import numpy as np
from hampel import hampel

original_data = np.sin(np.linspace(0, 10, 100)) + np.random.normal(0, 0.1, 100)

# Add outliers to the original data
for index, value in zip([20, 40, 60, 80], [2.0, -1.9, 2.1, -0.5]):
original_data[index] = value

Plotting original_data you should see something like this:

Outlier detection using Hampel (4)

It is very easy to detect the four outliers we have introduced visually, but let’s see if Hampel is also capable🤞.

result = hampel(original_data, window_size=10)

The hampel function returns a Result dataclass, which contains the following attributes:

  • filtered_data: The data with outliers replaced.
  • outlier_indices: Indices of the detected outliers.
  • medians: Median values within the sliding window.
  • median_absolute_deviations: Median Absolute Deviation (MAD) values within the sliding window.
  • thresholds: Threshold values for outlier detection.

We can access these attributes as simply as this:

filtered_data = result.filtered_data
outlier_indices = result.outlier_indices
medians = result.medians
mad_values = result.median_absolute_deviations
thresholds = result.thresholds

If we now print, for example, the filtered_data , we’ll have a cleaned version of the original_data , that is, without the outliers.

Outlier detection using Hampel (5)

That’s really cool! Hampel managed to remove the outliers we added previously! 💪

However, we can take advantage of the information provided by hampel to design a much more interesting graph. In my case, I’ll draw the outliers as red dots and will also add a grey band representing the threshold used by the algorithm at each point. In addition, I’ll create another plot below the first one showing the filtered data.

This is very easy to do using matplotlib:

fig, axes = plt.subplots(2, 1, figsize=(8, 6))

# Plot the original data with estimated standard deviations in the first subplot
axes[0].plot(original_data, label='Original Data', color='b')
axes[0].fill_between(range(len(original_data)), medians + thresholds,
medians - thresholds, color='gray', alpha=0.5, label='Median +- Threshold')
axes[0].set_xlabel('Data Point')
axes[0].set_ylabel('Value')
axes[0].set_title('Original Data with Bands representing Upper and Lower limits')

for i in outlier_indices:
axes[0].plot(i, original_data[i], 'ro', markersize=5) # Mark as red

axes[0].legend()

# Plot the filtered data in the second subplot
axes[1].plot(filtered_data, label='Filtered Data', color='g')
axes[1].set_xlabel('Data Point')
axes[1].set_ylabel('Value')
axes[1].set_title('Filtered Data')
axes[1].legend()

# Adjust spacing between subplots
plt.tight_layout()

# Show the plots
plt.show()

After running the snippet, you should see this beautiful figure 😍.

Outlier detection using Hampel (6)

And just in case you want to copy-paste the full Python script …👇👇 👇

import matplotlib.pyplot as plt
import numpy as np
from hampel import hampel

original_data = np.sin(np.linspace(0, 10, 100)) + np.random.normal(0, 0.1, 100)

# Add outliers to the original data
for index, value in zip([20, 40, 60, 80], [2.0, -1.9, 2.1, -0.5]):
original_data[index] = value

result = hampel(original_data, window_size=10)

filtered_data = result.filtered_data
outlier_indices = result.outlier_indices
medians = result.medians
thresholds = result.thresholds

fig, axes = plt.subplots(2, 1, figsize=(8, 6))

# Plot the original data with estimated standard deviations in the first subplot
axes[0].plot(original_data, label='Original Data', color='b')
axes[0].fill_between(range(len(original_data)), medians + thresholds,
medians - thresholds, color='gray', alpha=0.5, label='Median +- Threshold')
axes[0].set_xlabel('Data Point')
axes[0].set_ylabel('Value')
axes[0].set_title('Original Data with Bands representing Upper and Lower limits')

for i in outlier_indices:
axes[0].plot(i, original_data[i], 'ro', markersize=5) # Mark as red

axes[0].legend()

# Plot the filtered data in the second subplot
axes[1].plot(filtered_data, label='Filtered Data', color='g')
axes[1].set_xlabel('Data Point')
axes[1].set_ylabel('Value')
axes[1].set_title('Filtered Data')
axes[1].legend()

# Adjust spacing between subplots
plt.tight_layout()

# Show the plots
plt.show()

I hope this tutorial has been helpful in explaining how to apply hampel to clean our time series. If you are interested in seeing the details of the algorithm implementation (in my case it’s implemented using Cython), you are more than welcome to take a look at the repository 😛.

See you next time! 👋👋👋

Outlier detection using Hampel (7)
Outlier detection using Hampel (2024)
Top Articles
25+ Best Blueberry Recipes
How to Take Control of IBS with the FODMAP Diet IBS Health Coaching and FODMAP Diet Recipes | Calm Belly Kitchen
Katie Pavlich Bikini Photos
Gamevault Agent
Pieology Nutrition Calculator Mobile
Toyota Campers For Sale Craigslist
FFXIV Immortal Flames Hunting Log Guide
CKS is only available in the UK | NICE
Unlocking the Enigmatic Tonicamille: A Journey from Small Town to Social Media Stardom
Overzicht reviews voor 2Cheap.nl
Globe Position Fault Litter Robot
World Cup Soccer Wiki
Robert Malone é o inventor da vacina mRNA e está certo sobre vacinação de crianças #boato
Non Sequitur
How To Cut Eelgrass Grounded
Pac Man Deviantart
Alexander Funeral Home Gallatin Obituaries
Craigslist In Flagstaff
Shasta County Most Wanted 2022
Energy Healing Conference Utah
Testberichte zu E-Bikes & Fahrrädern von PROPHETE.
Aaa Saugus Ma Appointment
Geometry Review Quiz 5 Answer Key
Walgreens Alma School And Dynamite
Bible Gateway passage: Revelation 3 - New Living Translation
Yisd Home Access Center
Home
Shadbase Get Out Of Jail
Gina Wilson Angle Addition Postulate
Celina Powell Lil Meech Video: A Controversial Encounter Shakes Social Media - Video Reddit Trend
Walmart Pharmacy Near Me Open
A Christmas Horse - Alison Senxation
Ou Football Brainiacs
Access a Shared Resource | Computing for Arts + Sciences
Pixel Combat Unblocked
Cvs Sport Physicals
Mercedes W204 Belt Diagram
'Conan Exiles' 3.0 Guide: How To Unlock Spells And Sorcery
Teenbeautyfitness
Where Can I Cash A Huntington National Bank Check
Facebook Marketplace Marrero La
Nobodyhome.tv Reddit
Topos De Bolos Engraçados
Gregory (Five Nights at Freddy's)
Grand Valley State University Library Hours
Holzer Athena Portal
Hampton In And Suites Near Me
Hello – Cornerstone Chapel
Stoughton Commuter Rail Schedule
Bedbathandbeyond Flemington Nj
Otter Bustr
Selly Medaline
Latest Posts
Article information

Author: Nathanael Baumbach

Last Updated:

Views: 6440

Rating: 4.4 / 5 (75 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Nathanael Baumbach

Birthday: 1998-12-02

Address: Apt. 829 751 Glover View, West Orlando, IN 22436

Phone: +901025288581

Job: Internal IT Coordinator

Hobby: Gunsmithing, Motor sports, Flying, Skiing, Hooping, Lego building, Ice skating

Introduction: My name is Nathanael Baumbach, I am a fantastic, nice, victorious, brave, healthy, cute, glorious person who loves writing and wants to share my knowledge and understanding with you.