An Introduction to Sequential Pattern Mining (2024)

In this blog post, I will give anintroductiontosequentialpatternmining,an important dataminingtask with a wide range of applications from text analysis to market basket analysis. This blog post is aimed to be a short introduction. If you want to read a more detailedintroductiontosequentialpatternmining, you can readasurvey paperthat I recently wrote on thistopic.

What issequentialpatternmining?

Dataminingconsists of extracting information from data stored in databases to understand the data and/or take decisions.Some of the most fundamental dataminingtasks are clustering, classification, outlier analysis, andpatternmining.Patternminingconsists of discovering interesting, useful, and unexpected patterns in databases Various types of patterns can be discovered in databases such asfrequent itemsets, associations,subgraphs,sequential rules, andperiodic patterns.

The task ofsequentialpatternminingis a dataminingtask specialized for analyzingsequentialdata,to discoversequentialpatterns. More precisely, it consists of discovering interesting subsequences ina set of sequences, where the interestingness of a subsequence can be measured in terms of various criteria such as its occurrence frequency, length, and profit.Sequentialpatternmininghas numerous real-life applications due to the fact that data is naturally encoded assequences of symbolsin many fields such as bioinformatics, e-learning, market basket analysis, texts, and webpage click-stream analysis.

I will now explain the task ofsequentialpatternminingwith an example. Consider the followingsequence database, representing the purchases made by customers in a retail store.

An Introduction to Sequential Pattern Mining (1)

This database contains four sequences. Eachsequencerepresents the items purchased by a customer at different times. A sequence is an ordered list of itemsets (sets of items bought together). For example, in this database, the first sequence (SID 1) indicatesthat a customer bought some itemsaandbtogether, then purchased an itemc, then purchased itemsfandgtogether, then purchased an itemg, and then finally purchased an iteme.

Traditionally,sequentialpatternminingis beingused to find subsequences that appear often in a sequence database, i.e. that are common to several sequences. Those subsequences are called thefrequentsequentialpatterns. For example, in the context of our example,sequentialpatternminingcan be used to find the sequences of itemsfrequently bought by customers. This can be useful to understand the behavior of customers to take marketing decisions.

Todosequentialpatternmining, a user must provide a sequence database and specify a parameter called theminimum support threshold. This parameter indicates a minimum number of sequences in which apatternmust appear to be considered frequent, and be shown to the user. For example, if a user sets the minimum support threshold to 2 sequences, the task ofsequentialpatternminingconsists of finding all subsequences appearing in at least 2 sequences of the input database. In the example database, many subsequences met this requirement. Some of thesesequentialpatterns are shown in the table below, where the number of sequences containing eachpattern(called thesupport) is indicated in the right column of the table.

Note that thepattern<{a}, {f, g}> could also be put in this table, as well as thepattern<{f, g},{e} >, <{a},{f, g},{e} > and <{b},{f, g},{e} > with support = 2 …(2020/03)

For example, the patterns<{a}> and <{a}, {g}> are frequent and have a support of 3 and 2 sequences, respectively. In other words, these patterns appears in 3 and 2 sequences of the input database, respectively. Thepattern<{a}> appears in the sequences 1, 2 and 3, while thepattern<{a}, {g}> appears in sequences 1 and 3. These patterns are interesting as they represent some behavior common to several customers. Of course, this is a toy example.Sequentialpatternminingcan actually be applied on database containing hundreds of thousands of sequences.

Another example of application ofsequentialpatternminingis text analysis. In this context, a set of sentences from a text can be viewed as sequence database, and the goal ofsequentialpatternminingis then to find subsequences of words frequently used in the text. If such sequences are contiguous, they are called “ngrams” in this context. If you want to know more about this application, you can read thisblog post, where sequential patterns are discovered in a Sherlock Holmes novel.

Cansequentialpatternminingbe applied to time series?

Besides sequences,sequentialpatternminingcan also be applied totime series(e.g. stock data), when discretization is performed as a pre-processing step. For example, the figure below shows atime series (an ordered list of numbers) on the left. On the right, asequence(a sequence of symbols) is shown representing the same data, after applying a transformation. Various transformations can be done to transform atime series to a sequence such as the popular SAX transformation. After performing the transformation, anysequentialpatternminingalgorithm can be applied.

Where can I getSequentialpatternminingimplementations?

To trysequentialpatternminingwith your datasets, you maytry the open-sourceSPMF dataminingsoftware, which provides implementations of numeroussequentialpatternminingalgorithms:https://www.philippe-fournier-viger.com/spmf/

It provides implementations of several algorithms forsequentialpatternmining, as well as several variations of the problem such as discoveringmaximalsequentialpatterns,closedsequentialpatternsandsequentialrules.Sequentialrulesare especially useful for the purpose of performing predictions, as they also include the concept of confidence.

What are the current best algorithms forsequentialpatternmining?

There exists severalsequentialpatternminingalgorithms. Some of the classic algorithms for this problemarePrefixSpan, Spade, SPAM,andGSP. However, in the recent decade, several novel and more efficient algorithms have been proposed such asCM-SPADE andCM-SPAM(2014),FCloSMandFGenSM(2017), to name a few. Besides, numerous algorithms have been proposed for extensions of the problem ofsequentialpatternminingsuch as finding thesequentialpatterns that generate the most profit (high utilitysequentialpatternmining).

A video introduction to sequential pattern mining

If you think this blog post is a little bit short, you can also watch my video that gives an introduction to sequential pattern mining(23 min, MP4 format)

Conclusion

In thisblog post, I have given a brief overview ofsequentialpatternmining, a very useful set of techniques for analyzingsequentialdata. If you want to know more about this topic, you may read the followingrecent survey paper that I wrote, which gives an easy-to-read overview of this topic, including the algorithms forfsequentialpatternmining, extensions, research challenges and opportunities.

Fournier-Viger, P., Lin, J. C.-W., Kiran, R. U., Koh, Y. S., Thomas, R. (2017).A SurveyofSequentialPatternMining. Data Science andPatternRecognition, vol. 1(1), pp. 54-77.


Philippe Fournier-Vigeris a professor of Computer Science and also the founder of theopen-source data mining software SPMF,offering more than 120 dataminingalgorithms.

Related posts:

200,000 visitors on the SPMF website!Mining Episode Rules (video)A New Tool for Running Performance Comparison of Algorithms in SPMF 2.54
An Introduction to Sequential Pattern Mining (2024)
Top Articles
How to know if you’re rich in crypto? You hold it, you don’t spend it
Polygon Gas Estimator
Stretchmark Camouflage Highland Park
Ets Lake Fork Fishing Report
Online Reading Resources for Students & Teachers | Raz-Kids
Trade Chart Dave Richard
Skip The Games Norfolk Virginia
Turning the System On or Off
Dexter Gomovies
Water Days For Modesto Ca
Prestige Home Designs By American Furniture Galleries
Lcwc 911 Live Incident List Live Status
Carson Municipal Code
Army Oubs
Ally Joann
Google Doodle Baseball 76
Kringloopwinkel Second Sale Roosendaal - Leemstraat 4e
Https Paperlesspay Talx Com Boydgaming
Used Safari Condo Alto R1723 For Sale
Dragonvale Valor Dragon
Shreveport City Warrants Lookup
Dewalt vs Milwaukee: Comparing Top Power Tool Brands - EXTOL
Sadie Sink Reveals She Struggles With Imposter Syndrome
Chamberlain College of Nursing | Tuition & Acceptance Rates 2024
The Banshees Of Inisherin Showtimes Near Broadway Metro
Truck from Finland, used truck for sale from Finland
How rich were the McCallisters in 'Home Alone'? Family's income unveiled
Tu Housing Portal
Noaa Marine Forecast Florida By Zone
Bfri Forum
Rund um die SIM-Karte | ALDI TALK
Σινεμά - Τι Ταινίες Παίζουν οι Κινηματογράφοι Σήμερα - Πρόγραμμα 2024 | iathens.gr
Moses Lake Rv Show
Amici Pizza Los Alamitos
Hannibal Mo Craigslist Pets
Dr Adj Redist Cadv Prin Amex Charge
Trap Candy Strain Leafly
Hometown Pizza Sheridan Menu
Sas Majors
Clausen's Car Wash
No Boundaries Pants For Men
Homeloanserv Account Login
Guy Ritchie's The Covenant Showtimes Near Grand Theatres - Bismarck
Bekah Birdsall Measurements
Toomics - Die unendliche Welt der Comics online
Tommy Bahama Restaurant Bar & Store The Woodlands Menu
Neil Young - Sugar Mountain (2008) - MusicMeter.nl
303-615-0055
M Life Insider
Divisadero Florist
Primary Care in Nashville & Southern KY | Tristar Medical Group
Latest Posts
Article information

Author: Greg O'Connell

Last Updated:

Views: 5991

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Greg O'Connell

Birthday: 1992-01-10

Address: Suite 517 2436 Jefferey Pass, Shanitaside, UT 27519

Phone: +2614651609714

Job: Education Developer

Hobby: Cooking, Gambling, Pottery, Shooting, Baseball, Singing, Snowboarding

Introduction: My name is Greg O'Connell, I am a delightful, colorful, talented, kind, lively, modern, tender person who loves writing and wants to share my knowledge and understanding with you.