bupaR Docs | Create Logs (2024)

Transforming your raw data into an event log object is one of themost challenging tasks in process analysis. On this page, we cover allthe possible situations and challenges that you can encounter.

Logs: eventlog vs activitylog

bupaR supports two different kinds of log formats, bothof which are an extension on R data.frame:

  • eventlog: Event logs are created fromdata.frame in which each row represents a single event.This means that it has a single timestamp.
  • activitylog: Activity logs are createdfrom data.frame in which each row represents a singleactivity instances. This means it can has multiple timestamps, stored indifferent columns.

The data model below shows the difference between these two levels ofobservations, i.e.activity instances vs events.

bupaR Docs | Create Logs (1)

The example below shows an excerpt of an event log containing 6events. It can be seen that each event is linked to a single timestamp.As there can be more events within a single activity instance, eachevent also needs to be linked to a lifecycle status (here theregistration_type). Furthermore, an activity instance identifier(handling_id) is needed to indicated which events belong to the sameactivity instances.

handlingpatientemployeehandling_idregistration_typetime
Registration333r1333start2017-11-15 16:50:59
Registration333r1333complete2017-11-15 18:45:18
Triage and Assessment333r2833start2017-11-16 20:37:26
Triage and Assessment333r2833complete2017-11-17 08:21:08
Blood test333r31152start2017-11-17 22:27:09
Blood test333r31152complete2017-11-18 03:16:03

Transactional lifecycle?

An event is an atomic registration related to an activity instance. It thus contains one (and only one) timestamp. Additionally, the event should include a reference to a lifecycle transition. More specifically, multiple events can describe different lifecycle transitions of a single activity instance. For example, one event might record when a surgery is scheduled, another when it is started, yet another when it is completed, etc.

The table below show the same data as above, but now using theactivitylog format. It can be seen that there are now just3 rows instead of 6, but each row as 2 timestamps, representing 2events. The lifecycle status represented by those timestamps is now thecolumn names of those variables.

handlingpatientemployeehandling_idcompletestart
Registration333r13332017-11-15 18:45:182017-11-15 16:50:59
Triage and Assessment333r28332017-11-17 08:21:082017-11-16 20:37:26
Blood test333r311522017-11-18 03:16:032017-11-17 22:27:09

As these examples show, both formats can often be used forrepresenting the same process data. However, there are some importantdifferences between them:

  • the eventlog format has much moreflexibility in terms of lifecycle. There is no limit tothe number of events that can occur in a single activity instance. Ifyour data contains lifecycle statuses such as suspend,resume or reassign, they can be recorded multipletimes within a single activity instance. In the activitylogformat, as each lifecycle gets is own column, it isn’t possible to havetwo events of the same lifecycle status in a single activityinstance.
  • the level of observation in an eventlog is an event. Asa result, attribute values can be stored at the eventlevel. In an activitylog, the level of observationis an activity instance. This means that all additional attributes thatyou have about your process should be at this higher level. For example,an activity instance can only be connected to a single resource in theactivitylog format, whereas in an eventlogdifferent events within the same activity instance can have differentresources, of different values for any other attribute.
  • because of the limited flexibility, an activitylog iseasier to make, and typically closer to the format thatyour data is already in (see further below on how to constructlog objects). As a result of this, there are manysituations in which the analysis of an activitylog will bemuch faster compared to eventlog, where a lot of additionalcomplexity needs to be taken into account.

The right log for the job

Functionalities in bupaR core packages support both formats. 1 As such,the goal of your analysis does not impact the decision. Only thecomplexity of your data is important to make this decision. The preciseformat your raw data is in will further define the preparatory stepsthat are needed. We can distinguish between 3 typical scenarios. Theflowchart below helps you on your way.

bupaR Docs | Create Logs (3)

An activitylog is the best option when each row in yourdata is an activity instance, or when events belonging to the sameactivity instance have equal attribute values (e.g.all events areexecuted by the same resource). When these two criteria do not hold, youcan create an eventlog object.

Scenario 1

If each row in your data.frame is already an activityinstance, the activitylog format is the best way to go.Consider the data sample below.

patienthandlingactivity_startedactivity_ended
464Blood test2018-04-06 20:04:092018-04-07 01:18:17
464Check-out2018-04-12 19:02:112018-04-12 21:41:01
464Discuss Results2018-04-12 11:00:162018-04-12 13:59:44
464MRI SCAN2018-04-07 06:30:562018-04-07 09:37:26
464Registration2018-03-20 19:07:172018-03-20 21:15:41
464Triage and Assessment2018-03-21 15:58:552018-03-22 05:21:56

As each row contains multiple timestamps, i.e.activity_started andactivity_ended, it is clear that each row represents an activityinstance. Turning this dataset in an activitylog requiresthe following steps:

  1. Timestamp variables should be named in correspondence with thestandard Transactional lifecycle.
  2. Timestamp variables should be of type Date orPOSIXct.
  3. Use the activitylog constructor function.
data %>% # rename timestamp variables appropriately dplyr::rename(start = activity_started,  complete = activity_ended) %>% # convert timestamps to  convert_timestamps(columns = c("start", "complete"), format = ymd_hms) %>% activitylog(case_id = "patient", activity_id = "handling", timestamps = c("start", "complete"))
## # Log of 12 events consisting of:## 1 trace ## 1 case ## 6 instances of 6 activities ## 0 resources ## Events occurred from 2018-03-20 19:07:17 until 2018-04-12 21:41:01 ## ## # Variables were mapped as follows:## Case identifier: patient ## Activity identifier: handling ## Resource identifier: employee ## Timestamps: start, complete ## ## # A tibble: 6 × 5## patient handling start complete .order## <chr> <fct> <dttm> <dttm> <int>## 1 464 Blood test 2018-04-06 20:04:09 2018-04-07 01:18:17 1## 2 464 Check-out 2018-04-12 19:02:11 2018-04-12 21:41:01 2## 3 464 Discuss Results 2018-04-12 11:00:16 2018-04-12 13:59:44 3## 4 464 MRI SCAN 2018-04-07 06:30:56 2018-04-07 09:37:26 4## 5 464 Registration 2018-03-20 19:07:17 2018-03-20 21:15:41 5## 6 464 Triage and Assessment 2018-03-21 15:58:55 2018-03-22 05:21:56 6

Note that in case a resource identifier is available, thisinformation can be added in the activitylog call.

Scenario 2

If each row in your data.frame is an event, but allevents that belong to the same activity instance share the sameattribute values, the activitylog format is again the bestway to go. Consider the data sample below.

patienthandlingemployeehandling_idregistration_typetime
227Registrationr1227started2017-08-09 19:55:30
227Triage and Assessmentr2727started2017-08-09 22:17:43
227Registrationr1227completed2017-08-09 22:17:43
227Triage and Assessmentr2727completed2017-08-10 15:21:30
227Blood testr31109started2017-08-17 03:01:24
227Blood testr31109completed2017-08-17 09:17:20
227MRI SCANr41346started2017-08-17 13:15:04
227MRI SCANr41346completed2017-08-17 18:47:44
227Discuss Resultsr61961started2017-08-22 13:33:38
227Check-outr72456started2017-08-22 15:38:38
227Discuss Resultsr61961completed2017-08-22 15:38:38
227Check-outr72456completed2017-08-22 17:12:46

The resource identifier (employee) has been added as an additionalattribute. Note that though each row is an event, they can be groupedinto activity instances using the handling_id column, which we will callthe activity instance id. Using the latter, we can see that the resourceattribute is the same within each activity instance, which allows us tocreate an activitylog. The steps to do so are thefollowing.

  1. Lifecycle variable should be named in correspondence with thestandard Transactional lifecycle.
  2. Timestamp variable should be of type Date orPOSIXct.
  3. Use the eventlog constructor function.
  4. Convert to activitylog usingto_activitylog for reduced memory usage and improvedperformance.
data %>% # recode lifecycle variable appropriately dplyr::mutate(registration_type = forcats::fct_recode(registration_type,  "start" = "started", "complete" = "completed")) %>% convert_timestamps(columns = "time", format = ymd_hms) %>% eventlog(case_id = "patient", activity_id = "handling", activity_instance_id = "handling_id", lifecycle_id = "registration_type", timestamp = "time", resource_id = "employee") %>% to_activitylog() -> tmp_act

Note that the resource identifier is optional, and can be left out ofthe eventlog call if such an attribute does not exist inyour data. If the activity instance id does not exist, some heuristicsare available to generate it: [Missing activity instanceidentifier].

Scenario 3

If each row is an event, and events of the same activity instancehave differing attribute values, the flexibility ofeventlog objects is required. Consider the data samplebelow.

patienthandlingemployeehandling_idregistration_typetime
116Registrationr2116started2017-04-29 03:24:59
116Registrationr6116completed2017-04-29 06:23:09
116Triage and Assessmentr1616started2017-04-29 15:41:27
116Triage and Assessmentr7616completed2017-04-30 03:04:21
116Blood testr41054started2017-04-30 15:13:28
116Blood testr61054completed2017-04-30 21:24:18
116MRI SCANr11291started2017-05-01 01:12:51
116MRI SCANr41291completed2017-05-01 05:32:37
116Discuss Resultsr31850started2017-05-01 09:44:20
116Discuss Resultsr71850completed2017-05-01 14:00:48
116Check-outr32345started2017-05-03 04:02:35
116Check-outr22345completed2017-05-03 06:16:03

In this example, different resources (employees) sometimes performthe start and complete event of the same activity instance. Therefore,we resort to the eventlog format which has no problemsstoring this. The steps to take are the following:

  1. Lifecycle variable should be named in correspondence with thestandard Transactional lifecycle.
  2. Timestamp variable should be of type Date orPOSIXct.
  3. Use the eventlog constructor function.
data %>% # recode lifecycle variable appropriately dplyr::mutate(registration_type = forcats::fct_recode(registration_type,  "start" = "started", "complete" = "completed")) %>% convert_timestamps(columns = "time", format = ymd_hms) %>% eventlog(case_id = "patient", activity_id = "handling", activity_instance_id = "handling_id", lifecycle_id = "registration_type", timestamp = "time", resource_id = "employee") 
## Warning in validate_eventlog(eventlog): The following activity instances are## connected to more than one resource: 1054,116,1291,1850,2345,616
## # Log of 12 events consisting of:## 1 trace ## 1 case ## 6 instances of 6 activities ## 6 resources ## Events occurred from 2017-04-29 03:24:59 until 2017-05-03 06:16:03 ## ## # Variables were mapped as follows:## Case identifier: patient ## Activity identifier: handling ## Resource identifier: employee ## Activity instance identifier: handling_id ## Timestamp: time ## Lifecycle transition: registration_type ## ## # A tibble: 12 × 7## patient handling employee handling_id registration_type time ## <chr> <fct> <fct> <chr> <fct> <dttm> ## 1 116 Registrat… r2 116 start 2017-04-29 03:24:59## 2 116 Registrat… r6 116 complete 2017-04-29 06:23:09## 3 116 Triage an… r1 616 start 2017-04-29 15:41:27## 4 116 Triage an… r7 616 complete 2017-04-30 03:04:21## 5 116 Blood test r4 1054 start 2017-04-30 15:13:28## 6 116 Blood test r6 1054 complete 2017-04-30 21:24:18## 7 116 MRI SCAN r1 1291 start 2017-05-01 01:12:51## 8 116 MRI SCAN r4 1291 complete 2017-05-01 05:32:37## 9 116 Discuss R… r3 1850 start 2017-05-01 09:44:20## 10 116 Discuss R… r7 1850 complete 2017-05-01 14:00:48## 11 116 Check-out r3 2345 start 2017-05-03 04:02:35## 12 116 Check-out r2 2345 complete 2017-05-03 06:16:03## # ℹ 1 more variable: .order <int>

Note that we need an eventlog irrespective of whichattribute values are differing, i.e.it can be resources, but also anyadditional variables you have in your data set. For the special case ofresource values, it might be that a different resource executing eventsin the same activity instance is a data quality issue. If so, somefunctions can help you to identify this issue: Inconsistent Resources.

Again, if the activity instance id does not exist, some heuristicsare available to generate it: [Missing activity instanceidentifier].

Typical problems

Missing activity instance id

In order to be able to correlate events which belong to the sameactivity instance, an activity instance identifier is required. Forexample, in the data shown below, it is possible that a patient has gonethrough different surgeries, each with their own start- and completeevent. The activity instance identifier will then allow to distinguishwhich events belong together and which do not. It is important to notethat this instance identifier should be unique, also among differentcases and activities.

patientactivitytimestampstatusactivity_instance
John Doecheck-in2017-05-10 08:33:26complete1
John Doesurgery2017-05-10 08:53:16start2
John Doesurgery2017-05-10 09:25:19complete2
John Doetreatment2017-05-10 10:01:25start3
John Doetreatment2017-05-10 10:35:18complete3
John Doesurgery2017-05-10 10:41:35start4
John Doesurgery2017-05-10 11:05:56complete4
John Doecheck-out2017-05-11 14:52:36complete5

If the activity instance identifier is not available you can use theassign_instance_id() function, which uses an heuristic tocreate the missing identifier. Alternatively, you can try to create theidentifier on your own using dplyr::mutate() and othermanipulation functions.

Large Datasets and Validation

By default, bupaR validates certain properties of theactivity instances that is supplied when creating an event log:

  • a single activity instance identifier must not be connected tomultiple cases,
  • a single activity instance identifier must not be connected tomultiple activity labels,

However, these checks are not efficient and may lead to considerableperformance issues for large data frames. It is possible to deactivatethe validation in case you already know that your data fulfills all therequirements, using the argument validate = FALSE whencreating the eventlog. Note that when the activity instanceid was created with the assign_instance_id() function, youcan assume the above properties hold.

Inconsistent Resources

Each event can contain the notion of a resource. It can be so thatdifferent events belonging to the same activity instance are executed bydifferent resources, as in the eventlog below.

patienthandlingemployeehandling_idregistration_typetime.order
206Registrationr4206start2017-07-19 15:48:141
206Triage and Assessmentr6706start2017-07-19 17:03:442
206Registrationr3206complete2017-07-19 17:03:443
206Triage and Assessmentr7706complete2017-07-20 07:28:534
206Blood testr11100start2017-07-25 03:02:145
206Blood testr31100complete2017-07-25 08:14:466
206MRI SCANr61337start2017-07-25 12:37:367
206MRI SCANr21337complete2017-07-25 16:52:168
206Discuss Resultsr21940start2017-07-26 07:36:369
206Discuss Resultsr41940complete2017-07-26 11:08:0310
206Check-outr12435start2017-07-28 02:54:1711
206Check-outr72435complete2017-07-28 03:55:1312

If you have a large dataset, and want to have an overview of theactivity instances that have more than one resource connected to them,you can use the detect_resource_inconsistences()function.

log %>% detect_resource_inconsistencies()
## # A tibble: 6 × 5## patient handling handling_id complete start## <chr> <fct> <chr> <chr> <chr>## 1 206 Blood test 1100 r3 r1 ## 2 206 Check-out 2435 r7 r1 ## 3 206 Discuss Results 1940 r4 r2 ## 4 206 MRI SCAN 1337 r2 r6 ## 5 206 Registration 206 r3 r4 ## 6 206 Triage and Assessment 706 r7 r6

If you want to remove these inconsistencies, a quick fix is to mergethe resource labels together withfix_resource_inconsistencies(). Note that this is notneeded for eventlog, but it is foractivitylog. While the creation of theeventlog will emit a warning when resource inconsistenciesexist, this should mostly be seen as a data quality warning. That said,there might be analysis related to the counting of resources where suchinconsistencies might lead to odd results.

log %>% fix_resource_inconsistencies()
## *** OUTPUT ***
## A total of 6 activity executions in the event log are classified as inconsistencies.
## They are spread over the following cases and activities:
## # A tibble: 6 × 5## patient handling handling_id complete start## <chr> <fct> <chr> <chr> <chr>## 1 206 Blood test 1100 r3 r1 ## 2 206 Check-out 2435 r7 r1 ## 3 206 Discuss Results 1940 r4 r2 ## 4 206 MRI SCAN 1337 r2 r6 ## 5 206 Registration 206 r3 r4 ## 6 206 Triage and Assessment 706 r7 r6
## Inconsistencies solved succesfully.
## # Log of 12 events consisting of:## 1 trace ## 1 case ## 6 instances of 6 activities ## 6 resources ## Events occurred from 2017-07-19 15:48:14 until 2017-07-28 03:55:13 ## ## # Variables were mapped as follows:## Case identifier: patient ## Activity identifier: handling ## Resource identifier: employee ## Activity instance identifier: handling_id ## Timestamp: time ## Lifecycle transition: registration_type ## ## # A tibble: 12 × 7## patient handling employee handling_id registration_type time ## <chr> <fct> <chr> <chr> <fct> <dttm> ## 1 206 Registrat… r3 - r4 206 start 2017-07-19 15:48:14## 2 206 Triage an… r7 - r6 706 start 2017-07-19 17:03:44## 3 206 Registrat… r3 - r4 206 complete 2017-07-19 17:03:44## 4 206 Triage an… r7 - r6 706 complete 2017-07-20 07:28:53## 5 206 Blood test r3 - r1 1100 start 2017-07-25 03:02:14## 6 206 Blood test r3 - r1 1100 complete 2017-07-25 08:14:46## 7 206 MRI SCAN r2 - r6 1337 start 2017-07-25 12:37:36## 8 206 MRI SCAN r2 - r6 1337 complete 2017-07-25 16:52:16## 9 206 Discuss R… r4 - r2 1940 start 2017-07-26 07:36:36## 10 206 Discuss R… r4 - r2 1940 complete 2017-07-26 11:08:03## 11 206 Check-out r7 - r1 2435 start 2017-07-28 02:54:17## 12 206 Check-out r7 - r1 2435 complete 2017-07-28 03:55:13## # ℹ 1 more variable: .order <int>

Read more:

CreateLogsAdjustLogsPublicLogsXesFilesInspectLogsDataQuality

bupaR Docs | Create Logs (2024)
Top Articles
Department of Defense Releases the President's Fiscal Year 2024 Defense Budget
10 Best Airbnbs in Porto
Kostner Wingback Bed
Victor Spizzirri Linkedin
Ffxiv Shelfeye Reaver
Gomoviesmalayalam
Seething Storm 5E
Evita Role Wsj Crossword Clue
Mndot Road Closures
Lantana Blocc Compton Crips
Uvalde Topic
A.e.a.o.n.m.s
What Is A Good Estimate For 380 Of 60
อพาร์ทเมนต์ 2 ห้องนอนในเกาะโคเปนเฮเกน
What Time Chase Close Saturday
Directions To O'reilly's Near Me
Foodland Weekly Ad Waxahachie Tx
National Weather Service Denver Co Forecast
Bfg Straap Dead Photo Graphic
Louisiana Sportsman Classifieds Guns
Commodore Beach Club Live Cam
Www Craigslist Com Bakersfield
Mj Nails Derby Ct
Bjerrum difference plots - Big Chemical Encyclopedia
Minnick Funeral Home West Point Nebraska
Talk To Me Showtimes Near Marcus Valley Grand Cinema
Uncovering The Mystery Behind Crazyjamjam Fanfix Leaked
1 Filmy4Wap In
Danielle Ranslow Obituary
Expression&nbsp;Home&nbsp;XP-452 | Grand public | Imprimantes jet d'encre | Imprimantes | Produits | Epson France
WPoS's Content - Page 34
Why comparing against exchange rates from Google is wrong
Wheeling Matinee Results
R/Sandiego
Napa Autocare Locator
6465319333
Gwu Apps
Sephora Planet Hollywood
Dynavax Technologies Corp (DVAX)
20 Best Things to Do in Thousand Oaks, CA - Travel Lens
Puretalkusa.com/Amac
Mid America Irish Dance Voy
SF bay area cars & trucks "chevrolet 50" - craigslist
Academic Calendar / Academics / Home
Chubbs Canton Il
Lorton Transfer Station
Sherwin Source Intranet
6463896344
Tìm x , y , z :a, \(\frac{x+z+1}{x}=\frac{z+x+2}{y}=\frac{x+y-3}{z}=\)\(\frac{1}{x+y+z}\)b, 10x = 6y và \(2x^2\)\(-\) \(...
Lsreg Att
Latest Posts
Article information

Author: Nathanael Baumbach

Last Updated:

Views: 5589

Rating: 4.4 / 5 (55 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Nathanael Baumbach

Birthday: 1998-12-02

Address: Apt. 829 751 Glover View, West Orlando, IN 22436

Phone: +901025288581

Job: Internal IT Coordinator

Hobby: Gunsmithing, Motor sports, Flying, Skiing, Hooping, Lego building, Ice skating

Introduction: My name is Nathanael Baumbach, I am a fantastic, nice, victorious, brave, healthy, cute, glorious person who loves writing and wants to share my knowledge and understanding with you.