# Process Mining for Python (pm4py)

A clean and simple python library for process mining.

Disclaimer:
We aim to leave the stable features as-is, i.e. they are relatively stable and are unlikely to be refactored. The experimental features are more likely to be refactored in the nearby future.

For Windows users: a version of python 3.6.x is required! Please read the Installation Section for more details.

# Stable Features

Importing

• Importing event logs from IEEE XES files
• Importing event logs from comma-separated-values (csv) files
• Import Petri nets with initial and final marking from PNML files

Exporting

• Exporting event logs to IEEE XES files
• Exporting event logs to csv files
• Exporting Petri nets with initial and final marking to PNML files

Process Discovery

• Discovery of Directly-Follows Graphs (including frequency and performance decoration)
• Discovery of Petri nets with the Alpha Miner
• Discovery of Petri nets with the Inductive Miner (DFG-Based)

Replay/Conformance Checking techniques

• Token-based replay
• Alignments-based replay

Process model quality evaluation

• Replay fitness
• Precision
• Generalization
• Simplicity

# Installation

pm4py has been tested successfully on Windows (64/32 bit) and different Linux environments. Note that, to be able to use some internally used python libraries, pm4py runs with a 3.6.x version of Python. Some guidance on how to get an environment ready for Windows (64/32 bit) and Linux is provided in this tutorial. We start with the environment setup of windows (containing one part that applies to both 32 and 64 bit architectures, and a separate architecture dependant part). Subsequently, we discuss environment setup for Linux.

## Windows (64/32 bit)

For some dependencies of the project, i.e. ciso8601 and cvxopt, you need to have a working Windows C/C++ compiler. We suggest to install the Microsoft Visual Studio 2017 compiler, that available for free through the following link:

Microsoft Visual Studio

During installation, it is vital to install all C++ related development tools. Note that this leads to a big download, i.e. around 6GB.

Moreover, for the purpose of visualization with pm4py, GraphViz installation is required. To install GraphViz, the installer on this page could be used:

Graphviz

After installation, GraphViz is located in the program files directory (on 64 bit systems, it is likely to be located on the x86 program files folder). The bin\ subfolder of the GraphViz directory, e.g. found in C:\Program Files (x86)\Graphviz2.38\bin, needs to be added manually to the system path. For Windows 10, we suggest the following article in order to have an explanation of how to add a new folder to the system path:

Adding a folder to the system path

### Windows (64 bit)

We suggest installing Miniconda (Python 3.6.x) as Python distribution. Miniconda is a Python distribution focused on data science and machine learning related applications, and could be retrieved using the following link:

Miniconda with Python 3.6.x

During the installation of the Miniconda distribution, it is important to select the addition of Miniconda Python to the system path option. This allows us to get easy access to Python from the command line / Powershell.

pm4py requires additional packages to work. In order to install them, it is necessary to open a command line / Powershell and browse to the folder that contains the pm4py sources. An example of the instruction to reach a specific folder in a Windows environment is the following (the path should be replaced):

C:\>cd C:\Users\johndoe\pm4py-source


The additional packages of the project are installed by issuing the following command:

pip install -r requirements.txt


### Windows (32 bit)

Due to the limitations of a 32 bit architecture (limited memory possibilities for threads/processes), it is suggested to use a Windows (64 bit) installation whenever possible.

A full Anaconda (Python 3.6.x) installation is suggested. Anaconda is a free and open source distribution of the Python programming language for data science and machine learning related applications. Use the following link to get Anaconda:

Then, install Anaconda. An important option in order to get easy access to Python from the command line / Powershell is setting the addition of Anaconda Python to the system path. pm4py requires additional packages to work. In order to install them, it is necessary to open a command line / Powershell and reach the folder that contains pm4py sources. An example of the instruction to reach a specific folder in a Windows environment is the following (the path should be replaced):

C:\>cd C:\Users\johndoe\pm4py-source


The following command is required to install cvxopt (a linear/integer programming solver, for further information see this link). Here the conda package manager provided by Anaconda is used:

conda install cvxopt


Other requirements that are needed to execute pm4py could be then installed through the following command:

pip install -r requirements.txt


## Linux

In order to be able to use pm4py under Linux, the presence of a C/C++ compiler is required. Most distributions default install include already the gcc and g++ compilers, respectively compiling C and C++ code.

In order to check the presence of gcc and g++ on your current distribution, along with their version, the following commands could be given:

gcc -v
g++ -v


The response is a complex text that contains, in the end, the version of the compiler that is currently installed:

Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)


If they are not installed, refer to your distribution support for instructions on how to install them. We provide some commands for the most widely used distributions:

Debian / Ubuntu**

apt-get install gcc gcc-g++


Fedora

yum install gcc gcc-c++


Moreover, the presence of GraphViz is required on the system. To check the presence of GraphViz, please give the following command:

dot -h


If GraphViz is not installed, you will get an error from the output of that program. To install GraphViz, a command depending on the distribution should be given. We provide some commands for the most widely used distributions:

Debian / Ubuntu

apt-get install graphviz


Fedora

yum install graphviz


Also for Linux, Python 3.6.x is required. Since the stable versions of the Linux distributions generally include previous versions of Python 3.x, it is suggested to install the Miniconda Python distribution for Linux. Miniconda is a Python distribution focused on data science and machine learning related applications, and could be retrieved using the following link:

https://conda.io/miniconda.html

The 64 bit installer (32 bit architecture has limited memory allocation for threads/processes) could be executed from the command line using the following instruction:

root@debian:~# bash Miniconda3-latest-Linux-x86_64.sh


Important note: it is not required to be root; any user could suffice.

As the first step in the installation of Miniconda, it is required to read the license agreements. Press Enter key in order to read them, move with up and down arrow keys in order to read the points, and then click q in order to quit the license agreements and accept/deny (yes/no) that. Then, a path for the installation of Miniconda is required. The proposed path, that is inside the user directory, is proposed and could be accepted as-is. Then, Miniconda asks if the user wants to add executables to the user path, it is convenient to say yes here.

The additional requirements needed to run pm4py could be installed reaching the directory of pm4py and using pip

root@debian:~# cd /root/pm4py-source/
root@debian:~/pm4py-source# pip install -r requirements.txt


Testing pm4py after installation

To test if everything is installed correctly, the following script could be run from the pm4py folder

python imdf_example.py


On Linux environments, it could be necessary to call the proper version of Python replacing python with python3.6

The scripts work in the following way:

1. The data/input_data/running-example.xes XES log file is loaded
2. A version of the Inductive Miner process discovery algorithm is applied to retrieve a sound workflow net describing the process, along with an initial and final marking
3. The single cases in the log are aligned with the process model.

If no error occurs during the execution of the script, then everything is installed correctly.

# Working with Event Data

Event data, usually recorded in so-called event logs, are the primary source of data of any process mining project and/or algorithm. As such, they play a vital role in process mining.

Within pm4py we distinguish between two major event data object types:

• Event Logs: An event log represents a single sequence of events, executed in the context of a process. Multiple events at different indices of the event log potentially refer to the same process instance. The Event Log object in pm4py resembles the finite version of an event stream.
• Trace Logs: A trace log is a collection of sequences of events. Each sequence in a trace log represents the execution of a process instance (typically these events describe the same case id). The trace log object is equivalent to the notion of event logs, commonly used in process mining.

In the remainder of this section, we describe how pm4py supports access and manipulation of event log data through the IEEE XES- and csv format.

## Importing IEEE XES files

IEEE XES is a standard format in which Process Mining logs are expressed. For further information about the format, please study the IEEE Website.

The following example code aims to import a log, given a path to a log file.

from pm4py.objects.log.importer.xes import factory as xes_import_factory
log = xes_import_factory.apply("<path_to_xes_file>")


A fully working version of the example script can be found in the pm4py-source project, in the examples/web/import_xes_log.py file.

The IEEE XES log is imported as a trace log, hence, the events are already grouped in traces. Trace logs are stored as an extension of the Python list: to access a given trace in the log, it is enough to provide its index in the log. Consider the following examples of how to access the different objects stored in the imported trace log:

• log[0] refers to the first trace in the log
• log[0][0] refers to the first event of the first trace in the log
• log[0][1] refers to the second event of the first trace in the log
• log[1] refers to the second trace in the log
• log[1][0] refers to the first event of the second case in the log
• log[1][1] refers to the second event of the second case in the log

The apply method of the xes_import_factory, i.e. located in pm4py.objects.log.importer.xes.factory file, contains two additional parameters:

• variant: allows us to select a specific importer variant.
• parameters: allows us to specify additional parameters to the underlying importer.

Observe that throughout pm4py, we often use the notions of factories, which contain an apply method that takes some objects as input, an optional parameters object and an optional variant object. The parameters object is always a dictionary that contains the parameters in a key-value fashion. The variant is typically a strng-valued argument.

Currently, we support two different variants, with corresponding (different) parameters:

• iterparse: (default, use variant=”iterparse” to invoke) uses the iterparse library internally for xml parsing. Complies with the IEEE XES standard. Supported parameters:
• timestamp_sort: (boolean) Specify if we should sort log by timestamp
• timestamp_key: (string) If timestamp_sort is true, then sort the log by using this event-attribute key
• reverse_sort: (boolean) Specify in which direction the log should be sorted
• index_trace_indexes: (boolean) Specify if trace indexes should be added as event attribute for each event
• max_no_traces_to_import: (integer) Specify the maximum number of traces to import from the log (as occurring in order of the XML file)
• nonstandard: (use variant=”nonstandard” to invoke) custom implementation that reads the XES file in a line-by-line manner (for improved performance). It does not follow the standard and is able to import traces, simple trace attributes, events, and simple event attributes. Supported parameters:
• same as iterparse

It is possible to access a specific value of a trace / event attribute, as example (the “concept:name”-key at trace level represents the case ID, at event level it typically represents the performed activity):

first_trace_concept_name = log[0].attributes["concept:name"]
first_event_first_trace_concept_name = log[0][0]["concept:name"]


The following code iterates over all the traces in the log writing the case id and, for each event, the performed activity:

for case_index, case in enumerate(log):
print("\n case index: %d  case id: %s" % (case_index, case.attributes["concept:name"]))
for event_index, event in enumerate(case):
print("event index: %d  event activity: %s" % (event_index, event["concept:name"]))

 case index: 4  case id: 5
event index: 0  event activity: register request
event index: 1  event activity: examine casually
event index: 2  event activity: check ticket
event index: 3  event activity: decide
event index: 4  event activity: reinitiate request
event index: 5  event activity: check ticket
event index: 6  event activity: examine casually
event index: 7  event activity: decide
event index: 8  event activity: reinitiate request
event index: 9  event activity: examine casually
event index: 10  event activity: check ticket
event index: 11  event activity: decide
event index: 12  event activity: reject request


An example of invoking the non-standard variant, along with the specification of the timestamp_sort parameter, is contained in the following code:

import os
from pm4py.objects.log.importer.xes import factory as xes_import_factory

parameters = {"timestamp_sort": True}

log = xes_import_factory.apply("<path_to_xes_file>", variant="nonstandard", parameters=parameters)


## Exporting IEEE XES files

Exporting takes as input a trace log and produces an XML that is eventually being saved into a file.

To export a trace log into a file exportedLog.xes, the following code could be used:

from pm4py.objects.log.exporter.xes import factory as xes_exporter

xes_exporter.export_log(log, "exportedLog.xes")


## Importing logs from CSV files

CSV is a tabular format often used to store event logs. Excluding the first row, which describes the headers, each row in the CSV file corresponds to an event. Events in a CSV are not grouped in traces: a grouping should be made specifying a column as case ID and then events that share the same column value are grouped in the same case.

Process Mining algorithms implemented in pm4py usually take a trace log as input. The logical steps in order to get a trace log from a CSV file are:

• Using Pandas to ingest the CSV file into a Dataframe
• Converting Dataframe structure into the event log structure
• Converting the event log into a trace log specifying the column corresponding to the case ID

In the following piece of code, the CSV file running-example.csv that can be found in the directory tests/input_data is imported into an event log structure:

from pm4py.objects.log.importer.csv import factory as csv_importer

event_log = csv_importer.import_log("tests\\input_data\\running-example.csv")


The previous code covers both the importing of the CSV through Pandas and its conversion into the event log structure. Additional parameters for the import_log method inside a dictionary passed as optional parameters argument:

• sep expresses the delimiter (comma is the default)
• quotechar expresses the quote character used in the CSV
• nrows is a limit to the number of rows that should be read from the CSV file
• sort expresses if the log should be sorted according to the values of the field specified in sort_field (usually, the timestamp)
• insert_event_indexes is a boolean value that tells if an additional attribute should be inserted in the events

In an event log structure, events are not grouped in cases, so retrieving the length of an event log means retrieving the number of events. Moreover, each event in an event log is saved as a dictionary where the keys are the column names:

event_log_length = len(event_log)
print(event_log_length)
for event in event_log:
print(event)


In particular, this is an event of the running-example.csv log:

{'Unnamed: 0': 10, 'Activity': 'check ticket', 'Costs': 100, 'Resource': 'Mike', 'case:concept:name': 2, 'case:creator': 'Fluxicon Nitro', 'concept:name': 'check ticket', 'org:resource': 'Mike', 'time:timestamp': Timestamp('2010-12-30 11:12:00')}


To eventually convert the event log structure into a trace log structure (where events are grouped in cases), the case ID column must be identified by the user (in the previous example, the case ID column is caseconceptname). To operate the conversion, the following instructions could be provided:

from pm4py.objects.log import transform

trace_log = transform.transform_event_log_to_trace_log(event_log, case_glue="case:concept:name")


Sometimes is useful to ingest the CSV into a dataframe using Pandas, operating some pre-filtering on the dataframe, and after that converting it into an event log (and then trace log) structure. The following code covers the ingestion, the conversion into event log structure and eventually the conversion into trace log.

from pm4py.objects.log.adapters.pandas import csv_import_adapter
from pm4py.objects.log.importer.csv.versions import pandas_df_imp
from pm4py.objects.log import transform

event_log = pandas_df_imp.convert_dataframe_to_event_log(dataframe)
trace_log = transform.transform_event_log_to_trace_log(event_log, case_glue="case:concept:name")


## Exporting logs to CSV files

Exporting capabilities into CSV files are provided for both event log and trace log formats.

The following example covers exporting of event logs into CSV. Hereby, the event log structure is converted into a Pandas dataframe, which is then exported to a CSV file:

from pm4py.objects.log.exporter.csv import factory as csv_exporter

csv_exporter.export_log(event_log, "outputFile1.csv")


The exporting of trace logs into CSV is a similar matter. The trace log is converted into an event log (the case attributes are reported into the event adding the case: prefix to them), then the event log structure is converted into a Pandas dataframe and the dataframe is exported to a CSV file

from pm4py.objects.log.exporter.csv import factory as csv_exporter

csv_exporter.export_log(trace_log, "outputFile2.csv")


# Petri net management

Petri nets are one of the most common formalism to express a process model. A Petri net is a directed bipartite graph, in which the nodes represent transitions and places. Arcs are connecting places to transitions and transitions to places, and have an associated weight. A transition can fire if each of its input places contains a number of tokens that is at least equal to the weight of the arc connecting the place to the transition. When a transition is fired, then tokens are removed from the input places according to the weight of the input arc, and are added to the output places according to the weight of the output arc.

A marking is a state in the Petri net that associates each place to a number of tokens and is uniquely associated to a set of enabled transitions that could be fired according to the marking.

Process Discovery algorithms implemented in pm4py returns a Petri net along with an initial marking and a final marking. An initial marking is the initial state of execution of a process, a final marking is a state that should be reached at the end of the execution of the process.

## Importing and exporting

Petri nets, along with their initial and final marking, can be imported/exported from the PNML file format. The following code can be used to import a Petri net along with the initial and final marking. In particular, the Petri net related to running-example process is loaded from the test folder:

import os
from pm4py.objects.petri.importer import pnml as pnml_importer

net, initial_marking, final_marking = pnml_importer.import_net(os.path.join("tests","input_data","running-example.pnml"))


The Petri net is visualized using the Petri net visualizer:

from pm4py.visualization.petrinet import factory as pn_vis_factory

gviz = pn_vis_factory.apply(net, initial_marking, final_marking)
pn_vis_factory.view(gviz)


A Petri net can be exported along with only its initial marking:

from pm4py.objects.petri.exporter import pnml as pnml_exporter

pnml_exporter.export_net(net, initial_marking, "petri.pnml")


And along with both its initial marking and final marking:

pnml_exporter.export_net(net, initial_marking, "petri_final.pnml", final_marking=final_marking)


## Petri Net properties

The list of transitions enabled in a particular marking can be obtained using the following code:

from pm4py.objects.petri import semantics

transitions = semantics.enabled_transitions(net, initial_marking)


The function print(transitions) reports that only the transition register request is enabled in the initial marking in the given Petri net. To obtained all places, transitions, and arcs of the Petri net, the following code can be used:

places = net.places
transitions = net.transitions
arcs = net.arcs


Each place has a name and a set of input/output arcs (connected at source/target to a transition). Each transition has a name and a label and a set of input/output arcs (connected at source/target to a place). The following code prints for each place the name, and for each input arc of the place the name and the label of the corresponding transition:

for place in places:
print("\nPLACE: "+place.name)

for arc in place.in_arcs:
print(arc.source.name, arc.source.label)


The output starts with the following:

PLACE: sink 47
n10 register request
n16 reinitiate request

PLACE: source 45
...


Similarly, the following code prints for each transition the name and the label, and for each output arc of the transition the name of the corresponding place:

for trans in transitions:
print("\nTRANS: ",trans.name, trans.label)

for arc in trans.out_arcs:
print(arc.target.name)


For the running example the output starts with the following:

TRANS:  n14 examine thoroughly
sink 54

TRANS:  n15 decide
middle 49
...


## Creating a new Petri net

In this section, an overview of the code necessary to create a new Petri net with places, transitions, and arcs is provided. A Petri net object in pm4py should be created with a name. For example, this creates a Petri net with name new_petri_net

# creating an empty Petri net
net = PetriNet("new_petri_net")


Also places need to be named upon their creation:

# creating source, p_1 and sink place
source = PetriNet.Place("source")
sink = PetriNet.Place("sink")
p_1 = PetriNet.Place("p_1")


To be part of the Petri net they are added to it:

net.places.add(source)


Similar to the places, transitions can be created. However, they need to be assigned a name and a label:

t_1 = PetriNet.Transition("name_1", "label_1")
t_2 = PetriNet.Transition("name_2", "label_2")


They should also be added to the Petri net:

net.transitions.add(t_1)


The following code is useful to add arcs in the Petri net. Arcs can go from place to transition or from transition to place. The first parameter specifies the starting point of the arc, the second parameter its target and the last parameter states the Petri net it belongs to.

from pm4py.objects.petri import utils



To complete the Petri net an initial and possibly a final marking need to be defined. In the following, we define the initial marking to contain 1 token in the source place and the final marking to contain 1 token in the sink place:

from pm4py.objects.petri.petrinet import Marking

initial_marking = Marking()
initial_marking[source] = 1
final_marking = Marking()
final_marking[sink] = 1


The resulting Petri net along with the initial and final marking could be exported:

from pm4py.objects.petri.exporter import pnml as pnml_exporter

pnml_exporter.export_net(net, initial_marking, "createdPetriNet1.pnml", final_marking=final_marking)


Or visualized:

from pm4py.visualization.petrinet import factory as pn_vis_factory

gviz = pn_vis_factory.apply(net, initial_marking, final_marking)
pn_vis_factory.view(gviz)


To obtain a specific output format (e.g. svg or png) a format parameter should be provided to the algorithm. The following code explains how to obtain an SVG representation of the Petri net:

from pm4py.visualization.petrinet import factory as pn_vis_factory

parameters = {"format":"svg"}
gviz = pn_vis_factory.apply(net, initial_marking, final_marking, parameters=parameters)
pn_vis_factory.view(gviz)


Instead of opening visualization of the model directly it can also be saved using the following code:

from pm4py.visualization.petrinet import factory as pn_vis_factory

parameters = {"format":"svg"}
gviz = pn_vis_factory.apply(net, initial_marking, final_marking, parameters=parameters)
pn_vis_factory.save(gviz, "alpha.svg")


# Process Discovery

## Discovery Algorithms

Process Discovery using the Alpha Algorithm

Process Discovery algorithms want to find a suitable process model that describes the order of events/activities that are executed during a process execution. The Alpha Algorithm is one of the most known Process Discovery algorithm and is able to find:

• A Petri net model where all the transitions are visible and unique and correspond to classified events (for example, to activities).
• A initial marking that describes the status of the Petri net model when a execution starts
• A final marking that describes the status of the Petri net model when a execution ends

We provide an example where a log is read, the Alpha algorithm is applied and the Petri net along with the initial and the final marking are found. The log we take as input is the running-example.xes XES log that can be found in the folder tests/input_data.

The following code imports the running-example.xes log:

import os
from pm4py.objects.log.importer.xes import factory as xes_importer

log = xes_importer.import_log(os.path.join("tests","input_data","running-example.xes"))


Then, the log is loaded in memory and the Alpha Miner algorithm can be applied:

from pm4py.algo.discovery.alpha import factory as alpha_miner

net, initial_marking, final_marking = alpha_miner.apply(log)


To export the process model, to visualize it or to save the visualization of the model, the functions presented in the Petri net management section can be used.

The following picture represents the Petri net mined from the running-example.xes log by applying the Alpha Miner:

The place colored green is the source place and belongs to the initial marking. In the initial marking, a token is assigned to that place (indicated by the number 1 on the place). The place colored orange is the sink place and belongs to the final marking. We see that transitions here correspond to activities in the log. Models extracted by the Alpha Miner often have deadlock problems, so it is not sure that each trace is replayable on this model.

Process Discovery using Inductive Miner

Mining a Petri net

The Inductive Miner is a Process Discovery algorithm that aims to construct a sound workflow net with fitness guarantees: it is assured by construction that every trace in the log is replayable on this model. The basic idea of Inductive Miner is about detecting a “cut” in the log (e.g. sequential cut, parallel cut, concurrent cut and loop cut) and then recurs on sublogs, which were found applying the cut, until a base case is found. In pm4py, a variant of the Inductive Miner is implemented (IMDF; for further details see this link) that avoids the recursion on the sublogs but uses the Directly Follows graph.

Models generated by the Inductive Miner have generally greater fitness and generalization compared to models extracted by Alpha Miner. Inductive Miner models usually make extensive use of hidden transitions, especially for skipping/looping on a portion on the model. Furthermore, each visible transition has a unique label (there are no transitions in the model that share the same label).

We provide an example where a log is read, the Inductive Miner is applied and the Petri net along with the initial and the final marking are found. The log we take as input is the running-example.xes XES log that can be found in the folder tests/input_data.

To read the running-example.xes log, the following Python code can be used:

import os
from pm4py.objects.log.importer.xes import factory as xes_importer

log = xes_importer.import_log(os.path.join("tests","input_data","running-example.xes"))


Then, the log is loaded in memory and the Inductive Miner algorithm is applied:

from pm4py.algo.discovery.inductive import factory as inductive_miner

net, initial_marking, final_marking = inductive_miner.apply(log)


To export the process model, to visualize it or to save the visualization of the model, the functions presented in the Petri net management section can be used.

The following picture represents the Petri net obtained on running-example.xes log by applying Inductive Miner:

The place colored green is the source place and belongs to the initial marking. In the initial marking, a token is assigned to that place (indicated by the number 1 on the place). The place colored orange is the sink place and belongs to the final marking. We see that visible transitions here correspond to activities in the log, and there are some hidden transitions.

Mining a process tree

It is also possible to obtain a process tree from the event log using the Inductive Miner. The following code can be used in order to mine a process tree from an event log:

from pm4py.algo.discovery.inductive import factory as inductive_miner

tree = inductive_miner.apply_tree(log)

from pm4py.visualization.process_tree import factory as pt_vis_factory

gviz = pt_vis_factory.apply(tree)
pt_vis_factory.view(gviz)


The following representation is obtained:

If needed, the process tree could be printed through print(tree).

It is also possible to convert a process tree to a Petri net:

from pm4py.objects.conversion.tree_to_petri import factory as tree_petri_converter

net, initial_marking, final_marking = tree_petri_converter.apply(tree)


Process Discovery using Directly-Follows Graphs

Process models modeled using Petri nets have a well-defined semantic: a process execution starts from the places included in the initial marking and finishes at the places included in the final marking. In this section, another class of process models, Directly-Follows Graphs, are introduced. Directly-Follows graphs are graphs where the nodes represent the events/activities in the log and directed edges are present between nodes if there is at least a trace in the log where the source event/activity is followed by the target event/activity. On top of these directed edges, it is easy to represent metrics like frequency (counting the number of times the source event/activity is followed by the target event/activity) and performance (some aggregation, for example, the mean, of time inter-lapsed between the two events/activities).

We extract a Directly-Follows graph from the log running-example.xes.

To read the running-example.xes log, the following Python code could be used:

import os
from pm4py.objects.log.importer.xes import factory as xes_importer

log = xes_importer.import_log(os.path.join("tests","input_data","running-example.xes"))


Then, the following code could be used to extract a Directly-Follows graph from the log:

from pm4py.algo.discovery.dfg import factory as dfg_factory

dfg = dfg_factory.apply(log)


A colored visualization of the Directly-Follows graph decorated with the frequency of activities and edges can be then obtained by using the following code:

from pm4py.visualization.dfg import factory as dfg_vis_factory

gviz = dfg_vis_factory.apply(dfg, log=log, variant="frequency")
dfg_vis_factory.view(gviz)


To get a Directly-Follows graph decorated with the performance between the edges, the following code can replace the previous two pieces of code. The specification of performance should be included in both the Directly-Follows application and the visualization part:

from pm4py.algo.discovery.dfg import factory as dfg_factory
from pm4py.visualization.dfg import factory as dfg_vis_factory

dfg = dfg_factory.apply(log, variant="performance")
gviz = dfg_vis_factory.apply(dfg, log=log, variant="performance")
dfg_vis_factory.view(gviz)


To save the DFG decorated with frequency or performance, instead of displaying it on screen, in svg format, the following code could be used:

from pm4py.algo.discovery.dfg import factory as dfg_factory
from pm4py.visualization.dfg import factory as dfg_vis_factory

dfg = dfg_factory.apply(log, variant="performance")
parameters = {"format":"svg"}
gviz = dfg_vis_factory.apply(dfg, log=log, variant="performance", parameters=parameters)
dfg_vis_factory.save(gviz, "dfg.svg")


Adding frequency or performance information to Petri nets

Similar to the Directly-Follows graph, it is also possible to decorate the Petri net with frequency or performance information. This is done by using a replay technique on the model and then assigning frequency/performance to the paths. The variant parameter of the factory specifies which annotation should be used. The values for the variant parameter are the following:

• wo_decoration: This is the default value and indicates that the Petri net is not decorated.
• frequency: This indicates that the model should be decorated according to frequency information obtained by applying replay.
• performance: This indicates that the model should be decorated according to performance (aggregated by mean) information obtained by applying replay.

In the case the frequency and performance decoration are chosen, it is required to pass the log as a parameter of the visualization (it needs to be replayed).

The following code can be used to obtain the Petri net mined by the Inductive Miner decorated with frequency information:

from pm4py.visualization.petrinet import factory as pn_vis_factory

parameters = {"format":"png"}
gviz = pn_vis_factory.apply(net, initial_marking, final_marking, parameters=parameters, variant="frequency", log=log)
pn_vis_factory.save(gviz, "inductive_frequency.png")


Changing the variant to performance, we obtain the following process schema:

## Using different activity keys

Specifying a different activity key in a Process Mining algorithm

Algorithms implemented in pm4py assume to classify events based on their activity name, which is usually reported inside the concept:name event attribute. In some contexts, it is useful to use another event attribute as activity:

• Importing an event log from a CSV does not assure to have a concept:name event attribute
• Multiple events in a case may refer to different lifecycles of the same activity

The following example, shows the specification of an activity key for the Alpha Miner algorithm:

import os
from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.algo.discovery.alpha import factory as alpha_miner
from pm4py.util import constants

log = xes_importer.import_log(os.path.join("tests","input_data","running-example.xes"))

parameters = {constants.PARAMETER_CONSTANT_ACTIVITY_KEY: "concept:name"}
net, initial_marking, final_marking = alpha_miner.apply(log, parameters=parameters)


For logs imported from XES format, a list of fields that could be used in order to classify events and apply Process Mining algorithms is usually reported in the classifiers section. The Standard classifier usually includes the activity name (the concept:name attribute) and the lifecycle (the lifecycle:transition attribute); the Event name classifier includes only the activity name.

In pm4py, it is assumed that algorithms work on a single activity key. In order to use multiple fields, a new attribute should be inserted for each event as the concatenation of the two.

Classifiers: retrieval and insertion of a corresponding attribute

The following example demonstrates the retrieval of the classifiers inside a log file, using the receipt.xes log:

import os
from pm4py.objects.log.importer.xes import factory as xes_importer

log = xes_importer.import_log(os.path.join("tests","input_data","receipt.xes"))
print(log.classifiers)


The classifiers are then printed to the screen:

{'Activity classifier': ['concept:name', 'lifecycle:transition'], 'Resource classifier': ['org:resource'], 'Group classifier': ['org:group']}


To use the classifier Activity classifier and write a new attribute for each event in the log, the following code can be used:

from pm4py.objects.log.util import insert_classifier

log, activity_key = insert_classifier.insert_activity_classifier_attribute(log, "Activity classifier")
print(activity_key)


Then, as before, the Alpha Miner can be applied on the log specifying the newly inserted activity key:

from pm4py.algo.discovery.alpha import factory as alpha_miner
from pm4py.util import constants

parameters = {constants.PARAMETER_CONSTANT_ACTIVITY_KEY: activity_key}
net, initial_marking, final_marking = alpha_miner.apply(log, parameters=parameters)


Insert manually a new attribute

In the case, the XES specifies no classifiers, and a different field should be used as activity key, there is the option to specify it manually. For example, in this piece of code we read the receipt.xes log and create a new attribute called customClassifier that is the activity name plus the transition

import os
from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.util import constants

log = xes_importer.import_log(os.path.join("tests","input_data","receipt.xes"))

for trace in log:
for event in trace:
event["customClassifier"] = event["concept:name"] + event["lifecycle:transition"]


Then, for example, the Alpha Miner can be applied specifying customClassifier as activity key

from pm4py.algo.discovery.alpha import factory as alpha_miner

parameters = {constants.PARAMETER_CONSTANT_ACTIVITY_KEY: "customClassifier"}
net, initial_marking, final_marking = alpha_miner.apply(log, parameters=parameters)


# Conformance Checking

## Evaluating Petri nets

Now that is is clear how to obtain a Petri net, along with an initial and a final marking and how to apply a Process Discovery algorithm, the question is how to evaluate the quality of the extracted models in the 4 dimensions of Fitness, Precision, Generalization, and Simplicity. In pm4py we provide algorithms to evaluate these 4 dimensions.

For the examples reported in the following sections, we assume to work with the running-example logs located in the folder tests\input_data and apply the Alpha Miner as well as the Inductive Miner:

import os
from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.algo.discovery.alpha import factory as alpha_miner
from pm4py.algo.discovery.inductive import factory as inductive_miner

log = xes_importer.import_log(os.path.join("tests","input_data","running-example.xes"))
alpha_petri, alpha_initial_marking, alpha_final_marking = alpha_miner.apply(log)
inductive_petri, inductive_initial_marking, inductive_final_marking = inductive_miner.apply(log)


Fitness

Fitness is a measure of the replayability of the traces used to mine the model. A fitness evaluation could provide:

• An average fitness value for the log according to the model, that is comprised between 0 and 1 and indicates how well the model can represent the behavior seen in the traces.
• The percentage of traces in the log that is perfectly fitting according to the model.

In pm4py we provide the following algorithms to replay traces on a process model: token-based replay and alignment-based replay.

The following code is useful to get the average fitness value and the percentage of fit traces according to the token replayer:

from pm4py.evaluation.replay_fitness import factory as replay_factory

fitness_alpha = replay_factory.apply(log, alpha_petri, alpha_initial_marking, alpha_final_marking)
fitness_inductive = replay_factory.apply(log, inductive_petri, inductive_initial_marking, inductive_final_marking)
print("fitness_alpha=",fitness_alpha)
print("fitness_inductive=",fitness_inductive)


The output shows that for the running-example log and both Alpha Miner and Inductive Miner we have perfect fitness:

fitness_alpha= {'percFitTraces': 100.0, 'averageFitness': 1.0}
fitness_inductive= {'percFitTraces': 100.0, 'averageFitness': 1.0}


To use the alignment-based replay and get the fitness values, the following code can be used on the Inductive model. Since Alpha Miner does not produce a sound workflow net, alignments based replay cannot be applied.

fitness_inductive = replay_factory.apply(log, inductive_petri, inductive_initial_marking, inductive_final_marking, variant="alignments")


Alignments are using multiprocessing in order to improve the performance, therefore, it is mandatory to start the script with this condition in order to compute alignments:

if __name__ == "__main__":


Precision

Precision is a comparison between the behavior activated in the model at a given state and the behavior activated in the log. A model is precise when it does not allow for paths that are not present in the log. An approach to measure precision has been proposed in the following paper and is called ETConformance:

Muñoz-Gama, Jorge, and Josep Carmona. “A fresh look at precision in process conformance.” International Conference on Business Process Management. Springer, Berlin, Heidelberg, 2010.

Basically, the idea is to build an automaton from the log where the states are represented by prefixes of the traces in the log and transitions are inserted in the automaton if they are present in some trace of the log.

Each state of the automaton is replayed in the Petri net (assuming that it is fit according to the Petri net) and then we have:

• The reflected tasks that are the output transitions in the log automaton of such state
• The activated transitions that are the transitions which were activated but not executed in the Petri net after the trace prefix has been replayed

A set of escaping edges is defined as difference between the activated transitions and the reflected tasks. The following sums are computed:

• a sum of the number of the activated transitions in the Petri net for each state in the log automaton (SUM_AT).
• a sum of the number of escaping edges for each state (SUM_EE)

The precision measure then could be valued as 1 - SUM_EE/SUM_AT.

The following code measures the precision of the Alpha and Inductive Miner models on the receipt.xes log:

from pm4py.evaluation.precision import factory as precision_factory

precision_alpha = precision_factory.apply(log, alpha_petri, alpha_initial_marking, alpha_final_marking)
precision_inductive = precision_factory.apply(log, inductive_petri, inductive_initial_marking, inductive_final_marking)

print("precision_alpha=",precision_alpha)
print("precision_inductive=",precision_inductive)


We obtain the following values:

precision_alpha= 0.10416666666666663
precision_inductive= 0.10416666666666663


The Inductive Miner model is in this case less precise than the Alpha Model, as the model is a Spaghetti model and to fit the model a lot of skip/loop transitions are added.

Generalization

Generalization indicates the characteristic of a process model to not host components that are too specific and are used only in few executions of the process. Models that overfit the log have generally a lot of components that are too specific.

In the context of measuring precision on a Petri net, the components that have been taken into account for measuring precision are the transitions (both visible and hidden). In particular, the token replayer returns for each trace the list of transitions that have been activated during the replay. Note, that the implementation provided in pm4py is able to take into account hidden transitions. So it is easy to measure how many times given transitions have been activated during the replay of the log.

The implemented approach is suggested in the paper:

Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst. “Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity.” International Journal of Cooperative Information Systems 23.01 (2014): 1440001.

Accordingly, generalization is obtained using the following formula on the Petri net:

The following code measures the generalization of the Alpha and Inductive Miner models on the receipt.xes log:

from pm4py.evaluation.generalization import factory as generalization_factory

generalization_alpha = generalization_factory.apply(log, alpha_petri, alpha_initial_marking, alpha_final_marking)
generalization_inductive = generalization_factory.apply(log, inductive_petri, inductive_initial_marking, inductive_final_marking)

print("generalization_alpha=",generalization_alpha)
print("generalization_inductive=",generalization_inductive)


We obtain the following values:

generalization_alpha= 0.5259294594558881
generalization_inductive= 0.4158076884525792


The generalization value provided by the Inductive Miner on this log is slightly lower than the generalization of the Alpha Miner model because of the presence of skip/loop transitions that are visited less often than visible transitions. In comparison, the Petri net constructed by the Alpha Miner only contains visible transitions.

Simplicity

A model is simple when the end user can really understand information from the process model, so the execution paths of the model are clear. For Petri nets, the execution semantics is related to firing transitions, removing tokens from their input places and adding tokens to output places. So a model could be seen as simpler when the number of transitions (a possible way to consume/insert tokens) is low in comparison to the number of places. The approach implemented in pm4py is inspired by this idea, which has been reported in the following paper and is called ‘inverse arc degree’:

Blum, Fabian Rojas. Metrics in process discovery. Technical Report TR/DCC-2015-6, Computer Science Department, University of Chile, 2015.

The formula applied for simplicity is the following:

The following code measures the simplicity of the Alpha and Inductive Miner models on the receipt.xes log:

from pm4py.evaluation.simplicity import factory as simplicity_factory

simplicity_alpha = simplicity_factory.apply(alpha_petri)
simplicity_inductive = simplicity_factory.apply(inductive_petri)

print("simplicity_alpha=",simplicity_alpha)
print("simplicity_inductive=",simplicity_inductive)


We obtain the following values:

simplicity_alpha= 0.5333333333333333
simplicity_inductive= 0.6956521739130435


The simplicity of the Inductive Miner model is higher than the simplicity provided by the Alpha Miner on this log.

Getting all measures in one-line

In the previous sections, methods to calculate fitness, precision, generalization and simplicity of a process model have been provided. In this section, some code to retrieve all the measures at once is provided:

from pm4py.evaluation import factory as evaluation_factory
alpha_evaluation_result = evaluation_factory.apply(log, alpha_petri, alpha_initial_marking, alpha_final_marking)
print("alpha_evaluation_result=",alpha_evaluation_result)

inductive_evaluation_result = evaluation_factory.apply(log, inductive_petri, inductive_initial_marking, inductive_final_marking)
print("inductive_evaluation_result=",inductive_evaluation_result)


We obtain the following values:

alpha_evaluation_result= {'fitness': {'percFitTraces': 100.0, 'averageFitness': 1.0}, 'precision': 0.10416666666666663, 'generalization': 0.5259294594558881, 'simplicity': 0.5333333333333333, 'metricsAverageWeight': 0.540857364863972}
inductive_evaluation_result= {'fitness': {'percFitTraces': 100.0, 'averageFitness': 1.0}, 'precision': 0.10416666666666663, 'generalization': 0.4158076884525792, 'simplicity': 0.6956521739130435, 'metricsAverageWeight': 0.5539066322580724}


These values are the same that have been reported previously, and another measure (that is the average of the 4 measures) is provided with key ‘metricsAverageWeight’. It measures the overall quality of the process model. In this case, we see that the overall quality of the model extracted by Inductive Miner is greater than the overall quality of the model extracted by Alpha Miner.

## Conformance checking techniques

Token-based replayer

Token-based replay matches a trace and a Petri net model, starting from the initial place, in order to discover which transitions are executed and in which places we have remaining or missing tokens for the given process instance. Token-based replay is useful for Conformance Checking: indeed, a trace is fitting according to the model if, during its execution, the transitions can be fired without the need to insert any missing token. If the reaching of the final marking is imposed, then a trace is fitting if it reaches the final marking without any missing or remaining tokens.

Token-based replay permits both global and local Conformance Checking. For each trace, we can assign a fitness value that is between 0 and 1 and is defined as:

In pm4py there is an implementation of a token replayer that is able to go across hidden transitions (calculating shortest paths between places) and can be used with any Petri net model with unique visible transitions and hidden transitions. When a visible transition needs to be fired and not all places in the preset are provided with the correct number of tokens, starting from the current marking it is checked if for some place there is a sequence of hidden transitions that could be fired in order to enable the visible transition. The hidden transitions are then fired and a marking that permits to enable the visible transition is reached. In the following picture, an example representing the algorithm is provided:

Visible transitions TRANS could be enabled from the current marking by firring hidden transitions ht1 and ht2.

Aside from the fitness value, the replay algorithm can be configured in order to consider a trace completely fitting even if there are remaining tokens, as long as all visible transitions corresponding to events in the trace can be fired. Moreover, it can be configured to reach the final marking through hidden transitions. This is useful when after the last activity, the final marking is not reached but could be reached with the execution of hidden transitions.

We provide the following example showing the application of token-based replay. The example starts as usual with the import of the running-example.xes log and the application of the Inductive Miner.

import os
from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.algo.discovery.inductive import factory as inductive_miner

log = xes_importer.import_log(os.path.join("tests", "input_data", "running-example.xes"))
net, initial_marking, final_marking = inductive_miner.apply(log)


To apply token-based replay, the following code is applied:

from pm4py.algo.conformance.tokenreplay import factory as token_replay

replay_result = token_replay.apply(log, net, initial_marking, final_marking)


The print(replay_result) command prints the results:

[{'trace_is_fit': True, 'trace_fitness': 1.0, 'activated_transitions': [register request, examine casually, check ticket, decide, reinitiate request, loop_2, examine thoroughly, check ticket, decide, pay compensation, tau_1], 'reached_marking': ['sink:1'], 'enabled_transitions_in_marking': set(), 'transitions_with_problems': []}, {'trace_is_fit': True, 'trace_fitness': 1.0, 'activated_transitions': [register request, skip_5, check ticket, loop_4, examine casually, skip_6, decide, pay compensation, tau_1], 'reached_marking': ['sink:0'], 'enabled_transitions_in_marking': set(), 'transitions_with_problems': []}, {'trace_is_fit': True, 'trace_fitness': 1.0, 'activated_transitions': [register request, examine thoroughly, check ticket, decide, reject request, tau_1], 'reached_marking': ['sink:1'], 'enabled_transitions_in_marking': set(), 'transitions_with_problems': []}, {'trace_is_fit': True, 'trace_fitness': 1.0, 'activated_transitions': [register request, examine casually, check ticket, decide, pay compensation, tau_1], 'reached_marking': ['sink:0'], 'enabled_transitions_in_marking': set(), 'transitions_with_problems': []}, {'trace_is_fit': True, 'trace_fitness': 1.0, 'activated_transitions': [register request, examine casually, check ticket, decide, reinitiate request, loop_2, skip_5, check ticket, loop_4, examine casually, skip_6, decide, reinitiate request, loop_2, examine casually, check ticket, decide, reject request, tau_1], 'reached_marking': ['sink:0'], 'enabled_transitions_in_marking': set(), 'transitions_with_problems': []}, {'trace_is_fit': True, 'trace_fitness': 1.0, 'activated_transitions': [register request, skip_5, check ticket, loop_4, examine thoroughly, skip_6, decide, reject request, tau_1], 'reached_marking': ['sink:0'], 'enabled_transitions_in_marking': set(), 'transitions_with_problems': []}]


There is one dictionary in the list for each trace, the keys provided in the dictionary for each trace are:

• trace_is_fit -> Indicates if the trace is completely fit to the model
• trace_fitness -> The fitness value calculated for the trace
• activated_transitions -> List of transitions in the model that are activated during the replay
• reached_marking -> Marking that is reached at the end of the replay
• enabled_transitions_in_marking -> In the reached marking at the end of the replay, indicates if there are enabled transitions

The following code provides the overall log fitness value:

from pm4py.evaluation.replay_fitness import factory as replay_fitness_factory

log_fitness = replay_fitness_factory.evaluate(replay_result, variant="token_replay")


If we execute print(log_fitness) then the following result is obtained:

{'percFitTraces': 100.0, 'averageFitness': 1.0}


The token-based replayer can also be configured to return local conformance information about places. This is achieved through the enable_placeFitness parameter. The following code could be applied:

from pm4py.algo.conformance.tokenreplay import factory as token_replay

replay_result, place_fitness = token_replay.apply(log, net, initial_marking, final_marking, parameters={"enable_placeFitness": True})


If we do print(place_fitness), the following result is obtained:

{({'check ticket'}, {'decide'}): {'underfedTraces': set(), 'overfedTraces': set()}, ({'examine thoroughly', 'examine casually'}, {'decide'}): {'underfedTraces': set(), 'overfedTraces': set()}, ({'reinitiate request', 'register request'}, {'examine thoroughly', 'examine casually'}): {'underfedTraces': set(), 'overfedTraces': set()}, ({'decide'}, {'reinitiate request', 'pay compensation', 'reject request'}): {'underfedTraces': set(), 'overfedTraces': set()}, start: {'underfedTraces': set(), 'overfedTraces': set()}, end: {'underfedTraces': set(), 'overfedTraces': set()}, ({'reinitiate request', 'register request'}, {'check ticket'}): {'underfedTraces': set(), 'overfedTraces': set()}}


The keys of this dictionary are places, the values are dictionaries containing the set of traces for which the place is underfed and the set of traces for which the place is overfed.

Additional parameters of the token-replay algorithm, that could be passed in the parameters dictionary, are:

• consider_remaining_in_fitness -> Default True. In considering a trace fit according to the model, it is checked that no place is overfed (that means, the final marking is reached)
• try_to_reach_final_marking_through_hidden -> Default True. When the visible transitions corresponding to events in the trace are fired, and the final marking is not still reached but could be reached by firing hidden transitions, then the hidden transitions are fired in order to reach the final marking
• stop_immediately_unfit -> Default False. If True, the replay stops at the first deviation encountered
• walk_through_hidden_trans -> Default True. Enable going across the hidden transitions in token-based replay
• activity key (pm4py.util.constants.PARAMETER_CONSTANT_ACTIVITY_KEY) -> Must be specified if a different classifier from the activity name (concept:name) is needed.

To use a different classifier, we recall the Classifiers section in the documentation of Process Discovery:

for trace in log:
for event in trace:
event["customClassifier"] = event["concept:name"] + event["concept:name"]


A parameters dictionary containing the activity key is constructed:

# import constants
from pm4py.util import constants
# define the activity key in the parameters
parameters = {constants.PARAMETER_CONSTANT_ACTIVITY_KEY: "customClassifier"}


Then the process model is calculated:

# calculate process model using the given classifier
net, initial_marking, final_marking = inductive_miner.apply(log, parameters=parameters)


And eventually the replay is done:

# apply token-based replay
replay_result = token_replay.apply(log, net, initial_marking, final_marking, parameters=parameters)


Alignments

Alignment-based replay aims to find one of the best alignment between the trace and the model. For each trace, the output of an alignment is a list of couples where the first element is an event (of the trace) or » and the second element is a transition (of the model) or ». For each couple, the following classification could be provided:

• Sync move: the classification of the event corresponds to the transition label; in this case, both the trace and the model advance in the same way during the replay.
• Move on log: for couples where the second element is », it corresponds to a replay move in the trace that is not mimicked in the model. This kind of move is unfit and signal a deviation between the trace and the model.
• Move on model: for couples where the first element is », it corresponds to a replay move in the model that is not mimicked in the trace. For moves on model, we can have the following distinction:
• Moves on model involving hidden transitions: in this case, even if it is not a sync move, the move is fit.
• Moves on model not involving hidden transitions: in this case, the move is unfit and signals a deviation between the trace and the model.

The following code implements an example for obtaining alignments. First, the running-example.xes log is loaded and the Inductive Miner is applied:

import os
from pm4py.objects.log.importer.xes import factory as xes_importer
from pm4py.algo.discovery.inductive import factory as inductive_miner

log = xes_importer.import_log(os.path.join("tests", "input_data", "running-example.xes"))

net, initial_marking, final_marking = inductive_miner.apply(log)


And the alignments can be obtained by this piece of code:

import pm4py
from pm4py.algo.conformance.alignments import factory as align_factory

alignments = align_factory.apply_log(log, net, initial_marking, final_marking)


If we execute print(alignments) we get the following output:

[{'alignment': [('register request', 'register request'), ('examine casually', 'examine casually'), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('decide', 'decide'), ('reinitiate request', 'reinitiate request'), ('>>', None), ('>>', None), ('examine thoroughly', 'examine thoroughly'), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('decide', 'decide'), ('pay compensation', 'pay compensation'), ('>>', None)], 'cost': 7, 'visited_states': 18, 'queued_states': 50, 'traversed_arcs': 100, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('check ticket', 'check ticket'), ('>>', None), ('examine casually', 'examine casually'), ('>>', None), ('decide', 'decide'), ('pay compensation', 'pay compensation'), ('>>', None)], 'cost': 3, 'visited_states': 9, 'queued_states': 26, 'traversed_arcs': 45, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('examine thoroughly', 'examine thoroughly'), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('decide', 'decide'), ('reject request', 'reject request'), ('>>', None)], 'cost': 3, 'visited_states': 9, 'queued_states': 26, 'traversed_arcs': 45, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('examine casually', 'examine casually'), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('decide', 'decide'), ('pay compensation', 'pay compensation'), ('>>', None)], 'cost': 3, 'visited_states': 9, 'queued_states': 26, 'traversed_arcs': 45, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('examine casually', 'examine casually'), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('decide', 'decide'), ('reinitiate request', 'reinitiate request'), ('>>', None), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('examine casually', 'examine casually'), ('>>', None), ('decide', 'decide'), ('reinitiate request', 'reinitiate request'), ('>>', None), ('>>', None), ('examine casually', 'examine casually'), ('>>', None), ('check ticket', 'check ticket'), ('>>', None), ('decide', 'decide'), ('reject request', 'reject request'), ('>>', None)], 'cost': 11, 'visited_states': 29, 'queued_states': 75, 'traversed_arcs': 157, 'fitness': 1.0}, {'alignment': [('register request', 'register request'), ('check ticket', 'check ticket'), ('>>', None), ('examine thoroughly', 'examine thoroughly'), ('>>', None), ('decide', 'decide'), ('reject request', 'reject request'), ('>>', None)], 'cost': 3, 'visited_states': 9, 'queued_states': 26, 'traversed_arcs': 45, 'fitness': 1.0}]


This list reports for each trace the corresponding alignment along with its statistics. With each trace, a dictionary containing among the others the following information is associated:

• alignment: contains the alignment (sync moves, moves on log, moves on model)
• cost: contains the cost of the alignment according to the provided cost function
• fitness: is equal to 1 if the trace is perfectly fitting

To use a different classifier, we recall the Classifiers section in documentation of Process Discovery. Indeed, the following code defines a custom classifier for each event of each trace in the log:

for trace in log:
for event in trace:
event["customClassifier"] = event["concept:name"] + event["concept:name"]


A parameters dictionary containing the activity key can be formed:

# import constants
from pm4py.util import constants
# define the activity key in the parameters
parameters = {constants.PARAMETER_CONSTANT_ACTIVITY_KEY: "customClassifier"}


Then the process model could be calculated:

# calculate process model using the given classifier
net, initial_marking, final_marking = inductive_miner.apply(log, parameters=parameters)


And eventually the replay is done:

alignments = align_factory.apply_log(log, net, initial_marking, final_marking, parameters=parameters)


To get the overall log fitness value, the following code can be used:

from pm4py.evaluation.replay_fitness import factory as replay_fitness_factory

log_fitness = replay_fitness_factory.evaluate(alignments, variant="alignments")


Using print(log_fitness) the following result is obtained:

{'percFitTraces': 100.0, 'averageFitness': 1.0}


The following parameters can also be provided to the alignments:

• Model cost function: associating to each transition in the Petri net the corresponding cost of a move-on-model.
• Sync cost function: associating to each visible transition in the Petri net the cost of a sync move.

Implementation of a custom model cost function, and sync cost function:

model_cost_function = dict()
sync_cost_function = dict()
for t in net.transitions:
# if the label is not None, we have a visible transition
if t.label is not None:
# associate cost 1000 to each move-on-model associated to visible transitions
model_cost_function[t] = 1000
# associate cost 0 to each move-on-log
sync_cost_function[t] = 0
else:
# associate cost 1 to each move-on-model associated to hidden transitions
model_cost_function[t] = 1


Insertion of the model cost function and sync cost function in the parameters:

parameters[pm4py.algo.conformance.alignments.versions.state_equation_a_star.PARAM_MODEL_COST_FUNCTION] = model_cost_function
parameters[pm4py.algo.conformance.alignments.versions.state_equation_a_star.PARAM_SYNC_COST_FUNCTION] = sync_cost_function


And eventually the replay is done:

alignments = align_factory.apply_log(log, net, initial_marking, final_marking, parameters=parameters)


# Experimental Features

Discovery

Frequency/Performance analysis on Petri nets:

• By token-based replay
• By projecting DFG graphs

Visualization and decoration of entities:

• Possibility to get a DFG graph, decorated by frequency/performance, in several formats (PNG, PDF, SVG)
• Possibility to get a Petri net, decorated by frequency/performance, in several formats (PNG, PDF, SVG)
• Possibility to get a Transition System in several formats (PNG, PDF, SVG)

Limiting/Filtering/Sampling entities:

• Possibility to read a given number of cases from the XES file
• Possibility to read a given number of rows from the CSV file
• Filtering on timeframe (both logs and Pandas dataframes)
• Filtering on case performance (both logs and Pandas dataframes)
• Filtering on start activities and end activities (both logs and Pandas dataframes)
• Filtering on attributes values (both logs and Pandas dataframes)
• Filtering on variants (both logs and Pandas dataframes)
• Filtering on paths (both logs and Pandas dataframes)
• Auto filtering on start activities, end activities, activities, variants (for logs and Pandas dataframes) and paths (only for logs)
• Possibility to convert timestamp columns in the CSV only after a filtering
• Possibility to sample the event logs
• Noise removal from DFG graphs

Case management:

• Possibility to retrieve variants along their count, from logs and Pandas dataframes
• Possibility to retrieve cases ordered by start timestamp, completion timestamp, and duration, from logs and Pandas dataframes
• Possibility to retrieve the events belonging to a case with a specified case ID

Other:

• Petri net generation
• Log playout on a Petri net