IT Ticket Classification

August 22, 2020

8 mins read

1. Background

Customers contact companies via multiple channels, from social media platforms and review sites to email and live chats – any time of the day, wherever they are. With the growing number of mobile users, customers can easily access these services, adding to the volume of tickets generated. IT tickets are the generalized term used to refer to a record of work performed (or needing to be performed) by an organization to operate the company’s technology environment, fix issues, and resolve user requests. These tickets can be raised through the web, mobile app, emails, calls, or even in customer care centers. When an issue or support ticket drops into the IT helpdesk, first it needs to be processed and assigned a tag or category. These tickets are then routed to the right agent for resolution, according to the department that would be a perfect fit for the label of that ticket. Hence, the ticket needs to be correctly labeled, so that there is no waste of time and resources in routing the ticket to the right agent, only adding to the resolution time of that ticket. When this process of ticket classification is a manual one, usually what happens is that there are too many tickets to be tagged, and too many classes to choose from. As a result of this, most of the time the helpdesk personnel ends up selecting the ’Others’/ ’Miscellaneous’ tag for these tickets, which delays the entire routing process as a lot of time is spent processing these incorrectly tagged tickets. Ticket Classification is an essential task for every organization. However, it is one that is very mundane to allocate manual resources to. This is one of the main reasons why automation of Ticket Classification is so essential today.

2. Problem Statement

The problem statement at hand is the three-tier hierarchical classification of IT tickets using natural language processing and machine learning techniques.

3. Solution Methodology

The pipeline followed to get the required results were as follows:

A. Exploratory Data Analysis:

Distribution of tickets across various the three layers of classification:

Distribution across the first layer: ‘Service’

Distribution across the second layer: ‘Category’

Distribution across the third layer: ‘Sub Category’

Word Cloud:

B. Pre-processing of Raw Data

The pre-processing pipeline includes decisions needed to be made about the data. There was initially a large number of tickets that had been tagged in the ‘Others’ Category or the ‘Miscellaneous’ Category. After speaking to the IT Team there was more clarity on the classes that should be considered going forward and the final classes that were considered are as follows:

Feature Selection: Apart from this, from the raw data only columns that had at least 70% of the data as not ‘NA’ were considered. On the columns selected, Feature Selection techniques were applied to see which one of them was useful. As the independent and dependent variables were categorical in nature, the chi-square test was used to understand the dependency. The ‘SelectKBest’ algorithm was used with the chi-square method and the results were as follows:

The y-axis is the chi-square statistic. The higher this score, the more dependence of ‘Service’ with the particular variable.

Only ‘Department’, ‘Category’, ‘Sub-Category’, and ‘User Location’ had a significant level of dependence with the first layer of hierarchy: ‘Service’.
As ‘Category’ and ‘Sub Category’ are layers of the subsequent hierarchy, they are expected to have a significant level of dependence. Hence, they will not be considered while model building.

The code for this technique is attached below:

C. Embeddings of Textual Data

Word Embeddings are required to use a character-based text as an input for a machine learning model by embedding the text into a vector space (vectorization). This converts the textual data into vectors that can then be used as features in Machine Learning algorithms. There were a variety of embeddings explored to get the one that was best suited for our data. Embedding techniques tried can be broadly classified into 3 categories:

Static/Basic Embeddings:

These are basic embedding techniques where words are mapped to a particular vector representation derived after training on a large corpus. The methods tried were:

Tf-Idf
Word2Vec
Glove
Dynamic/Contextualized Embeddings:

Here, the vector representation of words is dynamic and depends on the context in which the word is used in the sentence. This is accomplished by using LSTm, Bi-LSTM, and Attention networks. Methods include:

BERT
ELMo
Sentence Embeddings:

These techniques return vector representation of the whole sentence rather than the word itself and hence are very useful in contexts where sentence transformers are required. Under the hood, these are models that are trained using LSTM, GRU, Attention networks, and Hierarchical ConvNets. The methods tried were:

Facebook’s InferSent
Google’s Universal Sentence Encoder

The evaluation of the different embedding approaches tried is listed below: (For evaluation, data was split into 80 : 20 :: Train : Test.)

From the above, it is evident that ELMO, BERT outperform static models, and InferSent, USE outperform all of them. InferSent is the best performing embedding technique for our data. InferSent Visualizations:

These visualizations give us a clear idea of how well the InferSent embeddings are doing in recognizing the keywords of the sentence. So the above visualizations show that InferSent is performing very well in understanding that ‘MySQL’ and ‘Laptop’ are more important than ‘Need’, ‘on’, or ‘my’. From the visualization, one more key thing to note is that the generic words used like ‘on’, ‘my; have very less importance. This is where the embedding technique becomes truly robust as there is no need for the traditional pipeline of data pre-processing to be followed for the textual data like tokenizing, lemmatizing, removing stop words, etc. The textual data has been passed as it is and the technique is performing very well. After this the combinations of various embeddings was tried to see if the Ensemble of embeddings gave better results and finally we came up with the following:

This was the best performing embedding approach. The code for the functions used are attached below:

D. Dimensionality Reduction

After applying the embeddings, the dataset grew significantly. The number of columns exceeded the number of rows, and hence it was essential to try Dimensionality Reduction methods on the dataset. Principal Component Analysis gave the desired results:

Hence, our data was transformed into 100 columns capturing 99% of the original data’s variability.

E. Model Building

After having the final data ready, several machine learning algorithms were explored. The best performing method included a top-down approach where the classifier for the first level of the hierarchy was built and then subsequent layers were worked on. The target classes were converted into concatenated labels using the following methodology:

Tier 1: Service
Tier 2: Service + Category
Tier 3: Service + Category + Sub Category

After conversion, simple classification models predicting tier 1, 2, and 3 respectively were chosen to complete the top-down approach. The data was split into Train : Test :: 80 : 20 and the evaluation metric used was F1 score. The best model was chosen, from several machine learning algorithms using the performance on the test (unseen to model) data. The XgBoost model performed best for each tier and the code is as follows:

The results were mostly consistent across the validation and test sets which shows that model was not overfitting and was performing great even on unseen data. The test set what 20% of the original dataset. (Other evaluation metrics can be found in codes). The train set scores touched 99.99% which might hint towards overfitting but that was not the case in our scenario as the data provided was repetitive and hence such situation came up and is confirmed in validation and test scores being consistent.

F. User Integration

After building the complete pipeline, we saved the trained models and used those to build a flask app which would take in IT ticket, Department, and User Location as Input and would classify the ticket into 3 tiers of classification by triggering our saved models and acting as an API for the ticket classification service for the IT team. Flask Integration: INPUT Flask Integration: OUTPUT As we can see from this example the ticket raised was “Email ID deletion” which was correctly classified into 3 tiers of classification by the flask APP and the time taken was ~10 seconds which also makes it very efficient concerning time and resources.

4. Conclusion:

This problem statement clarifies and throws light upon the fact that IT ticket routing/tagging and any such similar task can be easily automated through NLP and Machine Learning integration within our systems or on Cloud with very high precision and accuracy. InferSent performed the best in terms of Embedding technique which was expected as it is a sentence transformer utilizing capabilities of LSTM, GRU, and Attention networks, rather than the other methods which are word embedding techniques essentially. The best classification model for each tier in our case was XGBoost which is a tree-based model and does well to handle a class imbalance in the data. Ticket classification is an essential part for Ticket Routing and here are the key advantages that will largely help in implementing a more efficient Customer Care Service:

It will save hours of manpower, especially for large B2C organizations as they have a huge volume of tickets generated each day.
Will largely help in taking data-backed decisions for Resource Allocation to optimize customer care.
Will significantly reduce the overhead spent on routing tickets to the wrong departments.
It can help detect patterns and seasonal trends and hence, estimate the load beforehand, or identify a larger underlying issue.

5. Future Scope:

Ticket Classification is the first and most crucial step for Ticket Routing. Once this has been implemented, it opens up opportunities for a lot of different AI integrations for the Ticket Routing/ Customer Service pipeline. A few of these options are:

Automation of the entire Ticket Routing process with minimal human supervision. Once the ticket is generated, it will route it to the most appropriate agent fit based on certain criteria such as:

Urgency
Location
Current load of the agent

Chatbot integration, to give customers around-the-clock response whenever they generate a ticket, even if there is no human available to address the issue at that time.
Ticket classification as a reporting tool:

This method can also be used to analyze the trends and seasonality in the kinds of tickets that are generated which can help in assigning the required personnel to the most crowded departments accordingly.

6. References:

Articles for embedding techniques: Articles for embedding techniques: 1. Sentence Bert (used in our code) :

a. Paper: https://arxiv.org/abs/1908.10084

b. Implementation: https://github.com/UKPLab/sentence-transformers

2. Universal sentence encoder: a. Code: https://tfhub.dev/google/universal-sentence-encoder/4 b. Paper: https://arxiv.org/abs/1803.11175 3. Facebook’s infersent for sentence embeddings: a. Code: https://github.com/facebookresearch/InferSent b. Understanding: https://medium.com/analytics-vidhya/sentence-embeddings-facebooks-infersent-6ac4a9fc2001 c. Paper: https://arxiv.org/abs/1705.02364 4. ELMo: a. Paper: https://arxiv.org/pdf/1802.05365.pdf b. Understanding and Implementation: https://www.analyticsvidhya.com/blog/2019/03/learn-to-use-elmo-to-extract-features-from-text/

Acknowledgements:

The authors wish to express their gratitude to Paulami Das, Head of Data Science CoE @ Brillio and Anish Roychowdhury, Senior Analytics Leader @ Brillio for their mentoring and guidance towards shaping up this study The authors also wish to acknowledge the immense support from Soubhik Chandaa, Manager IT operations @ Brillio and from his team with help on data and problem understanding.

Authors:

Aarushi Bansal-

Aarushi Bansal is currently a Data Scientist at Brillio having more than 4 years of industry experience. Prior to this, she was working at Infoedge( Naukri.com), India’s leading internet company. Her areas of expertise include machine learning, deep learning, NLP, recommendations and predictive modelling and has had exposure across both product and services domain. Her major achievements include contribution towards lead scoring and revenue models that yielded more than 60% growth in revenue. She has done her B.Tech in IT from IGDTUW( formerly IGIT). LinkedIn: https://www.linkedin.com/in/bansalaarushi

Muskan Gupta-

Muskan Gupta is a final year UG student at NMIMS University pursuing Data Science Engineering from Mukesh Patel School of Technology, Management & Engineering. She is currently interning with Brillio and worked on the IT Ticket Classification project. Her key interests lie in Machine Learning and NLP. She has had experience working on projects in the same throughout her education and in her internship with Ernst & Young, where she worked on Employee Attrition and Sentiment Analysis. LinkedIn: https://www.linkedin.com/in/muskan-gupta-330085153

Praveen Kumar-

Praveen Kumar is a final year UG student at the Indian Institute of Technology, Kharagpur. He has worked on several projects in Machine learning and Deep Learning domain, like IT ticket classification (NLP task) at Brillio, building a real-time recommendation engine at Express Analytics and building a rasa chat-bot with bilingual capability using NLP at Gramophone. His key interests lie in Deep Learning, Reinforcement learning, Machine Learning, NLP and image recognition. LinkedIn: https://www.linkedin.com/in/praveen-kumar-79aaba134

Disclaimer: Any financial and crypto market information given on Analytics Insight are sponsored articles, written for informational purpose only and is not an investment advice. The readers are further advised that Crypto products and NFTs are unregulated and can be highly risky. There may be no regulatory recourse for any loss from such transactions. Conduct your own research by contacting financial experts before making any investment decisions. The decision to read hereinafter is purely a matter of choice and shall be construed as an express undertaking/guarantee in favour of Analytics Insight of being absolved from any/ all potential legal action, or enforceable claims. We do not represent nor own any cryptocurrency, any complaints, abuse or concerns with regards to the information provided shall be immediately informed here.

IT Ticket Classification

1. Background

2. Problem Statement

3. Solution Methodology

A. Exploratory Data Analysis:

B. Pre-processing of Raw Data

C. Embeddings of Textual Data

D. Dimensionality Reduction

E. Model Building

F. User Integration

4. Conclusion:

5. Future Scope:

6. References:

Acknowledgements:

Authors:

Aarushi Bansal-

Muskan Gupta-

Praveen Kumar-

Charting Cardano: An Extensive Analysis of Price Trends

5 Organization Tools to Complete Your Work in Minutes

Countries to Provide the Best Salaries to IIT Graduates

BlockDAG Aims $100 Million Liquidity With Strategic Vesting; More On Internet Computer & Dogecoin Projections

Charting Cardano: An Extensive Analysis of Price Trends

5 Organization Tools to Complete Your Work in Minutes

Countries to Provide the Best Salaries to IIT Graduates

BlockDAG Aims $100 Million Liquidity With Strategic Vesting; More On Internet Computer & Dogecoin Projections

About Us

About AI

Reach Us

Special Editions

Latest Issue

1. Background

2. Problem Statement

3. Solution Methodology

A. Exploratory Data Analysis:

B. Pre-processing of Raw Data

C. Embeddings of Textual Data

D. Dimensionality Reduction

E. Model Building

F. User Integration

4. Conclusion:

5. Future Scope:

6. References:

Acknowledgements:

Authors:

Aarushi Bansal-

Muskan Gupta-

Praveen Kumar-

You May Also Like

About Us

Links

About AI

Reach Us

Special Editions

Latest Issue