>A sentiment polarity analysis of airline reviews using VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon by NLTK. VADER lexicon is a rule-based sentiment analyzer in which the terms are generally labeled as per their semantic orientation as either positive or negative.
This airline reviews dataset contains 23,171 reviews for 497 airlines and was imported from kaggle. It was collected by web scraping the website https://www.airlinequality.com/ using the Python library Beautiful Soup. The website serves as a platform for travelers to submit their reviews and ratings about different airlines.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import warnings
warnings.filterwarnings("ignore")
reviews_df = pd.read_csv('Airline_Reviews.csv')
reviews_df.head()
Unnamed: 0 | Airline Name | Overall_Rating | Review_Title | Review Date | Verified | Review | Aircraft | Type Of Traveller | Seat Type | Route | Date Flown | Seat Comfort | Cabin Staff Service | Food & Beverages | Ground Service | Inflight Entertainment | Wifi & Connectivity | Value For Money | Recommended | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | AB Aviation | 9 | "pretty decent airline" | 11th November 2019 | True | Moroni to Moheli. Turned out to be a pretty ... | NaN | Solo Leisure | Economy Class | Moroni to Moheli | November 2019 | 4.0 | 5.0 | 4.0 | 4.0 | NaN | NaN | 3.0 | yes |
1 | 1 | AB Aviation | 1 | "Not a good airline" | 25th June 2019 | True | Moroni to Anjouan. It is a very small airline... | E120 | Solo Leisure | Economy Class | Moroni to Anjouan | June 2019 | 2.0 | 2.0 | 1.0 | 1.0 | NaN | NaN | 2.0 | no |
2 | 2 | AB Aviation | 1 | "flight was fortunately short" | 25th June 2019 | True | Anjouan to Dzaoudzi. A very small airline an... | Embraer E120 | Solo Leisure | Economy Class | Anjouan to Dzaoudzi | June 2019 | 2.0 | 1.0 | 1.0 | 1.0 | NaN | NaN | 2.0 | no |
3 | 3 | Adria Airways | 1 | "I will never fly again with Adria" | 28th September 2019 | False | Please do a favor yourself and do not fly wi... | NaN | Solo Leisure | Economy Class | Frankfurt to Pristina | September 2019 | 1.0 | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | no |
4 | 4 | Adria Airways | 1 | "it ruined our last days of holidays" | 24th September 2019 | True | Do not book a flight with this airline! My fr... | NaN | Couple Leisure | Economy Class | Sofia to Amsterdam via Ljubljana | September 2019 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | no |
reviews_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 23171 entries, 0 to 23170 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 23171 non-null int64 1 Airline Name 23171 non-null object 2 Overall_Rating 23171 non-null object 3 Review_Title 23171 non-null object 4 Review Date 23171 non-null object 5 Verified 23171 non-null bool 6 Review 23171 non-null object 7 Aircraft 7129 non-null object 8 Type Of Traveller 19433 non-null object 9 Seat Type 22075 non-null object 10 Route 19343 non-null object 11 Date Flown 19417 non-null object 12 Seat Comfort 19016 non-null float64 13 Cabin Staff Service 18911 non-null float64 14 Food & Beverages 14500 non-null float64 15 Ground Service 18378 non-null float64 16 Inflight Entertainment 10829 non-null float64 17 Wifi & Connectivity 5920 non-null float64 18 Value For Money 22105 non-null float64 19 Recommended 23171 non-null object dtypes: bool(1), float64(7), int64(1), object(11) memory usage: 3.4+ MB
nltk.download('vader_lexicon') # Download the VADER lexicon
sia = SentimentIntensityAnalyzer()
# Initialize lists to store sentiment labels and scores
sentiment_labels = []
sentiment_scores = []
[nltk_data] Downloading package vader_lexicon to [nltk_data] C:\Users\hp\AppData\Roaming\nltk_data... [nltk_data] Package vader_lexicon is already up-to-date!
# Run sentiment analysis on each review by loop
for i in range(len(reviews_df)):
review = reviews_df['Review'][i]
sentiment_score = sia.polarity_scores(review)['compound']
sentiment_scores.append(sentiment_score)
# Determine sentiment label based on sentiment score
if sentiment_score > 0.05:
sentiment_labels.append('Positive')
elif sentiment_score < 0.05:
sentiment_labels.append('Negative')
else:
sentiment_labels.append('Neutral')
# Create a new DataFrame with the sentiment analysis results
sentiment_df = pd.DataFrame({
'Airline' : reviews_df['Airline Name'],
'Review': reviews_df['Review'],
'Sentiment_Label': sentiment_labels,
'Sentiment_Score': sentiment_scores
})
print("Sentiment Analysis Results:")
print(sentiment_df.head(10))
Sentiment Analysis Results: Airline Review \ 0 AB Aviation Moroni to Moheli. Turned out to be a pretty ... 1 AB Aviation Moroni to Anjouan. It is a very small airline... 2 AB Aviation Anjouan to Dzaoudzi. A very small airline an... 3 Adria Airways Please do a favor yourself and do not fly wi... 4 Adria Airways Do not book a flight with this airline! My fr... 5 Adria Airways Had very bad experience with rerouted and ca... 6 Adria Airways Ljubljana to Zürich. Firstly, Ljubljana airp... 7 Adria Airways First of all, I am not complaining about a s... 8 Adria Airways Worst Airline ever! They combined two flight... 9 Adria Airways Ljubljana to Munich. The homebase airport of ... Sentiment_Label Sentiment_Score 0 Positive 0.9192 1 Negative -0.9242 2 Positive 0.7569 3 Negative -0.9600 4 Negative -0.1416 5 Negative -0.6106 6 Negative -0.9617 7 Negative -0.8216 8 Negative -0.2942 9 Positive 0.8514
plt.figure(figsize=(10,150))
g = sns.stripplot(
data=sentiment_df,
x="Sentiment_Score", y="Airline", hue="Sentiment_Label",
jitter=True, dodge=False, size=3)
g.set(xlabel="", ylabel="")
plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
plt.show()