Product Sales Analysis using SQL

Analyzing and visualizing data to make better informed decision.
I did some self-reflection, and I realized that the part of my job I enjoy the most is getting insights from analyzing the performances of employees where I work. I’ve been doing this for a couple of years using Excel. So I’d say I am proficient in using Excel/Google Sheets to collect data, analyze it, get insights and present it to management in form of a report. It’s something I always look forward to. Hence, I’ve decided to be intentional about my journey into becoming a Data Analyst. In that spirit, I’ve been studying materials, following professionals in the field, and taking some courses.
To help me keep track of my progress, I’ll be using this platform to record the things I practice, and my first project will be working on a product sales analysis using SQL. By the way, I’m new to SQL, so this was an exciting challenge.
Overview
The data set I used for this project can be found here. And it contains three years worth of sales data of different product lines across different countries. The aim of this project is to understand the overall sales performance of this company(let’s call it Moveit Limited).
Problem Statement
Moveit Ltd. has been in operations for a little over 2 years and they want to understand the following:
- Across all their Product Line, which one is the best selling?
- What was the best year for sales?
- What was the best month for sales?
- Which country sold the most product?
- Who is the best customer?
Methodology
Below is a breakdown of the steps taken to execute this project. Microsoft SQL Server Management Studio is what I used for this analysis

Data Cleaning
Looking at the entire data set, I could see that there were a lot of duplicates and missing values. The duplicates were expected because of the nature of the business. For instance, the column for ‘status’, ‘country’, ‘year_id’ had lots of duplicates. You can see it in the image below.

What I decided to do in the case was to identify the unique value for some of the columns by using the distinct function in SQL. To be able to give answers to the problems stated above, I figured that it would be more relevant to find unique values for only a few columns.
From doing this, I discovered that the sales status was categorized into either “Resolved, On Hold, Cancelled, Shipped, Disputed & in Progress.” Data was collected over 3 years of operations, that is 2003, 2004 & 2005. The company had 7 different products, operated in 19 different countries and had a total of 92 customers. There were no missing values in the columns of interest.
Data Exploration/Analysis
I decided to start answering the questions stated in the problem statement above.
Across all their Product Line, which one is the best selling?: For me to be able to do this, I had to group the sales by product line using the syntax below:

This revealed that of the 7 Products the company deals with, “Classic Cars” was the one that had the most sales and “Trains” had the least sales.
What was the best year for sales? : Following the previous method, sales was grouped by the year. This revealed that 2004 had the most sales. But there was also an interesting discovery here. The data also revealed 2005 as the year with the least sales and it begged the question, ‘What could be the reason behind the extremely low sales in 2005?’
Further data exploration was done by digging beyond the years to find out how much sales happened monthly for each of the years. This is where it was discovered that unlike 2003 and 2004 that made sales in all 12 months of the year, sales data for 2005 was only captured for the first 5 months of the year. This confirm the reason behind the low sales rate in that year.
What was the best month for sales? : To find out the month that had the highest sales, I grouped the total sales and number of order by the month. We’ve already established that the data collected was for a period of three years. So I had to identify the best sales month for each year.

The discovery here was that November was the month that had the highest sales in both years that the company operated all year round. Remember that the last year, 2005, was not considered since we only had data for 5 months out of the 12 months of the year.
Which country sold the most product? Using the query syntax below, the data revealed that United States of America sold the most product out of the 19 other countries.

Who is the best customer? The company had over 90 customers across all regions and from a business perspective, it is important to know those customers who have consistently purchased from the company. The outcome of this can potentially improve marketing strategy of the company. Therefore, it made sense to write a query that segmented the customers into different categories. This was done using the RFM analysis to group customers based on the recency, frequency and monetary total of their recent transactions to identify the best customers.
From this analysis, we could see how frequent customers purchased from the company, how recent their purchase was and the monetary value of their purchase.
For more clarity on the customers, I went further to write a case statement, basically segmenting the customers into different categories. i.e. ‘new customers’, ‘new customers’, ‘potential churners’, ‘active’, ‘loyal’, and ‘slipping away, cannot lose’. Doing this gave a better insight into the customers which I believe can be valuable to the sales and marketing team at the company.
To answer this question, the customers that fell in the category ‘loyal’ can be considered the best customers.
Data Visualization
For better understanding of the data analysis, I made use of Tableau to create a user-friendly interactive dashboard. This dashboard is a brief visual summary of the report on this data.
Conclusion
Thank you for staying with me till this point. This was my first try using SQL for data analysis. Looking at the dataset again, I believe there’s a lot more exploration that can be done and more insight that can be gotten which I intend to do to hone my data analytics skills. However, here an insight from this analysis.
In both years when the company operated full through out the year, the month of November had the highest sales. This may be due to the approaching holiday season. Studies have found that customers are more likely to make purchases towards the end of the year.
References: All the references used were hyperlinked within the article. The query codes can be found here.