1. Outline
In this project, we will continue from our Project 1 where we implemented a malicious credit card transaction detection system. But instead of
iuww520iuww520iuww520iuww520iuww520iuww520iuww520iuww520
implementing the features (which we completed in Project 1), we will now
focus on data analysis and visualisation skills to better present what our datasets contain. For this project, you will be given a dataset (CreditCard_2024_Project2.csv ) that contain credit card transactions that are already labelled normal or malicious. Your task is to perform the following steps (more details in the tasks section):
• Data analysis
• Data visualisation
• Write data analysis and visualisation report
• (bonus) use machine learning to implement detection
Note 1: This is an individual project, so please refrain from sharing your code or files with others. However, you can have high-level discussions about the syntax of the formula or the use of modules with other examples.
Please note that if it is discovered that you have submitted work that is not your own, you may face penalties. It is also important to keep in mind that ChatGPT and other similar tools are limited in their ability to generate
outputs, and it is easy to detect if you use their outputs without understanding the underlying principles. The main goal of this project is to demonstrate your understanding of programming principles and how they can be applied in practical contexts.
Note 2: you do not necessarily have to complete project 1 to do this project, as it is more about data analysis and visualisation of the datasets you are given.
2. Tasks
To begin, you need to define a main(filename, filter_value, type_of_card ) function that will read the dataset and store the transaction records in data and call the below functions to display appropriate results.
Sample Input:
main('CreditCard_2024_Project2.csv', 'Port Lincoln', 'ANZ')
Task 1: Data Analysis using NumPy
Mark: 15
Answer the following 5 NumPy related tasks for data analysis. These will require use of NumPy functions and methods, matrix manipulations, vectorized computations, NumPy statistics, NumPy where function, etc. To
complete this task, write a function called task1(data, filter_value, type_of_card) , where
data contains all records from the dataset and filter_value is an area name and type_of_card is the
name of the card provider. The function should return a list containing values from the following questions.
Return all results rounded to two decimal points.
Input:
cos_dist, var, median, corr, pca = task1(data, 'Port Lincoln ', 'ANZ ')
output:
[0.06, 1337142.45, [5.75, 7.21], -0.06, [0.73, 0.81, 0.7, 0.93, 0.72, …] ]