CMPF104: Data Cleaning and Preprocessing: Data science and Data Anaytics: Programming For Foundation In Engineering, Assignment, UNITEN, Malaysia
| University | Universiti Tenaga Nasional (UNITEN) |
| Subject | CMPF104: Programming For Foundation In Engineering |
Data science and Data Anaytics
Download the dataset from BRIGHTEN. If your student ID ends with an odd number, select Concrete_Data_A dataset, and if your student ID ends with an even number, select Concrete_Data_B dataset. Using the Python attributes, function and libraries to solve the following problems.
a) Data Cleaning and Preprocessing:
- Use Pandas to load the dataset. Name the dataframe as concrete_df_XXX.
- Remove ‘Number’ column using .drop() function and visualize the first ten (10)
rows of the data. - Handle any missing values by dropping or replacing the empty cells. Check for missing values using functions like .info() or .isnull().sum()
- Convert the data frame to array, using to_numpy() function.
- Divide the data into two sets of data with division of 80% and 20% for train and test data, respectively. Name the dataset as train_data_XXX and test_data_XXX
b) Data Analysis:
- Calculate the correlation between the variables in the dataframe.
- Utilize NumPy and Pandas to calculate summary statistics of the data such as
maximum, minimum, standard deviation, average, median and mode of each
category. - Use Pandas functions like .describe() for an overview of summary statistics and apply NumPy functions for specific calculations.
c) Visualization:
- Use Matplotlib to create visualizations such as line plots for train and test data
across all categories. - Generate histogram plots and box plots for all variables.
- Ensure that the visualizations are clear, informative, and aesthetically pleasing.
- Customize your plots by adding the titles, labels and legends
Get Help By Expert
Recent Solved Questions
- BBCM1023: The Role and Significance of Management Information Systems: Management Information System, Assignment 1, CU, Malaysia
- FBF1163: Fundamentals of Programming Assignment, UCSI, Malaysia Presume that your class at UCSI finishes at 6:30 pm and you must travel 6 km to reach your home for dinner
- NBNS3224: Obstetric and Gynaecology Assignment, OUM, Malaysia Madam M, aged 28 years, is a primigravida of 12 weeks gestation. She was admitted to the ward because of threatened
- TA6434: Algorithm And Data Structure Assignment, UKM, Malaysia Write a menu-based program to create a list of records at least 3 data using the queue concept
- MPU3313: Health and Wellness 2 Assignment, OUM, Malaysia Use the relevant resources to discuss the therapeutic diet for a diabetic and hypertension person in order to ensure
- Economics Assignment, UON, Malaysia Trade policy uncertainty in major economies increased to a historically high level in 2019 and has hardly been resolved
- EBB2334 Group Assignment: Statistical Data Analysis Using Descriptive and Inferential Techniques – Semester 1
- Common Law Course Work, MMU, Malaysia The ideological power of the jury system should not be underestimated
- Employment Law Course Work, UiTM, Malaysia Seb has been employed by Longton Water Company plc, Northampton, maintaining and repairing water pipes since
- FIN2102: Financial Management Report, IIU, Malaysia You are considering investing in HOG REIT Suppose it is currently undergoing expansion