CMPF104: Data Cleaning and Preprocessing: Data science and Data Anaytics: Programming For Foundation In Engineering, Assignment, UNITEN, Malaysia
| University | Universiti Tenaga Nasional (UNITEN) |
| Subject | CMPF104: Programming For Foundation In Engineering |
Data science and Data Anaytics
Download the dataset from BRIGHTEN. If your student ID ends with an odd number, select Concrete_Data_A dataset, and if your student ID ends with an even number, select Concrete_Data_B dataset. Using the Python attributes, function and libraries to solve the following problems.
a) Data Cleaning and Preprocessing:
- Use Pandas to load the dataset. Name the dataframe as concrete_df_XXX.
- Remove ‘Number’ column using .drop() function and visualize the first ten (10)
rows of the data. - Handle any missing values by dropping or replacing the empty cells. Check for missing values using functions like .info() or .isnull().sum()
- Convert the data frame to array, using to_numpy() function.
- Divide the data into two sets of data with division of 80% and 20% for train and test data, respectively. Name the dataset as train_data_XXX and test_data_XXX
b) Data Analysis:
- Calculate the correlation between the variables in the dataframe.
- Utilize NumPy and Pandas to calculate summary statistics of the data such as
maximum, minimum, standard deviation, average, median and mode of each
category. - Use Pandas functions like .describe() for an overview of summary statistics and apply NumPy functions for specific calculations.
c) Visualization:
- Use Matplotlib to create visualizations such as line plots for train and test data
across all categories. - Generate histogram plots and box plots for all variables.
- Ensure that the visualizations are clear, informative, and aesthetically pleasing.
- Customize your plots by adding the titles, labels and legends
Get Help By Expert
Recent Solved Questions
- D2FIN100: Introduction to Finance Assignment, HU, Malaysia Suppose the stock of Host Hotels & Resorts is currently trading for $20 per share. If Host does a 3:2 stock split
- Strategic Management and Leadership Assignment, NUM, Malaysia You are required to prepare a case study about a business leader of your choice
- CBOP3103: Object Oriented Approach in Software Development – January 2025 Assignment
- Programming Assignment, QUB, Malaysia The Borneo Car Rental Centre owns a collection of microcars, saloons, and multipurpose cars
- TMT2673: Object oriented software development Assignment, UMS, Malaysia Develop ONE (1) activity diagram to represent the general or business process of the proposed system in the Assignment Specification file
- BEO1106 Business Statistics Case Study, VU, Malaysia you are required to conduct a regression analysis to estimate the relation between Number of Rooms and Advertised Price
- In 2022, a new Recruitment Manager, Leon was ired at Delton Logistic, a transportation services company: Strategic, HR Planning, Selection and Recruitment Assignment, OUM, Malaysia
- BNNS6263: Haemopoietic & Oncology Nursing Assignment, LUC, Malaysia Haemophilia is a known genetic disorder in Malaysia, with about more than 1000 cases reported in 2018
- CT127-3-2-PFDA Retail Data Analytics Assignment: Customer Ratings Case Study for R-Based Insight Generation
- Economics Assignment, UITM, Malaysia Economics is the study of how society manages its scarce resources. Discuss the problem of scarcity in Malaysia