CMPF104: Data Cleaning and Preprocessing: Data science and Data Anaytics: Programming For Foundation In Engineering, Assignment, UNITEN, Malaysia
| University | Universiti Tenaga Nasional (UNITEN) |
| Subject | CMPF104: Programming For Foundation In Engineering |
Data science and Data Anaytics
Download the dataset from BRIGHTEN. If your student ID ends with an odd number, select Concrete_Data_A dataset, and if your student ID ends with an even number, select Concrete_Data_B dataset. Using the Python attributes, function and libraries to solve the following problems.
a) Data Cleaning and Preprocessing:
- Use Pandas to load the dataset. Name the dataframe as concrete_df_XXX.
- Remove ‘Number’ column using .drop() function and visualize the first ten (10)
rows of the data. - Handle any missing values by dropping or replacing the empty cells. Check for missing values using functions like .info() or .isnull().sum()
- Convert the data frame to array, using to_numpy() function.
- Divide the data into two sets of data with division of 80% and 20% for train and test data, respectively. Name the dataset as train_data_XXX and test_data_XXX
b) Data Analysis:
- Calculate the correlation between the variables in the dataframe.
- Utilize NumPy and Pandas to calculate summary statistics of the data such as
maximum, minimum, standard deviation, average, median and mode of each
category. - Use Pandas functions like .describe() for an overview of summary statistics and apply NumPy functions for specific calculations.
c) Visualization:
- Use Matplotlib to create visualizations such as line plots for train and test data
across all categories. - Generate histogram plots and box plots for all variables.
- Ensure that the visualizations are clear, informative, and aesthetically pleasing.
- Customize your plots by adding the titles, labels and legends
Get Help By Expert
Recent Solved Questions
- Environmental Engineering Thesis, UKM, Malaysia An ornamental creeping plant will be cultivated vertically to create a wall structure. Light conditions and water availability
- F79MA: Statistical Model Assignment, HWU, Malaysia Suppose that you are a trainee actuary working in the mathematical modeling team for a non-governmental organization that is rolling out a micro-credit scheme to support rural communities in developing countries
- BPMM1013: Principles of Marketing Assignment, UUM, Malaysia What would you like to do to increase the average monthly revenues by more than 20%? Justify your choice
- Computer Science Assignment, MMU, Malaysia Bradley is attending a 4D3N (4 days and 3 nights) conference in Kuching. He has already paid RM350 for the conference fee
- AAC20403: Financial Accounting and Reporting Assignment, MSU, Malaysia Using a diagram clearly explains the concept of price ceiling and Why are traders not able to adhere to the ceiling price of chicken
- LAW434: Malaysian Legal System Assignment, UITM, Malaysia With reference to decided cases discuss the application of the doctrine of stare decisis in Malaysia
- BWFF3033: Financial Market and Institutions Assignment, UUM, Malaysia As an individual investor in Malaysia, you believe the main index of the Malaysia stock market, FBM KLCI would decline further in the year 2022
- MPIS7103 – Management Information Systems, Assignment 1, CU, Malaysia
- Business Economic Case Study, SU, Malaysia The following table lists the cross-price elasticities of demand for several goods, where the percent quantity change
- Islamic Family Law Assignment, MUM, Malaysia Examination of the policies and laws established by the Malaysian government to support ta’liq by desertion