Yi Lin Sim Manager • almost 3 years ago
Barclays
Explore Airfoil Self Noise Dataset
- Import Libraries/Dataset
- Download the dataset
- Import the required libraries
1. Data Visualization and Exploration
- Print 2 rows for sanity check to identify all the features present in the dataset and if the target matches with them.
- Comment on class imbalance with appropriate visualization method.
- Provide appropriate visualizations to get an insight about the dataset.
- Do the correlational analysis on the dataset. Provide a visualization for the same. Will this correlational analysis have effect on feature selection that you will perform in the next step? Justify your answer. Answer without justification will not be awarded marks.
- Any other visualization specific to the problem statement.
2. Data Pre-processing and cleaning
- Do the appropriate pre-processing of the data like identifying NULL or Missing Values if any, handling of outliers if present in the dataset, skewed data etc.
- Mention the pre-processing steps performed in the markdown cell. Explore a few latest data balancing tasks and their effect on model evaluation parameters.
- Apply appropriate feature engineering techniques for them. Apply the feature transformation techniques like Standardization, Normalization, etc. You can apply the appropriate transformations depending on your dataset’s structure and complexity. Provide proper justification. Techniques used without justification will not be awarded marks. Explore a few techniques for identifying feature importance for your feature engineering task.
3. Model Building
- Split the dataset into training and test sets. Answer without justification will not be awarded marks.
Case 1: Train = 80 % Test = 20% [ x_train1, y_train1] = 80%; [ x_test1, y_test1] = 20%;
Case 2: Train = 10 % Test = 90% [ x_train2, y_train2] = 10%; [ x_test2, y_test2] = 90%
- Explore k-fold cross-validation.
- Build Model/s using 1) Linear Regression (SK learn or other libraries can be used)
4. Performance Evaluation
- Do the prediction for the test data and display the results for the inference. Calculate all the evaluation metrics and choose the best for your model. Justify your answer. Answers without justification will not be awarded marks.
- List out the performance measures in a tabular format. (Accuracy, F1-score, Efficiency, sensitivity or specificity)
------------------------------------------------------
Dataset: Airfoil Self Noise Dataset
UCI Machine Learning Repository: Airfoil Self-Noise Data Set
Instructions for Assignment Evaluation
1. Please follow the naming convention as _.ipynb
Eg – for group 1 with a weather dataset your notebooks should be named as - Group1_Airfoil_Self_noise.ipynb.
2. Inside each jupyter notebook, you are required to mention your name, Group details and the Assignment dataset you will be working on.
3. Organize your code in separate sections for each task. Add comments to make the code readable.
4. Deep Learning Models are strictly not allowed. You are encouraged to learn classical Machine learning techniques and experience their behavior. For comparison of output with classical model you can use, if needed.
5. Notebooks without output shall not be considered for evaluation.
6. Delete unnecessary error messages and long outputs.
7. Display the analysis of attributes in one frame rather than one after one. However, special treatment to attributes can be displayed separately.
8. Prepare a jupyter notebook (recommended – Azure ML) to build, train and evaluate a Machine Learning model on the given dataset. Please read the instructions carefully.
9. Only two files should be uploaded without zipping them. One is ipynb file and other on html output of the ipynb file. No other files should be uploaded.
Comments are closed.

0 comments