How many False Positives? What do these numbers represent? What are the potential costs to the business if we were to make these mistakes in practice? How many False Negatives? What do these numbers represent? What are the potential costs to the business if we were to make these mistakes in practice? Which prediction mistakes do you consider to be more costly?

Words: 510

Pages: 2

Topics: Assignment help, College essays, Essayhelp, frameworks, Precision, probability

Find the CSV files and the sample codes in the DRIVE LINK PROVIDED:https://drive.google.com/drive/folders/1hRh-yI6DL0…

QUESTION DETAILS:

Gain valuable experience interpreting and critically evaluating various classification performance measures. This assignment is setup to force you to think critically about the different performance indicators and what they mean to various business domains. You are not expected to be experts of any of the business contexts presented, but you ARE expected to show evidence that you are thinking deeply about what you would do IF you were.

There are 3 parts to this assignment – you will follow the same steps for each part. For each part you will create a Jupyter Notebook (3 in total) that includes the following…

High Level Outline
Brief background section: What is the problem? Why is it important? Who are the key stakeholders?
Data Section: note that you don’t need to do much with this section. The data has already been cleaned up a bit. There are no missing values and categorical encoding is done. There may be some outliers but you don’t need to worry about those for this problem set – that is NOT our focus right now.
You DO need to do a train/test split and you may want to normalize the numerical variables if you notice significant scaling issues.
Classification Modeling:
kNN
find the best “k” value to move forward with – loop through a list of odd numbers, e.g. [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21] and choose the “k” that produces the most accurate model
Train your final kNN model using the “k” from the step above.
Create a table of performance measures that vary over a range of possible probability thresholds. Each row will correspond to a probability threshold. Columns should include the following: TN, TP, FN, FP, Precision, Recall, F1, and Accuracy
Logistic Model
Train a Logistic model
Create a table of performance measures as described above – but for your Logistic Model
Pick a “winning” model
Based on the various performance measures, decide which of the two modeling frameworks to move forward with. Reminder – there may not be an obvious “right” or “wrong” choice here. The important thing is that you make a choice and do your best to justify it.
Careful evaluation of winning model performance measures: Using the previously created table of performance measures do the following…
Pick a relatively low probability threshold and then discuss the potential business ramifications of the corresponding performance measures. Specific questions you should try to answer…
How many False Positives? What do these numbers represent? What are the potential costs to the business if we were to make these mistakes in practice?
How many False Negatives? What do these numbers represent? What are the potential costs to the business if we were to make these mistakes in practice?
Which prediction mistakes do you consider to be more costly?
Choose 2 more probability thresholds and repeat the evaluation steps above.
Based on your careful consideration of probability threshold options and the corresponding speculated risks/costs – which probability threshold do you recommend going forward with?